Skip to main content

4 posts tagged with "scout"

View All Tags

Android Mobile Observability with OpenTelemetry

· 19 min read
Ranjan Sakalley
Founder & CPO at base14

A user opens a ticket: "the app froze when I tried to upload a photo." They were on the metro, on cellular, on a Samsung Galaxy A54 running Android 13. You're on a Pixel 8 on office Wi-Fi and the upload completes in 400 ms every time you try it. Crashlytics says "no crash logged." Play Console ANR rate looks normal. Was it the network? A frozen frame that swallowed the tap? A backend timeout? An OOM kill on a device with 4 GB of RAM and a busy launcher?

You can't tell. None of the tools you have were built to answer that question.

This is the gap OpenTelemetry fills on Android. Backend services have had distributed tracing for a decade. The mobile app, the thing the user actually touches, gets crash reports and a five-row Play Console dashboard. We've spent the last year helping teams close that gap with the OpenTelemetry Android Agent, and this post is a deep walkthrough of what it solves, how to wire it up, and how to ship the data to a collector you control.

Stop Deploying Broken OTel Configs: Validate & Test Before You Ship

· 10 min read
Nitin Misra
Engineer at base14

OpenTelemetry Collector configurations are YAML files. There's no schema, no type system, and no IDE that will tell you that tail_smapling isn't a real processor. You find out when your pipeline goes dark and someone starts paging the on-call.

The collector ships with otelcol validate, which catches syntax errors and fails on unknown component types. That covers a slice of the problem. It won't tell you that your send_batch_max_size is smaller than your send_batch_size, that your memory limiter is effectively disabled, or that you've hardcoded an API key in plain text.

LLM Prompt Lifecycle: From Observability to Optimization

· 22 min read
Nitin Misra
Engineer at base14

Rachel, a Staff Engineer at a mid-size SaaS company, woke up to a Slack message from the support lead: "Why are half our billing tickets going to the technical team?" She checked the deployment log, nothing shipped in a week. She checked the model configuration, same gpt-4o endpoint, same parameters, same code. No errors in the logs, no latency spikes, no alerts fired. But customer complaints about misrouted tickets had doubled in three weeks. Something was wrong.

This is prompt drift, a slow, invisible degradation in LLM output quality that no dashboard catches until a human notices the downstream effects. Rachel's triage prompt, which classifies support tickets and routes them to the right team, worked perfectly at launch. The team tested it carefully, tuned the wording, validated it against sample tickets, and shipped it with confidence. Three months later, it was failing, and nothing in the monitoring stack surfaced the problem until the support lead noticed a pattern in Slack complaints.