11 posts tagged with "opentelemetry"

Android Mobile Observability with OpenTelemetry

April 29, 2026 · 19 min read

Founder & CPO at base14

A user opens a ticket: "the app froze when I tried to upload a photo." They were on the metro, on cellular, on a Samsung Galaxy A54 running Android 13. You're on a Pixel 8 on office Wi-Fi and the upload completes in 400 ms every time you try it. Crashlytics says "no crash logged." Play Console ANR rate looks normal. Was it the network? A frozen frame that swallowed the tap? A backend timeout? An OOM kill on a device with 4 GB of RAM and a busy launcher?

You can't tell. None of the tools you have were built to answer that question.

This is the gap OpenTelemetry fills on Android. Backend services have had distributed tracing for a decade. The mobile app, the thing the user actually touches, gets crash reports and a five-row Play Console dashboard. We've spent the last year helping teams close that gap with the OpenTelemetry Android Agent, and this post is a deep walkthrough of what it solves, how to wire it up, and how to ship the data to a collector you control.

Stop Deploying Broken OTel Configs: Validate & Test Before You Ship

April 8, 2026 · 12 min read

Nitin Misra

Engineer at base14

OpenTelemetry Collector configurations are YAML files. There's no schema, no type system, and no IDE that will tell you that tail_smapling isn't a real processor. You find out when your pipeline goes dark and someone starts paging the on-call.

The collector ships with otelcol validate, which catches syntax errors and fails on unknown component types. That covers a slice of the problem. It won't tell you that your send_batch_max_size is smaller than your send_batch_size, that your memory limiter is effectively disabled, or that you've hardcoded an API key in plain text.

Instrumenting Google Apps Script with OpenTelemetry

March 23, 2026 · 8 min read

Nimisha G J

Engineer at base14

Google Apps Script powers a surprising amount of business infrastructure. Approval workflows, hiring pipelines, invoice generators, CRM integrations — all running as serverless functions triggered by form submissions, chat messages, or time-based triggers. When something breaks, you get a stacktrace in the Apps Script logs and nothing else. No traces, no metrics, no correlation between the email that failed and the spreadsheet write that succeeded two seconds earlier. You're debugging with Logger.log and guesswork.

We run a hiring automation bot on Apps Script that touches Gmail, Sheets, Drive, Calendar, GitHub, and Google Chat in a single command invocation. When a candidate's assignment email silently failed to send, we had no way to tell whether the issue was the template fetch, the Gmail API, or the spreadsheet update that stores the thread ID. The execution log just said "success." This is the story of how we instrumented it with OpenTelemetry.

LLM Prompt Lifecycle: From Observability to Optimization

March 18, 2026 · 23 min read

Nitin Misra

Engineer at base14

Rachel, a Staff Engineer at a mid-size SaaS company, woke up to a Slack message from the support lead: "Why are half our billing tickets going to the technical team?" She checked the deployment log, nothing shipped in a week. She checked the model configuration, same gpt-4o endpoint, same parameters, same code. No errors in the logs, no latency spikes, no alerts fired. But customer complaints about misrouted tickets had doubled in three weeks. Something was wrong.

This is prompt drift, a slow, invisible degradation in LLM output quality that no dashboard catches until a human notices the downstream effects. Rachel's triage prompt, which classifies support tickets and routes them to the right team, worked perfectly at launch. The team tested it carefully, tuned the wording, validated it against sample tickets, and shipped it with confidence. Three months later, it was failing, and nothing in the monitoring stack surfaced the problem until the support lead noticed a pattern in Slack complaints.

Kubernetes Scheduling: Observing Silent Failures

March 9, 2026 · 12 min read

Irfan Shah

Founder & CTO at base14

A Pending Pod means Kubernetes accepts your workload but can't run it. Classic culprits are: insufficient capacity, overly restrictive placement constraints, unbound PVCs, autoscaler ceilings, or namespace quota exhaustion. Most teams discover this during an incident. You don't have to. Wire up the OTel Collector's k8s_cluster, kubeletstats, and k8sobjects receivers, alert on FailedScheduling events and Pending pod duration, and you'll catch scheduling failures before your users do. This post covers the five root causes, a kubectl debugging workflow, and a complete OTel instrumentation setup with collector config, deployment topology, and alert conditions.

Coding Agent Observability - Monitor AI Coding Assistants

March 6, 2026 · 10 min read

Ranjan Sakalley

Founder & CPO at base14

Coding agents like Claude Code, OpenAI Codex CLI, and Google Gemini CLI now ship with native OpenTelemetry support. This means you can collect structured telemetry covering token usage, cost attribution, tool calls, sessions, and lines of code modified, the same way you instrument any other production system.

This post covers what each agent emits, how to enable collection, and what we learned running Claude Code telemetry across a team.

Flutter Mobile Observability with OpenTelemetry

March 4, 2026 · 5 min read

Nimisha G J

Engineer at base14

Most teams have solid observability on their backend. Structured logs, distributed traces, SLOs, alerting. The mobile app, which is often the first thing a user touches, gets crash reports at best.

A user taps a button and nothing happens. Was it the network? A janky frame that swallowed the tap? A backend timeout? A state management bug? Without telemetry on the device, you are guessing.

This post explains a couple of approaches we have used to help our customers instrument their Flutter apps and when to use each approach.

Zero-Code Instrumentation for Go with eBPF and OpenTelemetry

February 17, 2026 · 12 min read

Ranjan Sakalley

Founder & CPO at base14

Auto-instrumentation is well-established for Java, Python, and Node.js. Runtime agents hook into the interpreter or bytecode layer to inject tracing, metrics, and logging without requiring code changes. Go compiles to a static native binary, so JVM-style bytecode patching does not apply. But Go is not without options. Compile-time tools like Datadog's Orchestrion and Alibaba's opentelemetry-go-auto-instrumentation can inject tracing at build time, and eBPF provides a runtime alternative that requires no rebuild at all.

This post focuses on the eBPF approach. It attaches kernel-level probes to running Go binaries, extracting telemetry without modifying source code, recompiling, or restarting the process. OpenTelemetry now has two official projects built on this mechanism. We cover how it works, how to deploy it on Kubernetes, and where the practical limits are.

Production-Ready OpenTelemetry: Configure, Harden, and Debug Your Collector

February 13, 2026 · 12 min read

Ranjan Sakalley

Founder & CPO at base14

The OpenTelemetry Collector works out of the box with minimal configuration. You point a receiver at port 4317, wire up an exporter, and telemetry flows. In development, this is sufficient. In production, it is not.

Default settings ship without memory limits, without retry logic, without queue sizing, and without any self-monitoring. The collector will accept data until it runs out of memory, drop data silently when the queue fills up, and give you no signal that anything went wrong. These failures surface as gaps in your dashboards hours or days later, when the context to diagnose them is gone.

This post covers the practical steps to close that gap: hardening the collector's configuration, enabling its built-in diagnostic tools, and diagnosing the failure patterns that show up most often in production.

GitHub Actions Observability with Scout

February 12, 2026 · 8 min read

Ranjan Sakalley

Founder & CPO at base14

CI/CD pipelines are critical infrastructure. Builds slow down over weeks, flaky tests waste developer time, and when a pipeline breaks, diagnosing the root cause means clicking through GitHub's UI one run at a time.

The Scout OpenTelemetry CI/CD Action solves this by exporting your GitHub Actions workflow runs as OpenTelemetry traces. Each workflow becomes a trace, each job becomes a child span, and each step becomes a span within its job. You get the same structured observability for your pipelines that you already have for your applications.