Skip to main content

Kubernetes Scheduling: Observing Silent Failures

· 10 min read
Irfan Shah
Founder & CTO at base14

A Pending Pod means Kubernetes accepts your workload but can't run it. Classic culprits are: insufficient capacity, overly restrictive placement constraints, unbound PVCs, autoscaler ceilings, or namespace quota exhaustion. Most teams discover this during an incident. You don't have to. Wire up the OTel Collector's k8s_cluster, kubeletstats, and k8sobjects receivers, alert on FailedScheduling events and Pending pod duration, and you'll catch scheduling failures before your users do. This post covers the five root causes, a kubectl debugging workflow, and a complete OTel instrumentation setup with collector config, deployment topology, and alert conditions.

Coding Agent Observability for Your Team

· 8 min read
Ranjan Sakalley
Founder & CPO at base14

Coding agents like Claude Code, OpenAI Codex CLI, and Google Gemini CLI now ship with native OpenTelemetry support. This means you can collect structured telemetry covering token usage, cost attribution, tool calls, sessions, and lines of code modified, the same way you instrument any other production system.

This post covers what each agent emits, how to enable collection, and what we learned running Claude Code telemetry across a team.

Flutter Mobile Observability with OpenTelemetry

· 5 min read
Nimisha G J
Engineer at base14

Most teams have solid observability on their backend. Structured logs, distributed traces, SLOs, alerting. The mobile app, which is often the first thing a user touches, gets crash reports at best.

A user taps a button and nothing happens. Was it the network? A janky frame that swallowed the tap? A backend timeout? A state management bug? Without telemetry on the device, you are guessing.

This post explains a couple of approaches we have used to help our customers instrument their Flutter apps and when to use each approach.

Zero-Code Instrumentation for Go with eBPF and OpenTelemetry

· 10 min read
Ranjan Sakalley
Founder & CPO at base14

Auto-instrumentation is well-established for Java, Python, and Node.js. Runtime agents hook into the interpreter or bytecode layer to inject tracing, metrics, and logging without requiring code changes. Go compiles to a static native binary, so JVM-style bytecode patching does not apply. But Go is not without options. Compile-time tools like Datadog's Orchestrion and Alibaba's opentelemetry-go-auto-instrumentation can inject tracing at build time, and eBPF provides a runtime alternative that requires no rebuild at all.

This post focuses on the eBPF approach. It attaches kernel-level probes to running Go binaries, extracting telemetry without modifying source code, recompiling, or restarting the process. OpenTelemetry now has two official projects built on this mechanism. We cover how it works, how to deploy it on Kubernetes, and where the practical limits are.

Production-Ready OpenTelemetry: Configure, Harden, and Debug Your Collector

· 11 min read
Ranjan Sakalley
Founder & CPO at base14

The OpenTelemetry Collector works out of the box with minimal configuration. You point a receiver at port 4317, wire up an exporter, and telemetry flows. In development, this is sufficient. In production, it is not.

Default settings ship without memory limits, without retry logic, without queue sizing, and without any self-monitoring. The collector will accept data until it runs out of memory, drop data silently when the queue fills up, and give you no signal that anything went wrong. These failures surface as gaps in your dashboards hours or days later, when the context to diagnose them is gone.

This post covers the practical steps to close that gap: hardening the collector's configuration, enabling its built-in diagnostic tools, and diagnosing the failure patterns that show up most often in production.

GitHub Actions Observability with Scout

· 8 min read
Ranjan Sakalley
Founder & CPO at base14

CI/CD pipelines are critical infrastructure. Builds slow down over weeks, flaky tests waste developer time, and when a pipeline breaks, diagnosing the root cause means clicking through GitHub's UI one run at a time.

The Scout OpenTelemetry CI/CD Action solves this by exporting your GitHub Actions workflow runs as OpenTelemetry traces. Each workflow becomes a trace, each job becomes a child span, and each step becomes a span within its job. You get the same structured observability for your pipelines that you already have for your applications.

The Multi-Cloud Design: Engineering your code for Portability

· 6 min read
Irfan Shah
Founder & CTO at base14

In our previous post on Cloud-Native foundations, we explored why running on one cloud isn't lock-in—but designing for one cloud is. Now let's look at how to implement that portability.

Portability is not defined by the ability to run everywhere simultaneously, as that is often a path toward over-engineering. It is, more accurately, a function of reversibility. It provides the technical confidence that if a migration becomes necessary, the system can support it. This quality is not derived from a specific cloud provider, but rather from the deliberate layering of code and environment. While many teams focus on the destination of their deployment, true portability is found in the methodology of the build.

Live Metric Registry: find and understand observability metrics across your stack

· 9 min read
Ranjan Sakalley
Founder & CPO at base14

Introducing Metric Registry: a live, searchable catalog of 3,700+ observability (and rapidly growing) metrics extracted directly from source repositories across the OpenTelemetry, Prometheus, and Kubernetes ecosystems, including cloud provider metrics. Metric Registry is open source and built to stay current automatically as projects evolve.

What you can do today with Metric Registry

Search across your entire observability stack. Find metrics by name, description, or component, whether you're looking for HTTP-related histograms or database connection metrics.

Understand what metrics actually exist. The registry covers 15 sources including OpenTelemetry Collector receivers, Prometheus exporters (PostgreSQL, Redis, MySQL, MongoDB, Kafka), Kubernetes metrics (kube-state-metrics, cAdvisor), and LLM observability libraries.

See which metrics follow standards. Each metric shows whether it complies with OpenTelemetry Semantic Conventions, helping you understand what's standardized versus custom.

Trace back to the source. Every metric links to its origin: the repository, file path, and commit hash. When you need to understand a metric's exact definition, you can go straight to the source.

Trust the data. Metrics are extracted automatically from source code and official metadata files, and the registry refreshes nightly to stay current as projects evolve.

Can't find what you're looking for? Open an issue or better yet, submit a PR to add new sources or improve existing extractors.

Sources already indexed

CategorySources
OpenTelemetryCollector Contrib, Semantic Conventions, Python, Java, JavaScript
Prometheusnode_exporter, postgres_exporter, redis_exporter, mysql_exporter, mongodb_exporter, kafka_exporter
Kuberneteskube-state-metrics, cAdvisor
LLM ObservabilityOpenLLMetry, OpenLIT
CloudWatchRDS, ALB, DynamoDB, Lambda, EC2, S3, SQS, API Gateway