Skip to main content

Azure SQL Database Monitoring with OpenTelemetry - DTU, Connections & Deadlocks

Overview

This guide covers monitoring an Azure SQL Database (the managed PaaS) with the OpenTelemetry Collector's azure_monitor receiver. The collector polls Azure Monitor's REST API every 60 seconds for the metrics published by Microsoft.Sql/servers/databases, transforms them to OTel-style names, and ships them via OTLP/HTTP to base14 Scout.

The azure_monitor receiver does not connect to SQL directly. It queries Azure Monitor's metrics surface for any database Azure auto-publishes to — so the same pattern applies across all SKUs (DTU model, vCore, Serverless, Hyperscale), single databases and elastic pools, and to other Azure services like Cosmos DB, Storage, and Service Bus. The configuration shape below generalises to those.

Self-hosted SQL Server vs Azure SQL Database

If you run SQL Server yourself — on a VM, on-premises, or in a container — use the self-hosted SQL Server guide instead. That path uses the OTel sqlserverreceiver to scrape Dynamic Management Views (DMVs) directly, which works without an Azure subscription and surfaces internals (query plans, wait stats, buffer cache hit ratio) that Azure Monitor doesn't expose.

SurfaceMechanismSubscriptionInternals
Azure SQL Database (PaaS)azure_monitor receiver, this guideRequiredDTU, connections, storage
SQL Server (VM / on-prem / container)sqlserverreceiver, other guideNot requiredDMV scrapes — wait stats, query plans, buffer pool

Pick exactly one per database. Running both against the same workload produces double-counted dashboards because the metric names overlap with different dimensions.

What you'll monitor

Two Azure namespaces, scraped together on each poll:

  1. Microsoft.Sql/servers/databases — per-database metrics: DTU, connections, storage, deadlocks, replication lag, in-memory OLTP usage.
  2. Microsoft.Sql/servers/elasticPools — pool-level capacity and saturation. Omitted if no elastic pools exist on the target server.

The receiver emits one OTel metric per Azure aggregation; for example, dtu_consumption_percent becomes azure_dtu_consumption_percent_average, azure_dtu_consumption_percent_maximum, and azure_dtu_consumption_percent_minimum. Metrics with both Total and Count aggregations (e.g., connection_successful) emit two OTel series — pick whichever you prefer for dashboards; they carry the same information at the per-minute grain.

Database-level (Microsoft.Sql/servers/databases)

Azure REST nameOTel emittedUnitWhat it tells you
cpu_percentazure_cpu_percent_{average,maximum,minimum}PercentDatabase CPU usage. Page at sustained 80%+.
dtu_consumption_percentazure_dtu_consumption_percent_*PercentComposite DTU saturation (DTU model only).
dtu_used / dtu_limitazure_dtu_used_* / azure_dtu_limit_*DTUAbsolute DTU consumption + tier ceiling.
log_write_percentazure_log_write_percent_*PercentWrite-log throughput saturation.
physical_data_read_percentazure_physical_data_read_percent_*PercentRead-IO saturation (page reads from storage).
storageazure_storage_{average,maximum,minimum}BytesAllocated storage in bytes.
storage_percentazure_storage_percent_*Percent% of maxSizeBytes used. Alert at 80%.
sessions_percentazure_sessions_percent_*PercentSessions vs. tier ceiling.
workers_percentazure_workers_percent_*PercentWorkers vs. tier ceiling.
connection_successfulazure_connection_successful_{count,total}CountSuccessful connections per minute.
connection_failedazure_connection_failed_*CountFailed connections — only emitted when non-zero.
blocked_by_firewallazure_blocked_by_firewall_*CountConnections rejected by server firewall rules.
deadlockazure_deadlock_*CountDeadlock count — page on any non-zero.
availabilityazure_availability_*PercentDatabase availability % (PT1H grain).
replication_lag_secondsazure_replication_lag_seconds_*SecondsGeo-replication / active geo-replication lag. Premium / Business Critical / Hyperscale / geo-replicated only.
xtp_storage_percentazure_xtp_storage_percent_*PercentIn-memory OLTP storage. Premium / Business Critical only.

Pool-level (Microsoft.Sql/servers/elasticPools)

Azure REST nameOTel emittedUnitWhat it tells you
eDTU_limit, eDTU_usedazure_edtu_limit_*, azure_edtu_used_*DTUPool capacity vs. used (DTU pools).
dtu_consumption_percentazure_dtu_consumption_percent_*PercentPool DTU saturation.
cpu_percent, log_write_percent, physical_data_read_percentazure_cpu_percent_*, azure_log_write_percent_*, azure_physical_data_read_percent_*PercentPool CPU + I/O saturation.
storage_used, storage_limit, storage_percentazure_storage_used_*, azure_storage_limit_*, azure_storage_percent_*Bytes / PercentPool storage capacity vs. used.
allocated_data_storage, allocated_data_storage_percentazure_allocated_data_storage_*Bytes / PercentAllocated-data view of pool storage.
sessions_percent, sessions_count, workers_percentazure_sessions_percent_*, azure_sessions_count_*, azure_workers_percent_*Percent / CountPool connection pressure.
xtp_storage_percentazure_xtp_storage_percent_*PercentPool-level in-memory OLTP storage.

The connection_failed, blocked_by_firewall, and deadlock count metrics stay absent until real occurrences — Azure Monitor doesn't emit zero data points for them. The receiver also discovers the system master database alongside your application database and emits the same database-scope series for both; filter by cloud.resource_id (which encodes the full Azure resource ID for each emitted series) if you want to drop master in Scout.

What Azure Monitor does NOT see

Wait stats, buffer pool hit ratio, query store, individual replica health on Business Critical / Hyperscale, and the deadlock graph XML are SQL Server internals — Azure Monitor doesn't expose them. Point the OTel sqlserverreceiver at the SQL endpoint to add that depth; see the self-hosted SQL Server guide. The two paths are complementary, not redundant — run both with distinct service.name values when you need both views.

Prerequisites

RequirementMinimum
An Azure SQL Database (any tier)DTU, vCore, Serverless, Hyperscale
OTel Collector contribv0.148.0+ (snake_case YAML keys)
Microsoft.Sql providerregistered on the subscription
Service principalMonitoring Reader on the SQL RG
base14 Scoutany tenant

This guide is the SQL-DB-specific addition to a working OpenTelemetry Collector. For collector deployment + the Scout exporter pieces (which are the same for every Azure surface), see:

Access setup

The azure_monitor receiver needs read-only access to Azure Monitor metrics on the resource groups containing your SQL servers. Grant Monitoring Reader to a service principal:

# Create the SP (once per tenant — reuse it for every Azure surface).
az ad sp create-for-rbac --name sp-otel-azure-monitor --skip-assignment

# Scope Monitoring Reader to each SQL resource group.
RG_ID=$(az group show --name <your-rg> --query id -o tsv)
az role assignment create \
--assignee <appId from the create-for-rbac output> \
--role "Monitoring Reader" \
--scope "$RG_ID"

Capture appId, password, and tenant from the create output — they become AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID in the collector's environment. RBAC propagation on the legacy ARM /metrics endpoint is immediate; the data-plane batch API can lag 5-30 minutes (see Operations).

Inside Azure? If your collector runs in Azure (VM, Container Apps, AKS pod), prefer a User-assigned Managed Identity over a service principal. The azure_auth extension supports managed_identity: and workload_identity: modes; only the auth block changes.

Receiver configuration

This is the SQL-DB-specific addition to your collector. Add the azure_auth extension and azure_monitor receiver to your existing config, then wire the receiver into a metrics pipeline that exports to Scout (see Scout Exporter for the exporter half — it's the same OAuth2 + OTLP/HTTP setup used by every Azure surface).

otel-collector.yaml (excerpt)
extensions:
azure_auth:
service_principal:
tenant_id: ${env:AZURE_TENANT_ID}
client_id: ${env:AZURE_CLIENT_ID}
client_secret: ${env:AZURE_CLIENT_SECRET}

receivers:
azure_monitor:
subscription_ids: ["${env:AZURE_SUBSCRIPTION_ID}"]
resource_groups: ["${env:AZURE_RESOURCE_GROUP}"]
services:
- Microsoft.Sql/servers/databases
- Microsoft.Sql/servers/elasticPools
auth: { authenticator: azure_auth }
collection_interval: 60s
use_batch_api: false
cache_resources: 60
dimensions: { enabled: true }
metrics:
"Microsoft.Sql/servers/databases":
cpu_percent: []
dtu_consumption_percent: []
dtu_used: []
dtu_limit: []
physical_data_read_percent: []
log_write_percent: []
storage: []
storage_percent: []
connection_successful: []
connection_failed: []
blocked_by_firewall: []
deadlock: []
sessions_percent: []
workers_percent: []
availability: []
replication_lag_seconds: []
xtp_storage_percent: []
"Microsoft.Sql/servers/elasticPools":
cpu_percent: []
dtu_consumption_percent: []
eDTU_limit: []
eDTU_used: []
physical_data_read_percent: []
log_write_percent: []
storage_used: []
storage_limit: []
storage_percent: []
allocated_data_storage: []
allocated_data_storage_percent: []
sessions_percent: []
sessions_count: []
workers_percent: []
xtp_storage_percent: []

processors:
resource:
attributes:
- { key: cloud.provider, value: azure, action: insert }
- { key: cloud.platform, value: azure_sql_database, action: insert }
- { key: cloud.account.id, value: "${env:AZURE_SUBSCRIPTION_ID}", action: insert }
- { key: cloud.region, value: "${env:AZURE_REGION}", action: insert }
- { key: cloud.resource_id, value: "${env:AZURE_SQL_DATABASE_RESOURCE_ID}", action: insert }
- { key: service.name, value: "${env:SERVICE_NAME}", action: insert }

service:
extensions: [azure_auth] # plus your existing extensions (oauth2client, etc.)
pipelines:
metrics:
receivers: [azure_monitor]
processors: [resource, batch] # plus your existing processors
exporters: [otlphttp/b14] # the Scout exporter from the shared setup

Once metrics: is set for a namespace, the receiver only emits the metrics you list — there is no implicit "default + my picks" merge. Extend the list to add more (e.g., tempdb_data_size: [] on Premium tier). The empty aggregation list [] per metric collects all aggregations Azure publishes for that metric.

Environment variables

.env
# From `az ad sp create-for-rbac` output.
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=

# From your Azure subscription / resource group.
AZURE_SUBSCRIPTION_ID=
AZURE_RESOURCE_GROUP=
AZURE_REGION=
AZURE_SQL_DATABASE_RESOURCE_ID= # az sql db show -g <rg> -s <server> -n <db> --query id -o tsv

# Resource attribute defaults.
SERVICE_NAME=azure-sql-database

AZURE_SQL_DATABASE_RESOURCE_ID is the full ARM resource ID of the database whose metrics you want to label most explicitly with cloud.resource_id. The azure_monitor receiver discovers every database in the configured resource group; this attribute is for the dashboard / filter convention, not for scoping the scrape.

Key alerts to configure

Threshold guidance for the most operationally useful series. Tune to your workload; these are starting points for a Basic/Standard tier with real traffic.

Metric (OTel name)WarningCriticalWhy it matters
azure_dtu_consumption_percent_average> 75% / 5m> 90% / 5mSustained DTU saturation throttles queries; precedes connection failures.
azure_cpu_percent_average> 80% / 5m> 95% / 5mCPU-bound workload; correlate with query plans before scaling tier.
azure_storage_percent_average> 75%> 90%Approaching maxSizeBytes; out-of-space halts writes on Basic / Standard.
azure_sessions_percent_average> 70%> 90%Connection-pool exhaustion; tier ceiling differs per SKU.
azure_workers_percent_average> 70%> 90%Concurrent-request ceiling; tier-specific.
azure_log_write_percent_average> 80% / 5m> 95% / 5mWrite-throughput saturation; impacts replication lag.
azure_connection_failed_total> 0 / 5msustained > 0 / 15mAuth, firewall, or TLS issues; investigate immediately when sustained.
azure_blocked_by_firewall_total> 0 / 5m> 10 / 5mFirewall blocking traffic; usually a misconfiguration.
azure_deadlock_total> 0> 5 / 5mApplication-side concurrency bug; any non-zero deserves investigation.
azure_replication_lag_seconds_average> 5s> 30sGeo-replication / read-scale-out drift; only relevant on Premium / BC / Hyperscale.
azure_availability_average< 100% / 1h< 99.9% / 1hSLA-compliant availability; PT1H grain.

For elastic pools, mirror azure_dtu_consumption_percent_*, azure_storage_percent_*, azure_sessions_percent_*, and azure_workers_percent_* against the pool resource using the same thresholds.

Operations

  • Collection interval. 60 seconds is the sweet spot — Azure Monitor's ingestion lag is 1-3 minutes, so faster polls just re-read stale data.
  • cache_resources. This is the receiver's resource-list cache TTL in seconds (default 24h). The shipped config sets it to 60 so newly- created databases are visible to the receiver on the next poll — appropriate for a validation pass or for environments where databases come and go frequently. In a stable production fleet, raise it back toward the default (e.g., 3600 or higher) to skip the per-minute ARM resource-list call.
  • RBAC propagation. The legacy ARM /metrics endpoint propagates Monitoring Reader immediately. The newer data-plane batch API at *.metrics.monitor.azure.com requires separate RBAC propagation that can lag 5-30 minutes after grant.
  • Switching to use_batch_api: true raises Azure Monitor's per-tenant query rate ceiling from 12,000 to 360,000 calls/hour. Worth it once you're scraping more than a handful of databases. Wait for data-plane RBAC to settle before enabling.
  • System master database. The receiver auto-discovers the system master database alongside your application databases and emits the same database-scope series for both. master is mostly noise — filter by cloud.resource_id in Scout if you want to ignore it.
  • Tier-gated metrics. A few names in the whitelist only emit on specific tiers: replication_lag_seconds needs geo-replication or active geo-replication; xtp_storage_percent needs Premium or Business Critical (in-memory OLTP isn't available below Premium); app_cpu_billed / app_cpu_percent only emit on Serverless databases. The receiver polls these names regardless and silently returns no series on tiers that don't support the underlying feature, so the same config works across DTU, vCore, Serverless, and Hyperscale fleets. Add tempdb_data_size, tempdb_log_size, tempdb_log_used_percent, sql_instance_cpu_percent, and sql_instance_memory_percent to the whitelist if you need the InstanceAndAppAdvanced category on Premium or Hyperscale.

Troubleshooting

AuthorizationFailed from the receiver

The role assignment hasn't propagated. Wait 60 seconds after creating it; on the legacy ARM endpoint propagation is usually immediate. If you've enabled use_batch_api: true, allow up to 30 minutes for data-plane propagation — or temporarily flip back to false to confirm the role itself is correct.

403 Forbidden from the receiver

The service principal client_secret has expired. Rotate with az ad sp credential reset --id $AZURE_CLIENT_ID --years 1 and update your collector's AZURE_CLIENT_SECRET env var.

No metrics in the first 3 minutes

Azure Monitor has a 1-3 minute ingestion lag. azure_storage and azure_dtu_limit emit on every database from the first poll. DTU, connection, and lock metrics only show non-zero values after real workload on the database — control-plane calls (az sql db show) don't drive them. If the database is idle, that's expected.

connection_failed / blocked_by_firewall / deadlock are absent

Expected. Azure Monitor does not emit zero data points for these count-style metrics — they appear in your collector output only when real occurrences have happened in the polling window. Missing data implies zero, not a collection failure.

RequestThrottled warnings from the receiver

Azure Monitor's per-tenant query rate limit (12,000/hour on the legacy endpoint, 360,000/hour on the batch API). Either lower polling rate (collection_interval: 120s), narrow the scope (resource_groups: filter), or enable use_batch_api: true once data-plane RBAC has settled.

Collector container can't resolve login.microsoftonline.com

Docker Desktop networking glitch — the container's DNS resolver becomes unreachable. docker compose down && docker compose up -d typically fixes it. If persistent, restart Docker Desktop.

Scout OAuth2 returns 401

Verify the SCOUT_CLIENT_ID, SCOUT_CLIENT_SECRET, and SCOUT_TOKEN_URL your collector is using match the values in your Scout console. The endpoint_params.audience MUST be b14collector — that's what the Scout token endpoint expects.

Frequently Asked Questions

How do I monitor Azure SQL Database with OpenTelemetry?

Run the OpenTelemetry Collector with the azure_monitor receiver targeting Microsoft.Sql/servers/databases (and optionally Microsoft.Sql/servers/elasticPools for pool-level metrics). The receiver polls Azure Monitor's REST API every 60 seconds, transforms metrics from Azure's lowercase names (like dtu_consumption_percent) to OTel-style names (azure_dtu_consumption_percent_average), and ships them via OTLP/HTTP to base14 Scout. Authentication uses the azure_auth extension in service-principal or managed-identity mode.

Should I use this guide or the self-hosted SQL Server guide?

Use this guide for Azure SQL Database (the managed PaaS). Use the self-hosted SQL Server guide if you run SQL Server yourself on a VM, on-premises, or in a container — that path uses sqlserverreceiver to scrape DMVs directly instead of polling Azure Monitor. The two are complementary, not redundant: azure_monitor reports Azure's external view (DTU billing, blocked-by-firewall, geo-replication lag, storage-vs-cap), while sqlserverreceiver reports SQL Server internals (wait stats, buffer pool, query store). Production deployments commonly run both with distinct service.name values to keep the two views separate in dashboards.

Why does connection_failed return no data points?

Azure Monitor does not emit zero-valued data points for connection_failed. The metric only surfaces in your collector logs when there are real connection failures (auth errors, firewall blocks, TLS handshake failures). This is consistent with how Azure handles several count-style metrics — missing data on a count metric implies zero occurrences, not a collection failure.

Why does the receiver emit both _count and _total suffixes for connection_successful?

Azure Monitor publishes connection_successful with two supported aggregations: Total (sum) and Count. The receiver emits one OTel metric per published aggregation, producing azure_connection_successful_total and azure_connection_successful_count for the same source metric. Same applies to connection_failed, blocked_by_firewall, and deadlock. Pick whichever aggregation you prefer for dashboards; they carry the same information at the per-minute grain.

Which metrics need higher tiers to emit non-zero values?

replication_lag_seconds requires geo-replication or active geo-replication to be configured (Premium / Business Critical / Hyperscale, or any tier with a configured geo-secondary). xtp_storage_percent requires Premium or Business Critical (in-memory OLTP is not available below Premium). app_cpu_billed and app_cpu_percent only emit on Serverless databases. The receiver always polls these names; they simply return no series on tiers that don't support the underlying feature.

Can I monitor elastic pools alongside individual databases?

Yes — the shipped config covers both Microsoft.Sql/servers/databases and Microsoft.Sql/servers/elasticPools. The receiver returns no series for the elastic-pool namespace if the target server has no pools, so the same config is safe to run against servers that don't use pools.

How does this differ from Application Insights for Azure SQL Database?

Application Insights for Azure SQL is Azure-tenant-bound, billed per-GB ingested, and visualised in Azure dashboards or workbooks. The OpenTelemetry Collector is vendor-neutral — the same image ships to base14 Scout or any OTLP-compatible backend without redeployment. The metric coverage is identical — both surfaces draw from the same Azure Monitor REST API.

  • Self-hosted SQL Server — paired guide for SQL Server you run yourself (VM, on-prem, container). Uses sqlserverreceiver to scrape DMVs directly.
  • Azure Cosmos DB — sister guide; same azure_monitor pattern, NoSQL surface.
  • Azure Kubernetes Service — sister guide; same receiver pattern but adds in-cluster collectors for kubeletstats + cluster-state.
Was this page helpful?