Azure SQL Database Monitoring with OpenTelemetry - DTU, Connections & Deadlocks
Overview
This guide covers monitoring an Azure SQL Database (the managed PaaS) with
the OpenTelemetry Collector's azure_monitor receiver. The collector polls
Azure Monitor's REST API every 60 seconds for the metrics published by
Microsoft.Sql/servers/databases, transforms them to OTel-style names, and
ships them via OTLP/HTTP to base14 Scout.
The azure_monitor receiver does not connect to SQL directly. It queries
Azure Monitor's metrics surface for any database Azure auto-publishes to — so
the same pattern applies across all SKUs (DTU model, vCore, Serverless,
Hyperscale), single databases and elastic pools, and to other Azure
services like Cosmos DB, Storage, and Service Bus. The configuration shape
below generalises to those.
Self-hosted SQL Server vs Azure SQL Database
If you run SQL Server yourself — on a VM, on-premises, or in a container —
use the self-hosted SQL Server guide instead.
That path uses the OTel sqlserverreceiver to scrape Dynamic Management
Views (DMVs) directly, which works without an Azure subscription and surfaces
internals (query plans, wait stats, buffer cache hit ratio) that Azure
Monitor doesn't expose.
| Surface | Mechanism | Subscription | Internals |
|---|---|---|---|
| Azure SQL Database (PaaS) | azure_monitor receiver, this guide | Required | DTU, connections, storage |
| SQL Server (VM / on-prem / container) | sqlserverreceiver, other guide | Not required | DMV scrapes — wait stats, query plans, buffer pool |
Pick exactly one per database. Running both against the same workload produces double-counted dashboards because the metric names overlap with different dimensions.
What you'll monitor
Two Azure namespaces, scraped together on each poll:
Microsoft.Sql/servers/databases— per-database metrics: DTU, connections, storage, deadlocks, replication lag, in-memory OLTP usage.Microsoft.Sql/servers/elasticPools— pool-level capacity and saturation. Omitted if no elastic pools exist on the target server.
The receiver emits one OTel metric per Azure aggregation; for example,
dtu_consumption_percent becomes azure_dtu_consumption_percent_average,
azure_dtu_consumption_percent_maximum, and azure_dtu_consumption_percent_minimum.
Metrics with both Total and Count aggregations (e.g., connection_successful)
emit two OTel series — pick whichever you prefer for dashboards; they carry
the same information at the per-minute grain.
Database-level (Microsoft.Sql/servers/databases)
| Azure REST name | OTel emitted | Unit | What it tells you |
|---|---|---|---|
cpu_percent | azure_cpu_percent_{average,maximum,minimum} | Percent | Database CPU usage. Page at sustained 80%+. |
dtu_consumption_percent | azure_dtu_consumption_percent_* | Percent | Composite DTU saturation (DTU model only). |
dtu_used / dtu_limit | azure_dtu_used_* / azure_dtu_limit_* | DTU | Absolute DTU consumption + tier ceiling. |
log_write_percent | azure_log_write_percent_* | Percent | Write-log throughput saturation. |
physical_data_read_percent | azure_physical_data_read_percent_* | Percent | Read-IO saturation (page reads from storage). |
storage | azure_storage_{average,maximum,minimum} | Bytes | Allocated storage in bytes. |
storage_percent | azure_storage_percent_* | Percent | % of maxSizeBytes used. Alert at 80%. |
sessions_percent | azure_sessions_percent_* | Percent | Sessions vs. tier ceiling. |
workers_percent | azure_workers_percent_* | Percent | Workers vs. tier ceiling. |
connection_successful | azure_connection_successful_{count,total} | Count | Successful connections per minute. |
connection_failed | azure_connection_failed_* | Count | Failed connections — only emitted when non-zero. |
blocked_by_firewall | azure_blocked_by_firewall_* | Count | Connections rejected by server firewall rules. |
deadlock | azure_deadlock_* | Count | Deadlock count — page on any non-zero. |
availability | azure_availability_* | Percent | Database availability % (PT1H grain). |
replication_lag_seconds | azure_replication_lag_seconds_* | Seconds | Geo-replication / active geo-replication lag. Premium / Business Critical / Hyperscale / geo-replicated only. |
xtp_storage_percent | azure_xtp_storage_percent_* | Percent | In-memory OLTP storage. Premium / Business Critical only. |
Pool-level (Microsoft.Sql/servers/elasticPools)
| Azure REST name | OTel emitted | Unit | What it tells you |
|---|---|---|---|
eDTU_limit, eDTU_used | azure_edtu_limit_*, azure_edtu_used_* | DTU | Pool capacity vs. used (DTU pools). |
dtu_consumption_percent | azure_dtu_consumption_percent_* | Percent | Pool DTU saturation. |
cpu_percent, log_write_percent, physical_data_read_percent | azure_cpu_percent_*, azure_log_write_percent_*, azure_physical_data_read_percent_* | Percent | Pool CPU + I/O saturation. |
storage_used, storage_limit, storage_percent | azure_storage_used_*, azure_storage_limit_*, azure_storage_percent_* | Bytes / Percent | Pool storage capacity vs. used. |
allocated_data_storage, allocated_data_storage_percent | azure_allocated_data_storage_* | Bytes / Percent | Allocated-data view of pool storage. |
sessions_percent, sessions_count, workers_percent | azure_sessions_percent_*, azure_sessions_count_*, azure_workers_percent_* | Percent / Count | Pool connection pressure. |
xtp_storage_percent | azure_xtp_storage_percent_* | Percent | Pool-level in-memory OLTP storage. |
The connection_failed, blocked_by_firewall, and deadlock count metrics
stay absent until real occurrences — Azure Monitor doesn't emit zero data
points for them. The receiver also discovers the system master database
alongside your application database and emits the same database-scope series
for both; filter by cloud.resource_id (which encodes the full Azure
resource ID for each emitted series) if you want to drop master in Scout.
What Azure Monitor does NOT see
Wait stats, buffer pool hit ratio, query store, individual replica health
on Business Critical / Hyperscale, and the deadlock graph XML are SQL
Server internals — Azure Monitor doesn't expose them. Point the OTel
sqlserverreceiver at the SQL endpoint to add that depth; see the
self-hosted SQL Server guide. The two paths
are complementary, not redundant — run both with distinct service.name
values when you need both views.
Prerequisites
| Requirement | Minimum |
|---|---|
| An Azure SQL Database (any tier) | DTU, vCore, Serverless, Hyperscale |
| OTel Collector contrib | v0.148.0+ (snake_case YAML keys) |
Microsoft.Sql provider | registered on the subscription |
| Service principal | Monitoring Reader on the SQL RG |
| base14 Scout | any tenant |
This guide is the SQL-DB-specific addition to a working OpenTelemetry Collector. For collector deployment + the Scout exporter pieces (which are the same for every Azure surface), see:
- Docker Compose Setup, or Kubernetes Helm Setup / Linux Setup for other runtimes.
- Scout Exporter for the OAuth2 + OTLP/HTTP exporter config.
Access setup
The azure_monitor receiver needs read-only access to Azure Monitor metrics
on the resource groups containing your SQL servers. Grant Monitoring Reader
to a service principal:
# Create the SP (once per tenant — reuse it for every Azure surface).
az ad sp create-for-rbac --name sp-otel-azure-monitor --skip-assignment
# Scope Monitoring Reader to each SQL resource group.
RG_ID=$(az group show --name <your-rg> --query id -o tsv)
az role assignment create \
--assignee <appId from the create-for-rbac output> \
--role "Monitoring Reader" \
--scope "$RG_ID"
Capture appId, password, and tenant from the create output — they
become AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, and AZURE_TENANT_ID in
the collector's environment. RBAC propagation on the legacy ARM /metrics
endpoint is immediate; the data-plane batch API can lag 5-30 minutes (see
Operations).
Inside Azure? If your collector runs in Azure (VM, Container Apps, AKS pod), prefer a User-assigned Managed Identity over a service principal. The
azure_authextension supportsmanaged_identity:andworkload_identity:modes; only the auth block changes.
Receiver configuration
This is the SQL-DB-specific addition to your collector. Add the
azure_auth extension and azure_monitor receiver to your existing
config, then wire the receiver into a metrics pipeline that exports to
Scout (see Scout Exporter for the
exporter half — it's the same OAuth2 + OTLP/HTTP setup used by every Azure
surface).
extensions:
azure_auth:
service_principal:
tenant_id: ${env:AZURE_TENANT_ID}
client_id: ${env:AZURE_CLIENT_ID}
client_secret: ${env:AZURE_CLIENT_SECRET}
receivers:
azure_monitor:
subscription_ids: ["${env:AZURE_SUBSCRIPTION_ID}"]
resource_groups: ["${env:AZURE_RESOURCE_GROUP}"]
services:
- Microsoft.Sql/servers/databases
- Microsoft.Sql/servers/elasticPools
auth: { authenticator: azure_auth }
collection_interval: 60s
use_batch_api: false
cache_resources: 60
dimensions: { enabled: true }
metrics:
"Microsoft.Sql/servers/databases":
cpu_percent: []
dtu_consumption_percent: []
dtu_used: []
dtu_limit: []
physical_data_read_percent: []
log_write_percent: []
storage: []
storage_percent: []
connection_successful: []
connection_failed: []
blocked_by_firewall: []
deadlock: []
sessions_percent: []
workers_percent: []
availability: []
replication_lag_seconds: []
xtp_storage_percent: []
"Microsoft.Sql/servers/elasticPools":
cpu_percent: []
dtu_consumption_percent: []
eDTU_limit: []
eDTU_used: []
physical_data_read_percent: []
log_write_percent: []
storage_used: []
storage_limit: []
storage_percent: []
allocated_data_storage: []
allocated_data_storage_percent: []
sessions_percent: []
sessions_count: []
workers_percent: []
xtp_storage_percent: []
processors:
resource:
attributes:
- { key: cloud.provider, value: azure, action: insert }
- { key: cloud.platform, value: azure_sql_database, action: insert }
- { key: cloud.account.id, value: "${env:AZURE_SUBSCRIPTION_ID}", action: insert }
- { key: cloud.region, value: "${env:AZURE_REGION}", action: insert }
- { key: cloud.resource_id, value: "${env:AZURE_SQL_DATABASE_RESOURCE_ID}", action: insert }
- { key: service.name, value: "${env:SERVICE_NAME}", action: insert }
service:
extensions: [azure_auth] # plus your existing extensions (oauth2client, etc.)
pipelines:
metrics:
receivers: [azure_monitor]
processors: [resource, batch] # plus your existing processors
exporters: [otlphttp/b14] # the Scout exporter from the shared setup
Once metrics: is set for a namespace, the receiver only emits the metrics
you list — there is no implicit "default + my picks" merge. Extend the list
to add more (e.g., tempdb_data_size: [] on Premium tier). The empty
aggregation list [] per metric collects all aggregations Azure publishes
for that metric.
Environment variables
# From `az ad sp create-for-rbac` output.
AZURE_TENANT_ID=
AZURE_CLIENT_ID=
AZURE_CLIENT_SECRET=
# From your Azure subscription / resource group.
AZURE_SUBSCRIPTION_ID=
AZURE_RESOURCE_GROUP=
AZURE_REGION=
AZURE_SQL_DATABASE_RESOURCE_ID= # az sql db show -g <rg> -s <server> -n <db> --query id -o tsv
# Resource attribute defaults.
SERVICE_NAME=azure-sql-database
AZURE_SQL_DATABASE_RESOURCE_ID is the full ARM resource ID of the
database whose metrics you want to label most explicitly with
cloud.resource_id. The azure_monitor receiver discovers every database
in the configured resource group; this attribute is for the dashboard /
filter convention, not for scoping the scrape.
Key alerts to configure
Threshold guidance for the most operationally useful series. Tune to your workload; these are starting points for a Basic/Standard tier with real traffic.
| Metric (OTel name) | Warning | Critical | Why it matters |
|---|---|---|---|
azure_dtu_consumption_percent_average | > 75% / 5m | > 90% / 5m | Sustained DTU saturation throttles queries; precedes connection failures. |
azure_cpu_percent_average | > 80% / 5m | > 95% / 5m | CPU-bound workload; correlate with query plans before scaling tier. |
azure_storage_percent_average | > 75% | > 90% | Approaching maxSizeBytes; out-of-space halts writes on Basic / Standard. |
azure_sessions_percent_average | > 70% | > 90% | Connection-pool exhaustion; tier ceiling differs per SKU. |
azure_workers_percent_average | > 70% | > 90% | Concurrent-request ceiling; tier-specific. |
azure_log_write_percent_average | > 80% / 5m | > 95% / 5m | Write-throughput saturation; impacts replication lag. |
azure_connection_failed_total | > 0 / 5m | sustained > 0 / 15m | Auth, firewall, or TLS issues; investigate immediately when sustained. |
azure_blocked_by_firewall_total | > 0 / 5m | > 10 / 5m | Firewall blocking traffic; usually a misconfiguration. |
azure_deadlock_total | > 0 | > 5 / 5m | Application-side concurrency bug; any non-zero deserves investigation. |
azure_replication_lag_seconds_average | > 5s | > 30s | Geo-replication / read-scale-out drift; only relevant on Premium / BC / Hyperscale. |
azure_availability_average | < 100% / 1h | < 99.9% / 1h | SLA-compliant availability; PT1H grain. |
For elastic pools, mirror azure_dtu_consumption_percent_*,
azure_storage_percent_*, azure_sessions_percent_*, and
azure_workers_percent_* against the pool resource using the same
thresholds.
Operations
- Collection interval. 60 seconds is the sweet spot — Azure Monitor's ingestion lag is 1-3 minutes, so faster polls just re-read stale data.
cache_resources. This is the receiver's resource-list cache TTL in seconds (default 24h). The shipped config sets it to60so newly- created databases are visible to the receiver on the next poll — appropriate for a validation pass or for environments where databases come and go frequently. In a stable production fleet, raise it back toward the default (e.g.,3600or higher) to skip the per-minute ARM resource-list call.- RBAC propagation. The legacy ARM
/metricsendpoint propagatesMonitoring Readerimmediately. The newer data-plane batch API at*.metrics.monitor.azure.comrequires separate RBAC propagation that can lag 5-30 minutes after grant. - Switching to
use_batch_api: trueraises Azure Monitor's per-tenant query rate ceiling from 12,000 to 360,000 calls/hour. Worth it once you're scraping more than a handful of databases. Wait for data-plane RBAC to settle before enabling. - System
masterdatabase. The receiver auto-discovers the systemmasterdatabase alongside your application databases and emits the same database-scope series for both.masteris mostly noise — filter bycloud.resource_idin Scout if you want to ignore it. - Tier-gated metrics. A few names in the whitelist only emit on
specific tiers:
replication_lag_secondsneeds geo-replication or active geo-replication;xtp_storage_percentneeds Premium or Business Critical (in-memory OLTP isn't available below Premium);app_cpu_billed/app_cpu_percentonly emit on Serverless databases. The receiver polls these names regardless and silently returns no series on tiers that don't support the underlying feature, so the same config works across DTU, vCore, Serverless, and Hyperscale fleets. Addtempdb_data_size,tempdb_log_size,tempdb_log_used_percent,sql_instance_cpu_percent, andsql_instance_memory_percentto the whitelist if you need theInstanceAndAppAdvancedcategory on Premium or Hyperscale.
Troubleshooting
AuthorizationFailed from the receiver
The role assignment hasn't propagated. Wait 60 seconds after creating it; on
the legacy ARM endpoint propagation is usually immediate. If you've enabled
use_batch_api: true, allow up to 30 minutes for data-plane propagation —
or temporarily flip back to false to confirm the role itself is correct.
403 Forbidden from the receiver
The service principal client_secret has expired. Rotate with
az ad sp credential reset --id $AZURE_CLIENT_ID --years 1 and update your
collector's AZURE_CLIENT_SECRET env var.
No metrics in the first 3 minutes
Azure Monitor has a 1-3 minute ingestion lag. azure_storage and
azure_dtu_limit emit on every database from the first poll. DTU,
connection, and lock metrics only show non-zero values after real workload
on the database — control-plane calls (az sql db show) don't drive
them. If the database is idle, that's expected.
connection_failed / blocked_by_firewall / deadlock are absent
Expected. Azure Monitor does not emit zero data points for these count-style metrics — they appear in your collector output only when real occurrences have happened in the polling window. Missing data implies zero, not a collection failure.
RequestThrottled warnings from the receiver
Azure Monitor's per-tenant query rate limit (12,000/hour on the legacy
endpoint, 360,000/hour on the batch API). Either lower polling rate
(collection_interval: 120s), narrow the scope (resource_groups: filter),
or enable use_batch_api: true once data-plane RBAC has settled.
Collector container can't resolve login.microsoftonline.com
Docker Desktop networking glitch — the container's DNS resolver becomes
unreachable. docker compose down && docker compose up -d typically fixes
it. If persistent, restart Docker Desktop.
Scout OAuth2 returns 401
Verify the SCOUT_CLIENT_ID, SCOUT_CLIENT_SECRET, and SCOUT_TOKEN_URL
your collector is using match the values in your Scout console. The
endpoint_params.audience MUST be b14collector — that's what the Scout
token endpoint expects.
Frequently Asked Questions
How do I monitor Azure SQL Database with OpenTelemetry?
Run the OpenTelemetry Collector with the azure_monitor receiver targeting
Microsoft.Sql/servers/databases (and optionally
Microsoft.Sql/servers/elasticPools for pool-level metrics). The receiver
polls Azure Monitor's REST API every 60 seconds, transforms metrics from
Azure's lowercase names (like dtu_consumption_percent) to OTel-style
names (azure_dtu_consumption_percent_average), and ships them via
OTLP/HTTP to base14 Scout. Authentication uses the azure_auth extension
in service-principal or managed-identity mode.
Should I use this guide or the self-hosted SQL Server guide?
Use this guide for Azure SQL Database (the managed PaaS). Use the
self-hosted SQL Server guide if you run SQL
Server yourself on a VM, on-premises, or in a container — that path uses
sqlserverreceiver to scrape DMVs directly instead of polling Azure
Monitor. The two are complementary, not redundant: azure_monitor reports
Azure's external view (DTU billing, blocked-by-firewall, geo-replication
lag, storage-vs-cap), while sqlserverreceiver reports SQL Server
internals (wait stats, buffer pool, query store). Production deployments
commonly run both with distinct service.name values to keep the two
views separate in dashboards.
Why does connection_failed return no data points?
Azure Monitor does not emit zero-valued data points for
connection_failed. The metric only surfaces in your collector logs when
there are real connection failures (auth errors, firewall blocks, TLS
handshake failures). This is consistent with how Azure handles several
count-style metrics — missing data on a count metric implies zero
occurrences, not a collection failure.
Why does the receiver emit both _count and _total suffixes for connection_successful?
Azure Monitor publishes connection_successful with two supported
aggregations: Total (sum) and Count. The receiver emits one OTel
metric per published aggregation, producing
azure_connection_successful_total and azure_connection_successful_count
for the same source metric. Same applies to connection_failed,
blocked_by_firewall, and deadlock. Pick whichever aggregation you
prefer for dashboards; they carry the same information at the per-minute
grain.
Which metrics need higher tiers to emit non-zero values?
replication_lag_seconds requires geo-replication or active geo-replication
to be configured (Premium / Business Critical / Hyperscale, or any tier
with a configured geo-secondary). xtp_storage_percent requires Premium or
Business Critical (in-memory OLTP is not available below Premium).
app_cpu_billed and app_cpu_percent only emit on Serverless databases.
The receiver always polls these names; they simply return no series on
tiers that don't support the underlying feature.
Can I monitor elastic pools alongside individual databases?
Yes — the shipped config covers both Microsoft.Sql/servers/databases and
Microsoft.Sql/servers/elasticPools. The receiver returns no series for
the elastic-pool namespace if the target server has no pools, so the same
config is safe to run against servers that don't use pools.
How does this differ from Application Insights for Azure SQL Database?
Application Insights for Azure SQL is Azure-tenant-bound, billed per-GB ingested, and visualised in Azure dashboards or workbooks. The OpenTelemetry Collector is vendor-neutral — the same image ships to base14 Scout or any OTLP-compatible backend without redeployment. The metric coverage is identical — both surfaces draw from the same Azure Monitor REST API.
Related Guides
- Self-hosted SQL Server — paired guide for SQL
Server you run yourself (VM, on-prem, container). Uses
sqlserverreceiverto scrape DMVs directly. - Azure Cosmos DB — sister guide; same
azure_monitorpattern, NoSQL surface. - Azure Kubernetes Service — sister guide; same receiver pattern but adds in-cluster collectors for kubeletstats + cluster-state.