Skip to main content

Azure Firewall Monitoring with OpenTelemetry

Overview

This guide is the execution playbook for Azure Firewall (Standard SKU). For the cross-surface architecture (auth, push vs pull, latency, the trace gap), read Azure Monitoring with OpenTelemetry - Architecture for base14 Scout first.

The collector polls Azure Monitor's REST API for Microsoft.Network/azureFirewalls every 60 seconds, emits OTel metric series, and exports via OTLP/HTTP. The receiver does not touch the firewall data plane.

Azure Firewall is a managed L4-7 stateful firewall, distinct from Azure Web Application Firewall (which runs on Application Gateway and Front Door). This guide covers the stateful firewall surface: network-rule hits, application-rule hits, throughput, SNAT-port utilization, and threat-intel signals. For WAF metrics, see the Application Gateway and Front Door guides.

SKU choice

SKUMetric coverageWhen to choose
BasicReduced subset (no SNAT, no IDPS)SMB-tier deployments below ~250 Mbps; not recommended for production fleets.
StandardFull 8-metric whitelist in this guideProduction default; covers rule hits, throughput, SNAT, health.
PremiumStandard set + SignatureLookupHits (IDPS)When intrusion detection or TLS inspection is required.

The receiver shape is identical across all three SKUs; the metric whitelist is the only thing that changes.

Receiver configuration

Add this fragment to your existing collector config. It contributes the azure_auth extension, an azure_monitor receiver scoped to the firewall namespace, a resource processor, a transform processor for the rule-hit dimensions, and a metrics pipeline. Component keys are suffixed /firewall so the fragment composes cleanly with other Azure-surface receivers in the same collector.

The transform/firewall_dim_lowercase processor below is a workaround for receiver bug #45942 — the receiver currently emits the rule-hit metrics' dimensions in both metadata_Status and metadata_status forms, doubling cardinality. The transform lowercases and deduplicates them. See Bug #45942 for the full diagnosis; drop the processor block if doubled cardinality is acceptable.

otel-collector.yaml (Firewall addition)
extensions:
azure_auth:
# Pick one of: service_principal, managed_identity, workload_identity.
# See the Authentication section below for the right choice per
# collector deployment surface.
service_principal:
tenant_id: ${env:AZURE_TENANT_ID}
client_id: ${env:AZURE_CLIENT_ID}
client_secret: ${env:AZURE_CLIENT_SECRET}

receivers:
azure_monitor/firewall:
subscription_ids:
- ${env:AZURE_SUBSCRIPTION_ID}
resource_groups:
- ${env:FIREWALL_RESOURCE_GROUP}
# Multi-resource-group scoping. Omit resource_groups entirely to
# scrape every resource group in the listed subscriptions.
services:
- Microsoft.Network/azureFirewalls
auth:
authenticator: azure_auth
collection_interval: 60s
initial_delay: 1s
# Data-plane batch API. Lifts the per-subscription rate ceiling
# from 12k to 360k calls/hour and is the recommended default. Flip
# to false only as a temporary fallback while data-plane RBAC
# propagates after a fresh Monitoring Reader grant (5-30 min lag).
use_batch_api: true
cache_resources: 86400
dimensions:
enabled: true
metrics:
"Microsoft.Network/azureFirewalls":
ApplicationRuleHit: [Total]
NetworkRuleHit: [Total]
DataProcessed: [Total]
SNATPortUtilization: [Average, Maximum]
Throughput: [Average]
FirewallHealth: [Average]
ObservedCapacity: [Average, Maximum]
# FirewallLatencyPng is in Preview. Surface it if you want
# firewall-traversal latency, otherwise drop the line.
FirewallLatencyPng: [Average]

processors:
resource/firewall:
attributes:
- {key: cloud.provider, value: azure, action: insert}
- {key: cloud.platform, value: azure_firewall, action: insert}
- {key: cloud.account.id, value: "${env:AZURE_SUBSCRIPTION_ID}", action: insert}
- {key: cloud.region, value: "${env:FIREWALL_REGION}", action: insert}
# cloud.resource_id pins all metrics to one firewall. Drop this
# line for multi-firewall fleets; the receiver injects
# azuremonitor.resource_id per-resource automatically.
- {key: cloud.resource_id, value: "${env:FIREWALL_RESOURCE_ID}", action: insert}
- {key: service.name, value: "${env:FIREWALL_SERVICE_NAME}", action: insert}

# Workaround for receiver bug #45942 (case-mismatch on metadata_*
# dimensions, observed on v0.151.0). Lowercases the PascalCase
# variants to deduplicate. The `set(...) where ... == nil` guard
# prevents overwriting any legitimate lowercase value that the
# receiver already emitted on the same data point. See Cardinality
# control below.
transform/firewall_dim_lowercase:
metric_statements:
- context: datapoint
statements:
- set(attributes["metadata_status"], attributes["metadata_Status"]) where attributes["metadata_Status"] != nil and attributes["metadata_status"] == nil
- delete_key(attributes, "metadata_Status") where attributes["metadata_Status"] != nil
- set(attributes["metadata_reason"], attributes["metadata_Reason"]) where attributes["metadata_Reason"] != nil and attributes["metadata_reason"] == nil
- delete_key(attributes, "metadata_Reason") where attributes["metadata_Reason"] != nil
- set(attributes["metadata_protocol"], attributes["metadata_Protocol"]) where attributes["metadata_Protocol"] != nil and attributes["metadata_protocol"] == nil
- delete_key(attributes, "metadata_Protocol") where attributes["metadata_Protocol"] != nil

service:
extensions: [azure_auth] # keep your existing extensions alongside
pipelines:
metrics/firewall:
receivers: [azure_monitor/firewall]
processors: [memory_limiter, resource/firewall, transform/firewall_dim_lowercase, batch]
exporters: [otlphttp/b14]

The receiver, resource processor, transform processor, and pipeline are all keyed /firewall so they coexist with other Azure receivers in a single collector. Your Scout exporter (oauth2client + otlphttp/b14) stays unchanged; one Scout pipeline serves every Azure surface.

For multi-subscription scoping, add entries to subscription_ids:. The alternative discover_subscriptions: true scrapes every subscription the identity has Monitoring Reader on; prefer the explicit list in production, since discovery silently includes sandbox and dormant subscriptions.

Authentication and RBAC

Pick the azure_auth mode for where the collector runs:

  • AKS podworkload_identity (federated credential, no secret).
  • Container Apps / VMSS / Azure VMmanaged_identity (user-assigned survives instance replacement; system-assigned dies with the instance).
  • External or on-premservice_principal.
  • Local dev onlyuse_default: true (Azure SDK credential chain).

Grant Monitoring Reader at the resource group containing your firewalls. For mode-by-mode YAML, federation-credential setup, and the az role assignment create snippet, see Azure Service Bus § Authentication — the configuration is identical except for the receiver's services: line and the resource processor's cloud.platform value.

This guide defaults use_batch_api: true for the 360k-calls/hour ceiling. Data-plane RBAC lags 5-30 minutes after a fresh Monitoring Reader grant; if the receiver returns 401s in that window, temporarily flip to false (legacy ARM /metrics, immediate propagation) and revert once the data-plane RBAC settles.

If you run a service principal (collector outside Azure), rotate the client secret before its expiry; procedure mirrors other azure-monitor surfaces — see Service Bus § Service principal credential lifecycle.

What you'll monitor

Azure Firewall publishes 8 metrics on the Microsoft.Network/azureFirewalls namespace, all at PT1M time grain. The receiver renames Azure's PascalCase names (e.g. NetworkRuleHit) to OTel-style azure_<lowercased>_<aggregation> (e.g. azure_networkrulehit_total).

Azure REST nameOTel emittedUnitWhat it tells you
NetworkRuleHitazure_networkrulehit_totalCountHits on network-rule collections (5-tuple TCP / UDP / ICMP filtering). Splits by Status (Allow / Deny / DNAT) and Reason. Primary L4 traffic-shape metric.
ApplicationRuleHitazure_applicationrulehit_totalCountHits on application-rule collections (FQDN-based filtering). Splits by Status, Reason, and Protocol. Only emits when application rules exist in the policy and traffic matches them.
DataProcessedazure_dataprocessed_totalBytesTotal bytes processed by the firewall per minute. Primary data-volume metric; pairs with the data-processing fee ($0.016/GB) for cost forecasting.
Throughputazure_throughput_averagebpsThroughput in bits per second. Use for capacity-planning and alerting against the per-firewall ceiling — see Azure Firewall performance for current per-tier limits.
FirewallHealthazure_firewallhealth_average%Overall firewall health gauge. Below 100% indicates Azure-side degradation; cross-check Service Health for the region.
SNATPortUtilizationazure_snatportutilization_average (and _maximum)%Percentage of allocated SNAT ports currently in use. Above 80% indicates approaching SNAT exhaustion on outbound traffic. Splits by Protocol.
ObservedCapacityazure_observedcapacity_average (and _maximum)CountReported capacity-unit usage. Tracks horizontal scale of the firewall instance; per-CU throughput and connection limits are documented in the Azure Firewall performance reference.
FirewallLatencyPngazure_firewalllatencypng_averagems(Preview) Estimated firewall-traversal latency from internal latency probes. Preview metrics may change shape or disappear between receiver versions; gate alerting accordingly and revalidate on each upgrade.

Three metadata_* dimensions split the rule-hit and health metrics:

  • metadata_Status (NetworkRuleHit, ApplicationRuleHit, FirewallHealth) — Allow, Deny, DNAT. The Deny slice on NetworkRuleHit is the primary security-incident signal.
  • metadata_Reason (NetworkRuleHit, ApplicationRuleHit, FirewallHealth) — short reason code per rule firing (e.g. RuleNotMatched, Allowed, RuleMatched).
  • metadata_Protocol (ApplicationRuleHit, SNATPortUtilization) — TCP, UDP, ICMP, Any.

Receiver bug #45942 emits these dimensions in both PascalCase and lowercase forms on the same metric; the transform processor in the receiver config above normalises to lowercase. See Bug #45942.

Silent-when-quiet caveat. Azure Monitor returns data points for NetworkRuleHit, ApplicationRuleHit, and DataProcessed only when matching activity occurs. A firewall with no traffic emits zero series for those three. Wire alerts to fire on series presence in window (any non-zero point) rather than threshold crossings, since absence is the steady state for under-utilised firewalls.

FirewallHealth, Throughput, SNATPortUtilization, and ObservedCapacity flow continuously every minute regardless of traffic.

Scale and rate limits

The receiver fans out per-resource queries to Azure Monitor's REST API. A single firewall with the full 8-metric whitelist costs roughly 60 calls per hour at 60s collection_interval.

Azure Monitor enforces two ceilings:

EndpointRate limitWhen it applies
Data-plane batch (use_batch_api: true)360,000 calls / hour / subscriptionDefault in this guide. RBAC lags 5-30 min after the Monitoring Reader grant.
Legacy Azure Resource Manager /metrics (use_batch_api: false)12,000 calls / hour / subscriptionTemporary fallback if the data plane is still 401-ing after RBAC propagation should have completed. Immediate RBAC propagation.

A 50-firewall fleet polling at 60s costs ~3,000 calls/hour against the 360k ceiling — under 1% utilization, leaving room for sibling surfaces on the same collector. Even small fleets benefit from use_batch_api: true.

Cardinality control

The fan-out per firewall is moderate at baseline (~7 series for single-rule traffic, ~25 series for a multi-rule policy with non-trivial traffic). The dimension shape, however, has a significant gotcha:

Bug 45942 case-mismatched dimension keys

Receiver bug #45942 manifests on Microsoft.Network/azureFirewalls. The same logical dimension value appears under both PascalCase and lowercase keys on the same metric. For example, azure_networkrulehit_total for a single Allow rule firing emits data points with metadata_Status = "Allow" and separate data points with metadata_status = "Allow". Aggregating across the case-mismatched values double-counts.

Validation 2026-05-06 confirmed the bug applies to:

  • azure_networkrulehit_total (Status, Reason)
  • azure_applicationrulehit_total (Status, Reason, Protocol; less fully validated since application-rule hits are policy-shape dependent)
  • azure_firewallhealth_average (Status, Reason)
  • azure_snatportutilization_* (Protocol)

Three remediations, in order of operational ease:

  1. Apply a transform processor in the collector to lowercase the dimension keys before they ride downstream. The receiver configuration above includes this workaround.
  2. Normalise on the Scout side in dashboard / alert queries by coalescing the two casing variants. Useful as a stop-gap while the transform processor is being rolled out.
  3. Drop the affected dimensions via dimensions.overrides if per-Status / per-Reason granularity is not actionable for your alerting. Reduces fan-out at the cost of incident-investigation detail.

Track the issue for upstream resolution; v0.151.0 (Apr 2026) has the bug, future releases may not — re-validate on each receiver upgrade.

Standard cardinality levers

The override config uses the bare Azure dimension name (e.g. Status, not metadata_Status); the receiver adds the metadata_ prefix when it emits. Overrides apply at the receiver, before the transform/firewall_dim_lowercase processor runs — so the override key matches Azure's PascalCase regardless of what the transform emits downstream.

For single-firewall fleets:

azure_monitor/firewall:
dimensions:
enabled: true
overrides:
"Microsoft.Network/azureFirewalls":
NetworkRuleHit:
- Status # keep
# drop Reason if per-rule-firing-reason granularity is not actionable
FirewallHealth: [] # drop all dimensions; metric is per-firewall and needs no splits
SNATPortUtilization:
- Protocol # keep; protocol-split helps SNAT-exhaustion triage

Watch the otelcol_processor_batch_metadata_cardinality self-metric on the collector's port-8888 Prometheus endpoint to see actual cardinality after overrides apply.

Alert tuning

Threshold guidance for the high-signal series. Numbers are starting points; derive your own from observed 99th-percentile baselines over a representative week.

MetricWarningCriticalWhy it matters
azure_firewallhealth_average< 100% over 5m< 99% over 15mAzure-side firewall degradation. Cross-check Azure Service Health for the region.
azure_snatportutilization_average> 80% over 5m> 95% over 5mSNAT-port exhaustion is imminent. Pool is shared across every workload behind the firewall (unlike Standard LB, where SNAT is per-backend), so a single noisy VM can exhaust the whole firewall — investigate the top-talker in firewall logs before scaling. Add public IPs (each adds ~2496 ports) once the offender is identified.
azure_networkrulehit_total filtered to metadata_status="Deny"sustained presence over 15mspike > 10x baselineDeny rules firing at unusual rate. Either you have a misconfigured client or an active probe / scan. Cross-check the Application / Network rule logs.
azure_observedcapacity_maximumsustained > 8 capacity unitssustained > 12 capacity units, or > 80% of your SKU's documented ceilingFirewall is auto-scaling toward the per-instance limit. ObservedCapacity saturates before Throughput does, since the per-instance bandwidth ceiling moves with auto-scale state. Plan multi-firewall topology before this alert fires sustained. Verify your SKU's capacity-unit ceiling on the Azure Firewall performance reference.
azure_throughput_average> 70% of capacity-unit headroom> 90% of capacity-unit headroomApproaching the firewall's bandwidth headroom (each capacity unit ≈ 250 Mbps). Use alongside ObservedCapacity rather than alone — the throughput ceiling moves with auto-scale state.
azure_applicationrulehit_total filtered to metadata_status="Deny"sustained presence over 15mspike > 10x baselineFQDN-rule denials. Indicates either policy misalignment with application traffic or active threat-intel-driven blocks (see Threat Intel section).

For Deny-filtered alerts, fire on series presence in window rather than numeric thresholds — see the silent-when-quiet caveat above.

Threat-intel mode

Azure Firewall has three threat-intel modes that affect how ApplicationRuleHit and NetworkRuleHit slices appear:

  • Off — threat-intel signals do not generate rule hits.
  • Alert (default) — threat-intel matches surface as metadata_Reason="ThreatIntelAlert" data points without blocking traffic. Some Azure Monitor slices also tag these with metadata_Status="Deny" even though the packet was forwarded unchanged. A Deny-rate alert in Alert mode therefore reads as "potentially-malicious traffic observed but allowed", not "traffic blocked." On-call runbooks must distinguish between the two — the metric on its own does not.
  • Alert and deny — threat-intel matches block traffic and the rule-hit metric records them as Status=Deny, Reason=ThreatIntelDeny. Same metric shape, different operational meaning.

Document your firewall's threat-intel mode in runbooks; what reads as "the firewall is blocking attacks" in "Alert and deny" mode reads as "the firewall is observing potential threats" in "Alert" mode, and the same metadata_status="Deny" alert fires in both.

Logs

Azure Firewall logs are not optional for any non-trivial deployment. The metric set in this guide gives you rate, errors, duration. The four log categories give you the per-flow and per-rule-firing detail required for incident investigation.

FW_RES_ID=$(az network firewall show -n <fw> -g <rg> --query id -o tsv)
az monitor diagnostic-settings create \
--resource "$FW_RES_ID" \
--name "fw-to-eventhubs" \
--logs '[
{"category":"AzureFirewallApplicationRule","enabled":true},
{"category":"AzureFirewallNetworkRule","enabled":true},
{"category":"AzureFirewallThreatIntelLog","enabled":true},
{"category":"AzureFirewallDnsProxy","enabled":true}
]' \
--event-hub-rule <eh-namespace-rule-id>

Architecture for the Diagnostic Settings → Event Hubs → azure_event_hub path is in the overview. Pair the log stream with the metric stream to correlate alert firings with the specific source IPs, destination FQDNs, and rule names involved.

AzureFirewallThreatIntelLog records every threat-intel match with source IP, destination, and matched signature; pair it with the metadata_status="Deny" metric alerts above for security-team workflows. AzureFirewallDnsProxy captures the firewall's DNS-proxy decisions — the source for DNS-based exfiltration detection.

Premium SKU additions

Premium SKU adds Intrusion Detection and Prevention System (IDPS) support on top of Standard, surfaced in two distinct streams: extra metrics on the same azure_monitor receiver, and extra log categories on the Diagnostic Settings stream described in Logs.

Premium metrics

Extend the whitelist on the existing receiver — shape is identical to the Standard-tier metrics:

metrics:
"Microsoft.Network/azureFirewalls":
# ...the eight Standard-tier metrics above...
SignatureLookupHits: [Total] # IDPS signature match rate

SignatureLookupHits is the only Premium-exclusive metric on the namespace. If you are running Standard, omit it; if you are running Premium, alert on sustained presence as a per-firewall security signal.

Premium log categories

Premium adds two log categories. They are not metrics — they ride the same Diagnostic Settings → Event Hubs → azure_event_hub path as the four Standard categories:

# Append to the --logs JSON in the Diagnostic Settings command above.
{"category":"AzureFirewallApplicationRuleAggregation","enabled":true},
{"category":"AzureFirewallIDPSSignatureMatch","enabled":true}

AzureFirewallIDPSSignatureMatch records each IDPS hit with the signature that triggered — pair it with the SignatureLookupHits metric to follow an alert back to the specific signatures. AzureFirewallApplicationRuleAggregation is a pre-aggregated form of AzureFirewallApplicationRule that lowers log volume when application-rule traffic is dense.

Apps-side instrumentation

This guide is metrics-only. Standard SKU is L4-7 transparent (the client and server applications do not see the firewall as a hop), so there is no apps-side trace integration on Standard.

Premium SKU caveat. Premium TLS inspection decrypts and re-encrypts traffic on the firewall, so it is not transparent at L7 — traceparent headers and other request-context attributes may not survive the round-trip. Validate trace continuity end-to-end before relying on cross-firewall span propagation under Premium.

The only "firewall in the trace" signal you can get from instrumentation is end-to-end client latency that includes the firewall hop; the metric azure_firewalllatencypng_average is the firewall's own estimate of that hop's latency.

Troubleshooting

AuthorizationFailed from the receiver

Data-plane batch API (use_batch_api: true, the default) propagates Monitoring Reader 5-30 minutes after grant; legacy ARM /metrics (use_batch_api: false) propagates immediately. If you've just granted the role and the receiver is 401-ing, temporarily flip to false to confirm the role itself is correct, then revert.

403 Forbidden from the receiver

If using a service principal: the client_secret has expired. See Service Bus § Service principal credential lifecycle. If using managed identity: check that the firewall is in a subscription / resource group where the managed identity has Monitoring Reader.

Metrics never appear after a fresh firewall provision

Two distinct delays compound:

  1. Firewall control-plane provisioning takes 20-30 minutes for Standard SKU on a fresh deployment. Until the firewall reaches provisioningState=Succeeded, metrics do not flow regardless of the collector configuration. Verify with az network firewall show -n <fw> -g <rg> --query provisioningState -o tsv.
  2. The receiver caches metric definitions for the cache_resources interval (default 86400s / 24h). On the first poll after a fresh firewall is created, Azure Monitor's metric-definition catalogue may not yet have populated. Restart the collector after the firewall reaches Succeeded to reset the discovery cache. The receiver log line metrics_definitions_count: 0 confirms the diagnosis; recovery is verified when the next poll cycle logs metrics_definitions_count: <N> with N > 0.

This is the same first-poll race documented for Load Balancer and Storage; the long firewall provisioning latency makes it more conspicuous.

ApplicationRuleHit always zero

The firewall policy contains only network-rule collections. Network rules and application rules are separate concepts; ApplicationRuleHit only emits data points when traffic matches an application-rule collection (FQDN-based filtering). Add an application-rule collection to the policy, generate matching traffic, and the metric will populate. If your deployment intends to be network-rules-only, drop ApplicationRuleHit from the whitelist to avoid alerting confusion.

metadata_* dimensions appear with mixed casing

Bug #45942. See Cardinality control; apply the transform processor in the receiver configuration to normalise.

RequestThrottled warnings from the receiver

You have hit Azure Monitor's per-subscription rate ceiling. Either:

  • Lower polling rate: collection_interval: 120s for the fast receiver.
  • Confirm use_batch_api: true is set (the guide default).
  • Split heavy subscriptions across multiple collector instances.

Cardinality blowup on Scout volume

The case-mismatch bug is the most common cause; apply the transform processor first. If still high, apply dimensions.overrides (see Cardinality control) or split the noisy firewall into a separate receiver instance.

Scout OAuth2 returns 401

Verify SCOUT_CLIENT_ID, SCOUT_CLIENT_SECRET, and SCOUT_TOKEN_URL match the values in your Scout console. The endpoint_params.audience must be b14collector.

Frequently Asked Questions

How do I add Azure Firewall metrics to my OTel collector?

Add the azure_auth extension and an azure_monitor receiver scoped to Microsoft.Network/azureFirewalls, then route the receiver into a metrics pipeline that exports to Scout via the oauth2client-authenticated OTLP/HTTP exporter. The receiver polls Azure Monitor's REST API every 60 seconds and emits one OTel metric per Azure aggregation. RBAC requirement is Monitoring Reader at resource-group scope. Standard SKU emits the full metric set; Basic SKU emits a reduced subset.

Why are my ApplicationRuleHit metrics not appearing?

ApplicationRuleHit only emits when traffic matches an application-rule collection (FQDN-based filtering) in your firewall policy. If your policy contains only network-rule collections (5-tuple TCP / UDP / ICMP filtering), only NetworkRuleHit will emit. To validate the metric flow, add an application-rule collection that matches the traffic generated by your test backend. Threat-intel mode also affects whether ApplicationRuleHit slices include the Threat-Intel-derived blocks; see the threat-intel section above.

Why are my rule-hit dimensions appearing twice with different casing?

This is a known receiver bug — opentelemetry-collector-contrib issue 45942 — that manifests on Microsoft.Network/azureFirewalls. The same logical dimension value (Status=Allow, for example) appears under both metadata_Status and metadata_status keys, doubling cardinality silently. Workaround: apply a transform processor in the collector to lowercase the dimension keys, or normalise downstream in Scout queries. The bug is namespace-specific to Azure Firewall and Storage; it does not always manifest, but on Firewall the rule-hit and FirewallHealth metrics consistently show the doubling.

How do I detect SNAT port exhaustion on the firewall?

Alert on azure_snatportutilization_average above 80% over 5 minutes; warn at 60%. SNAT exhaustion on Azure Firewall presents differently from Standard Load Balancer: outbound connections from any backend behind the firewall start timing out or returning EADDRNOTAVAIL even when the firewall itself is healthy. The fix is to add more public IP frontends to the firewall (each adds 2496 SNAT ports per public IP, with the firewall preempting them as needed).

Should I run Azure Firewall logs through this collector?

Yes, but via Diagnostic Settings → Event Hubs → azure_event_hub receiver, not via this metrics collector. The four log categories (AzureFirewallApplicationRule, AzureFirewallNetworkRule, AzureFirewallThreatIntelLog, AzureFirewallDnsProxy) are not optional for any non-trivial Firewall deployment; they are the primary investigation surface during incidents. Configure once per firewall, ingest into the same collector via a separate fragment under the long-lived shared scraper.

Reference

Was this page helpful?