Skip to main content

AWS RDS PostgreSQL Monitoring with OpenTelemetry - Metrics, Logs & Alerts

Overview

This guide covers monitoring AWS RDS PostgreSQL instances using OpenTelemetry and CloudWatch Metrics Stream. You'll collect infrastructure metrics from CloudWatch, database-specific metrics from the PostgreSQL receiver, and logs from CloudWatch Logs — all flowing into base14 Scout for unified visibility.

What You'll Monitor

RDS PostgreSQL monitoring combines two metric sources that together provide complete visibility:

CloudWatch Metrics Stream (infrastructure):

MetricWhat it tells you
CPUUtilizationInstance CPU usage (%)
FreeableMemoryAvailable RAM (bytes)
FreeStorageSpaceRemaining disk space (bytes)
ReadIOPS / WriteIOPSDisk read/write operations per second
ReadLatency / WriteLatencyAverage time per disk I/O operation
DatabaseConnectionsActive database connections
ReplicaLagReplication delay for read replicas (seconds)
DiskQueueDepthNumber of I/O requests waiting
NetworkReceiveThroughput / NetworkTransmitThroughputNetwork bytes in/out
SwapUsageSwap space used (bytes)
BurstBalanceRemaining I/O burst credits (gp2/gp3)

OTel PostgreSQL receiver (database internals):

MetricWhat it tells you
postgresql.backendsActive connections per database
postgresql.commits / postgresql.rollbacksTransaction rates
postgresql.database.locksActive locks by type
postgresql.deadlocksDeadlock count
postgresql.sequential_scans / postgresql.index.scansScan type distribution
postgresql.rowsRows affected by operations
postgresql.table.size / postgresql.index.sizeStorage per table/index
postgresql.table.vacuum.countVacuum frequency
postgresql.blks_hit / postgresql.blks_readBuffer cache hit ratio
postgresql.replication.data_delayReplication byte lag
postgresql.tup_inserted / postgresql.tup_updated / postgresql.tup_deletedTuple operations

Prerequisites

RequirementMinimumRecommended
RDS PostgreSQL1114+
OTel Collector Contrib0.90.0latest
base14 ScoutAny-
AWS permissionsCloudWatch, Kinesis Firehose, S3-

Before starting:

  • RDS instance must be accessible from the host running the OTel Collector (same VPC or VPC peering)
  • A monitoring user with pg_monitor role for the PostgreSQL receiver
  • CloudWatch Metrics Stream infrastructure set up (see Step 1)

Step 1: Set up CloudWatch Metrics Stream

Follow our comprehensive CloudWatch Metrics Stream guide to set up the streaming infrastructure (S3 bucket, Kinesis Firehose, Metrics Stream).

When configuring the Metrics Stream, select the AWS/RDS namespace instead of "All namespaces" to only collect RDS metrics and reduce costs.

Step 2: Create a monitoring user on RDS

Connect to your RDS PostgreSQL instance and create a dedicated monitoring user:

CREATE USER otel_monitor WITH PASSWORD '<your_password>';
GRANT pg_monitor TO otel_monitor;

The pg_monitor role provides read-only access to all statistics views needed for monitoring. No write permissions required.

For RDS instances, ensure the security group allows connections from the Collector host on port 5432.

Step 3: Configure the OTel Collector for PostgreSQL metrics

Create rds-postgres-config.yaml with both the PostgreSQL receiver and the CloudWatch metrics pipeline:

rds-postgres-config.yaml
receivers:
postgresql:
endpoint: ${env:RDS_ENDPOINT}
collection_interval: 10s
username: ${env:RDS_MONITOR_USER}
password: ${env:RDS_MONITOR_PASSWORD}
databases: ["${env:RDS_DATABASE}"]
tls:
insecure_skip_verify: true

metrics:
postgresql.database.locks:
enabled: true
postgresql.deadlocks:
enabled: true
postgresql.sequential_scans:
enabled: true
postgresql.index.scans:
enabled: true
postgresql.backends:
enabled: true
postgresql.commits:
enabled: true
postgresql.rollbacks:
enabled: true
postgresql.db_size:
enabled: true
postgresql.table.count:
enabled: true
postgresql.table.size:
enabled: true
postgresql.index.size:
enabled: true
postgresql.table.vacuum.count:
enabled: true
postgresql.rows:
enabled: true
postgresql.blks_hit:
enabled: true
postgresql.blks_read:
enabled: true
postgresql.tup_inserted:
enabled: true
postgresql.tup_updated:
enabled: true
postgresql.tup_deleted:
enabled: true
postgresql.tup_fetched:
enabled: true
postgresql.replication.data_delay:
enabled: true

processors:
resource:
attributes:
- key: environment
value: ${env:ENVIRONMENT}
action: upsert
- key: service.name
value: ${env:SERVICE_NAME}
action: upsert
- key: cloud.provider
value: aws
action: upsert

batch:
timeout: 10s
send_batch_size: 1024

exporters:
otlphttp/b14:
endpoint: ${env:OTEL_EXPORTER_OTLP_ENDPOINT}
tls:
insecure_skip_verify: true

service:
pipelines:
metrics:
receivers: [postgresql]
processors: [resource, batch]
exporters: [otlphttp/b14]

Environment variables

.env
RDS_ENDPOINT=your-rds-instance.xxxxx.us-east-1.rds.amazonaws.com:5432
RDS_MONITOR_USER=otel_monitor
RDS_MONITOR_PASSWORD=your_password
RDS_DATABASE=your_database
ENVIRONMENT=production
SERVICE_NAME=rds-postgres
OTEL_EXPORTER_OTLP_ENDPOINT=https://<your-tenant>.base14.io

Note: CloudWatch Metrics Stream delivers the infrastructure metrics (CPU, memory, IOPS) automatically. The PostgreSQL receiver above collects the database-internal metrics. Together they give you the full picture.

Step 4: Collect RDS PostgreSQL logs

RDS PostgreSQL publishes logs to CloudWatch Log Groups. Use the CloudWatch Logs receiver to forward them:

rds-postgres-logs-config.yaml
receivers:
awscloudwatchlogs/rds_postgres:
region: ${env:AWS_REGION}
logs:
poll_interval: 1m
groups:
named:
# Replace with your RDS log group name
/aws/rds/instance/${env:RDS_INSTANCE_ID}/postgresql:

processors:
attributes/add_source:
actions:
- key: source
value: "rds_postgres"
action: insert
- key: cloud.provider
value: "aws"
action: insert

batch:
send_batch_size: 10000
send_batch_max_size: 11000
timeout: 10s

exporters:
otlphttp/b14:
endpoint: ${env:OTEL_EXPORTER_OTLP_ENDPOINT}
tls:
insecure_skip_verify: true

service:
pipelines:
logs/rds:
receivers: [awscloudwatchlogs/rds_postgres]
processors: [attributes/add_source, batch]
exporters: [otlphttp/b14]

In the RDS console under Configuration > Log exports, enable:

  • PostgreSQL log — query errors, connection events, autovacuum
  • Upgrade log — major version upgrade details

For query-level logging, set these RDS parameter group values:

log_statement = 'ddl'
log_min_duration_statement = 1000 # Log queries over 1 second
log_connections = on
log_disconnections = on

Verify the setup

Start the Collector and check for metrics within 60 seconds:

# Test PostgreSQL connectivity from the Collector host
psql -h ${RDS_ENDPOINT%:*} -p 5432 -U otel_monitor \
-d ${RDS_DATABASE} -c "SELECT version();"
-- Verify monitoring permissions
SELECT * FROM pg_stat_database WHERE datname = 'your_database';
SELECT * FROM pg_stat_user_tables LIMIT 5;

Check Scout for both CloudWatch metrics (prefixed aws.rds.*) and PostgreSQL metrics (prefixed postgresql.*).

Key alerts to configure

Once metrics are flowing, set up alerts on these thresholds:

MetricWarningCriticalWhy
CPUUtilization> 70%> 85%Sustained high CPU degrades query performance
DatabaseConnections> 80% of max> 90% of maxConnection exhaustion causes application errors
FreeStorageSpace< 20%< 10%Running out of storage crashes the instance
ReplicaLag> 10s> 60sHigh lag means read replicas serve stale data
ReadLatency / WriteLatency> 10ms> 20msI/O latency spikes indicate storage bottlenecks
DiskQueueDepth> 10> 20Deep queue means I/O is saturated
postgresql.deadlocks> 0> 5/minDeadlocks indicate application-level locking issues
Buffer hit ratio< 95%< 90%Low hit ratio means too many disk reads

Buffer hit ratio: calculate as blks_hit / (blks_hit + blks_read) * 100.

Troubleshooting

PostgreSQL receiver shows no metrics

Cause: Collector can't reach the RDS instance.

Fix:

  1. Verify the RDS instance security group allows inbound on port 5432 from the Collector's IP or security group
  2. Confirm the RDS instance is not in a private subnet without a route to the Collector
  3. Test connectivity: psql -h <rds-endpoint> -U otel_monitor -d <db>
  4. Check the monitoring user has pg_monitor role: SELECT rolname FROM pg_roles WHERE pg_has_role('otel_monitor', oid, 'member');

CloudWatch metrics not appearing

Cause: Metrics Stream not configured for the AWS/RDS namespace.

Fix:

  1. In CloudWatch > Metrics > Streams, verify the stream is active
  2. Check that the namespace filter includes AWS/RDS
  3. Verify Kinesis Firehose delivery is succeeding (check the S3 error bucket)
  4. Allow 5-10 minutes for initial metrics to flow

Replication lag metrics showing zero

Cause: No read replicas configured, or the instance is a replica (not the primary).

Fix:

  1. ReplicaLag is only populated on read replica instances
  2. postgresql.replication.data_delay requires at least one replica connected to the primary
  3. On the primary, check: SELECT * FROM pg_stat_replication;

High connection count but low CPU

Cause: Idle connections consuming connection slots.

Fix:

  1. Check for idle connections: SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';
  2. Consider connection pooling (PgBouncer or RDS Proxy)
  3. Set idle_in_transaction_session_timeout in the parameter group

FAQ

How do I monitor RDS PostgreSQL query performance?

Enable pg_stat_statements for per-query statistics and use the PostgreSQL Advanced guide for detailed query-level monitoring.

What's the difference between CloudWatch and Enhanced Monitoring?

CloudWatch metrics are collected at 1-minute intervals and cover instance-level stats. Enhanced Monitoring provides OS-level metrics at up to 1-second granularity (per-process CPU, memory, file system). Enable Enhanced Monitoring when you need to diagnose issues that 1-minute intervals miss.

Can I monitor multiple RDS instances with one Collector?

Yes. Add multiple PostgreSQL receiver blocks with distinct names:

receivers:
postgresql/primary:
endpoint: primary.xxxxx.rds.amazonaws.com:5432
postgresql/replica:
endpoint: replica.xxxxx.rds.amazonaws.com:5432

Then include both in the pipeline: receivers: [postgresql/primary, postgresql/replica].

How do I filter which CloudWatch metrics are streamed?

When configuring the Metrics Stream, select specific namespaces and choose only AWS/RDS instead of all namespaces. This reduces costs and data volume.

Was this page helpful?