Skip to main content

One post tagged with "pgx"

View All Tags

pgX: Comprehensive PostgreSQL Monitoring at Scale

· 10 min read
Engineering Team at base14

Watch: Tracing a slow query from application latency to PostgreSQL stats with pgX.

For many teams, PostgreSQL monitoring begins and often ends with pg_stat_statements. That choice is understandable. It provides normalized query statistics, execution counts, timing data, and enough signal to identify slow queries and obvious inefficiencies. For a long time, that is sufficient.

But as PostgreSQL clusters grow in size and importance, the questions engineers need to answer change. Instead of "Which query is slow?", the questions become harder and more operational:

  • Why is replication lagging right now?
  • Which application is exhausting the connection pool?
  • What is blocking this transaction?
  • Is autovacuum keeping up with write volume?
  • Did performance degrade because of query shape, data growth, or resource pressure?

These are not questions pg_stat_statements is designed to answer.

Most teams eventually respond by stitching together ad-hoc queries against pg_stat_activity, pg_locks, pg_stat_replication, pg_stat_user_tables, and related system views. This works until an incident demands answers in minutes, not hours.

This post lays out what comprehensive PostgreSQL monitoring actually looks like at scale: the nine observability domains that matter, the kinds of metrics each domain requires, and why moving beyond query-only monitoring is unavoidable for serious production systems.