Query Optimization: How to Read EXPLAIN Plans and Fix Slow Queries

Why Queries Are Slow

A query is slow for exactly one of three reasons:

It reads more data than it needs to — full table scans instead of index lookups, fetching columns you do not use, no partition pruning
It does more work than necessary — inefficient join algorithms, re-computing the same subquery for every row, sorting a million rows when you only need the top 10
It waits for something — lock contention, I/O saturation, network latency to a remote node

The database's query planner decides how to execute your SQL — which indexes to use, which join algorithm to choose, in what order to process joins. EXPLAIN shows you exactly what the planner decided and how much it cost. Understanding that output is the skill that turns a 30-second query into a 50-millisecond one.

EXPLAIN vs EXPLAIN ANALYZE

PostgreSQL has two modes:

-- EXPLAIN: shows the plan the optimizer *would* use — does NOT execute the query
EXPLAIN SELECT * FROM orders WHERE customer_id = 42;

-- EXPLAIN ANALYZE: actually executes the query and shows real timing
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;

-- Full detail: buffers (cache hits/misses), verbose node output
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = 42;

Use EXPLAIN first on queries that are too slow to run (e.g. a query you suspect will do a 10-minute full table scan). Use EXPLAIN ANALYZE when you need real timing numbers to confirm the optimization worked.

Warning: EXPLAIN ANALYZE actually executes the query — including writes. Wrap destructive queries in a transaction and roll back:

BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'archived' WHERE created_at < '2020-01-01';
ROLLBACK;

Reading an EXPLAIN Plan: The Basics

EXPLAIN output is a tree. Each node is a plan node — a step in the execution. The tree is read bottom-up and inside-out: inner nodes feed their output into outer nodes. The root node produces the final result.

EXPLAIN SELECT o.order_id, c.name, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.amount > 500
ORDER BY o.amount DESC
LIMIT 10;

Limit  (cost=1823.45..1823.48 rows=10 width=52)
  ->  Sort  (cost=1823.45..1834.67 rows=4489 width=52)
        Sort Key: o.amount DESC
        ->  Hash Join  (cost=312.00..1601.23 rows=4489 width=52)
              Hash Cond: (o.customer_id = c.id)
              ->  Seq Scan on orders o  (cost=0.00..1201.00 rows=4489 width=36)
                    Filter: (amount > 500)
              ->  Hash  (cost=186.00..186.00 rows=10080 width=20)
                    ->  Seq Scan on customers c  (cost=0.00..186.00 rows=10080 width=20)

Reading the cost numbers: (cost=start..total rows=N width=W)

start cost: cost to return the first row (important for LIMIT queries)
total cost: estimated cost to return all rows (in arbitrary planner units)
rows: estimated number of rows this node outputs
width: estimated average row width in bytes

Costs are estimates, not real time. They are computed from table statistics. After EXPLAIN ANALYZE, you see both estimated and actual:

Seq Scan on orders o
  (cost=0.00..1201.00 rows=4489 width=36)
  (actual time=0.043..18.234 rows=4612 loops=1)
--                                    ↑ actual rows vs estimated 4489

When estimated rows differ significantly from actual rows, the planner made a bad plan decision. Run ANALYZE tablename to update statistics.

The Scan Nodes: How Data Is Read

The most important nodes to recognize are the scan types — they tell you how the database is reading data from a table.

Sequential Scan (Seq Scan) — Reads Everything

Seq Scan on orders  (cost=0.00..1201.00 rows=50000 width=36)
  Filter: (amount > 500)

Reads every page of the table from start to end, applies the filter to each row. O(n) in table size. Appropriate when:

The query returns a large fraction of the table (>5–10%)
The table is small (fits in memory easily)
No usable index exists

A Seq Scan on a large table with a low-selectivity filter is almost always the problem you are looking for.

Index Scan — Uses B-Tree, Fetches Rows

Index Scan using idx_orders_customer on orders
  (cost=0.43..8.45 rows=12 width=36)
  Index Cond: (customer_id = 42)

Uses the index to find matching entries, then fetches each row from the heap (table storage) by pointer. O(log n + k) where k is matching rows. Appropriate for high-selectivity queries returning a small fraction of the table.

The heap fetch cost: If k is large (thousands of rows scattered across many pages), random heap fetches can make Index Scan slower than Seq Scan. The planner knows this and may choose Seq Scan even when an index exists.

Index Only Scan — Index Contains Everything Needed

Index Only Scan using idx_covering on orders
  (cost=0.43..4.12 rows=12 width=16)
  Index Cond: (customer_id = 42)
  Heap Fetches: 0

All columns needed by the query are in the index. Zero heap fetches. The fastest scan type for covered queries. Heap Fetches: 0 confirms the index is covering this query.

If Heap Fetches is nonzero, the visibility map has not been updated — run VACUUM on the table.

Bitmap Index Scan — Batched Heap Access

Bitmap Heap Scan on orders  (cost=24.50..412.00 rows=450 width=36)
  Recheck Cond: (customer_id = 42)
  ->  Bitmap Index Scan on idx_orders_customer
        (cost=0.00..24.39 rows=450 width=0)
        Index Cond: (customer_id = 42)

Two steps: first builds a bitmap of matching page locations from the index, then fetches those pages in sequential order (better I/O locality than random Index Scan). Used when returning a moderate number of rows — too many for Index Scan, too few for Seq Scan. Automatically chosen by the planner.

The Join Nodes: How Tables Are Combined

Nested Loop Join

Nested Loop  (cost=0.43..156.89 rows=12 width=52)
  ->  Index Scan on orders (outer, 12 rows)
  ->  Index Scan on customers (inner, 1 row per outer row)

For each row in the outer table, scan the inner table for matches. O(outer × inner). Efficient when the outer table is small and the inner table has an index. Terrible when both tables are large.

Hash Join

Hash Join  (cost=312.00..1601.23 rows=4489 width=52)
  Hash Cond: (o.customer_id = c.id)
  ->  Seq Scan on orders (build side: larger table)
  ->  Hash  (cost=186.00..186.00 rows=10080 width=20)
        ->  Seq Scan on customers (probe side: smaller table)

Builds a hash table from the smaller table, then probes it for each row of the larger table. O(n + m). The go-to for large table joins with no usable index. Requires memory — work_mem controls how much. If the hash table spills to disk (look for "Batches > 1"), increase work_mem.

-- Temporarily increase work_mem for a specific session
SET work_mem = '256MB';
EXPLAIN ANALYZE SELECT ... (your join query);
-- If Hash Join batches drop to 1, the plan is more efficient

Merge Join

Merge Join  (cost=0.86..2456.78 rows=4489 width=52)
  Merge Cond: (o.customer_id = c.id)
  ->  Index Scan on orders using idx_customer_id
  ->  Index Scan on customers using customers_pkey

Requires both inputs to be sorted on the join key. Scans both in parallel. O(n log n) if sorting needed, O(n + m) if already sorted (via index). Efficient for large tables where both sides have sorted access available.

The EXPLAIN ANALYZE Output: What to Look For

With EXPLAIN ANALYZE, you get actual timing alongside estimates. Know what to look for:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.order_id, c.name, o.amount
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.amount > 500
ORDER BY o.amount DESC
LIMIT 10;

Limit  (cost=1823.45..1823.48 rows=10 width=52)
       (actual time=312.451..312.453 rows=10 loops=1)
  ->  Sort  (cost=1823.45..1834.67 rows=4489 width=52)
            (actual time=312.448..312.449 rows=10 loops=1)
        Sort Key: o.amount DESC
        Sort Method: top-N heapsort  Memory: 27kB
        ->  Hash Join  (cost=312.00..1601.23 rows=4489 width=52)
                       (actual time=45.123..298.234 rows=4612 loops=1)
              Buffers: shared hit=892 read=1245
              ->  Seq Scan on orders o  (cost=0.00..1201.00 rows=4489 width=36)
                                        (actual time=0.043..180.234 rows=4612 loops=1)
                    Filter: (amount > 500)
                    Rows Removed by Filter: 45388
                    Buffers: shared hit=312 read=1245
              ->  Hash  (cost=186.00..186.00 rows=10080 width=20)
                        (actual time=44.891..44.891 rows=10080 loops=1)
                    Buckets: 16384  Batches: 1  Memory Usage: 641kB
                    Buffers: shared hit=580
                    ->  Seq Scan on customers c  (cost=0.00..186.00 rows=10080 width=20)
                                                  (actual time=0.012..22.134 rows=10080 loops=1)
                        Buffers: shared hit=580
Planning Time: 2.341 ms
Execution Time: 312.891 ms

The red flags to scan for:

What you see	What it means	Fix
`Seq Scan` on large table with low row count	Missing index	Add index on filter column
`Rows Removed by Filter: 45388`	High waste ratio	Index or partial index
`Buffers: read=1245` (high disk reads)	Cache miss — data not in memory	Increase `shared_buffers`, add covering index
`Batches: 4` on Hash Join	Hash table spilled to disk	Increase `work_mem`
Estimated rows `4489` vs actual `4612`	Statistics stale	Run `ANALYZE tablename`
`loops=1000` on nested loop inner node	Inner scan runs 1000 times	Add index on inner join column
Sort `Method: external merge`	Sort spilled to disk	Increase `work_mem`
High `Planning Time` (>100ms)	Complex query, many tables	Simplify or use `pg_hint_plan`

The Optimization Workflow

When a query is slow, follow this sequence:

Step 1 — Find the slow query. Do not optimize blind. Use pg_stat_statements to identify the actual worst offenders:

-- Enable pg_stat_statements in postgresql.conf:
-- shared_preload_libraries = 'pg_stat_statements'

-- Top 10 slowest queries by total time
SELECT
    query,
    calls,
    ROUND(total_exec_time::NUMERIC, 2)        AS total_ms,
    ROUND(mean_exec_time::NUMERIC, 2)         AS mean_ms,
    ROUND(stddev_exec_time::NUMERIC, 2)       AS stddev_ms,
    rows
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 10;

Step 2 — Run EXPLAIN ANALYZE. Get the actual plan with timing and buffer information.

Step 3 — Find the most expensive node. Look for the node with the highest actual time. That is where the work is happening.

Step 4 — Apply the fix. The fix depends on the bottleneck:

Most expensive node is Seq Scan on large table?
  └─ Check: does a usable index exist?
     ├─ No index → CREATE INDEX on the filter/join column
     ├─ Index exists but unused → function on column? implicit cast? leftmost prefix wrong?
     └─ Index exists, query returns >10% of rows → Seq Scan may be correct (add covering index instead)

Most expensive node is Sort?
  └─ Can ORDER BY be served by an index? → Add index matching ORDER BY columns
  └─ Sort spilling to disk? → Increase work_mem for this session

Most expensive node is Hash Join with Batches > 1?
  └─ Hash table too large for memory → Increase work_mem

Nested Loop with inner loop running thousands of times?
  └─ Missing index on inner join column → CREATE INDEX

High disk reads (Buffers: read=...)?
  └─ Data not cached → Covering index reduces pages read; increase shared_buffers

Estimated rows wildly off from actual?
  └─ Stale statistics → ANALYZE tablename; consider increasing default_statistics_target

Step 5 — Verify. Run EXPLAIN ANALYZE again after the fix. Confirm the plan changed and execution time improved.

Practical Optimizations

1. Add the Missing Index

-- Before: Seq Scan, 312ms
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

-- Add index
CREATE INDEX idx_orders_customer_status ON orders (customer_id, status);

-- After: Index Scan, 0.8ms
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42 AND status = 'pending';

2. Turn Index Scan Into Index Only Scan

-- Before: Index Scan — fetches heap for each row to get `amount`
EXPLAIN ANALYZE
SELECT order_id, amount FROM orders WHERE customer_id = 42;

-- Add amount to the index (covering index)
DROP INDEX idx_orders_customer;
CREATE INDEX idx_orders_customer_covering ON orders (customer_id, order_id, amount);

-- After: Index Only Scan — zero heap fetches

3. Fix Function on Indexed Column

-- Before: Seq Scan — function prevents index use
EXPLAIN ANALYZE
SELECT * FROM orders WHERE DATE(created_at) = '2025-03-15';

-- After: Index Scan — range condition on raw column
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE created_at >= '2025-03-15'
  AND created_at <  '2025-03-16';

-- Index on created_at is now usable

4. Update Stale Statistics

-- When estimated rows are far from actual:
ANALYZE orders;          -- update statistics for one table
ANALYZE;                 -- update all tables (runs quickly, non-blocking)

-- For columns with skewed distributions (many NULLs, power-law distribution):
ALTER TABLE orders ALTER COLUMN status SET STATISTICS 500;
-- Default is 100 (samples 300 values). Higher = better estimates, slower ANALYZE.
ANALYZE orders;

5. Rewrite Correlated Subquery as JOIN or Window Function

-- Before: correlated subquery executes once per row — O(n²)
SELECT order_id, amount,
    (SELECT AVG(amount) FROM orders o2 WHERE o2.customer_id = o.customer_id) AS avg
FROM orders o;
-- Execution: 8400ms

-- After: window function — one pass
SELECT order_id, amount,
    AVG(amount) OVER (PARTITION BY customer_id) AS avg
FROM orders;
-- Execution: 45ms

The 5-Minute Slow Query Checklist

When handed a slow query, run through this in order:

1. Run EXPLAIN (ANALYZE, BUFFERS) — get the actual plan
2. Find the highest-cost node — that is the bottleneck
3. Is it a Seq Scan on a large table?
   → Check for missing index; check for function on column preventing index use
4. Is estimated row count wildly off from actual?
   → ANALYZE the table to refresh statistics
5. Is it a Hash Join with Batches > 1 or Sort with external merge?
   → Temporarily SET work_mem = '256MB' and re-run — if plan improves, tune permanently
6. Is it a Nested Loop with inner loops=N (large N)?
   → Add index on the inner join column
7. High Buffers: read (disk I/O)?
   → Covering index to reduce pages touched; check if shared_buffers is too small
8. None of the above?
   → Query rewrite: eliminate correlated subqueries, use CTEs, reduce result set earlier with WHERE

🧭 What's Next

Post 08: Schema Design — normalization keeps data consistent, denormalization makes reads fast; every real schema is a deliberate trade-off between them