How SQL Query Planners Decide What Your Query Does

How SQL Query Planners Decide What Your Query Does

You write SELECT * FROM users WHERE email = 'a@b.com' and the database returns rows. You never said how to find them — never said "use the email index" or "scan the table from the start." That decision happened inside the planner, a small program that turns declarative SQL into an actual sequence of disk reads, memory operations, and joins. When it picks well, your query takes 2 ms. When it picks badly, the same query takes 40 seconds. Understanding what the planner is thinking is the difference between guessing at performance problems and fixing them.

This post opens up the box. We'll use Postgres EXPLAIN output as the working example, but the concepts apply to MySQL, SQLite, SQL Server, and Oracle — same job, slightly different vocabulary.

What "Planning" Actually Means

flowchart LR
  SQL["SELECT u.name, o.total<br/>FROM users u<br/>JOIN orders o ON o.user_id = u.id<br/>WHERE u.active = true"]
  Parse[Parse + bind]
  Rewrite[Rewriter]
  Plan[Planner / optimizer]
  Stats[(pg_statistic<br/>row counts<br/>histograms<br/>correlations)]
  Exec[Executor]
  Rows[(Rows)]
  SQL --> Parse --> Rewrite --> Plan
  Stats -. estimates feed cost model .-> Plan
  Plan --> Exec --> Rows
  Plan -.- Note["Many valid plans;<br/>pick lowest estimated cost"]

SQL is declarative: you describe the result you want, not the steps to compute it. Internally, the database has to convert your query into an execution plan — a tree of physical operators like sequential scan, index scan, hash join, sort, limit. There are usually many valid plans for any given query. Choosing among them is the planner's job.

The planner has two sub-jobs: enumerate plausible plans, and estimate the cost of each one. The lowest-cost plan wins. Cost here is an abstract number meant to approximate disk I/O, CPU, and memory pressure. It is not seconds, and it is not deterministic across versions — but lower is meant to be faster.

The estimates rely on statistics the database keeps about each table: row counts, the distribution of values in each column, how many distinct values exist, correlation between columns, and on-disk page counts. Postgres updates these via ANALYZE (manual) or autovacuum (background). When statistics drift away from reality, plan quality drops. This is the single most common cause of "the query was fast yesterday and slow today" with no code changes.

Reading Your First EXPLAIN

Let's start concrete. Suppose you have a users table with 1 million rows, indexed on email.

EXPLAIN SELECT * FROM users WHERE email = 'a@b.com';

Output:

Index Scan using users_email_idx on users  (cost=0.42..8.44 rows=1 width=180)
  Index Cond: (email = 'a@b.com'::text)

Three numbers matter here. cost=0.42..8.44 is the planner's estimate of work needed to return the first row (0.42) and the last row (8.44). rows=1 is the estimated number of rows the operator will return. width=180 is the average row size in bytes. The planner picked an index scan because it expects exactly one row out, and walking the B-tree to find it is much cheaper than reading the whole table.

Now compare with a query that doesn't filter:

EXPLAIN SELECT * FROM users WHERE active = true;
Seq Scan on users  (cost=0.00..23540.00 rows=900000 width=180)
  Filter: active

Sequential scan — read every row, drop the inactive ones. Why? Because if 90% of rows match, an index lookup would actually be slower than a sequential read. Disks (and OS page caches) are great at sequential I/O; random I/O for 900,000 separate B-tree lookups would be brutal. The planner correctly chose the brute-force option.

This is the core lesson: indexes help when they are selective. An index on a boolean column where 90% of rows are true is useless and the planner knows it.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN shows estimates. EXPLAIN ANALYZE actually runs the query and shows you both estimates and reality, side by side:

EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 42;
Index Scan using orders_customer_idx on orders
  (cost=0.43..16.85 rows=4 width=120)
  (actual time=0.028..0.105 rows=312 loops=1)
  Index Cond: (customer_id = 42)
Planning Time: 0.214 ms
Execution Time: 0.197 ms

The planner estimated 4 rows. It actually returned 312. That gap — almost 80x off — is the single most useful signal in EXPLAIN ANALYZE. When the planner is wrong about row counts, every downstream decision compounds the error. A small estimated row count makes nested loop joins look cheap; if the real number is 80x higher, you've just done 80x as many lookups as the planner thought you would.

Whenever a query is mysteriously slow, the first thing to check is whether rows= matches actual rows= at every node. If not, run ANALYZE table_name; and try again.

Join Strategies: Nested Loop, Hash, Merge

flowchart TB
  subgraph NL["Nested loop"]
    direction LR
    NLO[(Outer:<br/>small)] --> NLLoop["for row in outer:<br/>  index lookup in inner"]
    NLLoop --> NLI[(Inner:<br/>indexed)]
    NLOut[Result]
    NLLoop --> NLOut
  end
  subgraph H["Hash join"]
    direction LR
    HS[(Smaller side)] --> HBuild["Build hash table<br/>in memory"]
    HBuild --> HProbe["Probe with<br/>larger side row by row"]
    HL[(Larger side)] --> HProbe
    HProbe --> HOut[Result]
  end
  subgraph M["Merge join"]
    direction LR
    MA[(Sorted A)] --> MMerge["Walk both in lockstep<br/>(like merging sorted lists)"]
    MB[(Sorted B)] --> MMerge
    MMerge --> MOut[Result]
  end
  classDef good fill:#1f1f1f,stroke:#4ade80,color:#e4e4e4;
  classDef ok fill:#1f1f1f,stroke:#fb923c,color:#e4e4e4;
  classDef alt fill:#1f1f1f,stroke:#60a5fa,color:#e4e4e4;
  class NL good
  class H ok
  class M alt

When you join two tables, the planner picks one of three strategies. Each has a sweet spot.

Nested loop join. For every row in the outer table, look up matching rows in the inner table — usually via an index. Cheap when the outer is small and the inner has a good index. Catastrophic when the outer is large.

EXPLAIN SELECT u.name, o.total
FROM users u JOIN orders o ON o.user_id = u.id
WHERE u.id = 5;
Nested Loop  (cost=0.85..18.43 rows=4 width=40)
  ->  Index Scan using users_pkey on users u  (rows=1)
  ->  Index Scan using orders_user_idx on orders o  (rows=4)
        Index Cond: (user_id = 5)

One user, look up their orders by index — perfect.

Hash join. Build a hash table from the smaller table in memory, then scan the larger table and probe the hash. Linear in both inputs. Wins when neither side has a useful index for the join key, or when both sides are large.

Hash Join  (cost=27.50..1503.42 rows=50000 width=40)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (rows=1000000)
  ->  Hash  (rows=10000)
        ->  Seq Scan on users u

Merge join. If both sides are already sorted on the join key (because they came from an index, or a prior sort), walk them in lockstep like merging two sorted lists. Memory-efficient on huge inputs.

The planner's job is to know which one is cheapest given the row counts and available indexes. Mostly it gets it right. When it doesn't, the fix is usually to fix the statistics — not to override the planner.

Why the Planner Sometimes Gets It Wrong

flowchart TD
  Slow([Query is mysteriously slow])
  Run["Run EXPLAIN ANALYZE"]
  Diff{rows= vs<br/>actual rows=<br/>diverge &gt;10×?}
  Stale{Recent bulk<br/>load or<br/>schema change?}
  Skew{One value<br/>dominates the<br/>filtered column?}
  Corr{Filter mixes<br/>correlated<br/>columns?}
  Stat[Run ANALYZE<br/>(or tune autovacuum)]
  Hist[ALTER COLUMN ...<br/>SET STATISTICS 1000]
  CrStat[CREATE STATISTICS<br/>for dependencies]
  Idx[Re-examine indexes<br/>or rewrite query]
  Slow --> Run --> Diff
  Diff -- yes --> Stale
  Stale -- yes --> Stat
  Stale -- no --> Skew
  Skew -- yes --> Hist
  Skew -- no --> Corr
  Corr -- yes --> CrStat
  Corr -- no --> Idx
  Diff -- no --> Idx

Three failure modes show up over and over. Stale statistics: you bulk-loaded 5 million rows yesterday and ANALYZE hasn't run yet. The planner still thinks the table has 100k rows and chooses a nested loop that's now 50x slower than a hash join would be. Fix: ANALYZE. Better fix: tune autovacuum so it triggers earlier on write-heavy tables.

Skewed data: a column has 1 million distinct values, but 50% of rows have the value 'pending'. Default statistics may underestimate this skew, so the planner thinks WHERE status = 'pending' returns 1 in a million when it actually returns half. Fix with ALTER TABLE ... ALTER COLUMN status SET STATISTICS 1000 to capture more of the histogram.

Correlated columns: the planner usually assumes columns are independent. If you filter WHERE city = 'Tokyo' AND country = 'Japan', it multiplies the two selectivities, even though country = 'Japan' is implied by city = 'Tokyo'. Estimates collapse, plans go sideways. Postgres has CREATE STATISTICS for multi-column dependencies — use it when this pattern appears.

A practical workflow: when a query is slow, run EXPLAIN ANALYZE, find the node where estimated and actual row counts diverge most, and trace that mis-estimate to one of the three causes above.

Indexes the Planner Will Actually Use

Index scan vs sequential scan as a function of selectivity Index scan selective Bitmap heap scan moderate Seq scan non-selective 0% ~5% ~25% 100% % of rows matching the WHERE clause WHERE id = 42 WHERE city = 'Tokyo' WHERE active = true (90% true)
Indexes only help when the predicate filters most rows out. Past ~25% selectivity, a sequential scan with a tight memory page walk almost always wins.

An index is only useful if the planner picks it. Several common mistakes lead to indexes sitting unused on disk.

A query like WHERE LOWER(email) = 'a@b.com' cannot use an index on email — the function transforms the value the index has stored. Either store the data already lowercased, or build a functional index: CREATE INDEX ON users (LOWER(email)).

Leading-wildcard LIKE patterns (WHERE name LIKE '%smith') cannot use a B-tree, because B-trees rely on prefix ordering. Trailing-wildcard ('smith%') can. For full substring search, a trigram index (pg_trgm) is the right tool.

Composite indexes follow leftmost-prefix rules. An index on (country, city, age) helps queries that filter by country, or country + city, or all three. It does not help a query that filters only by city or only by age.

Type mismatches silently disable index use. A text column compared to a numeric literal forces a cast on every row.

Before tuning indexes, format your query so it's actually readable. Long inline SELECT statements with nested subqueries hide the structure that determines plan shape. Drop your query into the SQL Formatter to see the join order and predicates clearly. When you're ready to ship a hot query into application code, the SQL Minifier collapses it back to a single-line form for embedding in source. If your query lives inside JSON config or an API payload, the JSON Formatter helps you extract it cleanly first.

Build Intuition With a Tiny EXPLAIN Lab

You don't need a production database to build intuition. Spin up a local Postgres or use the in-browser SQL Online Runner to play with plan shapes. A useful starter exercise:

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  user_id INT,
  type TEXT,
  created_at TIMESTAMP
);

INSERT INTO events (user_id, type, created_at)
SELECT (random()*1000)::int, 'click', now() - (random()*30 || ' days')::interval
FROM generate_series(1, 100000);

ANALYZE events;
EXPLAIN ANALYZE SELECT * FROM events WHERE user_id = 42;

You'll see a sequential scan — there's no index yet. Add one with CREATE INDEX ON events (user_id); and rerun. Now you'll see an index scan with a much lower cost. Try a query that returns 50% of rows (WHERE created_at > now() - interval '15 days') and watch the planner refuse the index — exactly as it should.

A note on hints: some databases (Oracle, MySQL, SQL Server) let you embed planner hints directly in the query. Postgres deliberately doesn't, though the pg_hint_plan extension exists. Hints are seductive and almost always wrong — they freeze a plan that was good for the data shape on the day you wrote it. When data grows or distributions shift, the hint stays and the planner can't help. The right response to a bad plan is almost never a hint; it's fix the statistics, add or drop an index, or rewrite the query to be more selective.

What to Take Away

Three things will catch 80% of query performance problems.

First, run EXPLAIN ANALYZE and compare estimated vs actual row counts. Mismatches are where the planner is flying blind.

Second, run ANALYZE after bulk loads, schema changes, or any time a query gets mysteriously slower. Stale statistics are the silent killer.

Third, learn the three join strategies — nested loop, hash, merge — and the conditions that make each one cheap. Once you can predict which one the planner should pick, it becomes obvious when it picked wrong.

For related reading on this site, Understanding Database Indexes covers B-tree structure and selectivity in more depth, SQL Basics: Zero to Queries is the right starting point if any syntax above felt unfamiliar, and NoSQL vs SQL discusses when a relational planner is the right tool at all. For external reading: the PostgreSQL EXPLAIN documentation is the canonical reference, Use the Index, Luke! is the best book-length guide to indexing, the Wikipedia article on query optimization covers the academic foundations, and the SQLite query planner overview shows how a different engine makes the same decisions.

The planner is not magic. It's a cost estimator with imperfect information, doing its best to translate your intent into bytes off disk. Once you can read what it's thinking, fixing slow queries stops being guesswork.

FAQ

Why does the same query suddenly become slow without any code change?

Almost always stale statistics. After a bulk load, schema migration, or unusual write pattern, the planner's row-count estimates drift away from reality. Run ANALYZE table_name; (Postgres) or ANALYZE TABLE (MySQL) and re-check EXPLAIN. If the new plan beats the old one, autovacuum is running too rarely on that table — tune autovacuum_analyze_scale_factor or schedule manual ANALYZEs for write-heavy tables.

Should I use planner hints to force a specific plan?

Almost never. Hints freeze a plan that was good for the data shape on the day you wrote it; when distributions shift, the hint stays and the planner can't help. Postgres deliberately omits hints for this reason (though pg_hint_plan exists). MySQL, Oracle, and SQL Server have them, but the right response to a bad plan is fix the statistics, add or drop an index, or rewrite the query — not pin it.

How is `EXPLAIN ANALYZE` different from regular `EXPLAIN`?

EXPLAIN shows estimates only; the query doesn't run. EXPLAIN ANALYZE actually executes the query and shows estimates alongside actual row counts and timing. The actual row counts are the most useful debugging signal — when estimated and actual diverge by 10× or more, the planner was flying blind. Beware: ANALYZE actually executes the query, so don't run it on DELETE or UPDATE unless you want the side effects.

Why won't my index be used even though the column is indexed?

Common reasons: the WHERE clause wraps the column in a function (LOWER(email)), the column type doesn't match the literal type (text column compared to numeric), the leading-prefix rule for composite indexes isn't met, the index isn't selective enough (boolean column with 90% true values), or the planner correctly decided a sequential scan is faster. Check EXPLAIN to see the actual cost comparison; the planner is usually right.

What's the difference between a hash join and a merge join?

Hash join builds an in-memory hash table from the smaller side, then probes with the larger side — linear in both inputs, no sort needed. Merge join requires both sides already sorted on the join key (often from index scans) and walks them in lockstep — memory-efficient on huge inputs. The planner picks based on input sizes, available indexes, and work_mem — hash join wins for small-to-medium joins, merge join wins for already-sorted huge inputs.

How big does my data need to be before query planning matters?

Tens of thousands of rows is usually where you start to see plan differences. At a few hundred rows, sequential scan is faster than any index lookup regardless of plan choice. At a few million, plan quality dominates everything else — a wrong join order can make a 50ms query take 50 seconds. The threshold also depends on column selectivity: a 10K-row table with a unique-key lookup needs a good plan even at small scale.

Are stored procedures faster than ad-hoc queries?

Sometimes — they skip plan re-parsing and benefit from cached plans. But cached plans can also become stale across data shape changes (the "parameter sniffing" problem in SQL Server). For modern Postgres and MySQL, prepared statements give you most of the parsing savings without the operational complexity of stored procs. Don't migrate to procs purely for performance unless profiling shows parsing is the bottleneck.

What's the modern alternative to reading EXPLAIN output by hand?

Tools like explain.depesz.com for Postgres and tenser.dev visualize plan trees with timing heatmaps, making bottlenecks visually obvious. PgHero, pganalyze, and DataGrip's plan visualizer all do similar work. They don't change what the planner does — just make the output legible to humans, which is enough to catch most issues without memorizing operator vocabulary.