Understanding Database Indexes: How They Work and When to Use Them

Understanding Database Indexes: How They Work and When to Use Them

At some point, every developer writes a query that works fine in development and runs for 30 seconds in production. The data grew, nobody noticed, and now you have a problem. Indexes are almost always the fix — but adding them without understanding them leads to a different set of problems: bloated storage, slow writes, and indexes the query planner never actually uses.

What an Index Actually Is

The most common database index type is a B-tree (balanced tree). Conceptually, it's a separate data structure maintained alongside your table that stores sorted copies of a column's values, with pointers back to the original rows.

Think of it like a book index. The book itself has information scattered across hundreds of pages. The index at the back lets you jump directly to "concurrency" on page 217 without reading the whole book. The database table is the book; the index is the lookup structure.

A B-tree index lets the database do binary search rather than a linear scan. For a table with 1 million rows, a full scan examines 1,000,000 rows. A B-tree binary search examines roughly 20 (log₂ of 1,000,000). That's a genuine order-of-magnitude difference on large tables.

Full Table Scan vs Index Scan

When you run a query, the database engine creates an execution plan. It decides whether to use an available index or just scan every row in the table. The decision is based on cost estimation.

-- Without an index on email:
-- Full table scan: reads every row, checks each one
SELECT * FROM users WHERE email = 'alice@example.com';

-- With an index on email:
-- Index scan: jumps directly to matching rows
SELECT * FROM users WHERE email = 'alice@example.com';

The query looks identical. The difference is entirely in the execution plan. You can see it with EXPLAIN:

EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';

PostgreSQL will show you whether it's doing a Seq Scan (sequential/full scan) or an Index Scan. For a detailed breakdown including actual timing:

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@example.com';

EXPLAIN ANALYZE actually runs the query, so you get both the plan and the real execution times. It's the most useful diagnostic tool in your SQL toolkit. MySQL has the same feature with slightly different output format.

Cardinality and Selectivity

Not every column benefits equally from an index. The key concept is selectivity — how much an index narrows down the result set.

High cardinality means many distinct values: email addresses, UUIDs, timestamps. An index on a high-cardinality column is very selective — it eliminates most rows quickly.

Low cardinality means few distinct values: boolean flags, status codes with 3–5 values. An index on WHERE active = true when 90% of users are active is nearly useless. It eliminates almost nothing, and the query planner will often prefer a full scan anyway.

A rough rule: if a column's values aren't very selective (less than ~10% of rows for a typical query), an index on that column alone probably won't help and may be ignored by the planner.

Composite Indexes and Column Order

A composite index covers multiple columns:

CREATE INDEX idx_orders_user_status ON orders (user_id, status);

This index can speed up queries that filter on user_id alone, or on user_id AND status together. It cannot efficiently speed up queries that filter on status alone (without user_id).

This is the left-prefix rule: a composite index is usable for any prefix of its columns, left to right. So (user_id, status, created_at) helps queries that filter on:

  • user_id
  • user_id, status
  • user_id, status, created_at

But not on just status or just created_at.

Column order in composite indexes matters enormously. Put the most selective column first, and put columns used in equality conditions (WHERE col = ?) before columns used in range conditions (WHERE col > ?).

Covering Indexes

A covering index is one that contains all the columns a query needs — so the database can answer entirely from the index without touching the table at all.

-- Query: get names and emails of active users
SELECT name, email FROM users WHERE active = true;

-- Covering index for this query:
CREATE INDEX idx_users_active_name_email ON users (active, name, email);

When the index includes the WHERE column plus every SELECT column, the database never needs to look up the actual table row. For high-traffic read queries, this can be a meaningful performance win.

When Indexes Hurt

Indexes aren't free. Every index you add has costs:

Write overhead. Every INSERT, UPDATE, or DELETE must maintain all indexes on that table. A table with eight indexes takes eight times as many index updates on write. For tables with heavy write traffic — event logs, audit tables, queues — that overhead adds up.

Storage. Indexes take disk space. A large table with many indexes can have indexes as large as the table itself. On cloud databases priced by storage, this matters.

Query planner confusion. Too many similar or redundant indexes means the planner has more options to consider and can occasionally make a worse choice. Keep your index set clean.

Indexes You Probably Need

Primary keys — your database creates these automatically. They're unique indexes on the PK column.

Foreign keys — if you JOIN on orders.user_id = users.id, you need an index on orders.user_id. Many databases don't create these automatically (PostgreSQL doesn't; MySQL's InnoDB does). Missing FK indexes are a common source of slow JOINs.

Columns in frequent WHERE clauses — if you query by email, username, or any field that acts as a lookup key, it needs an index.

Columns you ORDER BY on large result sets — sorting a large table without an index is expensive.

Indexes That Are Probably Wasted

Low-cardinality boolean or status columns in isolation — as covered above, these rarely help.

Columns that are never actually queried — review your index list occasionally. Indexes accumulate over time as features are added and removed.

Duplicate or redundant indexes — if you have (user_id) and (user_id, status), the single-column index is redundant for any query the composite index covers. Drop it.

A Practical Debugging Workflow

When a query is slow:

  1. Run EXPLAIN ANALYZE on the slow query.
  2. Look for Seq Scan on large tables — that's where you need an index.
  3. Check whether an existing index exists but isn't being used (the column might not be selective enough, or the query might not match the index's left prefix).
  4. Add the index, re-run EXPLAIN ANALYZE, and confirm it's being used.
  5. Measure the actual query time before and after.

The PostgreSQL documentation on `EXPLAIN` is thorough and worth bookmarking. MySQL's EXPLAIN output is similar but formatted differently.

Our SQL Formatter is useful for cleaning up complex queries before you analyze them — a well-formatted query is much easier to reason about when you're reading an execution plan.

For context on how indexes relate to broader database design, see SQL Basics: From Zero to Writing Real Queries and NoSQL vs SQL: Choosing the Right Database.

Indexes are one of the highest-leverage tools in database performance. Understand them rather than just adding them blindly — you'll add the right ones, skip the useless ones, and know exactly where to look when a query slows down.