Database Index Optimization for Web Applications
An index is the fastest way to speed up a query. And simultaneously the fastest way to slow down writes, consume dozens of gigabytes, and mislead the planner. Let's examine which indexes to add and which to remove.
Index Types in PostgreSQL
B-tree — default, suitable for equality, ranges, ORDER BY, LIKE 'prefix%'.
GIN — for arrays, JSONB, tsvector (full-text search), operators @>, ?, @@.
GiST — for geometric types, range types, full-text (alternative to GIN with smaller size, slower on build).
BRIN — for very large tables with correlated data (time-series metrics, logs). Minimal size, slower by GIN/B-tree on lookup.
Hash — only for equality (=). Rarely needed, B-tree usually better.
Mandatory Indexes
All FK columns — without them DELETE/UPDATE of parent record causes sequential scan of child:
-- PostgreSQL auto-indexes PK, but not FK
CREATE INDEX idx_products_category_id ON products (category_id);
CREATE INDEX idx_order_items_order_id ON order_items (order_id);
CREATE INDEX idx_order_items_product_id ON order_items (product_id);
CREATE INDEX idx_comments_post_id ON comments (post_id);
CREATE INDEX idx_comments_user_id ON comments (user_id);
Unique constraints — automatically create an index:
ALTER TABLE users ADD CONSTRAINT uq_users_email UNIQUE (email);
ALTER TABLE products ADD CONSTRAINT uq_products_slug UNIQUE (slug);
Composite Indexes
Column order is critical. Rule: equality first, range/sort last.
-- Query: WHERE status = 'published' AND created_at > '2024-01-01' ORDER BY created_at DESC
-- Correct composite index:
CREATE INDEX idx_products_status_created ON products (status, created_at DESC);
-- Wrong: range condition first — index partially used
CREATE INDEX idx_products_created_status ON products (created_at, status); -- worse
Check: EXPLAIN (ANALYZE, BUFFERS) SELECT ... WHERE status = 'published' ORDER BY created_at DESC LIMIT 20;
Expect: Index Scan using idx_products_status_created with rows removed by filter: 0 (ideal) or minimal deleted rows.
Partial Indexes
Partial index covers a subset of rows — smaller size, faster build, better selectivity:
-- Index only for published products
CREATE INDEX idx_products_published_created
ON products (created_at DESC)
WHERE status = 'published';
-- Index for incomplete orders (few of them)
CREATE INDEX idx_orders_pending
ON orders (user_id, created_at DESC)
WHERE status IN ('pending', 'processing');
-- For soft delete: index by active records
CREATE INDEX idx_users_active_email
ON users (email)
WHERE deleted_at IS NULL;
Partial index is used only when WHERE condition of query includes index condition.
Covering Indexes (INCLUDE)
PostgreSQL 11+ supports INCLUDE — adds columns to leaf pages without affecting order:
-- Query: SELECT id, title, price FROM products WHERE status = 'published' ORDER BY created_at DESC
-- Covering index — no heap fetch needed
CREATE INDEX idx_products_published_cover
ON products (status, created_at DESC)
INCLUDE (id, title, price);
Query executes via Index Only Scan — data taken directly from index, no heap read.
GIN for JSONB
-- specs: {"ram": "16GB", "storage": "512GB", "os": "linux"}
CREATE INDEX idx_products_specs ON products USING GIN (specs);
-- Search by key-value membership
SELECT * FROM products WHERE specs @> '{"os": "linux"}';
-- Search by key presence
SELECT * FROM products WHERE specs ? 'ram';
-- GIN with jsonb_path_ops — only for @>, smaller size
CREATE INDEX idx_products_specs_path ON products USING GIN (specs jsonb_path_ops);
Finding Unused Indexes
-- Indexes never used (since last pg_stat_reset)
SELECT
schemaname,
tablename,
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND schemaname = 'public'
ORDER BY pg_relation_size(indexrelid) DESC;
Indexes with idx_scan = 0 are deletion candidates. Exception: unique indexes (needed for constraint checks on INSERT).
Finding Duplicate Indexes
SELECT
t.relname AS table_name,
ix1.relname AS index1,
ix2.relname AS index2,
array_to_string(a1.attnames, ', ') AS columns1,
array_to_string(a2.attnames, ', ') AS columns2
FROM pg_index i1
JOIN pg_index i2 ON i1.indrelid = i2.indrelid AND i1.indexrelid < i2.indexrelid
JOIN pg_class t ON t.oid = i1.indrelid
JOIN pg_class ix1 ON ix1.oid = i1.indexrelid
JOIN pg_class ix2 ON ix2.oid = i2.indexrelid
CROSS JOIN LATERAL (
SELECT array_agg(a.attname ORDER BY ordinality) AS attnames
FROM unnest(i1.indkey) WITH ORDINALITY AS u(attnum, ordinality)
JOIN pg_attribute a ON a.attrelid = i1.indrelid AND a.attnum = u.attnum
) a1
CROSS JOIN LATERAL (
SELECT array_agg(a.attname ORDER BY ordinality) AS attnames
FROM unnest(i2.indkey) WITH ORDINALITY AS u(attnum, ordinality)
JOIN pg_attribute a ON a.attrelid = i2.indrelid AND a.attnum = u.attnum
) a2
WHERE i1.indkey[0] = i2.indkey[0] -- first column matches
ORDER BY t.relname;
Creating Indexes Without Blocking
In production add indexes only via CONCURRENTLY:
CREATE INDEX CONCURRENTLY idx_products_new ON products (new_column);
Disadvantages of CONCURRENTLY: takes longer (two passes), can't use in transaction. But doesn't block INSERT/UPDATE/DELETE during build.
Bloat and Rebuild
Over time indexes fragment. Check bloat:
-- via pgstattuple extension
SELECT * FROM pgstattuple('idx_products_status_created');
-- dead_tuple_percent > 20% — time to REINDEX
-- Rebuild without blocking (PG 12+)
REINDEX INDEX CONCURRENTLY idx_products_status_created;
Timelines
Audit of indexes in existing project (unused, duplicates, missing FK-indexes, recommendations): 1 day. Development and addition of optimal indexes for specific query set: 1–2 days.







