PostgreSQL Deep Dive Part 12 — Advanced Indexing Strategies and Optimization
Having established our foundation in index fundamentals, let's dive into the art and science of advanced indexing strategies. This is where the real performance optimization begins — it's not just about creating indexes, but about creating the right indexes for your specific use cases.
Published At
1/4/2025
Reading Time
~ 16 min read
Access complete, unfiltered study notes maintained on GitHub for easily maintaing Second Brain Dir
Strategic Index Creation: Where and When 🎨
Creating indexes is more art than science. While building a schema is scientific, indexing requires understanding your specific access patterns and query behaviors. Simply adding indexes to every column is a common pitfall — remember that each index is a separate data structure storing copied data, which can impact performance and storage.
-- Sample users table with common columns
CREATE TABLE users (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT,
birthday DATE,
is_pro BOOLEAN,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Common query pattern without index
SELECT * FROM users WHERE birthday = '1989-02-14';
-- Analyzing current performance
EXPLAIN SELECT * FROM users WHERE birthday = '1989-02-14';
-- Result shows: Seq Scan on users - scanning entire table
-- Sample users table with common columns
CREATE TABLE users (
id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
first_name TEXT,
last_name TEXT,
email TEXT,
birthday DATE,
is_pro BOOLEAN,
deleted_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT NOW(),
updated_at TIMESTAMPTZ DEFAULT NOW()
);
-- Common query pattern without index
SELECT * FROM users WHERE birthday = '1989-02-14';
-- Analyzing current performance
EXPLAIN SELECT * FROM users WHERE birthday = '1989-02-14';
-- Result shows: Seq Scan on users - scanning entire table
When analyzing potential index locations, consider these key factors:
- WHERE clause columns - The most obvious candidates for indexing, as they directly filter your data
- JOIN conditions - Critical for optimizing table relationships and preventing expensive operations
- ORDER BY clauses - Can eliminate costly sorting operations when properly indexed
- GROUP BY expressions - Important for aggregation performance
Understanding Index Selectivity 📊
Index selectivity is crucial for determining whether a column is a good candidate for indexing. It helps us understand if creating an index will actually improve query performance.
Think of it this way: if you have a column where most values are the same (like a first_name column where everyone is named "Abhishek"), an index won't help much since it won't narrow down the results effectively.
Two key concepts determine index effectiveness:
- Cardinality: The number of unique values in a column (e.g., a boolean column has cardinality of 2)
- Selectivity: The ratio of unique values to total rows (cardinality/total rows)
-- Calculating column cardinality and selectivity
SELECT
COUNT(DISTINCT birthday) as cardinality,
(COUNT(DISTINCT birthday)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as selectivity
FROM users;
-- Example results:
-- cardinality | selectivity
-- 10950 | 0.0111
-- Comparing with perfect selectivity (primary key)
SELECT
(COUNT(DISTINCT id)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as id_selectivity
FROM users;
-- Result: 1.0000
-- Calculating column cardinality and selectivity
SELECT
COUNT(DISTINCT birthday) as cardinality,
(COUNT(DISTINCT birthday)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as selectivity
FROM users;
-- Example results:
-- cardinality | selectivity
-- 10950 | 0.0111
-- Comparing with perfect selectivity (primary key)
SELECT
(COUNT(DISTINCT id)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as id_selectivity
FROM users;
-- Result: 1.0000
Key selectivity principles:
- Higher selectivity (closer to 1.0) = better index candidate - like primary keys which have perfect selectivity
- Very low selectivity (like 0.0000024 for boolean columns) might lead PostgreSQL to ignore the index
- PostgreSQL uses internal statistics to decide whether to use indexes - if a query returns more than ~50% of rows, it might skip the index entirely
- Statistics can be updated manually with ANALYZE or automatically via auto-vacuum
Mastering Composite Indexes: Understanding the Left-Most Prefix Rule 🔧
Composite indexes can dramatically improve query performance by indexing multiple columns together. While PostgreSQL can combine results from multiple single-column indexes, a well-designed composite index usually yields better performance, especially for queries that filter on multiple columns.
Creating and Using Composite Indexes
Let's look at a practical example:
-- Creating a composite index on three columns
CREATE INDEX multi_idx ON users USING btree(first_name, last_name, birthday);
-- These queries effectively use the index:
SELECT * FROM users WHERE first_name = 'Abhishek';
-- Uses index (microseconds)
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index (microseconds)
-- The order of conditions doesn't matter - both use the index efficiently:
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index
SELECT * FROM users WHERE last_name = 'Singh' AND first_name = 'Abhishek';
-- Also uses index
-- This query won't use the index (takes ~6 seconds):
SELECT * FROM users WHERE last_name = 'Singh'; -- Sequential scan
-- Creating a composite index on three columns
CREATE INDEX multi_idx ON users USING btree(first_name, last_name, birthday);
-- These queries effectively use the index:
SELECT * FROM users WHERE first_name = 'Abhishek';
-- Uses index (microseconds)
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index (microseconds)
-- The order of conditions doesn't matter - both use the index efficiently:
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index
SELECT * FROM users WHERE last_name = 'Singh' AND first_name = 'Abhishek';
-- Also uses index
-- This query won't use the index (takes ~6 seconds):
SELECT * FROM users WHERE last_name = 'Singh'; -- Sequential scan
Understanding the Left-Most Prefix Rule
The Left-Most Prefix Rule is fundamental to composite index behavior. Here's how it works:
- Column Order in Index Definition:
- The order of columns in your
CREATE INDEX
statement is crucial - The query optimizer respects this order, regardless of the order of conditions in your WHERE clause
- Think of it as a hierarchical phone book where you must know the first name to find the last name
- The order of columns in your
- B-tree Structure:
- The composite index creates a nested B-tree structure
- first_name is the primary sort key
- last_name is sorted within each first_name group
- birthday is sorted within each first_name + last_name combination
- Key Rules:
- Must use columns left-to-right from the index definition
- Cannot skip the leftmost column(s)
- PostgreSQL has optimizations for middle column access, but the leftmost column is always required
- Range conditions > or < or BETWEEN typically stop further index usage
Important Implications
-- These queries CAN use the index:
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh'
AND birthday = '1990-01-01'; -- Uses all columns
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND birthday = '1990-01-01'; -- PostgreSQL can skip last_name
-- These queries CANNOT use the index effectively:
SELECT * FROM users
WHERE last_name = 'Singh'; -- Missing leftmost column
SELECT * FROM users
WHERE birthday = '1990-01-01'; -- Missing leftmost column
-- These queries CAN use the index:
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh'
AND birthday = '1990-01-01'; -- Uses all columns
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND birthday = '1990-01-01'; -- PostgreSQL can skip last_name
-- These queries CANNOT use the index effectively:
SELECT * FROM users
WHERE last_name = 'Singh'; -- Missing leftmost column
SELECT * FROM users
WHERE birthday = '1990-01-01'; -- Missing leftmost column
If you frequently need to search by last_name
alone, you have two options:
- Create a separate index on
last_name
:
CREATE INDEX idx_lastname ON users(last_name);
CREATE INDEX idx_lastname ON users(last_name);
- Create a different composite index with
last_name
as the leftmost column:
CREATE INDEX idx_lastname_firstname ON users(last_name, first_name);
CREATE INDEX idx_lastname_firstname ON users(last_name, first_name);
This understanding of the Left-Most Prefix Rule and column ordering is crucial for designing efficient indexes that match your application's query patterns.
Understanding Composite Range Queries 🎯
Composite indexes with range conditions require special consideration in PostgreSQL, particularly in how the B-tree structure processes these queries. Understanding this behavior is crucial for optimizing query performance.
B-tree Index Behavior
In PostgreSQL, B-tree indexes organize data hierarchically in a sorted structure. When designing composite indexes that include range conditions, the order of columns significantly impacts how PostgreSQL traverses the B-tree:
- Each level contains keys dividing data into ranges
- The tree is traversed top-to-bottom for matching records
- The leftmost prefix is crucial for efficient navigation
Index Structure Comparison
Let's examine two different approaches to indexing the same columns:
-- Approach 1: Range condition last (Optimal)
CREATE INDEX first_last_birth ON users USING btree(
first_name,
last_name,
birthday
);
-- Approach 2: Range condition in middle (Suboptimal)
CREATE INDEX first_birth_last ON users USING btree(
first_name,
birthday,
last_name
);
-- Example query using these indexes
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh'
AND birthday < '1989-12-31';
-- Approach 1: Range condition last (Optimal)
CREATE INDEX first_last_birth ON users USING btree(
first_name,
last_name,
birthday
);
-- Approach 2: Range condition in middle (Suboptimal)
CREATE INDEX first_birth_last ON users USING btree(
first_name,
birthday,
last_name
);
-- Example query using these indexes
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh'
AND birthday < '1989-12-31';
Performance Implications
Optimal Approach (first_last_birth)
- PostgreSQL first matches
first_name = 'Abhishek'
- Navigates directly to the correct branch
- Then matches
last_name = 'Singh'
- Further narrows within that branch
- Finally, scans
birthday < '1989-12-31'
- Sequential scan within already narrowed range
Suboptimal Approach (first_birth_last)
- Matches
first_name = 'Abhishek'
- For
birthday < '1989-12-31'
:- Must scan multiple paths
- Widens the search space
last_name
comparison becomes less effective- Range condition has already broadened the search
Best Practices for Range Conditions
- Place equality conditions first
- Most selective columns should lead
- Creates distinct, narrow paths in B-tree
- Enables precise search space reduction
- Put range conditions last
- Range conditions create multiple potential paths
- Placing them last minimizes the search space
- Prevents undermining subsequent column filters
- Consider query patterns when ordering columns
- Match index structure to common query patterns
- Monitor query performance with EXPLAIN
- Create specific indexes for critical queries
Index Design Guidelines
When designing composite indexes with range conditions:
-- Good: Range condition at end
CREATE INDEX good_index ON users(
equality_column1, -- Most selective
equality_column2, -- Less selective
range_column -- Range conditions last
);
-- Avoid: Range condition in middle
CREATE INDEX suboptimal_index ON users(
equality_column1,
range_column, -- Creates inefficient scan
equality_column2
);
-- Good: Range condition at end
CREATE INDEX good_index ON users(
equality_column1, -- Most selective
equality_column2, -- Less selective
range_column -- Range conditions last
);
-- Avoid: Range condition in middle
CREATE INDEX suboptimal_index ON users(
equality_column1,
range_column, -- Creates inefficient scan
equality_column2
);
Following these principles ensures optimal B-tree utilization and query performance. The key is understanding that once a range condition is encountered, PostgreSQL must consider multiple paths in the B-tree, making any subsequent column conditions less effective for filtering.
Combining Multiple Indexes 🔄
PostgreSQL can use multiple indexes in a single query when appropriate:
-- Creating separate indexes
CREATE INDEX idx_first_name ON users(first_name);
CREATE INDEX idx_last_name ON users(last_name);
-- PostgreSQL can combine these indexes for queries like:
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh';
-- Check the execution plan
EXPLAIN ANALYZE
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh';
-- Creating separate indexes
CREATE INDEX idx_first_name ON users(first_name);
CREATE INDEX idx_last_name ON users(last_name);
-- PostgreSQL can combine these indexes for queries like:
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh';
-- Check the execution plan
EXPLAIN ANALYZE
SELECT * FROM users
WHERE first_name = 'Abhishek'
AND last_name = 'Singh';
Index combination strategies:
- AND conditions might use multiple indexes
- OR conditions usually need separate indexes
- Cost-based optimizer decides final approach
Covering Indexes: Optimizing Query Performance 🎯
Covering indexes represent one of PostgreSQL's most powerful optimization techniques. They can eliminate heap lookups entirely, providing significant performance benefits:
-- Basic index
CREATE INDEX idx_first ON users(first_name);
-- This index only stores first_name and a heap pointer (ctid)
-- Covering index
CREATE INDEX idx_first_cover ON users(first_name) INCLUDE (id);
-- Stores first_name in B-tree structure and id in leaf nodes only
-- The difference becomes clear in execution plans
EXPLAIN
SELECT first_name, last_name
FROM users
WHERE first_name = 'Abhishek';
-- Requires heap lookup because last_name isn't in index
-- You'll see "Index Scan" in the plan
EXPLAIN
SELECT first_name, id
FROM users
WHERE first_name = 'Abhishek';
-- Index only scan! No heap access needed
-- You'll see "Index Only Scan" in the plan
-- You can also create composite indexes for covering
CREATE INDEX idx_composite ON users(first_name, last_name);
-- Different from INCLUDE - both columns participate in B-tree
-- Basic index
CREATE INDEX idx_first ON users(first_name);
-- This index only stores first_name and a heap pointer (ctid)
-- Covering index
CREATE INDEX idx_first_cover ON users(first_name) INCLUDE (id);
-- Stores first_name in B-tree structure and id in leaf nodes only
-- The difference becomes clear in execution plans
EXPLAIN
SELECT first_name, last_name
FROM users
WHERE first_name = 'Abhishek';
-- Requires heap lookup because last_name isn't in index
-- You'll see "Index Scan" in the plan
EXPLAIN
SELECT first_name, id
FROM users
WHERE first_name = 'Abhishek';
-- Index only scan! No heap access needed
-- You'll see "Index Only Scan" in the plan
-- You can also create composite indexes for covering
CREATE INDEX idx_composite ON users(first_name, last_name);
-- Different from INCLUDE - both columns participate in B-tree
Important considerations:
- Don't include large columns
- Only include frequently accessed columns
- Balance storage cost against performance gain
Partial Indexes: Targeted Performance Boost 🎯
Partial indexes allow you to index only specific portions of your table, reducing overhead while maintaining performance where it matters:
-- Create partial index for pro users only
CREATE INDEX idx_email_pro ON users(email)
WHERE is_pro IS TRUE;
-- This query will use the partial index
EXPLAIN ANALYZE
SELECT * FROM users
WHERE email = 'abhishek@gmail.com'
AND is_pro IS TRUE; -- important to pass this to use index
-- Unique partial index for active users
CREATE UNIQUE INDEX idx_unique_active_email
ON users(email)
WHERE deleted_at IS NULL;
-- Create partial index for pro users only
CREATE INDEX idx_email_pro ON users(email)
WHERE is_pro IS TRUE;
-- This query will use the partial index
EXPLAIN ANALYZE
SELECT * FROM users
WHERE email = 'abhishek@gmail.com'
AND is_pro IS TRUE; -- important to pass this to use index
-- Unique partial index for active users
CREATE UNIQUE INDEX idx_unique_active_email
ON users(email)
WHERE deleted_at IS NULL;
This approach is particularly powerful for:
- Tables with distinct subsets of data
- Enforcing unique constraints on subset
- Optimizing specific query patterns
Index Ordering Strategies 🔄
Index ordering can significantly impact query performance, particularly for queries requiring sorted results. Understanding and optimizing index ordering ensures efficient execution of queries without unnecessary sorting operations.
Key Concepts for Index Ordering
-
Single-Column Indexes:
- When creating an index on a single column, the order (ascending or descending) typically does not matter for performance.
- PostgreSQL automatically scans the index in forward or reverse order depending on the query's
ORDER BY
clause.
sql-- Example of a single-column index CREATE INDEX idx_birthday ON users(birthday); -- Queries utilizing the index efficiently EXPLAIN SELECT * FROM users ORDER BY birthday LIMIT 10; EXPLAIN SELECT * FROM users ORDER BY birthday DESC LIMIT 10;
sql-- Example of a single-column index CREATE INDEX idx_birthday ON users(birthday); -- Queries utilizing the index efficiently EXPLAIN SELECT * FROM users ORDER BY birthday LIMIT 10; EXPLAIN SELECT * FROM users ORDER BY birthday DESC LIMIT 10;
-
Composite Indexes:
- For multi-column (composite) indexes, the order of columns and their sort direction matters significantly.
- PostgreSQL can perform backward scans for composite indexes where all columns share the same order direction.
sql-- Composite index with both columns in ascending order CREATE INDEX idx_birthday_created_at ON users(birthday, created_at); -- Queries utilizing the composite index efficiently EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10; EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
sql-- Composite index with both columns in ascending order CREATE INDEX idx_birthday_created_at ON users(birthday, created_at); -- Queries utilizing the composite index efficiently EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10; EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
-
Mixed Ordering Challenges:
- Queries with mixed ordering (e.g.,
birthday DESC, created_at ASC
) require incremental sorting if the index does not match the exact ordering. - Incremental sorting adds overhead and can reduce performance.
sql-- Example query requiring mixed ordering EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10; -- Output: Incremental sort is used if no suitable index exists.
sql-- Example query requiring mixed ordering EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10; -- Output: Incremental sort is used if no suitable index exists.
- Queries with mixed ordering (e.g.,
-
Fixing Mixed Ordering:
- Create an index that explicitly matches the query's ordering requirements.
- Use descending or mixed orderings as needed for your query patterns.
sql-- Index explicitly designed for mixed ordering CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC); -- This index optimizes the mixed ordering query EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
sql-- Index explicitly designed for mixed ordering CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC); -- This index optimizes the mixed ordering query EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
Design Strategies
When designing index ordering:
- Analyze Common Query Patterns:
- Study the
ORDER BY
clauses in your queries. - Determine the most frequently used sorting combinations.
- Study the
- Match Index Order to Query Requirements:
- Align the index ordering with the sorting directions in the queries.
- Avoid unnecessary indexes for rare query patterns.
- Leverage PostgreSQL's Scan Capabilities:
- PostgreSQL can perform forward and backward scans effectively when all columns in the index share the same order direction.
- Mixed orderings require explicit support via carefully designed composite indexes.
Example
Consider these queries and their corresponding index strategies:
-- Queries with ascending or descending order
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
-- Supported by the following index:
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
-- Query with mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
-- Supported by the following index:
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);
-- Queries with ascending or descending order
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
-- Supported by the following index:
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
-- Query with mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
-- Supported by the following index:
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);
Summary
- Single-column indexes adapt flexibly to query sort directions.
- Composite indexes require careful design to match query patterns.
- Mixed orderings should be explicitly supported when frequent queries demand it.
- Always evaluate query performance using
EXPLAIN
to ensure indexes are utilized effectively.
Ordering NULLs in Indexes 📊
PostgreSQL offers precise control over how NULL
values are handled in indexes and queries, allowing developers to tailor behavior to specific requirements:
-- Default NULL handling in queries
SELECT * FROM users
ORDER BY birthday DESC; -- NULLs appear last by default
-- Explicit NULL ordering
SELECT * FROM users
ORDER BY birthday DESC NULLS LAST; -- Makes the default behavior explicit
-- Creating an index with NULL ordering
CREATE INDEX idx_birthday_null_first
ON users(birthday ASC NULLS FIRST); -- NULLs are treated as the smallest values
-- Default NULL handling in queries
SELECT * FROM users
ORDER BY birthday DESC; -- NULLs appear last by default
-- Explicit NULL ordering
SELECT * FROM users
ORDER BY birthday DESC NULLS LAST; -- Makes the default behavior explicit
-- Creating an index with NULL ordering
CREATE INDEX idx_birthday_null_first
ON users(birthday ASC NULLS FIRST); -- NULLs are treated as the smallest values
Key Considerations for NULL Ordering:
- Default Behavior: In PostgreSQL,
NULL
values are treated as larger than any non-NULL value by default in descending order and smaller in ascending order. - Custom NULL Ordering in Queries: Use the
NULLS FIRST
orNULLS LAST
clause inORDER BY
to explicitly define howNULL
values should be handled in query results. - Index Definition: When creating indexes, you can specify
NULLS FIRST
orNULLS LAST
to optimize query performance for specific use cases.- For example, if most queries require
NULL
values to appear first in results, defining an index withNULLS FIRST
can speed up retrieval.
- For example, if most queries require
- Matching Indexes with Queries: To ensure optimal performance, align the
NULL
handling in your query'sORDER BY
clause with theNULL
ordering in your index definition.
By understanding and leveraging these capabilities, you can achieve more efficient query execution and consistent result ordering tailored to your application's needs.
Remember: The art of indexing isn't just about creating indexes - it's about creating the right indexes for your specific use cases and maintaining them effectively over time.
📇
Do you have any questions, or simply wish to contact me privately? Don't hesitate to shoot me a DM on Twitter.
Have a wonderful day.
Abhishek 🙏