PostgreSQL Deep Dive Part 12 — Advanced Indexing Strategies and Optimization | Abhishek Kumar Singh

Access complete, unfiltered study notes maintained on GitHub for easily maintaing Second Brain Dir

Strategic Index Creation: Where and When 🎨

Creating indexes is more art than science. While building a schema is scientific, indexing requires understanding your specific access patterns and query behaviors. Simply adding indexes to every column is a common pitfall — remember that each index is a separate data structure storing copied data, which can impact performance and storage.

sql

-- Sample users table with common columns
CREATE TABLE users (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    first_name TEXT,
    last_name TEXT,
    email TEXT,
    birthday DATE,
    is_pro BOOLEAN,
    deleted_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- Common query pattern without index
SELECT * FROM users WHERE birthday = '1989-02-14';
 
-- Analyzing current performance
EXPLAIN SELECT * FROM users WHERE birthday = '1989-02-14';
-- Result shows: Seq Scan on users - scanning entire table

sql

-- Sample users table with common columns
CREATE TABLE users (
    id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    first_name TEXT,
    last_name TEXT,
    email TEXT,
    birthday DATE,
    is_pro BOOLEAN,
    deleted_at TIMESTAMPTZ,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);
 
-- Common query pattern without index
SELECT * FROM users WHERE birthday = '1989-02-14';
 
-- Analyzing current performance
EXPLAIN SELECT * FROM users WHERE birthday = '1989-02-14';
-- Result shows: Seq Scan on users - scanning entire table

When analyzing potential index locations, consider these key factors:

WHERE clause columns - The most obvious candidates for indexing, as they directly filter your data
JOIN conditions - Critical for optimizing table relationships and preventing expensive operations
ORDER BY clauses - Can eliminate costly sorting operations when properly indexed
GROUP BY expressions - Important for aggregation performance

Understanding Index Selectivity 📊

Index selectivity is crucial for determining whether a column is a good candidate for indexing. It helps us understand if creating an index will actually improve query performance.

Think of it this way: if you have a column where most values are the same (like a first_name column where everyone is named "Abhishek"), an index won't help much since it won't narrow down the results effectively.

Two key concepts determine index effectiveness:

Cardinality: The number of unique values in a column (e.g., a boolean column has cardinality of 2)
Selectivity: The ratio of unique values to total rows (cardinality/total rows)

sql

-- Calculating column cardinality and selectivity
SELECT
    COUNT(DISTINCT birthday) as cardinality,
    (COUNT(DISTINCT birthday)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as selectivity
FROM users;
 
-- Example results:
-- cardinality | selectivity
-- 10950       | 0.0111
 
-- Comparing with perfect selectivity (primary key)
SELECT
    (COUNT(DISTINCT id)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as id_selectivity
FROM users;
-- Result: 1.0000

sql

-- Calculating column cardinality and selectivity
SELECT
    COUNT(DISTINCT birthday) as cardinality,
    (COUNT(DISTINCT birthday)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as selectivity
FROM users;
 
-- Example results:
-- cardinality | selectivity
-- 10950       | 0.0111
 
-- Comparing with perfect selectivity (primary key)
SELECT
    (COUNT(DISTINCT id)::DECIMAL / COUNT(*)::DECIMAL)::DECIMAL(7,4) as id_selectivity
FROM users;
-- Result: 1.0000

Key selectivity principles:

Higher selectivity (closer to 1.0) = better index candidate - like primary keys which have perfect selectivity
Very low selectivity (like 0.0000024 for boolean columns) might lead PostgreSQL to ignore the index
PostgreSQL uses internal statistics to decide whether to use indexes - if a query returns more than ~50% of rows, it might skip the index entirely
Statistics can be updated manually with ANALYZE or automatically via auto-vacuum

Mastering Composite Indexes: Understanding the Left-Most Prefix Rule 🔧

Composite indexes can dramatically improve query performance by indexing multiple columns together. While PostgreSQL can combine results from multiple single-column indexes, a well-designed composite index usually yields better performance, especially for queries that filter on multiple columns.

Creating and Using Composite Indexes

Let's look at a practical example:

sql

-- Creating a composite index on three columns
CREATE INDEX multi_idx ON users USING btree(first_name, last_name, birthday);
 
-- These queries effectively use the index:
SELECT * FROM users WHERE first_name = 'Abhishek';
-- Uses index (microseconds)
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index (microseconds)
 
-- The order of conditions doesn't matter - both use the index efficiently:
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index
SELECT * FROM users WHERE last_name = 'Singh' AND first_name = 'Abhishek';
-- Also uses index
 
-- This query won't use the index (takes ~6 seconds):
SELECT * FROM users WHERE last_name = 'Singh';  -- Sequential scan

sql

-- Creating a composite index on three columns
CREATE INDEX multi_idx ON users USING btree(first_name, last_name, birthday);
 
-- These queries effectively use the index:
SELECT * FROM users WHERE first_name = 'Abhishek';
-- Uses index (microseconds)
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index (microseconds)
 
-- The order of conditions doesn't matter - both use the index efficiently:
SELECT * FROM users WHERE first_name = 'Abhishek' AND last_name = 'Singh';
-- Uses index
SELECT * FROM users WHERE last_name = 'Singh' AND first_name = 'Abhishek';
-- Also uses index
 
-- This query won't use the index (takes ~6 seconds):
SELECT * FROM users WHERE last_name = 'Singh';  -- Sequential scan

Understanding the Left-Most Prefix Rule

The Left-Most Prefix Rule is fundamental to composite index behavior. Here's how it works:

Column Order in Index Definition:
- The order of columns in your CREATE INDEX statement is crucial
- The query optimizer respects this order, regardless of the order of conditions in your WHERE clause
- Think of it as a hierarchical phone book where you must know the first name to find the last name
B-tree Structure:
- The composite index creates a nested B-tree structure
- first_name is the primary sort key
- last_name is sorted within each first_name group
- birthday is sorted within each first_name + last_name combination
Key Rules:
- Must use columns left-to-right from the index definition
- Cannot skip the leftmost column(s)
- PostgreSQL has optimizations for middle column access, but the leftmost column is always required
- Range conditions > or < or BETWEEN typically stop further index usage

Important Implications

sql

-- These queries CAN use the index:
SELECT * FROM users
WHERE first_name = 'Abhishek'
  AND last_name = 'Singh'
  AND birthday = '1990-01-01';  -- Uses all columns
 
SELECT * FROM users
WHERE first_name = 'Abhishek'
  AND birthday = '1990-01-01';  -- PostgreSQL can skip last_name
 
-- These queries CANNOT use the index effectively:
SELECT * FROM users
WHERE last_name = 'Singh';  -- Missing leftmost column
 
SELECT * FROM users
WHERE birthday = '1990-01-01';  -- Missing leftmost column

sql

-- These queries CAN use the index:
SELECT * FROM users
WHERE first_name = 'Abhishek'
  AND last_name = 'Singh'
  AND birthday = '1990-01-01';  -- Uses all columns
 
SELECT * FROM users
WHERE first_name = 'Abhishek'
  AND birthday = '1990-01-01';  -- PostgreSQL can skip last_name
 
-- These queries CANNOT use the index effectively:
SELECT * FROM users
WHERE last_name = 'Singh';  -- Missing leftmost column
 
SELECT * FROM users
WHERE birthday = '1990-01-01';  -- Missing leftmost column

If you frequently need to search by last_name alone, you have two options:

Create a separate index on last_name:

sql

CREATE INDEX idx_lastname ON users(last_name);

sql

CREATE INDEX idx_lastname ON users(last_name);

Create a different composite index with last_name as the leftmost column:

sql

CREATE INDEX idx_lastname_firstname ON users(last_name, first_name);

sql

CREATE INDEX idx_lastname_firstname ON users(last_name, first_name);

This understanding of the Left-Most Prefix Rule and column ordering is crucial for designing efficient indexes that match your application's query patterns.

Understanding Composite Range Queries 🎯

Composite indexes with range conditions require special consideration in PostgreSQL, particularly in how the B-tree structure processes these queries. Understanding this behavior is crucial for optimizing query performance.

B-tree Index Behavior

In PostgreSQL, B-tree indexes organize data hierarchically in a sorted structure. When designing composite indexes that include range conditions, the order of columns significantly impacts how PostgreSQL traverses the B-tree:

Each level contains keys dividing data into ranges
The tree is traversed top-to-bottom for matching records
The leftmost prefix is crucial for efficient navigation

Index Structure Comparison

Let's examine two different approaches to indexing the same columns:

sql

-- Approach 1: Range condition last (Optimal)
CREATE INDEX first_last_birth ON users USING btree(
    first_name,
    last_name,
    birthday
);
 
-- Approach 2: Range condition in middle (Suboptimal)
CREATE INDEX first_birth_last ON users USING btree(
    first_name,
    birthday,
    last_name
);
 
-- Example query using these indexes
SELECT * FROM users
WHERE first_name = 'Abhishek'
    AND last_name = 'Singh'
    AND birthday < '1989-12-31';

sql

-- Approach 1: Range condition last (Optimal)
CREATE INDEX first_last_birth ON users USING btree(
    first_name,
    last_name,
    birthday
);
 
-- Approach 2: Range condition in middle (Suboptimal)
CREATE INDEX first_birth_last ON users USING btree(
    first_name,
    birthday,
    last_name
);
 
-- Example query using these indexes
SELECT * FROM users
WHERE first_name = 'Abhishek'
    AND last_name = 'Singh'
    AND birthday < '1989-12-31';

Performance Implications

Optimal Approach (first_last_birth)

PostgreSQL first matches first_name = 'Abhishek'
- Navigates directly to the correct branch
Then matches last_name = 'Singh'
- Further narrows within that branch
Finally, scans birthday < '1989-12-31'
- Sequential scan within already narrowed range

Suboptimal Approach (first_birth_last)

Matches first_name = 'Abhishek'
For birthday < '1989-12-31':
- Must scan multiple paths
- Widens the search space
last_name comparison becomes less effective
- Range condition has already broadened the search

Best Practices for Range Conditions

Place equality conditions first
- Most selective columns should lead
- Creates distinct, narrow paths in B-tree
- Enables precise search space reduction
Put range conditions last
- Range conditions create multiple potential paths
- Placing them last minimizes the search space
- Prevents undermining subsequent column filters
Consider query patterns when ordering columns
- Match index structure to common query patterns
- Monitor query performance with EXPLAIN
- Create specific indexes for critical queries

Index Design Guidelines

When designing composite indexes with range conditions:

sql

-- Good: Range condition at end
CREATE INDEX good_index ON users(
    equality_column1,    -- Most selective
    equality_column2,    -- Less selective
    range_column        -- Range conditions last
);
 
-- Avoid: Range condition in middle
CREATE INDEX suboptimal_index ON users(
    equality_column1,
    range_column,       -- Creates inefficient scan
    equality_column2
);

sql

-- Good: Range condition at end
CREATE INDEX good_index ON users(
    equality_column1,    -- Most selective
    equality_column2,    -- Less selective
    range_column        -- Range conditions last
);
 
-- Avoid: Range condition in middle
CREATE INDEX suboptimal_index ON users(
    equality_column1,
    range_column,       -- Creates inefficient scan
    equality_column2
);

Following these principles ensures optimal B-tree utilization and query performance. The key is understanding that once a range condition is encountered, PostgreSQL must consider multiple paths in the B-tree, making any subsequent column conditions less effective for filtering.

Combining Multiple Indexes 🔄

PostgreSQL can use multiple indexes in a single query when appropriate:

sql

-- Creating separate indexes
CREATE INDEX idx_first_name ON users(first_name);
CREATE INDEX idx_last_name ON users(last_name);
 
-- PostgreSQL can combine these indexes for queries like:
SELECT * FROM users
WHERE first_name = 'Abhishek'
			AND last_name = 'Singh';
 
-- Check the execution plan
EXPLAIN ANALYZE
	SELECT * FROM users
	WHERE first_name = 'Abhishek'
			AND last_name = 'Singh';

sql

-- Creating separate indexes
CREATE INDEX idx_first_name ON users(first_name);
CREATE INDEX idx_last_name ON users(last_name);
 
-- PostgreSQL can combine these indexes for queries like:
SELECT * FROM users
WHERE first_name = 'Abhishek'
			AND last_name = 'Singh';
 
-- Check the execution plan
EXPLAIN ANALYZE
	SELECT * FROM users
	WHERE first_name = 'Abhishek'
			AND last_name = 'Singh';

Index combination strategies:

AND conditions might use multiple indexes
OR conditions usually need separate indexes
Cost-based optimizer decides final approach

Covering Indexes: Optimizing Query Performance 🎯

Covering indexes represent one of PostgreSQL's most powerful optimization techniques. They can eliminate heap lookups entirely, providing significant performance benefits:

sql

-- Basic index
CREATE INDEX idx_first ON users(first_name);
-- This index only stores first_name and a heap pointer (ctid)
 
-- Covering index
CREATE INDEX idx_first_cover ON users(first_name) INCLUDE (id);
-- Stores first_name in B-tree structure and id in leaf nodes only
 
-- The difference becomes clear in execution plans
EXPLAIN
    SELECT first_name, last_name
    FROM users
    WHERE first_name = 'Abhishek';
    -- Requires heap lookup because last_name isn't in index
    -- You'll see "Index Scan" in the plan
 
EXPLAIN
    SELECT first_name, id
    FROM users
    WHERE first_name = 'Abhishek';
    -- Index only scan! No heap access needed
    -- You'll see "Index Only Scan" in the plan
 
-- You can also create composite indexes for covering
CREATE INDEX idx_composite ON users(first_name, last_name);
-- Different from INCLUDE - both columns participate in B-tree

sql

-- Basic index
CREATE INDEX idx_first ON users(first_name);
-- This index only stores first_name and a heap pointer (ctid)
 
-- Covering index
CREATE INDEX idx_first_cover ON users(first_name) INCLUDE (id);
-- Stores first_name in B-tree structure and id in leaf nodes only
 
-- The difference becomes clear in execution plans
EXPLAIN
    SELECT first_name, last_name
    FROM users
    WHERE first_name = 'Abhishek';
    -- Requires heap lookup because last_name isn't in index
    -- You'll see "Index Scan" in the plan
 
EXPLAIN
    SELECT first_name, id
    FROM users
    WHERE first_name = 'Abhishek';
    -- Index only scan! No heap access needed
    -- You'll see "Index Only Scan" in the plan
 
-- You can also create composite indexes for covering
CREATE INDEX idx_composite ON users(first_name, last_name);
-- Different from INCLUDE - both columns participate in B-tree

Important considerations:

Don't include large columns
Only include frequently accessed columns
Balance storage cost against performance gain

Partial Indexes: Targeted Performance Boost 🎯

Partial indexes allow you to index only specific portions of your table, reducing overhead while maintaining performance where it matters:

sql

-- Create partial index for pro users only
CREATE INDEX idx_email_pro ON users(email)
WHERE is_pro IS TRUE;
 
-- This query will use the partial index
EXPLAIN ANALYZE
SELECT * FROM users
WHERE email = 'abhishek@gmail.com'
			AND is_pro IS TRUE; -- important to pass this to use index
 
-- Unique partial index for active users
CREATE UNIQUE INDEX idx_unique_active_email
ON users(email)
WHERE deleted_at IS NULL;

sql

-- Create partial index for pro users only
CREATE INDEX idx_email_pro ON users(email)
WHERE is_pro IS TRUE;
 
-- This query will use the partial index
EXPLAIN ANALYZE
SELECT * FROM users
WHERE email = 'abhishek@gmail.com'
			AND is_pro IS TRUE; -- important to pass this to use index
 
-- Unique partial index for active users
CREATE UNIQUE INDEX idx_unique_active_email
ON users(email)
WHERE deleted_at IS NULL;

This approach is particularly powerful for:

Tables with distinct subsets of data
Enforcing unique constraints on subset
Optimizing specific query patterns

Index Ordering Strategies 🔄

Index ordering can significantly impact query performance, particularly for queries requiring sorted results. Understanding and optimizing index ordering ensures efficient execution of queries without unnecessary sorting operations.

Key Concepts for Index Ordering

Single-Column Indexes:

When creating an index on a single column, the order (ascending or descending) typically does not matter for performance.
PostgreSQL automatically scans the index in forward or reverse order depending on the query's ORDER BY clause.

sql

-- Example of a single-column index
CREATE INDEX idx_birthday ON users(birthday);
 
-- Queries utilizing the index efficiently
EXPLAIN SELECT * FROM users ORDER BY birthday LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC LIMIT 10;

sql

-- Example of a single-column index
CREATE INDEX idx_birthday ON users(birthday);
 
-- Queries utilizing the index efficiently
EXPLAIN SELECT * FROM users ORDER BY birthday LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC LIMIT 10;

Composite Indexes:

For multi-column (composite) indexes, the order of columns and their sort direction matters significantly.
PostgreSQL can perform backward scans for composite indexes where all columns share the same order direction.

sql

-- Composite index with both columns in ascending order
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
 
-- Queries utilizing the composite index efficiently
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;

sql

-- Composite index with both columns in ascending order
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
 
-- Queries utilizing the composite index efficiently
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;

Mixed Ordering Challenges:

Queries with mixed ordering (e.g., birthday DESC, created_at ASC) require incremental sorting if the index does not match the exact ordering.
Incremental sorting adds overhead and can reduce performance.

sql

-- Example query requiring mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
 
-- Output: Incremental sort is used if no suitable index exists.

sql

-- Example query requiring mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
 
-- Output: Incremental sort is used if no suitable index exists.

Fixing Mixed Ordering:

Create an index that explicitly matches the query's ordering requirements.
Use descending or mixed orderings as needed for your query patterns.

sql

-- Index explicitly designed for mixed ordering
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);
 
-- This index optimizes the mixed ordering query
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;

sql

-- Index explicitly designed for mixed ordering
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);
 
-- This index optimizes the mixed ordering query
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;

Design Strategies

When designing index ordering:

Analyze Common Query Patterns:
- Study the ORDER BY clauses in your queries.
- Determine the most frequently used sorting combinations.
Match Index Order to Query Requirements:
- Align the index ordering with the sorting directions in the queries.
- Avoid unnecessary indexes for rare query patterns.
Leverage PostgreSQL's Scan Capabilities:
- PostgreSQL can perform forward and backward scans effectively when all columns in the index share the same order direction.
- Mixed orderings require explicit support via carefully designed composite indexes.

Example

Consider these queries and their corresponding index strategies:

sql

-- Queries with ascending or descending order
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
 
-- Supported by the following index:
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
 
-- Query with mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
 
-- Supported by the following index:
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);

sql

-- Queries with ascending or descending order
EXPLAIN SELECT * FROM users ORDER BY birthday, created_at LIMIT 10;
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at DESC LIMIT 10;
 
-- Supported by the following index:
CREATE INDEX idx_birthday_created_at ON users(birthday, created_at);
 
-- Query with mixed ordering
EXPLAIN SELECT * FROM users ORDER BY birthday DESC, created_at ASC LIMIT 10;
 
-- Supported by the following index:
CREATE INDEX idx_birthday_created_mixed ON users(birthday DESC, created_at ASC);

Summary

Single-column indexes adapt flexibly to query sort directions.
Composite indexes require careful design to match query patterns.
Mixed orderings should be explicitly supported when frequent queries demand it.
Always evaluate query performance using EXPLAIN to ensure indexes are utilized effectively.

Ordering NULLs in Indexes 📊

PostgreSQL offers precise control over how NULL values are handled in indexes and queries, allowing developers to tailor behavior to specific requirements:

sql

-- Default NULL handling in queries
SELECT * FROM users
ORDER BY birthday DESC;  -- NULLs appear last by default
 
-- Explicit NULL ordering
SELECT * FROM users
ORDER BY birthday DESC NULLS LAST;  -- Makes the default behavior explicit
 
-- Creating an index with NULL ordering
CREATE INDEX idx_birthday_null_first
ON users(birthday ASC NULLS FIRST);  -- NULLs are treated as the smallest values

sql

-- Default NULL handling in queries
SELECT * FROM users
ORDER BY birthday DESC;  -- NULLs appear last by default
 
-- Explicit NULL ordering
SELECT * FROM users
ORDER BY birthday DESC NULLS LAST;  -- Makes the default behavior explicit
 
-- Creating an index with NULL ordering
CREATE INDEX idx_birthday_null_first
ON users(birthday ASC NULLS FIRST);  -- NULLs are treated as the smallest values

Key Considerations for NULL Ordering:

Default Behavior: In PostgreSQL, NULL values are treated as larger than any non-NULL value by default in descending order and smaller in ascending order.
Custom NULL Ordering in Queries: Use the NULLS FIRST or NULLS LAST clause in ORDER BY to explicitly define how NULL values should be handled in query results.
Index Definition: When creating indexes, you can specify NULLS FIRST or NULLS LAST to optimize query performance for specific use cases.
- For example, if most queries require NULL values to appear first in results, defining an index with NULLS FIRST can speed up retrieval.
Matching Indexes with Queries: To ensure optimal performance, align the NULL handling in your query's ORDER BY clause with the NULL ordering in your index definition.

By understanding and leveraging these capabilities, you can achieve more efficient query execution and consistent result ordering tailored to your application's needs.

Remember: The art of indexing isn't just about creating indexes - it's about creating the right indexes for your specific use cases and maintaining them effectively over time.

📇

PostgreSQL Deep Dive Part 12 — Advanced Indexing Strategies and Optimization