PostgreSQL Deep Dive Part 15 — EXPLAIN - Understanding Query Execution Plans | Abhishek Kumar Singh

Access complete, unfiltered study notes maintained on GitHub for easily maintaing Second Brain Dir

Understanding EXPLAIN Command Structure

The EXPLAIN command in PostgreSQL reveals the execution plan that the PostgreSQL planner generates for a given query. Let's start with basic examples and progressively analyze more complex scenarios:

sql

EXPLAIN SELECT * FROM users;

sql

EXPLAIN SELECT * FROM users;

This basic query will produce an output containing several key components:

The scan method chosen
Estimated startup and total cost
Estimated number of rows
Estimated average width of each row

Let's look at how the plan changes with different query conditions:

sql

EXPLAIN SELECT * FROM users LIMIT 10;
 
EXPLAIN SELECT * FROM users
WHERE first_name = 'Abhi'
LIMIT 10;

sql

EXPLAIN SELECT * FROM users LIMIT 10;
 
EXPLAIN SELECT * FROM users
WHERE first_name = 'Abhi'
LIMIT 10;

The execution plan shows how PostgreSQL decides to retrieve the data, whether through sequential scans, index scans, or other methods. Each node in the plan represents an operation, with costs and row estimates provided for each step.

Understanding Different Scan Methods

PostgreSQL employs various scanning methods to retrieve data efficiently. Let's explore each type in detail:

1. Index Only Scan (Most Efficient)

This is the optimal scanning method where PostgreSQL can satisfy the query using only the information stored in the index, without accessing the table heap.

When it occurs:

All required columns are present in the index
No additional columns need to be fetched from the table
The WHERE clause references only indexed columns

Example scenario:

sql

CREATE INDEX idx_email ON users(email);
EXPLAIN SELECT email FROM users WHERE email = 'user@example.com';

sql

CREATE INDEX idx_email ON users(email);
EXPLAIN SELECT email FROM users WHERE email = 'user@example.com';

2. Index Scan

A standard index scan occurs when PostgreSQL uses an index to locate specific rows but needs to fetch additional columns from the table.

Characteristics:

Uses the index to find relevant rows
Requires additional heap fetches for non-indexed columns
Generally efficient for selective queries

3. Bitmap Index Scan

This method is used when PostgreSQL needs to access a larger number of rows but not the entire table.

Process:

Scans the index and creates a bitmap in memory
Sorts the heap accesses in physical order
Retrieves the actual rows

This method is particularly efficient when:

Multiple indexes need to be combined
A moderate number of rows need to be retrieved
The rows are scattered across the table

4. Sequential Scan

A sequential scan reads the entire table in physical order.

When it's used:

No suitable indexes are available
A large portion of the table needs to be read
The table is small enough that sequential scanning is faster than index usage

Understanding Cost and Row Metrics

PostgreSQL's EXPLAIN provides several key metrics to understand query performance:

1. Startup Cost

Initial cost before the first row can be returned
Includes index traversal time
Important for queries that need to sort or aggregate data

2. Total Cost

Estimated total cost to complete the query
Includes all I/O and processing costs
Used by the planner to compare different execution strategies

3. Rows

Estimated number of rows that will be output
Based on table statistics
Critical for understanding query selectivity

4. Width

Estimated average width of rows in bytes
Helps understand memory usage
Important for optimizing data retrieval

EXPLAIN ANALYZE: Getting Real Execution Statistics

While EXPLAIN shows estimated costs and row counts, EXPLAIN ANALYZE actually executes the query and provides real timing and row counts.

sql

EXPLAIN ANALYZE SELECT * FROM users WHERE email < 'b';

sql

EXPLAIN ANALYZE SELECT * FROM users WHERE email < 'b';

Important considerations:

EXPLAIN ANALYZE actually executes the query
Use with caution on UPDATE, DELETE, or INSERT statements
Provides actual timing and row counts
Shows buffer usage and I/O statistics

The output includes:

Actual time spent at each node
Actual number of rows processed
Number of loops executed
Memory usage statistics

Example output analysis:

txt

Seq Scan on users  (cost=0.00..1638.00 rows=50000 width=90)
  (actual time=0.016..8.571 rows=49999 loops=1)
  Filter: (email < 'b'::text)
  Rows Removed by Filter: 1
Planning Time: 0.083 ms
Execution Time: 11.752 ms

txt

Seq Scan on users  (cost=0.00..1638.00 rows=50000 width=90)
  (actual time=0.016..8.571 rows=49999 loops=1)
  Filter: (email < 'b'::text)
  Rows Removed by Filter: 1
Planning Time: 0.083 ms
Execution Time: 11.752 ms

This shows:

Estimated vs actual row counts
Time spent planning vs executing
Filter effectiveness
Actual execution statistics

Understanding these metrics helps in:

Identifying query bottlenecks
Optimizing index usage
Improving query performance
Making informed decisions about database design

The EXPLAIN command and its various components provide invaluable insights into query execution and performance. By understanding these elements, developers and database administrators can make informed decisions about query optimization, index creation, and database design.

Remember that while EXPLAIN provides estimates based on statistics, EXPLAIN ANALYZE gives actual execution metrics but comes with the cost of actually running the query. Use these tools judiciously based on your specific needs and circumstances.

🏃‍♂️‍➡️

PostgreSQL Deep Dive Part 15 — EXPLAIN - Understanding Query Execution Plans