Skip to content

Hands on system design labs for learning scaling strategies of postgres

Notifications You must be signed in to change notification settings

farmanp/postgres-system-design-labs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Hands-on Guide to System Design with PostgreSQL

PostgreSQL is a powerful relational database system with extensive features to help developers and engineers build scalable and efficient systems. This hands-on guide demonstrates how to approach system design using PostgreSQL, emphasizing practical experimentation and concepts like indexing, caching, and query optimization. It leverages what we've explored to foster a deeper understanding of PostgreSQL's role in system design.


Introduction to System Design with PostgreSQL

System design involves creating efficient, scalable, and maintainable systems to meet functional and non-functional requirements. PostgreSQL’s features make it an excellent choice for designing systems with:

  • High Performance: Optimized for both OLTP and OLAP workloads.
  • Scalability: Parallel queries, partitioning, and support for massive datasets.
  • Reliability: ACID compliance and strong support for constraints.
  • Extensibility: Rich extensions (e.g., PostGIS, pgAudit) and advanced features (e.g., JSON support).

This guide will help you:

  1. Understand PostgreSQL's core scaling strategies.
  2. Explore practical experiments to test system behavior.
  3. Apply learnings to real-world system design.

Experiment 1: Query Optimization and Indexing

Objective:

Understand how PostgreSQL optimizes queries with and without indexes.

Steps:

1. Create a Large Dataset

Populate a table with 1 million customer records:

CREATE TABLE customers (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255),
    email VARCHAR(255) UNIQUE,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Generate synthetic data using Python or SQL loops.

2. Query Without an Index

Run a query to filter records by email:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM customers WHERE email = '[email protected]';
  • Expected Behavior: Sequential scans for large tables are costly.
  • Observation: Use query plans to see the query’s execution path.

3. Add an Index

CREATE INDEX idx_customers_email ON customers(email);

Re-run the query:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM customers WHERE email = '[email protected]';
  • Expected Behavior: Index scan significantly reduces execution time.
  • Observation: Compare query execution times and buffer usage.

Insights:

  • Understand the trade-offs between sequential and indexed scans.
  • Learn how indexes improve query performance.

Experiment 2: Parallelism in PostgreSQL

Objective:

Observe PostgreSQL’s parallel query execution and its impact on performance.

Steps:

1. Enable Parallelism

Ensure parallel queries are enabled in PostgreSQL:

SHOW max_parallel_workers_per_gather;
SHOW parallel_setup_cost;
SHOW parallel_tuple_cost;

2. Execute a Parallel Query

Run a query on a large dataset without indexing:

EXPLAIN (ANALYZE, VERBOSE, BUFFERS) SELECT * FROM customers WHERE name LIKE '%example%';
  • Expected Behavior: PostgreSQL distributes the scan across multiple workers.
  • Observation: Examine the Gather and Parallel Seq Scan nodes in the query plan.

3. Adjust Parallelism

Test the impact of different worker settings:

SET max_parallel_workers_per_gather = 4;

Re-run the query and observe the performance difference.

Insights:

  • Learn how PostgreSQL dynamically scales queries using parallel workers.
  • Understand the trade-offs between parallelism and resource usage.

Experiment 3: Caching and Memory Optimization

Objective:

Understand PostgreSQL’s caching mechanisms and their impact on performance.

Steps:

1. Monitor Cache Usage

Query PostgreSQL’s pg_stat_database to view cache hit ratios:

SELECT datname, blks_hit, blks_read,
       blks_hit * 100.0 / NULLIF(blks_hit + blks_read, 0) AS cache_hit_ratio
FROM pg_stat_database;
  • Expected Behavior: High cache hit ratio (>90%) for frequently accessed data.
  • Observation: Identify workloads that benefit from memory optimization.

2. Force Disk Reads

Restart the database server to clear cache:

docker restart postgres-container

Re-run queries to observe increased disk reads:

EXPLAIN (ANALYZE, BUFFERS) SELECT * FROM customers WHERE email = '[email protected]';

3. Tune Memory Settings

Modify PostgreSQL’s shared_buffers to allocate more memory:

SHOW shared_buffers;
ALTER SYSTEM SET shared_buffers = '512MB';

Reload the configuration and observe performance improvements.

Insights:

  • Learn how caching minimizes disk I/O.
  • Tune memory settings for optimized resource usage.

Experiment 4: Constraints and Scaling Strategies

Objective:

Test the role of constraints (e.g., unique constraints) and their impact on scaling.

Steps:

1. Drop a Unique Constraint

Remove the unique constraint to simulate a system without strict guarantees:

ALTER TABLE customers DROP CONSTRAINT customers_email_key;

Re-run queries and observe the absence of unique index benefits.

2. Add the Constraint Back

Recreate the unique constraint:

ALTER TABLE customers ADD CONSTRAINT customers_email_key UNIQUE (email);
  • Expected Behavior: Queries become faster due to the automatic unique index.
  • Observation: Constraints enforce data integrity and improve performance.

Building Scalable Systems with PostgreSQL

Use these experiments as building blocks for designing scalable and efficient systems:

  1. Indexing Strategies:

    • Use compound indexes for multi-column queries.
    • Analyze query patterns to decide which columns to index.
  2. Partitioning:

    • Partition large tables for better performance.
    • Use declarative partitioning for time-series or sharded datasets.
  3. Connection Pooling:

    • Implement pooling with tools like pgbouncer for high-concurrency systems.
  4. High Availability:

    • Set up replication for fault tolerance.
    • Use tools like Patroni for automated failover.
  5. Monitoring and Tuning:

    • Monitor performance using tools like pg_stat_statements.
    • Continuously tune parameters based on workload.

Conclusion

System design with PostgreSQL is a combination of leveraging its advanced features, understanding its internal mechanisms, and applying best practices. By conducting hands-on experiments and interpreting query plans, you can gain valuable insights into PostgreSQL’s scaling strategies and design systems that perform efficiently under real-world workloads.

About

Hands on system design labs for learning scaling strategies of postgres

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published