[WIP] - Partitioning support #78

ggam · 2024-05-01T11:29:46Z

Problem

This is a draft PR adding partitioning support for Postgres as stated on dbt-labs/dbt-adapters#679. It doesn't use dbt coding best practices and lacks, but it works. I'm opening it to gather feedback on how to proceed.

Syntax is copied from on dbt-bigquery partitioning:

{{ config(
    materialized = 'table',
    partition_by = {
      "field": "created_at",
      "granularity": "month"
    }
) }}

select generate_series(current_date - interval '1000 day', current_date, '1 month'::interval)::date as created_at,
        'hello' as dummy_text

Benefits this brings:

Postgres doesn't allow to partition an existing table. If dbt creates an initially partitioned table, a DBA can then manually tune them or use pg_partman extension over it. That's not possible unless dbt is able to create partitioned tables.
When using insert+write incremental strategy, you can change the access method of old partitions to Citus columnar or Hydra, which don't allow deletes or updates. Incremental methods are currently not currently usable on those environments.

Moreover, creating a custom incremental strategy replacing partitions is then relatively easy as shown in dbt-labs/dbt-adapters#679. That incremental strategy is out of scope for this PR as there are a lot of possible variations.

IMO this patch is already in a stage that can be reviewed. I would help on some best practices and adding tests though.

Solution

To Do:

Contract config is not enforced
Tests
Partitioning means executing the SQL model twice, or temporarily storing it twice. This need to be documented.
Partition names concatenate the start day of the partition (regardless of granularity). Names longer than the maximum 64 characters are still not handlded correctly.
Add a default partition to handle uncovered ranges. Without specific partition support on incremental strategies, one must handle creation of future partitions. This can be done via a cron job or manually. Having a default partition means inserts won't fail. Handle incremental partitions
handle adding of new fields to the parent table

Checklist

I have read the contributing guide and understand what's expected of me
I have run this code in development and it appears to resolve the stated issue
This PR includes tests, or tests are not required/relevant for this PR
This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

ggam · 2024-05-01T11:34:14Z

dbt/include/postgres/macros/adapters.sql

-      {{ adapter.dispatch('get_column_names', 'dbt')() }}
-    )
-    {%- set sql = get_select_subquery(sql) %}
+  {% if config.get('partition_by') != None %}


I separated the logic for partitioning as it's easier to read. When partioning is not enabled, the code is exactly the same as before.

ggam · 2024-05-01T11:35:56Z

dbt/include/postgres/macros/adapters.sql

+    alter table {{ from_relation }} rename to {{ target_name }};
+
+    {# If the relation is partitioned, rename the subtables #}
+    {% set existing_partitions_query %}


Partitions are created with the __dbt_tmp suffix and so they need to be individually renamed. Sadly I don't think the config variable is available here so I don't know if there's a way to determine if the relation is partitioned.

HyuHiguchi · 2024-08-02T21:36:01Z

dbt/include/postgres/macros/materializations/incremental_strategies.sql

+{% endmacro %}
+
+{% macro postgres__get_incremental_delete_insert_sql(arg_dict) %}
+  {{ create_incremental_missing_partitions(arg_dict) }}


This was parsed but not executed.
When I wrote the following, it was executed and the partition was created.

{% do return( create_incremental_missing_partitions(arg_dict) + default__get_incremental_delete_insert_sql(arg_dict) ) %}

AniPatel · 2024-10-09T11:51:11Z

Do you have any plans for when these changes will be available in the master branch?

SChalupaBraiins · 2025-02-24T14:16:54Z

Any update on this? It is a great feature which would make our implementation much easier.

LoicEm · 2025-03-27T15:43:27Z

Hey @ggam what is your situation on this ?
This feature would be incredibly helpful in our current infrastructure, so I might take some time to iterate your work and make a fully viable PR :)

ggam · 2025-03-27T17:08:35Z

@LoicEm I've been using it in production since I opened the PR. I labeled it WIP since I was hoping for some maintainers review (there are no tests and I'm sure the code can be improved), but it definitely works.

The only caveats I've found after nearly a year of use:

For now I'm stuck in dbt 1.8 since I'm not completely sure the code will work as-is on 1.9.
When adding a new column, you need to manually add it to the parent table (alter table fct_test add new_field text;). Otherwise, the partition will have the column, but you won't see it in the partitioned table.
Partitions end with a strange name as I create them with a suffix. This can be solved but I haven't needed to fix it yet.
Postgres doesn't analyze the parent table for partitions. You need to do scheduled analyze on the parent table for the global statistics to be computed.

If anyone has the time and knowledge to review and improve my code, I'd be more than happy to incorporate it on my PR. Even if the code is not merged, it can be useful for other people.

CamFromStar · 2025-05-21T13:01:19Z

Just here to share my support for this

cla-bot bot added the cla:yes label May 1, 2024

ggam force-pushed the main branch from 135d274 to 23a0854 Compare May 1, 2024 11:30

ggam mentioned this pull request May 1, 2024

Add Table Partitioning Option for PostgreSQL dbt-labs/dbt-adapters#679

Open

ggam commented May 1, 2024

View reviewed changes

ggam force-pushed the main branch from dbd2a9a to 1bf527c Compare May 1, 2024 16:44

WIP - Partitioning support

b46ab1c

ggam force-pushed the main branch from 1bf527c to b46ab1c Compare May 1, 2024 16:45

ggam added 2 commits May 1, 2024 18:55

refactor

dc630af

Rough handling of incremental partitions

8361eeb

njuguna-n mentioned this pull request May 17, 2024

Improve BRAC model performance medic/cht-pipeline#82

Closed

HyuHiguchi reviewed Aug 2, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] - Partitioning support #78

[WIP] - Partitioning support #78

Uh oh!

ggam commented May 1, 2024 •

edited

Loading

Uh oh!

ggam May 1, 2024

Uh oh!

ggam May 1, 2024

Uh oh!

HyuHiguchi Aug 2, 2024

Uh oh!

AniPatel commented Oct 9, 2024

Uh oh!

SChalupaBraiins commented Feb 24, 2025 •

edited

Loading

Uh oh!

LoicEm commented Mar 27, 2025

Uh oh!

ggam commented Mar 27, 2025

Uh oh!

CamFromStar commented May 21, 2025

Uh oh!

Uh oh!

[WIP] - Partitioning support #78

Are you sure you want to change the base?

[WIP] - Partitioning support #78

Uh oh!

Conversation

ggam commented May 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Checklist

Uh oh!

ggam May 1, 2024

Choose a reason for hiding this comment

Uh oh!

ggam May 1, 2024

Choose a reason for hiding this comment

Uh oh!

HyuHiguchi Aug 2, 2024

Choose a reason for hiding this comment

Uh oh!

AniPatel commented Oct 9, 2024

Uh oh!

SChalupaBraiins commented Feb 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LoicEm commented Mar 27, 2025

Uh oh!

ggam commented Mar 27, 2025

Uh oh!

CamFromStar commented May 21, 2025

Uh oh!

Uh oh!

ggam commented May 1, 2024 •

edited

Loading

SChalupaBraiins commented Feb 24, 2025 •

edited

Loading