Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse - Default Partition should be monthly (toYYYYMM) rather than daily #5079

Open
redsquare opened this issue Sep 10, 2024 · 8 comments

Comments

@redsquare
Copy link

Advised on Clickhouse to have partitions of between 30gb and 150gb per partition, I am sure most people using rudder->click do not have this volume daily therefore the default should be monthly - keeping the amount of parts on disk lower should be the preferred option

partitionByClause = fmt.Sprintf(`PARTITION BY toDate(%s)`, partitionField)

@contributor-support
Copy link

Thanks for opening this issue! We'll get back to you shortly. If it is a bug, please make sure to add steps to reproduce the issue.

@redsquare redsquare changed the title Default Partition should be monthly (toYYYYMM) rather than daily Clickhouse - Default Partition should be monthly (toYYYYMM) rather than daily Sep 10, 2024
@ericdodds
Copy link

@redsquare we are going to slot this into an upcoming sprint. I'll reach out to you for more info as we get closer to starting the work.

@redsquare
Copy link
Author

@ericdodds any update on this :)

@elliotdickison
Copy link

This would be super helpful to us as it's not possible to change partitions on a table after creation - fixing the problem after the fact is quite tricky.

@redsquare
Copy link
Author

@elliotdickison agree, @ericdodds any update on this please :)

@gitcommitshow
Copy link
Collaborator

Not shipped yet. I am following up with the team to prioritise this.

@elliotdickison
Copy link

elliotdickison commented Jan 9, 2025

We've come up with an SOP to work around this - we get the CREATE sql for a RudderStack table, modify the sql with the partitioning we want, create the table, copy data over to it, drop the old table, and rename the new table to match the old table's name. We've automated most of this via a script, we just have to remember to run it any time we add a new event and RudderStack creates a new table.

Given that good partitioning depends on use-case I think a config option to set the default partition strategy (hourly, daily, monthly, quarterly, yearly) might be helpful, although if you have to pick a single default I'd guess monthly is better than daily.

@redsquare
Copy link
Author

redsquare commented Jan 9, 2025 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants