Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Compression job should process chunks in order of range_start #6755

Open
RobAtticus opened this issue Mar 8, 2024 · 4 comments · May be fixed by #7148
Open

[Enhancement]: Compression job should process chunks in order of range_start #6755

RobAtticus opened this issue Mar 8, 2024 · 4 comments · May be fixed by #7148
Labels
enhancement An enhancement to an existing feature for functionality internal-team-ask

Comments

@RobAtticus
Copy link
Member

What type of enhancement is this?

User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

The compression job should process the chunks in order of their range_start so that the experimental rollup functionality is more effective. Without an order, it's possible for chunks to processed in an order that prevent full rollups from being done, because it may start rolling up a chunk "later" in timeline, then go back in the timeline, but now that partially rolled up chunk is too large to rollup into the one further back.

Implementation challenges

No response

@RobAtticus RobAtticus added the enhancement An enhancement to an existing feature for functionality label Mar 8, 2024
@nikkhils
Copy link
Contributor

@RobAtticus the current show_chunks logic uses the hypertable_id and table_id numbering values to do the sorting of the returned chunks. Typically, if we consider append only data insertions then that should be in sync with the time ranges.

We could return the chunks in dimension slice order though

@RobAtticus
Copy link
Member Author

Is show_chunks used as part of the compression policy job? Basically what I've found is that sometimes the compression job will skip around in the set of chunks to be compressed, which leads to inefficient rollups. So this issue was about that, although I also think show_chunks should enforce dimension slice order rather than rely on the IDs (given backfills, untiering a chunk, etc)

@nikkhils
Copy link
Contributor

@RobAtticus yeah, show_chunks is used in the compression policy logic.

yeah, maybe dimension_slice based sorting is the way to go. We will need documentation changes also if we go this route.

@rodrigomideac
Copy link

We migrated some data to our Timescale instance, and maybe due to the way that we did the ingestion the chunk naming was not ordered. When we enabled the rollup functionality (which by the way is incredible useful in our use case) the compression job was not able to achieve the full compress chunk interval. We setted it to 7 days, but most chunks were between 1 to 2 days only.

We had to decompress the table, and create a script to compress one by one, ordered by range_start. It would be easier if somehow the compression job would be able to do it by itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An enhancement to an existing feature for functionality internal-team-ask
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants