[Enhancement]: Compression job should process chunks in order of range_start #6755

RobAtticus · 2024-03-08T20:47:40Z

What type of enhancement is this?

User experience

What subsystems and features will be improved?

Compression

What does the enhancement do?

The compression job should process the chunks in order of their range_start so that the experimental rollup functionality is more effective. Without an order, it's possible for chunks to processed in an order that prevent full rollups from being done, because it may start rolling up a chunk "later" in timeline, then go back in the timeline, but now that partially rolled up chunk is too large to rollup into the one further back.

Implementation challenges

No response

nikkhils · 2024-03-25T08:19:13Z

@RobAtticus the current show_chunks logic uses the hypertable_id and table_id numbering values to do the sorting of the returned chunks. Typically, if we consider append only data insertions then that should be in sync with the time ranges.

We could return the chunks in dimension slice order though

RobAtticus · 2024-03-25T15:22:48Z

Is show_chunks used as part of the compression policy job? Basically what I've found is that sometimes the compression job will skip around in the set of chunks to be compressed, which leads to inefficient rollups. So this issue was about that, although I also think show_chunks should enforce dimension slice order rather than rely on the IDs (given backfills, untiering a chunk, etc)

nikkhils · 2024-03-26T14:16:39Z

@RobAtticus yeah, show_chunks is used in the compression policy logic.

yeah, maybe dimension_slice based sorting is the way to go. We will need documentation changes also if we go this route.

rodrigomideac · 2024-12-02T17:44:34Z

We migrated some data to our Timescale instance, and maybe due to the way that we did the ingestion the chunk naming was not ordered. When we enabled the rollup functionality (which by the way is incredible useful in our use case) the compression job was not able to achieve the full compress chunk interval. We setted it to 7 days, but most chunks were between 1 to 2 days only.

We had to decompress the table, and create a script to compress one by one, ordered by range_start. It would be easier if somehow the compression job would be able to do it by itself.

RobAtticus added the enhancement An enhancement to an existing feature for functionality label Mar 8, 2024

RobAtticus added the internal-team-ask label Jun 12, 2024

RobAtticus linked a pull request Jul 23, 2024 that will close this issue

Order chunks for compression by range_start #7148

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: Compression job should process chunks in order of range_start #6755

[Enhancement]: Compression job should process chunks in order of range_start #6755

RobAtticus commented Mar 8, 2024

nikkhils commented Mar 25, 2024

RobAtticus commented Mar 25, 2024

nikkhils commented Mar 26, 2024

rodrigomideac commented Dec 2, 2024

[Enhancement]: Compression job should process chunks in order of range_start #6755

[Enhancement]: Compression job should process chunks in order of range_start #6755

Comments

RobAtticus commented Mar 8, 2024

What type of enhancement is this?

What subsystems and features will be improved?

What does the enhancement do?

Implementation challenges

nikkhils commented Mar 25, 2024

RobAtticus commented Mar 25, 2024

nikkhils commented Mar 26, 2024

rodrigomideac commented Dec 2, 2024