-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CAgg migration performance regression #7517
base: main
Are you sure you want to change the base?
Fix CAgg migration performance regression #7517
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7517 +/- ##
==========================================
+ Coverage 80.06% 82.22% +2.15%
==========================================
Files 190 231 +41
Lines 37181 43373 +6192
Branches 9450 10909 +1459
==========================================
+ Hits 29770 35663 +5893
- Misses 2997 3383 +386
+ Partials 4414 4327 -87 ☔ View full report in Codecov by Sentry. |
7035bac
to
ffedd0a
Compare
There is no reason this change should not go in the release notes, so we should add a changelog entry. In addition, this is a non-trivial change, so we should probably stick to two reviewers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7ab9690
to
6da1365
Compare
When creating the migration plan we need to figure out the boundaries (min and max) of the primary dimension to split the COPY data into multiple transactions to don't exauste the instance resources. The problem is we're reading the boundaries direct from the CAgg view and when the realtime is enabled it lead to big performance penalty (this was one of the reasons for the new format). Also in the `COPY DATA` migration plan step we changed the batch size used that previously was based on the materialization hypertable partition range and now will be the Continuous Aggregate bucked size multiplied by 10 (ten). Fixed it by querying direct the materialization hypertable instead. Also in ef2cfe3 we introduced a problem that now we're not explicitly updating the watermark but instead warning the user to manually execute the refresh procedure to update it, but without update the watermark the query performance for realtime Continuous Aggregate can be awful so we introduced a small procedure to update the watermark for the new migrated cagg during the migration.
6da1365
to
04da963
Compare
When creating the migration plan we need to figure out the boundaries (min and max) of the primary dimension to split the COPY data into multiple transactions to don't exauste the instance resources.
The problem is we're reading the boundaries direct from the CAgg view and when the realtime is enabled it lead to big performance penalty (this was one of the reasons for the new format).
Also in the
COPY DATA
migration plan step we changed the batch size used that previously was based on the materialization hypertable partition range and now will be the Continuous Aggregate bucked size multiplied by 10 (ten).Fixed it by querying direct the materialization hypertable instead.
Also in ef2cfe3 we introduced a problem that now we're not explicitly updating the watermark but instead warning the user to manually execute the refresh procedure to update it, but without update the watermark the query performance for realtime Continuous Aggregate can be awful so we introduced a small procedure to update the watermark for the new migrated cagg during the migration.
Disable-check: force-changelog-file