-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix auto date histogram rounding assertion bug #17023
base: main
Are you sure you want to change the base?
Fix auto date histogram rounding assertion bug #17023
Conversation
❕ Gradle check result for 0ecdf31: UNSTABLE Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17023 +/- ##
============================================
+ Coverage 72.24% 72.41% +0.16%
- Complexity 65421 65503 +82
============================================
Files 5306 5306
Lines 304215 304222 +7
Branches 44118 44119 +1
============================================
+ Hits 219788 220294 +506
+ Misses 66414 65903 -511
- Partials 18013 18025 +12 ☔ View full report in Codecov by Sentry. |
@bowenlan-amzn can you take a look when you have a chance? Thank you! |
I remember the optimization will not be applied if the aggregation defined a timezone, so this bug is kind of a surprise. Lines 67 to 68 in ef87b39
The block of timezone is inside getInterval OpenSearch/server/src/main/java/org/opensearch/common/Rounding.java Lines 1385 to 1388 in 1e49aa8
The timezone is part of the Rounding object, so we need to have Rounding first then check the timezone. However, in auto datehistogram, to get the rounding, we are updating prepared rounding also which leading to this bug. I recommend we add a simple timezone check right before Lines 67 to 68 in ef87b39
It purely takes in a Rounding (for autodatehistogram, this can just be the first Rounding in RouningInfos, since every Rounding would have the same timezone information) and do the check OpenSearch/server/src/main/java/org/opensearch/common/Rounding.java Lines 1385 to 1388 in 1e49aa8
If the check doesn't pass, we don't even bother to go inside getRounding method. The check you added here (not shrink the rounding) is still meaningful, agree it should be ever increasing, but not shrink depending on the next segment processed. |
@jainankitk can you take a look when you have a chance? Thanks! |
Hard to describe my suggest change, so I create a commit, please take a look. |
Another thing I'm wondering is why we cannot support timezone in this optimization. When timezone is there, the aggregation default method first converts the date values to local value, do the rounding, then convert it back to UTC To support that in our optimization, we should figure out the right ranges for certain timezone and represent in UTC, then use these ranges to intersect the index structure of date field which saves UTC values. |
I see what you mean this is much cleaner. Cherry picked this commit. |
❌ Gradle check result for d4f5dc1: FAILURE Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change? |
Signed-off-by: Finn Carroll <[email protected]>
…te histo assertion bug per opensearch-project#16932 Signed-off-by: Finn Carroll <[email protected]>
… preparedRounding of agg. Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: Finn Carroll <[email protected]>
Signed-off-by: bowenlan-amzn <[email protected]>
d4f5dc1
to
dc88032
Compare
@@ -489,7 +489,7 @@ public double roundingSize(long utcMillis, DateTimeUnit timeUnit) { | |||
* | |||
* @opensearch.internal | |||
*/ | |||
static class TimeUnitRounding extends Rounding { | |||
public static class TimeUnitRounding extends Rounding { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed
roundingIdx = Math.max(prevRoundingIdx, roundingIdx); | ||
if (roundingIdx != prevRoundingIdx) { | ||
preparedRounding = prepareRounding(roundingIdx); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
slightly more readable?
if (roundingIdx > prevRoundingIdx) {
roundingIdx = prevRoundingIdx;
preparedRounding = prepareRounding(roundingIdx);
}
/** | ||
* Quick explanation of the two below conditions: | ||
* | ||
* 1. [targetBuckets * roundingInfos[roundingIdx].getMaximumInnerInterval()] | ||
* Represents the total bucket count possible before we will exceed targetBuckets | ||
* even if we use the maximum inner interval of our current rounding. For example, consider the | ||
* DAYS_OF_MONTH rounding where the maximum inner interval is 7 days (i.e. 1 week buckets). | ||
* targetBuckets * roundingInfos[roundingIdx].getMaximumInnerInterval() would then be the number of | ||
* 1 day buckets possible such that if we re-bucket to 1 week buckets we will have more 1 week buckets | ||
* than our targetBuckets limit. If the current count of buckets exceeds this limit we must update | ||
* our rounding. | ||
* | ||
* 2. [targetBuckets * roundingInfos[roundingIdx].getMaximumRoughEstimateDurationMillis()] | ||
* The total duration of ms covered by our current rounding. In the case of MINUTES_OF_HOUR rounding | ||
* getMaximumRoughEstimateDurationMillis is 60000. If our current total range in millis (max - min) | ||
* exceeds this range we must update our rounding. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding details documentation!
I agree with @bowenlan-amzn. We should try to support the optimization with timezones as well to have wider coverage. @bowenlan-amzn - Can you please create issue for same? |
Description
The auto date histogram agg contains an optimization which skips traditional doc collection and instead examines the "pre-aggregated" doc counts contained in the BKD tree of each segment. Several conditions need to be true before this optimization can execute for a given segment. One such condition is we must be able to determine a set of ranges (or rounding) for the segment under consideration before optimizing.
Normally ranges for the auto date histogram aggregation are updated as needed over the course of collecting documents but the filter rewrite optimization will update the
preparedRounding
of the agg in accordance with the min & max values of the segment under consideration ahead of time since it skips regular doc collection.As a result, it is possible for our
preparedRounding
to shrink as the rounding built from the segment could easily be smaller than the rounding previously used by our shard level aggregator.This usually does not pose a problem as the
preparedRounding
will be updated accordingly when we collect our next document, or reduce our shard level aggs into a single top level agg.The specific case where this becomes problematic is when our
preparedRounding
is delegating to a "bounded" structure. When we prepare a rounding we do so for the min & max epoch time of our shard since this allows us to optimize the structure we delegate rounding to.For some ranges of epoch time and time zones rounding will be little more than a modulo operation. However if our min & max epoch time crosses "transitions" such as daylight savings we may want to delegate rounding to a linked list or array structure to quickly lookup these transitions. This is why the specific occurrence of this bug linked in the initial issue only appears when
"time_zone":"America/New_York"
.The combination of delegating rounding to these strictly bounded structures and the filter rewrite optimization "replaying" our previous bucket keys fails an assertion within our
preparedRounding
as our previous bucket keys are not guaranteed to fit within the strict bounds of the rounding prepared for the current segment being collected.The changes in this PR resolve this by ensuring the filter rewrite optimization only ever increases the granularity of our
preparedRounding
.Related Issues
Resolves #16932
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.