Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory for active series custom trackers #5853

Closed

Conversation

zenador
Copy link
Contributor

@zenador zenador commented Aug 29, 2023

What this PR does

Reduce memory used by active series custom trackers by using maps (sparse) instead of slices. Aim is to at least reduce it for native histograms, but experimenting with applying the same change to normal active series as well, benchmark results below.

Which issue(s) this PR fixes or relates to

Fixes #5812

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@zenador zenador requested a review from a team as a code owner August 29, 2023 08:09
@zenador zenador changed the title educe mem native histogram custom trackers Reduce mem native histogram custom trackers Aug 29, 2023
@zenador zenador marked this pull request as draft August 29, 2023 08:10
@zenador zenador changed the title Reduce mem native histogram custom trackers Reduce memory for native histogram active series/buckets custom trackers Aug 29, 2023
@zenador zenador changed the title Reduce memory for native histogram active series/buckets custom trackers Reduce memory for active series custom trackers Aug 29, 2023
activeNativeHistogramBuckets uint32 // Number of buckets in active native histogram entries in this stripe. Only decreased during purge or clear.
activeMatchingNativeHistogramBuckets []uint32 // Number of buckets in active native histogram entries in this stripe matching each matcher of the configured Matchers.
active uint32 // Number of active entries in this stripe. Only decreased during purge or clear.
activeMatching map[uint16]uint32 // Number of active entries in this stripe matching each matcher of the configured Matchers.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the past we didn't use a map because it would be significantly slower to access than slices, if there are few elements. I'm very curious to see the benchmarks for different scenarios (no custom trackers, few custom trackers, many custom trackers).

Copy link
Contributor

@colega colega Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually in the past we did use a map, and then we switched to a slice. The reason behind this is that usually a series only matches between zero and one tracker (the trackers are usually the integration where they come from) with a few more in some use cases, but never many.

Edit: oh, I see we're not talking about specific series here but about the stripes, in which case you're right, an array is much faster to access, while the gain is probably not that big. A customer with 100 custom matchers would use 4 * 100 * 128 = 50Kb for all stripes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for this is #5318 (comment) and https://raintank-corp.slack.com/archives/C04AT2ANL90/p1690989928160289

Before @bboreham 's recent change in #5665 it was taking up a lot of memory, and even afterwards we are still tripling the amount needed as we are tracking native histogram series and buckets as well. Discussed with @krajorama and we do need to track those, and it makes sense to use the same set of custom matchers for float series and native histogram series, so we are exploring other ways to minimise the memory usage especially in cases where native histograms are not being used. So we could just implement the change for native histograms and leave the float active series tracking alone, but we thought we might as well explore the possibility of making them consistent.

Benchmark results:
compare_BenchmarkActiveSeries_UpdateSeries$.txt
compare_BenchmarkMatchesSeries.txt

Copy link
Contributor

@colega colega Aug 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note in your benchmarks that UpdateSeries is 5% slower in meaningful cases (when there are series, with or without matchers) while there's no memory improvement for the meaningful cases (there's only improvement when there are no series, saving 150Ki, same order of magnitude that I made above, but when there are 100K series the saving is 0.01% (because it's constant, doesn't depend on the series amount). There's also an increase in allocations, which will slow down the process more (as it generates more garbage to collect).

I wouldn't trade a 5% speed (which stretches with number of series) for 150Ki of constant memory.

@zenador zenador force-pushed the zenador/reduce-mem-native-histogram-custom-trackers branch from 2a852bb to 278add9 Compare August 29, 2023 17:46
@pracucci
Copy link
Collaborator

Closing for now due to inactivity, but feel free to re-open it anytime if you plan to get back to this. Thanks!.

@pracucci pracucci closed this Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce memory use of tracking active series and native histograms
3 participants