Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve symbols partitioning heuristics #3262

Open
kolesnikovae opened this issue May 1, 2024 · 1 comment · May be fixed by #3820
Open

Improve symbols partitioning heuristics #3262

kolesnikovae opened this issue May 1, 2024 · 1 comment · May be fixed by #3820
Assignees
Labels
enhancement New feature or request performance If there's anything we have to be really good at it's this storage Low level storage matters

Comments

@kolesnikovae
Copy link
Collaborator

kolesnikovae commented May 1, 2024

Pyroscope stores symbolic information and stack traces in a dedicated storage called symdb. The storage is partitioned by key, which is derived from the service_name label, or the main binary name, if it's present.

However, this approach leads to "shared" partitions, when we store symbols of multiple services together. In turn, this increases the read amplification factor; often, we fetch data that's not referenced by samples of a particular query. Analysis shows that more than 50% of the fetched symbols might be irrelevant to the query.

For example, consider the following series being ingested:

  • cpu nanoseconds service_name="foo" main_binary="main"
  • heap allocs service_name="foo" main_binary="main"
  • heap inuse service_name="foo" main_binary="main"
  • cpu nanoseconds service_name="bar" main_binary="main"

All them will be stored in the same partition, but in 99.9% cases, only one of them is accessed at query time.

Basically, there are two issues:

  • We do not take sample types into account. The problem is that different sample type categories often reference non-overlapping elements (especially stack traces).
  • We rely on the binary name. Ideally, there should be a way to distinguish binaries by build ID, but in most cases, we lack this information.

I propose to change the heuristic in the way that the partition key is built based on the service name and the sample type set the profile has. Thus, for example, service="foo" cpu nanoseconds won't be stored together with heap allocs, but service="foo" heap allocs will be stored together with heap inuse.

The caveat is that this will cause an increase in memory consumption in ingesters because some symbols will be duplicated among partitions. A notable case is when the same binary is run in multiple instances with different service_name label values (Pyroscope itself is an example). Either way, this duplication allows us to make the data more independent and decrease the read amplification factor.

@kolesnikovae kolesnikovae added enhancement New feature or request storage Low level storage matters performance If there's anything we have to be really good at it's this labels May 1, 2024
@kolesnikovae kolesnikovae self-assigned this May 1, 2024
@kolesnikovae
Copy link
Collaborator Author

Another issue is that:

  1. The main binary name (the name of the first mapping) might be malformed or non-deterministic.
  2. We determine the partition after the pprof split, which may result in profiles of the same service being split across multiple partitions.

@kolesnikovae kolesnikovae linked a pull request Jan 7, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance If there's anything we have to be really good at it's this storage Low level storage matters
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant