-
Notifications
You must be signed in to change notification settings - Fork 291
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amend -off-cpu-threshold value #316
Conversation
@@ -554,23 +554,22 @@ func loadAllMaps(coll *cebpf.CollectionSpec, cfg *Config, | |||
// On modern systems /proc/sys/kernel/pid_max defaults to 4194304. | |||
// Try to fit this PID space scaled down with cfg.OffCPUThreshold into | |||
// this map. | |||
adaption["sched_times"] = (4194304 / support.OffCPUThresholdMax) * cfg.OffCPUThreshold | |||
adaption["sched_times"] = (4194304 * cfg.OffCPUThreshold) / support.OffCPUThresholdMax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiply first to prevent rounding inaccuracies.
We should have better unit testing to avoid this kind of regressions |
We should definitely have better unit testing. The current test code coverage for this project is not impressive. But in the case mentioned in the PR description, the error came from a Cilium ebpf function, hard to trigger/catch in a unit tests. Btw, this PR adds a check for the off-cpu threshold value in |
Co-authored-by: Christos Kalkanis <[email protected]>
Sorry for the late comment on this. Wouldn't it be nicer to define this chance as a rate See https://pkg.go.dev/runtime#SetMutexProfileFraction for prior art that uses this approach. |
I agree that the current accepted values limit the user. Why not directly using a probability 1/X? (For example |
This amends the
-off-cpu-threshold
values toBefore, a value of 1000 disabled off-cpu profiling, thus enabling reporting of all events wasn't possible, which isn't what users might expect. The value of 0 was "valid" but caused a division-by-zero panic.
Additionally, when introducing new configuration variables, a default of non-zero (or empty string) may cause issues in other code bases. For example, our experimental OTEL collector/receiver stopped working, because the newly introduced off-cpu threshold value wasn't initialized with
support.OffCPUThresholdMax
(not easy to know when not closely following the PRs in this repo). A default of 0 for the off-cpu threshold naturally mitigates this.