cap live_bytes to zero in a few places where GC intervals are computed #170

d-netto · 2024-07-29T19:51:17Z

PR Description

Caps live_bytes to 0 when computing GC intervals.

Checklist

Requirements for merging:

I have opened an issue or PR upstream on JuliaLang/julia: N/A (patch to our fork).
I have removed the port-to-* labels that don't apply.
I have opened a PR on raicode to test these changes: https://github.com/RelationalAI/raicode/pull/20626.

d-netto · 2024-07-29T19:54:29Z

As usual, we should definitely benchmark this before merging. Bonus points if we can run this on one of the problematic workloads pointed out by Todd.

kpamnany

Approving, but let's test this in RAICode before merging here.

kpamnany · 2024-07-29T19:54:56Z

src/gc.c

+    // XXX: we've observed that the `live_bytes` was negative in a few cases
+    // which is not expected. We should investigate this further, but let's just
+    // cap it to 0 for now.
+    int64_t live_bytes_for_interval_computation = live_bytes < 0 ? 0 : live_bytes;


Do we also want to cap it in the other direction? It should not exceed maximum memory, right?

If I were to cap anything in the other direction (i.e. put an upper bound on anything), that would be the gc_num.interval, probably.

With this change, I believe it's already upper-bounded by max_total_memory, but we can possibly make that bound a bit tighter (i.e. max_total_memory / 2 or so).

If this is just a patch for v.1.10.2+RAI and not intended for upstream, could we limit gc_num.interval to max_total_memory / log2(max_total_memory) ?

What's the motivation for dividing by log?

To be more precise: why the choice of log here? And not some other function?

The OOMGuardian heuristic triggers full gc when the heap size increases by 15% over the high-water mark in a rolling window (10 minutes). This avoids having the heap be more than 15% larger than it needs to be (resulting in shorter gc pauses and fewer TLB misses in pointer dereferences). It also means that if the reachable heap size is increasing monotonically, we don't do more than a logarithmic number of collections. In the scenario where we don't know the actual heap size because of the accounting problem, I'm going for a logarithmic number of collections in physical memory size. The max_total_memory / log2(max_total_memory) calculation would give an interval of 0.5GiB for 16GiB ram (32 collections until all of physical memory allocated), and 6.7GiB for 256GiB ram (38 collections).

Implemented in latest commit.

src/gc.c

tveldhui · 2024-08-09T13:47:47Z

Can we bump up the urgency level of fixing this- it is causing major headaches for customers running XL instances in spcs (e.g. cashapp).

d-netto · 2024-08-09T14:03:38Z

Acked. Will be opening a PR to test this on raicode and try to merge this ASAP.

tveldhui

LGTM, thanks!

tveldhui · 2024-08-12T14:53:18Z

Are we close to merging this?

d-netto · 2024-08-12T15:19:30Z

Will meet with @kpamnany to discuss the performance results from https://github.com/RelationalAI/raicode/pull/20626.

If there are no significant regressions we expect to merge it today.

d-netto · 2024-08-12T18:46:10Z

Benchmarks look fine.

@DilumAluthge

…ang#56831) Stdlib: Statistics URL: https://github.com/JuliaStats/Statistics.jl.git Stdlib branch: master Julia branch: master Old commit: 68869af New commit: d49c2bf Julia version: 1.12.0-DEV Statistics version: 1.11.2(Does not match) Bump invoked by: @DilumAluthge Powered by: [BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl) Diff: JuliaStats/Statistics.jl@68869af...d49c2bf ``` $ git log --oneline 68869af..d49c2bf d49c2bf Merge pull request #178 from JuliaStats/dw/ci d10d6a3 Update Project.toml 1b67c17 Merge pull request #168 from JuliaStats/andreasnoack-patch-2 c3721ed Add a coverage badge 8086523 Test earliest supported Julia version and prereleases 12a1976 Update codecov in ci.yml 2caf0eb Merge pull request #177 from JuliaStats/ViralBShah-patch-1 33e6e8b Update ci.yml to use julia-actions/cache a399c19 Merge pull request #176 from JuliaStats/dependabot/github_actions/julia-actions/setup-julia-2 6b8d58a Merge branch 'master' into dependabot/github_actions/julia-actions/setup-julia-2 c2fb201 Merge pull request #175 from JuliaStats/dependabot/github_actions/actions/cache-4 8f808e4 Merge pull request #174 from JuliaStats/dependabot/github_actions/codecov/codecov-action-4 7f82133 Merge pull request #173 from JuliaStats/dependabot/github_actions/actions/checkout-4 046fb6f Update ci.yml c0fc336 Bump julia-actions/setup-julia from 1 to 2 a95a57a Bump actions/cache from 1 to 4 b675501 Bump codecov/codecov-action from 1 to 4 0088c49 Bump actions/checkout from 2 to 4 ad95c08 Create dependabot.yml 40275e2 Merge pull request #167 from JuliaStats/andreasnoack-patch-1 fa5592a Merge pull request #170 from mbauman/patch-1 cf57562 Add more tests of mean and median of ranges 128dc11 Merge pull request #169 from stevengj/patch-1 48d7a02 docfix: abs2, not ^2 2ac5bec correct std docs: sqrt is elementwise 39f6332 Merge pull request #96 from josemanuel22/mean_may_return_incorrect_results db3682b Merge branch 'master' into mean_may_return_incorrect_results 9e96507 Update src/Statistics.jl 58e5986 Test prereleases 6e76739 Implement one-argument cov2cor! b8fee00 Stop testing on nightly 9addbb8 Merge pull request #162 from caleb-allen/patch-1 6e3d223 Merge pull request #164 from aplavin/patch-1 71ebe28 Merge pull request #166 from JuliaStats/dw/cov_cor_optimization 517afa6 add tests aa0f549 Optimize `cov` and `cor` with identical arguments cc11ea9 propagate NaN value in median cf7040f Use non-mobile Wikipedia urls 547bf4d adding docu to mean! explain target should not alias with the source 296650a adding docu to mean! explain target should not alias with the source ``` Co-authored-by: Dilum Aluthge <[email protected]>

#170) * cap live_bytes to zero in a few places where GC intervals are computed * mem / log(mem) for interval upper bound

cap live_bytes to zero in a few places where GC intervals are computed

961f224

d-netto requested a review from kpamnany July 29, 2024 19:51

github-actions bot added port-to-v1.10 port-to-v1.12 This change should apply to Julia v1.12 builds labels Jul 29, 2024

d-netto removed the port-to-v1.12 This change should apply to Julia v1.12 builds label Jul 29, 2024

kpamnany approved these changes Jul 29, 2024

View reviewed changes

tveldhui reviewed Aug 8, 2024

View reviewed changes

src/gc.c Show resolved Hide resolved

mem / log(mem) for interval upper bound

caef7e2

tveldhui approved these changes Aug 10, 2024

View reviewed changes

d-netto merged commit 1c192fd into v1.10.2+RAI Aug 12, 2024
2 checks passed

d-netto deleted the dcn-cap-live-bytes-to-zero branch August 12, 2024 18:46

nickrobinson251 pushed a commit that referenced this pull request Feb 26, 2025

cap live_bytes to zero in a few places where GC intervals are computed (

3afb01b

#170) * cap live_bytes to zero in a few places where GC intervals are computed * mem / log(mem) for interval upper bound

cap live_bytes to zero in a few places where GC intervals are computed #170

cap live_bytes to zero in a few places where GC intervals are computed #170

Conversation

d-netto commented Jul 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Description

Checklist

Uh oh!

d-netto commented Jul 29, 2024

Uh oh!

kpamnany left a comment

Choose a reason for hiding this comment

Uh oh!

kpamnany Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

d-netto Jul 29, 2024

Choose a reason for hiding this comment

Uh oh!

tveldhui Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

d-netto Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

d-netto Aug 8, 2024

Choose a reason for hiding this comment

Uh oh!

tveldhui Aug 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

d-netto Aug 9, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tveldhui commented Aug 9, 2024

Uh oh!

d-netto commented Aug 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tveldhui left a comment

Choose a reason for hiding this comment

Uh oh!

tveldhui commented Aug 12, 2024

Uh oh!

d-netto commented Aug 12, 2024

Uh oh!

d-netto commented Aug 12, 2024

Uh oh!

Uh oh!

Uh oh!

d-netto commented Jul 29, 2024 •

edited

Loading

tveldhui Aug 8, 2024 •

edited

Loading

d-netto commented Aug 9, 2024 •

edited

Loading