Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch cache from theine to otter #405

Merged
merged 2 commits into from
Aug 10, 2024
Merged

Switch cache from theine to otter #405

merged 2 commits into from
Aug 10, 2024

Conversation

FZambia
Copy link
Member

@FZambia FZambia commented Aug 10, 2024

In my test setup this fixes a memory leak when using CompressionPreparedMessageCacheSize option. The reason of memory leak with theine is unclear to me.

Copy link

codecov bot commented Aug 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 81.45%. Comparing base (8316026) to head (126a1e0).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #405      +/-   ##
==========================================
+ Coverage   81.44%   81.45%   +0.01%     
==========================================
  Files          40       40              
  Lines        7200     7204       +4     
==========================================
+ Hits         5864     5868       +4     
  Misses        961      961              
  Partials      375      375              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@FZambia FZambia merged commit bc57b9c into master Aug 10, 2024
7 checks passed
@FZambia FZambia deleted the switch_to_otter branch August 10, 2024 13:18
@Yiling-J
Copy link

@FZambia Hi, I’m the author of Theine. Could you please provide more details about the memory leak issue so I can debug it? Thanks!

@FZambia
Copy link
Member Author

FZambia commented Aug 13, 2024

@Yiling-J hello, nice to see you here :)

Probably a heap profile will give you more insights than it gave to me:

heap.profile.zip

(this was a heap from Centrifugo node in production on which memory leaked quite a lot above normal values).

The setup I used to reproduce the memory leak locally involves running the modified Go example from this repo and connecting from a Node.JS script. The heap was very similar to the above's. Since I have not identified the root cause I can't be 100% sure that the issue is in theine-go, but when switching to another cache lib the leak is not reproducing and the heap profile has absolutely different pattern without growing (*PreparedMessage) frame. Please let me know if you need steps to reproduce the setup I mentioned and I'll prepare instructions. But maybe profile will help somehow without additional debug complexities.

I already tried to come up to some minimal leak reproducer – but all my attempts failed, it just works properly.

In general, my thinking is that somehow theine cache keeps references to objects even though TTL is expired so that GC does not remove them. But it's just an assumption.

@Yiling-J
Copy link

@FZambia I think I found the issue. Theine stores *Entry in the sharded map, where each entry contains a key/value pair and some metadata. To reduce allocations, there is a sync pool where removed entries are put back. I found that while I reset the value before putting it back, I did not reset the key. This doesn't affect correctness because when getting an entry from the pool on Set, new key/value pairs are assigned to the entry. However, from a memory perspective, if the key is large, it will still occupy space in the sync pool.

In the case of centrifuge, because the TTL is very short and the key is the data string, Theine deletes expired entries proactively and moves them to the sync pool very fast. This can result in a large number of entries in the pool, consuming significant memory.

I have created a syncpool branch to address this issue. If you have time, could you try this branch and see if the memory usage issue is still reproducible?

@Yiling-J
Copy link

After thinking it a bit and running some simple benchmarks, I think sync pool might not be the cause. If entries in the sync pool are not used, they will be GCed after two GC cycles. And they are not used means there are no new 'Set' operations, so memory usage should remain stable. And if new 'Set' operations occur, they will utilize entries from the pool, still keeping memory usage stable.

@FZambia could you also share an Otter profile for comparison? From the profile, the in-use space is not large (0.23GB). What would be the ideal memory usage if there were no leaks?

@FZambia
Copy link
Member Author

FZambia commented Aug 14, 2024

I did experiments yesterday with a new commit – and yep, it does not help, the growth is still there. I'll run my reproducer for a longer time and share heap profiles.

@FZambia
Copy link
Member Author

FZambia commented Aug 14, 2024

@Yiling-J hello,

Ran with theine for ~8 hours, server RSS went up from 40 MB normal to 422 MB, heap profile at the end:

theine.heap.zip

Then tried with otter for half an hour, did not observe any growth (stable around 40 MB), so stopped the process, heap profile at the end:

otter.heap.zip

One more thing btw, Centrifugo customer who reported the issue tried a new version in production where otter is used – and also confirmed that the rss is now stable for half a day at this point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants