Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak for large files #605

Open
lassepe opened this issue Sep 26, 2024 · 5 comments
Open

Memory leak for large files #605

lassepe opened this issue Sep 26, 2024 · 5 comments

Comments

@lassepe
Copy link

lassepe commented Sep 26, 2024

The following file created with JLD2 0.4.53 causes memory leaks on my system (Ubuntu 22.04, Julia 1.11.0-rc3):

https://drive.google.com/file/d/1_mjdRDD-DhrEsLoVy31sDis5sGpRo-mW/view?usp=sharing

Specifically, if I load the contained file as foo = load_object("training_data.jld2") and then do foo = nothing; GC.gc(true), the memory is never freed again. Hence, after a few consecutive loads, my julia sessions goes OOM.

@JonasIsensee
Copy link
Collaborator

What packages should I install to load this?
The JLD2 fallback type reconstruction makes fewer performance optimizing assumptions making the loading very slow.

@JonasIsensee
Copy link
Collaborator

After some first tests, I see signifcant increase in (remaining) memory usage after the first load but it does not increase after that.
The problem could be related to the fact that the current implementation for the MmapIO backend requires the full file size to be available as a contiguous chunk in memory.
However, Julia will likely have a rather disorganized heap structure after the first loads. (lots of intermediate allocations and also new method compilation etc.)

@lassepe
Copy link
Author

lassepe commented Sep 27, 2024

Sorry, yeah the package that this data originates from is not public. I can try to see if the same problem occurs with “random” data. I wonder whether this is a linux vs. windows issue due to differences in handling mmaping? The machine that this occurs on has 30GB RAM which is should be more than enough to find non-fragmented memory blocks for this. Another specialty of the machine this happened on is that it doesn’t have any swap.

@JonasIsensee
Copy link
Collaborator

I also tested on linux.
Hm, lack of swap probably does change the flexibility of ram usage but I don't know.

@JonasIsensee
Copy link
Collaborator

Hi @lassepe,
you seem to be encountering a rather rare problem...
I did some more googling on this but couldn't really find anythong that would explain or reproduce your issue.

I saw that your file contained a single large dictionary. If you were to split the dict into smaller datasets
then you could load those individually. That should, in theory, allow you to access them sequentially. If you don't need all data, that should lower the runtime and lower the RAM usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants