Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible memory leak in DataColumnWriter #436

Open
TapaniAalto opened this issue Nov 28, 2023 · 2 comments · May be fixed by #597
Open

Possible memory leak in DataColumnWriter #436

TapaniAalto opened this issue Nov 28, 2023 · 2 comments · May be fixed by #597

Comments

@TapaniAalto
Copy link

While investigating a potential memory leak in my Azure Function app, I noticed that the Managed Memory Tool in Visual Studio tells me that I have many objects in the .NET managed heap. When looking through those objects (byte[]), they seem to have Microsoft.IO.RecyclableMemoryStreamManager as their root.

Parquet.Net uses RecyclableMemoryStreamManager in DataColumnWriter.

In it's documentation it states that

Important!: If you do not set MaximumFreeLargePoolBytes and MaximumFreeSmallPoolBytes there is the possibility for unbounded memory growth!

And also that:

the RecyclableMemoryStreamManager will use the properties MaximumFreeSmallPoolBytes and MaximumFreeLargePoolBytes to determine whether to put those buffers back in the pool, or let them go (and thus be garbage collected). It is through these properties that you determine how large your pool can grow. If you set these to 0, you can have unbounded pool growth, which is essentially indistinguishable from a memory leak.

It seems that the DataColumnWriter does not set these properties, so I guess that might be the reason for my app's high memory usage.

Should those MaximumFreeSmallPoolBytes and MaximumFreeLargePoolBytes properties be somehow user configurable? Maybe via ParquetOptions?

@aloneguid
Copy link
Owner

Thanks for looking into this. Seems like a good idea to add an option and maybe also set some default limit.

@DrewMcArthur
Copy link
Contributor

hey @aloneguid ! we're also running into some weird memory issues in our app. we haven't done the same profiling as @TapaniAalto to be sure, but this seems like some low-hanging fruit that i'd be happy to tackle. How does it sound to set the defaults to this?

MaximumFreeSmallPoolBytes: 16MB,
MaximumFreeLargePoolBytes: 64MB,

I'd also expose this through ParquetOptions to the library's users (we might bump this higher in our personal usage to 64MB & 192MB respectively). Would you prefer to use the same name for clarity, or something simplified like SmallBufferPoolLimit?

I'll open up a draft PR shortly, let me know any thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants