Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: cadCAD underutilizes CPU and opens too many filehandles #351

Open
zcstarr opened this issue Mar 15, 2024 · 6 comments · May be fixed by #352
Open

Bug: cadCAD underutilizes CPU and opens too many filehandles #351

zcstarr opened this issue Mar 15, 2024 · 6 comments · May be fixed by #352
Labels
bug Something isn't working

Comments

@zcstarr
Copy link
Contributor

zcstarr commented Mar 15, 2024

Summary:
When cadCAD runs in parallel mode, cadCAD will underutilize CPUs somewhere around 75% of available CPUs will go vacant. It's not that it doesn't try and use the CPU it's that it thrashes, the CPU by trying to create new process pools for every config that it wants to run in parallel. This in turn causes the process manager to thrash, because it will constantly utilize then free up memory.

Motivation:
cadCAD performance increase to be able to utilize 100% of the cpu in a multithreaded situation, that can save hours off of a large simulation. It will also prevent too many process file handles being opened during execution of a simulation, I believe this might be related to #350 .

Solution:
The solution is to refactor execution.py to use a single process pool, as intended by the creators of the package, and instead refactor the simulation to instead only create the pool once, and then reuse cpus as they become available. I also suggest, that we include an option to write intermediate results to disk, and to read them back without loading the entire dataset into memory.

I found that once I increased the parallelization, it was really easy to run out of memory for a large simulation, simply because all intermediate results were being held. My temporary solution is to write these intermediate results to temporary disk. This will prevent processes from running out of memory when running in a highly ( think 16 cores or more) parallel environment. The downside of this is that , when the simulation is complete, cadCAD currently requires you to load everything back into memory. This in turn is memory intensive as well.

The solution here is to continue the refactor to have the final data, load iteratively into memory vs altogether. My initial experiments were with cadcad 0.4.28 to fix this problem. I am hopeful that maybe the datacopy enhancement would reduce the overall memory capacity, which might make it less likely to be an issue. I think the real solution is to iteratively load it.

I'd be happy to go down this route of making this configurable. The PR i've written auto writes and reads from disk, this should be a config option. I wanted to know if this is a direciton worth going, if so I can make it prod worthy.

Copy link

linear bot commented Mar 15, 2024

zcstarr added a commit to zcstarr/cadCAD that referenced this issue Mar 15, 2024
Prior to this change the process pool was created and destroyed
for every configuration, this would cause the cpu/memory to
thrash and improperly allocate task to avaialble cpu, resulting
in sometimes 25% utilization of available cpu resources

The change corrects this, as well as tackles memory problems,
by writing temporary results to disk and then reading them back
at the end of the simulation.

This is non configurable in this commit, and can also result in
loading too much memory, as it does not include the ability to
progressively or lazy load data into the final dataframe to
complete the simulation.

This commit is a WIP fixes cadCAD-org#351
@zcstarr
Copy link
Contributor Author

zcstarr commented Mar 15, 2024

@emanuellima1 @danlessa , I have a proposal to fix some of cadcad's perf issues. I debugged this issue back in 0.4.28 while working on a client project that involved alot of simulation data. The problem is essentially that ProcessPool is being used incorrectly.

I think this will fix your issue @danlessa with number #350 , in the issue I have listed out what I think the problems are and the PR is the fix I implemented.

I wanted to know if this is something you're interested in fixing with cadCAD and if it would make sense to go the whole way and fix the lazy loading of the dataframe at the end and make writing the imm. files to disk optional ?

Curious to know what you think.

@danlessa
Copy link
Member

Hey @zcstarr that would be awesome! You could potentially use the additional_objs parameter for toggling it off or on. See PR #316 for an example.

@danlessa
Copy link
Member

@zcstarr going through the whole way is definitely worth the time. It would be nice to develop standardized benchmarks for overall simulation execution and memory usage (in terms of RAM and disk), too.

@zcstarr
Copy link
Contributor Author

zcstarr commented Mar 19, 2024

@danlessa ,thanks for looking at these and the quick response! Yeah I think an option makes sense as well. I played around with a few things , so created a very small sim with 1 parameter 1 state , then played with making that state large like 1mb or 100kb, then making multiple runs to see if I could see a difference in memory performance using the memory profiler. There's a definite simulation drop, but reading the result back from disk is probably the biggest issue.

I was able to sketch out and get working a fully incremental/lazy load process for the simulation part outside of returning back the results. Looking further down the road and thinking about easy_run and the executor, I could see wanting a way to handle having too large a data simulation. The larger the simulation state and params, the more problematic it is to process run the data.

It's easy to run out of ram when running a large enough simulation. I'm trying to think through if it would make sense to have a serialization option, that can allow users to lazy load/write large datasets. I'm not sure what that should be 🤔

I think in a semi ideal world, you'd be able to write the results to disk in a way that can be lazy loaded or computed on demand, so you don't have to pull the entire dataset into memory. Thinking about https://github.com/vaexio/vaex or maybe https://docs.dask.org/en/stable/dataframe.html .

Probably the way forward is to make this an additional obj option for writing temp files to disk , and another for writing the results of the simulation to disk in some format.

zcstarr added a commit to zcstarr/cadCAD that referenced this issue Mar 21, 2024
Prior to this change the process pool was created and destroyed
for every configuration, this would cause the cpu/memory to
thrash and improperly allocate task to avaialble cpu, resulting
in sometimes 25% utilization of available cpu resources

The change corrects this, as well as tackles memory problems,
by writing temporary results to disk and then reading them back
at the end of the simulation.

This is non configurable in this commit, and can also result in
loading too much memory, as it does not include the ability to
progressively or lazy load data into the final dataframe to
complete the simulation.

This commit is a WIP fixes cadCAD-org#351
@zcstarr
Copy link
Contributor Author

zcstarr commented Mar 21, 2024

@danlessa @emanuellima1 just tagging an update here. So I refactored things to lazy evaluate and was able to save alot of runtime memory.

The basic gist is
before any changes including the parallel processing change,
Pasted Graphic 2
This is parallel processing change
Pasted Graphic
This is the lazy evaluation change
Pasted Graphic 1
This simulation is in examples/documentation/headless_tools.py I try and simulation creating a 100-200kb state + parameters simulation for a 10year daily run over 2 years.
The result of memory usage is 1.1GB currently to 117mb with lazy evaluation, then when loaded to a df it's 846mb.
Then in data frame
image

There is a segment of code that kind of didn't make much sense to me, I assumed maybe it was from different versions ago. So given this is new code I thought I'd be able to just support most standard use cases. So feel free to let me know if those additional potential configurations or scenarios are necessary.

Just let me know what you think, I also made lazy eval switchable and only able to be enable on local parallel processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants