Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

SeanMcOwen · 2024-11-05T15:15:41Z

This notebook shows a current simulation model that runs very slow when moving to multi-processing.

The results are:

No deepcopy and 1 Monte Carlo run: 0.12 seconds
No deepcopy and 5 monte carlo runs: 60.22 seconds / 5 = 12.04 seconds
Deepcopy and 1 monte carlo: 2.79 seconds
Deepcopy and 5 monte carlo runs: 59 seconds /5 = 11.90 Seconds

In a table then:

	Single Proc	Multi Proc
No Deepcopy	.12	12.04
Deepycopy	2.79	11.90

So we can see that on single simulations turning off deep copy speeds up a lot but no matter what in mutli-processing we run massively slower. Given that deepcopy has no effect it looks like it doesn't get triggered with multi-proc BUT it still runs much slower from other things.

Further context from @danlessa is:

"Regarding multiprocessing, those two threads have some context
2023-12 on [client project]: https://blockscienceteam.slack.com/archives/C05LRRUMGQM/p1703034551508919?thread_ts=1703019991.737059&cid=C05LRRUMGQM
2020-12, on using multiprocessing alternatives: https://blockscienceteam.slack.com/archives/CCYHUBHJ7/p1609220349006200"

The important information from the slack thread is:
"as for the single thread result: this is related to how multi-processing in Python works. Processes cannot share memory directly, and they rely on IPC, which involves serialization of data. This is an expensive operation when dealing with objects generally. cadCAD uses pathos for parallelizing runs, which in turn depends on dill as a serializer. dill is particularly slow when compared to pickle as a serializer, however it can handle pretty much any kind of object, while pickle cannot. This is a problem without an easy and universal way out. Most performance improvements requires constraining use cases in some direction. If you're looking up for 10-100x speed-ups, then investing in an non-deepcopy compatible solution can definitely pay out, as it opens you the possibility of doing some clever hacks (like history erasure, which facilitates the serialization a lot)"

linear · 2024-11-05T15:15:44Z

CORE-126 Multiprocessing runs 100X+ slower than single simulation with certain simulations

SeanMcOwen · 2024-11-08T16:28:12Z

Might be related to #351

emanuellima1 self-assigned this Nov 8, 2024

emanuellima1 added the bug Something isn't working label Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

SeanMcOwen commented Nov 5, 2024

linear bot commented Nov 5, 2024

SeanMcOwen commented Nov 8, 2024

Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

Comments

SeanMcOwen commented Nov 5, 2024

linear bot commented Nov 5, 2024

SeanMcOwen commented Nov 8, 2024