Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasing Memory usage in time #43

Open
AntoinePhD opened this issue Dec 20, 2023 · 4 comments
Open

Increasing Memory usage in time #43

AntoinePhD opened this issue Dec 20, 2023 · 4 comments

Comments

@AntoinePhD
Copy link

AntoinePhD commented Dec 20, 2023

Hello,

This is more of a comment than a problem, but I had a problem with OOM Killer (Linux security measure to keep RAM usage <100%) killing subprocesses that cause the pool to never exit, and so loses all events found in the run.

It seems that some sub-processes increase their memory usage over time, doubling or tripling the overall memory usage of the run. (see the image for reference, the black curve is the main script, each child is a subprocess running the associate function from utils.py)
Screenshot from 2023-12-20 16-40-03

I tried adding a maxtasksperchild parameter to the pool, to see if it solves the problems which is why there is a lot of child's process (but it did not work)

Breaking the picks' fille into several parts solve the problem, but I thought it was worth telling.

Best,
Antoine

@zhuwq0
Copy link
Contributor

zhuwq0 commented Dec 20, 2023

Thank you for reporting this. I will debug the RAM issue.

@ziyixi
Copy link
Contributor

ziyixi commented Dec 20, 2023

@AntoinePhD Thanks for letting us know about this. The thing with the OOM Killer and subprocesses sounds tricky. I remember facing something like this where some processes just wouldn't stop.

I am wondering if there is any example code and data from your side? That would really help in figuring this out. Also, are you using the latest version of the software? Just want to make sure. Thanks!

@AntoinePhD
Copy link
Author

I am using the version 1.2.4 (and also the version 1.1.11) the problem exist on both.

Yes as you say the main process never top, this seems to be due to the subprocess holding the lock being suppressed so the pool is infinitely waiting for the lock to be released. Apparently it's a known issue of the multiprocessing lib. The only way to solve this (in my mind) without breaking the input file, is to make the pool async and then check for dead process (to raise error) or to add a timeout limit.

Something like this :

        procs = set(pool._pool).update(pool._pool)
        if any(map(lambda p: not p.is_alive(), procs)):
            raise RuntimeError("Workers was Killed")

for the indefinitely increasing child memory usage example, I used a modified code of your “example_phasenet.ipynb" with

str_year = 2003
config = {'center': (87.0, 32.75),
'xlim_degree': [80, 92]
'ylim_degree': [28.5, 36.3]
'z(km)': [0, 60],
'degree2km': 111.19492474777779,
'starttime': datetime(int(str_year), 1, 1, 0, 0),
'endtime': datetime(int(str_year), 12, 31, 0, 0)}

i can send you the data used per mail if you want.

I ran the code on 16 CPU and with 32 GB of ram.

@ziyixi
Copy link
Contributor

ziyixi commented Dec 21, 2023

Thank you! It would be really helpful if you could send some test data to [email protected] . @zhuwq0 Hi Weiqiang, maybe I can also share the data for you to look at.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants