-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid leaked semaphores #214
Conversation
switch spawn -> fork and pandas -> pyarrow
I suspect it will be installed anyway as an astropy/pyvo/astroquery dependency, but not directly rely on it sounds good to me, it's always good to keep the dependency footprint to the minimum. |
4699568
to
736922b
Compare
If we can, it would be good to move onto 3.11 or 3.12, so no need to sort out older warnings. |
736922b
to
10713db
Compare
I've run several tests and haven't seen a leaked semaphore warning. Since the warning did not occur every time, I can't be positive that this fixes it. @jkrick can you run a test or two and see if you get the warning? If you don't, then I think this is ready to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All things equal, any code changes with negative net lines are good, let alone when they also remove a dependency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran it with 1,000 and 10,000 sources on the SMCE and both ran successfully.
Avoid leaked semaphores ff3ff7f
Background
We have been encountering "leaked semaphore" warnings occasionally. It's difficult to track down exactly where this is coming from, but I believe we are experiencing issue python/cpython#90549 which reports a bug in python's
multiprocessing
stemming from the combined use of global resources and "spawning" new processes*. We have been using "spawn" (over the default "fork") because it is thread-safe, which is required forpd.read_parquet
, which was being used byZTF_get_lightcurve
. This PR replacespd.read_parquet
with a pyarrow method and switches multiprocessing to "fork".*Note: A fix has been backported to python>=3.10. I'm still getting the warning using 3.10, but I can't tell if I'm using a version that includes the backport.
Included Changes
mp.set_start_method("spawn")
.pd.read_parquet(..)
withpyarrow.parquet.read_table(..).to_pandas()
.s3fs
withpyarrow.fs
. This is not strictly necessary, but it's convenient to just use pyarrow for everything. Since no other modules depend ons3fs
, I have also removed it from the explicit requirements for light curves. Please let me know if you would prefer to keep it (maybe @bsipocz).