Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dynesty/ultranest samplers run indefinitely #202

Open
bfhealy opened this issue Aug 11, 2023 · 8 comments
Open

dynesty/ultranest samplers run indefinitely #202

bfhealy opened this issue Aug 11, 2023 · 8 comments
Assignees
Labels
question Further information is requested

Comments

@bfhealy
Copy link
Collaborator

bfhealy commented Aug 11, 2023

Potentially related to default sampler parameters (#20): performing light_curve_analysis on an example candidate using the Bu2022Ye model and the dynesty/ultranest samplers appears to run indefinitely. I'm finding that the following calls to light_curve_analysis begin sampling but do not conclude:

dynesty:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_dynesty --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler dynesty

ultranest:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_ultranest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler ultranest

However, using pymultinest for this sampling finishes in a few minutes:

light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_pymultinest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler pymultinest

These runs were performed with the latest version of nmma and its requirements (including attempts with bilby-2.1.2 installed with pip and bilby-2.1.2.dev26+g9c1dda6c installed from the source).

@tsunhopang tsunhopang self-assigned this Aug 12, 2023
@tsunhopang
Copy link
Collaborator

tsunhopang commented Aug 12, 2023

For the ultranest, could you try the following two independent approaches?

  1. Run with mpiexec and see how long does it take? (From my experience, it should take ~10 times longer than pymultinest)
  2. Run it with the following extra command line argument --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

@bfhealy
Copy link
Collaborator Author

bfhealy commented Aug 15, 2023

Hi @tsunhopang, thanks for the suggestions! I tried another ultranest run using mpiexec and the new arguments, and it sampled for several hours before failing with this error:

astropy.cosmology.core.CosmologyError: Best guess z=5.6740878366520494e-09 is very close to the lower z limit 0.0.
Try re-running with a different zmin.

I also saw this warning several times throughout the run:

UserWarning: Sampling from region seems inefficient (0/40 accepted in iteration 2500). To improve efficiency, modify the transformation so that the current live points (stored for you in /var/folders/8_/ky643qs168ngjmhrpwcq1fdm0000gn/T/tmpsv0qi0se/extra/sampling-stuck-it%d.csv) are ellipsoidal, or use a stepsampler, or set frac_remain to a lower number (e.g., 0.5) to terminate earlier.

@tsunhopang
Copy link
Collaborator

there seems to be some problem with the data, could u link it to here?

@tsunhopang
Copy link
Collaborator

also the prior u used for the analysis

@bfhealy
Copy link
Collaborator Author

bfhealy commented Aug 15, 2023

Hi @tsunhopang, I tried two different Bu2022Ye analysis runs using nmma demo data. The data were for this ZTF candidate and AT2017gfo. The priors are here, and the function calls are below:

mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label ZTF21abdpqpq_ultranest --data ./example_files/candidate_data/ZTF21abdpqpq.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --filters ztfg,ztfr --trigger-time 59361.0 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"
mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label AT2017gfo_ultranest --data ./example_files/lightcurves/AT2017gfo.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --trigger-time 57983.0 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

@tsunhopang
Copy link
Collaborator

Could u try the following:

  1. Use a tighter prior on distance (e.g. for AT2017gfo, the distance is ~40Mpc)
  2. The name of the KNtimeshift should be KNtimeshift rather than trigger_time
  3. The prior on KNtimeshift can be set to zero if there is a clear trigger, otherwise, still better use a tighter prior
  4. The trigger time for AT2017gfo should be 57982.52852

@bfhealy
Copy link
Collaborator Author

bfhealy commented Aug 15, 2023

Thanks! I've started a new sampling run with these changes.

@bfhealy
Copy link
Collaborator Author

bfhealy commented Aug 21, 2023

Hi @tsunhopang, I've tried running the ultranest sampling a few times using the following call, but each time it runs until my computer restarts because of a problem (presumably memory related).

mpiexec -n 8 light_curve_analysis --model Bu2022Ye --interpolation_type tensorflow --svd-path ./svdmodels --outdir ./outdir --label AT2017gfo_ultranest_new --data ./example_files/lightcurves/AT2017gfo.dat --prior ./priors/Bu2022Ye.prior --tmin 0 --tmax 14 --dt 0.1 --error-budget 1 --nlive 512 --Ebv-max 0 --trigger-time 57982.52852 --plot --sampler ultranest --reactive-sampling --sampler-kwargs "{'dlogz': 0.1}"

I also continue to see warnings about inefficient sampling as shared above. Perhaps different stopping criteria would help the ultranest finish before running out of memory?

Changing the sampler to pymultinest and removing the --reactive-sampling and --sampler_kwargs arguments successfully produces light curve/corner plots and other sampling results, although I need to interrupt the code in my terminal window in order to enter any more commands.

@bfhealy bfhealy added the question Further information is requested label Sep 8, 2023
@bfhealy bfhealy added this to the Analysis Tools milestone Sep 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants