Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move best_fit.json into *_result.json #159

Open
tylerbarna opened this issue Jul 21, 2023 · 16 comments
Open

Move best_fit.json into *_result.json #159

tylerbarna opened this issue Jul 21, 2023 · 16 comments
Assignees
Labels
enhancement New feature or request wontfix This will not be worked on

Comments

@tylerbarna
Copy link
Collaborator

Feature Summary
alter behaviour of the bestfit flag to write the best fit data points to the result.json file rather than a separate file

Usage / behavior
alteration to existing bestfit flag behaviour

Alternative Solutions
If possible, it would be good to "extract" the best fit lightcurve at an earlier point in the fitting rather than calling generate_lightcurve after the fitting has concluded, though I'm unclear as to where this could be accomplished

Implementation details
Make changes to nmma/em/analysis.py

Additional context
Tweaks to pr #147, related to issue #138

@tylerbarna tylerbarna added the enhancement New feature or request label Jul 21, 2023
@tylerbarna tylerbarna added this to the Analysis Tools milestone Jul 21, 2023
@tylerbarna tylerbarna self-assigned this Jul 21, 2023
@sahiljhawar
Copy link
Member

sahiljhawar commented Jul 21, 2023

I don't think that bestfit results should go to *_results.json, since it already contains a lot of data. And if someone write script for the first time to extract the data may have to look into the file. And the file size becomes huge for complex analyses.

@tylerbarna
Copy link
Collaborator Author

I don't think that bestfit results should go to *_results.json, since it already contains a lot of data.

I don't think that the bestfit lightcurve and associated times will meaningfully add to the size of the result json file

And if someone write script for the first time to extract the data may have to look into the file.

My rationale for adding it to the result json is that the result json file can essentially function as a single file that contains all the necessary information for most post-fitting analysis one would conduct.

Placing the best fit lightcurve inside of the result json wouldn't be all that problematic for new users provided we add a blurb in the documentation about accessing it, especially because up to this point users have had to regenerate the lightcurve if they wanted to do any additional analysis. Because of the structure of json files, the code required to extract a best fit lightcurve from a result json vs a separate file will be almost identical.

And the file size becomes huge for complex analyses.

I usually see mine top out at a few megabytes, but I only really touch the EM side of things. Would this have implications for joint analysis?

@tylerbarna
Copy link
Collaborator Author

Alternative Solutions
If possible, it would be good to "extract" the best fit lightcurve at an earlier point in the fitting rather than calling generate_lightcurve after the fitting has concluded, though I'm unclear as to where this could be accomplished

I was looking through the code as well as bilby documentation to figure out where/how result.json is created, though I wasn't able to pinpoint it

@mcoughlin
Copy link
Member

Or just put everything into an hdf5 file?

@sahiljhawar
Copy link
Member

sahiljhawar commented Jul 21, 2023

I usually see mine top out at a few megabytes, but I only really touch the EM side of things. Would this have implications for joint analysis?

Yes, I think so. I had few runs where the the parameter space was 54D, and the result.json was as big as as 100MB. Though I don't think so that any inference would be this consisting of this large parameters unless I add my thing. However, time would still be O(1) I guess while using the data during post processing. But I think it should be much easy to read the useful metrics if they are directly on the top, instead of searching through a 15+MB json file.

@sahiljhawar
Copy link
Member

I was looking through the code as well as bilby documentation to figure out where/how result.json is created, though I wasn't able to pinpoint it

I think check bilby/core/result.py L735:
def save_to_file(self, filename=None, overwrite=False, outdir=None, extension='json', gzip=False): . . .

@sahiljhawar
Copy link
Member

Yes, I think so. I had few runs where the the parameter space was 54D, and the result.json was as big as as 100MB. Though I don't think so that .....

Fetching from a large 50+MB JSON is painstakingly long.

@mcoughlin
Copy link
Member

Maybe open an issue with bilby if hdf5 files are not supported. It sounds like something our team should consider helping them implement as the error marginalization you are working on could lead to some large parameter spaces.

@sahiljhawar
Copy link
Member

Okay. I will look into it.

@sahiljhawar
Copy link
Member

@mcoughlin bilby supports both hdf5 and pickle.

@mcoughlin
Copy link
Member

@sahiljhawar @tsunhopang maybe one of you move us to an hdf5 setup then?

@sahiljhawar
Copy link
Member

@mcoughlin Trying a run with hdf5 to see the compression.

@tylerbarna
Copy link
Collaborator Author

if we do end up moving to an hdf5 setup, I think it would be good to retain an option to save results to a json

@sahiljhawar
Copy link
Member

sahiljhawar commented Jul 26, 2023

since @tsunhopang already pushed an update with sampler_kwargs. One can pass the following to save results as hdf5. --sampler-kwargs "{'save':'hdf5'}"

@sahiljhawar
Copy link
Member

if we do end up moving to an hdf5 setup, I think it would be good to retain an option to save results to a json

We don't need to, can be left upon the user to decide.

@sahiljhawar
Copy link
Member

hdf5 will not work if using MPI, and the pickle file is too large. Let's stick to the default JSON format. Additionally, add the bestfit parameters to a separate JSON file for easy access, just like in the previous commit. Perhaps consider marking this issue as wontfix, but keep it open.

@tylerbarna tylerbarna added the wontfix This will not be worked on label Aug 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

4 participants