Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always save the latest lognl in the demargin hdf #4900

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
38 changes: 32 additions & 6 deletions bin/inference/pycbc_inference_model_stats
Original file line number Diff line number Diff line change
Expand Up @@ -17,8 +17,9 @@
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.

"""Recalculates log likelihood and prior for points in a given inference or
posterior file and writes them to a new file. Also records auxillary model
stats that may have been ignored by the sampler.
posterior file and writes them to a new file, overwrites all lognl in the
samples group attrs. Also records auxillary model stats that may have been
ignored by the sampler.
"""

import os
Expand Down Expand Up @@ -92,20 +93,34 @@ model = models.read_from_config(cp)
# from the variable parameter space
model.sampling_transforms = None

# create function for calling the model to get the stats
# create function for calling the model to get the stats, lognl, and rec
def callmodel(arg):
iteration, paramvals = arg
# calculate the logposterior to get all stats populated
model.update(**{p: paramvals[p] for p in model.variable_params})
_ = model.logposterior
stats = model.get_current_stats()
if hasattr(model, 'submodels'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WuShichao Wouldn't it make more sense for heirarchical models to make it so that get_current_stats includes any submodels rather than hardcoding knowledge about submodels into this script? It's best to keep information compartmentalized and use clearer interfaces to avoid technical debt in the future and prevent flexibility.

Similarly, why do we need to hardnode anything about lognl here? Maybe it should just be included in the stats when it is a fixed number like this, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lognl issue is not the hierarchical-model-only problem; whenever users use this script and lognl is much larger than loglr, the same problem will occur. So instead of modifying all likelihood models, I only change in one place. The reason why I hardcode lognl is I want to match the keys in the original PE file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@WuShichao (1) You are explicitly checking through submodels here. That really should be handled by the hierarchical model itself.

(2) lognl should probably just be a model stat if it is a constant. No modifications are then needed here.

lognls = {}
lognls['lognl'] = 0
for lbl, submodel in model.submodels.items():
submodel_lognl = 0
for det in submodel.detectors:
lognls['{}__{}_lognl'.format(lbl, det)] =\
submodel.det_lognl(det)
submodel_lognl += submodel.det_lognl(det)
lognls['{}__lognl'.format(lbl)] = submodel_lognl
lognls['lognl'] += submodel_lognl
else:
lognls = {det: model.det_lognl(det) for det in model.detectors}
lognls['lognl'] = sum(lognls.values())

rec = {}
if opts.reconstruct_parameters:
model.update(**{p: paramvals[p] for p in model.variable_params})
# Ensure unique random seed for each reconstruction
rec = model.reconstruct(seed=iteration)
return stats, rec
return stats, lognls, rec

# these help for parallelization for MPI
models._global_instance = callmodel
Expand All @@ -132,19 +147,30 @@ logging.info("Calculating stats")
data = list(tqdm.tqdm(pool.imap(model_call, enumerate(samples)),
total=len(samples)))
stats = [x[0] for x in data]
rec = [x[1] for x in data]
lognls = [x[1] for x in data][0]
rec = [x[2] for x in data]

# write to the output file
logging.info("Copying input to output")
shutil.copy(opts.input_file, opts.output_file)

logging.info("Writing stats to output")
logging.info("Writing stats, lognls, and rec to output")
out = loadfile(opts.output_file, 'a')
idx = range(len(stats))
for pi, p in enumerate(model.default_stats):
vals = numpy.array([stats[ii][pi] for ii in idx]).reshape(shape)
out.write_data(p, vals, path=fp.samples_group, append=False)

# overwrite all lognl in the samples attrs, there is a precision issue
# in lognl calculation, during reconstruct process, model will get slightly
# different lognl and loglr values (so loglikelihood). This becomes a problem
# when loglr is very small compared to lognl, which numerical precision is
# domainted by lognl. If still use the original lognl before the reconstruct
# process, pycbc_inference_plot_posterior will get wrong loglr/snr for plotting
for key in out['samples'].attrs:
if key in lognls:
out['samples'].attrs[key] = lognls[key]

if opts.reconstruct_parameters:
logging.info("Writing reconstructed parameters")
for p in rec[0].keys():
Expand Down
Loading