Skip to content
Francesco Pannarale edited this page Jan 26, 2024 · 191 revisions

PyCBC multi-interferometer coherent followup

Note: this page is publicly viewable and should not be used to record sensitive or confidential information.

A project board is associated to this work.

Context

PyGRB is a coherent matched-filtering search that looks for compact binary coalescence (CBC) signals associated with external triggers, such as gamma-ray bursts (GRBs). The O1, O2, and O3 LVK analyses used a workflow that was a mixture of PyCBC and lalapps_coh_PTF code, as well as parts of PyLAL during the post-processing. A flowchart of the PyGRB workflow can be seen here (from p.85 of Iain Dorrington's thesis).

Objective

The final goal is the full integration of PyGRB within PyCBC. This will allow us to use optimised PyCBC code to speed up the analysis, to pick up any new feature introduced in PyCBC, and to drop the maintenance of lalapps_cohPTF and PyLAL.

Development

There are five main parts to the development

The filtering engine

Make a coherent matched-filtering executable. This is called pycbc_multi_inspiral.

  • develop an executable to calculate the coherent SNR for a single sky point at and around the time of an event (e.g., a GRB)
  • allow for face-on / face-away projection of signals (appropriate for beamed GRBs)
  • add time slides (see PR1 and PR2)
  • loop over a sky patch (see PR)

Injections

An (outdated!) wikipage to coordinate this work is available.

Ranking statistic

  • Make (an equivalent of) the BestNR ranking statistic used by cohPTF available to pycbc_multi_inspiral.
  • Implement the ranking statistic based on machine learning proposed in this thesis.
  • Test one against the other to perform an informed decision on wether to improve BestNR or pick up the machine-learning ranking statistic as the new standard.

Post-processing

This area of development addresses the postprocessing and the output webpages. The former is very memory consuming and will be rewritten to use

  • hdf5 files, rather than xml ones, and
  • PyCBC style output webpages.

As of October 2022 this work is in good shape (preview), but see this link for details.

The workflow generator(s)

In order to run the PyGRB pipeline one needs code that generates an analysis workflow, e.g., pycbc_make_offline_grb_workflow for offline GRB follow ups. This will work out the relationships between different elements of the pipleine (querying data servers, fetching template banks, gating data, filtering data, preparing injections, searching for injections, making plots, etc.).

The existing offline workflow generator is being progressively updated to work with the new executables to perform:

  • injections
  • filtering
  • post-processing and production of results webpage

A dedicated workflow generator) for the post-processing only has been written but will be phased out.

#4288 is dedicated to the points above and allows for a workflow to be generated, with timeslides and partial generation of the results webpage for a single sky-point. An example of the output is available here.

Issue #4430 is dedicated to open points pertaining to the workflow generator(s). Links to an early issue and PR are also worth listing.

The end products of this activity are

  • A complete workflow generator for offline analyses
  • A complete workflow generator for medium latency analysis

Configuration files for testing must be maintained here, where installation and run scripts for LDG clusters are available.

pycbc_multi_inspiral Tasks (Update where possible!)

Issue Description Assigned to Status
Time slides Short (and maybe long) timeslides need to be implemented. This will require changes to event manager: keep track of which time slide we are on and to treat events from a different time slide as separate events. Also requires changes to trig cluster, trig combiner, inj cluster, and inj combiner. Erin, Hannah, Apratim
Find a new reweighted SNR We are now using single detector chisq tests rather than coherent chisq. This means we need a new way of calculating the reweighted SNR. Jam/(Tom) First attempt pushed to master. #3466
Make work with injections Injections were originally too slow. Need to implement the pycbc injfilterrejector stuff to make it go faster. Iain
Rewrite injection code These should use hdf5 and generally not be so ad hoc. We should get rid of inspinj, em filter, jitter sky loc, and align spin codes and replace with one executable that makes suitable injections for a GRB search: piggy back on pycbc_create_injections and pycbc_hdf5_splitbank Sam/Prasia/Steve Missing jittering stage. #3468
Search over sky grid At the moment we can only search over a single sky point. pycbc_multi_inspiral will need to loop over a sky patch. Stephanie old wiki #4380
Circularly polarised coherent SNR This requires a new function to replace the old projection matrix function. It needs to find two matricies (left and right polarised), so it then requires changes later in the code to apply both of these matrices. Jam/(Tom)/Andrew #3599
Frame filelist must be a list of GWFs Cache file functionality to be removed. Can run with cache. -- Not started
Remove precessing stuff from event manager If we are sure we do not want precessing stuff, then there is no point event manager storing these values, so we should remove them. Steve: yes remove as never used and unlikely to be used (new ideas about how to do a precessing search would requrie a re-write). -- Not started.
Add other chisq tests At the moment we only use the power chisq test. pycbc_multi_inspiral is (in theory) able to calculate the other chisq values if given the correct options (untested). At the moment, however, the workflow generation script does not give it these values. Steve: New pycbc chi-sq calc would no longer be a bottle neck, but would be good to check that pygrb has no potential speed up by using these cheaper cuts. Jacob Issues fixed with chisq tests. Attempting to use BestNR as a metric for if the bank and auto chisq test give useful information.

Post-processing and Results Webpage

The work undertaken focused on abandoning all PyLAL dependencies, on having one executable per plot/table-production task rather than a couple of scripts producing several outputs, and on using hdf5 as input instead of xml.

For the time being, all new post-processing executables are placed in pycbc/bin/pygrb and all utility functions in pycbc/results/pygrb_postprocessing_utils.py, or pycbc/results/pygrb_plotting_utils.py, while pycbc/results/legacy_grb.py is destined to disappear (see [#4288]).

  • All PyLAL dependencies in PyGRB have been removed.
  • A list of todos and desiderata is maintained in issue #3660.
  • An issue dedicated to the items remaining for the full transition from xml to hdf5 is #4419

Credit for development so far: Cameron, Duncan, Francesco, Gino, Michael, Nathan, Viviana, Shamita.

Tasks

Issue Description Assigned to Status
xml to hdf Switch from xml to hdf in the trigger combination and clustering. Duncan, with fixes from Erin and Francesco
Add chisq cut At the moment the power chisq is calculated and saved but no cut is made. This should be added to the post-processing so that we can make the cuts plots on the webpage. Possibly sanity check on order of magnitude that there is no speed/memory save by making this cut in the pycbc_multi_inspiral. But post-processing with h5py should be fine. -- Not started
Abandon PyLAL Rewrite of PyLAL plotting scripts and removal of all PyLAL dependencies. Francesco, Gino, Cam
Make new plotting codes work for new filtering output Conversion of xml handling to hdf5 handling. Additionally, the new output has different statistics; the coherent chi2 tests are gone for example. Francesco, Hannah, Viviana, Shamita Done #4034
Postprocessing and webpage production Switch to PyCBC style webpage and webpage generation Francesco, Michael, Nathan ✅ (but regular updates will happen as new results capabilities are introduced)
Automatic follow up of loudest triggers and missed injections Francesco, Michael Done in xml, but needs to work with hdf5 output #4419

Telecons

25 January 2023

Attendance: Jacob, Tito, Francesco, Marco, Erin, Prasia

  • Jacob illustrates the PRs he is working on and the solution to use template ids to overcome one of the points raised in the review of his PR for pycbc_pygrb_efficiency (see #4419). Plans to update the PR with this solution by next week.
  • Marco illustrates his fork for pycbc_pygrb_page_tables (see #4419). Suggestion to avoid duplicating functions and rely on, e.g., ranking.py for methods to compute the new SNR. Will clean up the fork and open a PR once Jacob's is through (there is some dependency).
  • Erin's work on pycbc_pygrb_plot_stats_distribution (see #4419) is not affected by the development above.
  • Francesco approves PR on Rachel's sky-grid code and Tito picks up issue #4610 to improve it.
  • Prasia needs to ping #pycbc-code to decide where to place code to jitter injection distances. Options are
    1. a method in the injection class; this could then be generalized/improved and all PyCBC would benefit from it;
    2. a standalone executable that the PyGRB workflows would use. Participants prefer the first option, but this must be discussed with the wider PyCBC crowd.

21 December 2023 Attendance: Jacob, Tito, Sebastian

Jacob: presented slides with his approved/in-progress PRs

  • Expanded pycbc_multi_inspiral&pygrb_postprocessing_utils to enable loading complete information on the postprocessing side.
    • PR4427 extended timeslides information.
    • PR4542 similar work with segments.
  • Bugfixes on the injection code, see PR4502
  • Currently working on pycbc_pygrb_efficiency PR 4562. Planning to finish it in the next few weeks.

Sebastian: presented some slides with all the performance results for pycbc_multi_inspiral vs lalapps_coh_PTF_inspiral

  • This results will be posted soon in the issue 4434
  • Development friendly comments PR has been approved.
  • Discussed a typical error caused at the pycbc.strain.py level when choosing the wrong input paramenters. Tito suggested doing this at the workflow generation level via comments on the config files and/or opening a PR to try making this error more explicit rather than introducing assert statements on pycbc_multi_inspiral. see issue
  • Shared some insights on possible bottlenecks when reading large frame files.

Tito:

  • Approved Sebastian's PR.
  • Commented about how much template information should Jacob's PR consider, based on what's currently being done by the all-sky searches.
  • Asked Sebastian to share performance test scripts, in order to confirm possible bottleneck while reading frame files.
  • Gave maintainer rights to Jacob and Sebastian since PR conflicts can not be easily resolved without them.

7 December 2023

Attendance: Francesco, Prasia, Sebastian.

Francesco:

  • Reviewed one of Jacob's PRs.
  • Started looking into Tito&Stephanie's PR for the skygrid implementation.

Sebastian:

  • Discussed new performance plots. Found multi_inspiral(lalframe) to spend a lot of time reading 5000+ second .gwf files.
  • Prasia suggested to look at how pycbc_inspiral splits large .gwf files using cache to avoid this bottleneck.

Prasia:

  • Done with the script for computing injected distances from hdf files.
  • Will open issue contaning her fork.
  • The issue will follow the discussion of whether we should put her code in a new separate script or whether it should inside pycbc's existing injection code.

30 November 2023 Attendance: Erin, Hannah, Marco, Sebastian.

Marco:

  • Tables development ready on his fork.
  • Waiting for Sebastian's pycbc_multi_inspiral dev comments PR to be approved.
  • Started running the O4 workflow generation example.

Sebastian:

  • PR still waiting for approval.
  • Currently Running performance tests on CIT pcdev6 matching dorrington's old config on the cardiff cluster.

Erin:

  • Back from holiday. Will continue to look into the dev workflow generation config files to compare the O3 and O4 versions and see how to make them production-ready.

Jacob:

  • Two PRs are ready for a re-review: PR1, PR2

23 November 2023

  • Attendance:Prasia, Marco, Francesco, Sebastian

  • Francesco: will soon look into Tito's and Jacob's PRs.

  • Prasia: working on a script that computes injected distances from hdf files.

    • Sam's notebook living in: home/samuel.higginbotham/Injection_testing/pycbc_test_env/pycbc/jitter_skyloc/eff_distance_testing.ipynb was discussed to help futher injected distance developments
  • Marco: Almost done with the new tables. Still having som issues with avoiding Vetoes. The script is in his fork

  • Sebastian: started to implement the onsource/offsource analysis in pycbc_multi_inspiral. Rerunning the timing scripts one last time to match Dorrington's runs on the Cardiff cluster /home/iain.dorrington/170817_pmi.

16 November 2023

  • Attendance: Marco, Erin, Prasia, Sebastian
  • Sebastian:
    • Waiting for Francesco to approve pull requests for comments and sanity checks within multi_inspiral.
    • Jacob's PR is waiting still.
    • Testing multi_inspiral timing in different template bank sizes vs short slides, incrementing block and segment duration. Created 2D plot of results. Will post plot in Github issue.
  • Marco:
    • Had question about chi_squared. In multi_inspiral, there is both power_chisq, autochisq, and bank_chisq but bank returns nothing, to which Sebastian replied that vetoes have not been applied yet. Marco cannot reweight SNR when they are NONE. The only thing he can work with is power_chisq.
    • Had question about veto files being in XML instead of HDF. Was told to temporarily ignore vetoes until a conclusion is reached about if we will be using them or not.
  • Prasia:
    • Was checking injection distances and now has to test the whole workflow.
  • Erin:
  • Jacob:
    • Slowly working through PR reviews. Segment PR is done (until further review).

02 November 2023

  • Attendance: Francesco, Erin, Prasia, Hannah, Sebastian

  • Prasia:

    • Asked for an example injection hdf5 file computed after the fisher distribution changes.
  • Francesco:

    • Merged Erin's PR and planning on reviewing Jacob's PR soon.
    • Next step: put together the workflow generation with Erin's and Jacob's changes to the postprocessing side.
    • Suggested other developers to check the following config files, ahead of the round of test worflow runs coming soon: https://git.ligo.org/ligo-cbc/pycbc-config/-/tree/master/O4/pygrb/dev
  • Discussion: what to do with the Bank Veto xml files, and how not including them in the analysis would affect several computed statistics of PyGRB vs PyCBC all sky.

  • Sebastian:

    • Done with the dev comments PR https://github.com/gwastro/pycbc/pull/4513.
    • Done with the coh_PTF vs multi_insp comparison in the large block duration regime, will post the plots in the optimization issue soon.
  • Jacob:

26 October 2023

  • Attendance: Sebastian, Prasia, Erin, Hannah
  • Prasia:
    • Use new computed inj distances, get rid of old dependencies (xml —> hdf):
    • Testing, asked if anyone has an hdf inj trigger file generated by the workflow with the embright filter
    • Otherwise code is ready
  • Erin:
    • Waiting on Jacob's PR to go through
    • Will begin testing once post-processing loose ends are tied up
  • Sebastian:
    • Performance:
      • Found a regime in which multi-inspiral is faster than cohptf
      • Cohptf struggles when frame files are large. multi-inspiral handles that much better
      • Multi-inspiral doesn’t need cache files and does not crash for 2000s frames as cohptf does
      • Preparing analysis for this regime for next week
  • Jacob:
    • Update on Slack 10/24: "I just opened a new PR for the segment dictionary (https://github.com/gwastro/pycbc/pull/4542). I think I will have one or two more PRs after this for the remainder of the background stuff, and then the direct changes to pycbc_pygrb_efficiency."

19 October 2023

  • Attendance: Francesco, Tito, Erin, Marco, Hannah, Prasia, Sebastian

  • Tito discussed new skygrid code progress. Now it is possible to do a uniform circular skigrid. See issue

  • Prasia working on phase and amplitude errors for the injected distance computation.

  • Marco working on page tables. Reported problems with construct_trials and load_segment_dict in /pycbc/results/pygrb_postprocessing_utils.py

  • Sebastian showed new coh_PTF vs multi_inspiral performance comparison results. See issue and fork containing the optimized code

  • Erin and Jacob are done with xml to hdf transition on the postprocessing side. Currently waiting for PR approvals to start running full workflow tests. See issue

17 January 2022

  • Attendance: Andrew, Francesco, Erin, Hannah, Sam, Viviana, Steve
  • Call will move to 1 hour earlier from next week (2pm UTC)
  • Erin, Max, Andrew met to discuss time slides work, will be testing/developing together on branch this coming week
  • Francesco, Hannah, Viviana, Andrew to meet to plan work on outstanding post-processing tasks\
  • Sam and Steve have discussed EM-bright injections work and Sam has started coding
  • We might have HEALPix sky grids coming in during O4 (TBC), so we may want to make use of that for injection placement as well as grids for filtering

13 September 2021

  • Attendance: Andrew, Francesco, Max, Jam
  • Andrew: Injection code has an open PR. With Sam will look at getting that merged. Then integrating the EM-bright cut is next on list.
  • ACTION: Finish off injection PR.
  • Jam: Talking with Steve. Issue with the matrix definition, apparent inconsistency needs to be investigated a bit more. Will talk to Tom about the tuning of detection statistic.
  • Max: Can start by looking at workflow generation to check everything still runs end-to-end. Also can test the pycbc_multi_inspiral executable to see how that runs and look at the output. This can feed into the post processing development tasks.
  • ACTION: Andrew to send Max example scripts and config files for these tests.
  • Francesco: Post processing tasks can move on once the code output in hdf5 format can be used.
  • ACTION: Andrew to email usual contributors to check for other updates and send poll on the best time to have meetings this term.

08 Mar 2021

  • Attendance: Andrew, Francesco, Jam, Michael, Ryan, Sam
  • Action item (Francesco): create an issue on gwastro/pycbc for the new PyGRB webpage.
  • Can start including missed injection/loudest offsource event follow-up work by Michael in the webpages/
  • Jam hits an error on CIT running /home/jam.sadiq/pygrbtest/pycbc/IainCodes/ExperimentswithInjections/runscript_pycbcMultiInspiral.sh. Discussed how to proceed.
  • Sam will have to check the reference frame spins of injections are defined in; if it is the one aligned with the total angular momentum, we should be fine and no extra layer of coding will be needed.

01 Mar 2021

  • Attendance: Cam, Francesco, Jam, Michael, Sam
  • Discussed use of injections in pycbc_multi_inspiral and dc-errs in pycbc_pygrb_efficiency

22 Feb 2021

  • Attendance: Cam, Francesco, Jam, Ryan, Sam
  • Francesco & Cam: old pylal_cbc_cohptf_efficiency code split up in 4 independent executables that tackle a specific need (injection plots, statistics vs FAP distributions, html tables, efficiency calculations. Now we can clean these up idependently, optimize, etc.
  • Sam: manage to reproduce original injection distributions with my tools for a single point and I know have to work on the sky-jittering.
  • Jam: able to use pycbc_create_injections and learning about inclination angles. Problems with options to then use these in pycbc_multi_inspiral.

The rest of the call is dedicated to debugging this and it appears a more recent installation of PyCBC is needed.

15 Feb 2021

  • Attendance: Francesco, Jam, Ryan, Sam
  • Francesco: progress on simplifying pylal_cohptf_efficiency. All injection related scatter plots are now produced in a loop rather than serially. The loop will then be handled at workflow level and the plot production by a stand-alone executable. Once efficiency curve plots are also handled similarly, we will be able to abandon pylal_cohptf_efficiency, generate PyGRB output pages in PyCBC style, and move on to hdf5.
  • Jam: trying to experiment with new definitions of reweighted SNR. Help needed in generating injections. Will iterate with Sam on slack
  • Sam: learning more about hdf5 file creation and manipulation.
  • Ryan: starting summer student on PyGRB. Expected to work on chi-square cuts in June and July.

8 Feb 2021

  • Attendance: Andrew, Cam, Francesco, Jam, Ryan, Sam
  • Jam: expect to start making progress on new reweighted SNR starting next week
  • Sam: rudimentary ligolw_cbc_jitter_skyloc runs with hdf5 files I/O.
  • Jam: Circularly polarised coherent SNR function now works. Discussion follows on trigger timeseries with standard SNR and left-polarized SNR and how to compare them. Suggestion is to compare noise distributions and start with face-on injections. Then increase inclination and compare behaviours.

25 Jan 2021

  • Attendance: Andrew, Francesco, Jam, Michal, Michael
  • Summary of previous call.
  • Jam made good progress on circular polarization. His implementation runs without failures. Will check and understand all output and the submit for review in the coming 1-2 weeks.

18 Jan 2021

  • Updates on pylal_cbc_cohptf_efficiency rewrite and injection handling.
  • Sorted out usage of Gaussian prior distribution in setting up injection configuration files.

17 Dec 2020

10 Dec 2020

  • Please document your task(s), indicating git repo where devel work happens and code(s) that will be changed.
  • Injections work (Andrew/Sam/Steve): had a telecon about this, starting after Christmas break.
  • Iain’s thesis (linked above) formulated a few ideas for new reweighted SNR and he had carried out tests on raven/hawk. He tried to write a chi-square that focuses on a single det SNR. This was weighted by the detector SNR.
  • Jam (and Tom) on this task.
  • Jam (with Andrew and Tom’s guidance) also on circular polarized SNR but the reweighted SNR task has higher priority. This involves implementing the formula from Andrew’s paper and checking the paper-code conventions are consistent.
  • KAGRA colleagues can log on clusters and run (short!) tests on the headnodes, but cannot sumbit to queues.

3 Dec 2020

  • Attendance: Francesco, Hideyuki, Jam, Patrick, Philippe, Ryan, Samuel, Tessa
  • We will host this call at 2 PM GMT until the end of the year. Francesco will set up a new poll for the first half of 2021.
  • Tessa: I set up repo for end-to-end test runs; contains a very reduced template bank and a few GRBs from O3a.
  • Timeslides and skyloc fine with Tessa and Cam, but not the rest: Tessa will update the tables in this wikipage accordingly.
  • Cameron has started work on pylal_inj_efficiency. Francesco committing to it too.
  • Sam taking over from where Cam and Tessa leave.
  • Next call: continue assigning tasks and outlining work to be done.
  • 2 weeks from now: add a wikipage to your taks(s) and document work done and todo as much as possible. Point to commits, lines of code, etc.
  • Jam will inquire about picking up the two 2 SNR-related items.

20 Sep 2020

  • Tasks were discussed and prioritized. Task specific comments were added to the descriptions above. Other general remarks below.
  • Steve: Overarching goal is to get a pipeline that runs and produce a webpage maybe for a single sky point with just short slides.
  • Francesco: pycbc_multi_inspiral does not support xml.
  • Patrick: split execs into “do calculations” and “make plots” so that different versions of the code can be compared.
  • Francesco: Yes this is the plan.
  • Francesco: would be good to work on the efficiency stuff now so that when we have the new executable can make a webpage.
  • Andrew: suggest development cycles with forks to implement a specific task and merge often.
  • The timeline is April 2021. Mid November goal: pylal efficiency gone. End of November: time slides. First milestone by Christmas 2020: get short timeslides with one sky point webpage working.