legend-exp · ggmarshall · Oct 9, 2024 · Oct 20, 2024 · Oct 21, 2024 · Oct 21, 2024
diff --git a/.gitignore b/.gitignore
@@ -77,7 +77,7 @@ instance/
 .scrapy
 
 # Sphinx documentation
-/docs/build/
+/docs/_build/
 /docs/source/generated
 
 # PyBuilder
@@ -113,3 +113,5 @@ venv.bak/
 
 # mypy
 .mypy_cache/
+
+docs/source/api
diff --git a/.readthedocs.yaml b/.readthedocs.yaml
@@ -0,0 +1,22 @@
+version: 2
+
+build:
+  os: "ubuntu-22.04"
+  tools:
+    python: "3.12"
+  commands:
+    # FIXME: dependencies should not be explicitly listed here!
+    - asdf plugin add uv
+    - asdf install uv latest
+    - asdf global uv latest
+    - uv venv
+    - uv pip install .[docs]
+    - rm -rf docs/source/api
+    - .venv/bin/python -m sphinx.ext.apidoc
+      --private
+      --module-first
+      --force
+      --output-dir docs/source/api
+      scripts
+    - .venv/bin/python -m sphinx -T -b html -d docs/_build/doctrees -D
+      language=en docs/source $READTHEDOCS_OUTPUT/html
diff --git a/.ruff.toml b/.ruff.toml
@@ -12,7 +12,7 @@ lint.select = [
   "PIE",         # flake8-pie
   "PL",          # pylint
   "PT",          # flake8-pytest-style
-  # "PTH",         # flake8-use-pathlib
+  "PTH",         # flake8-use-pathlib
   "RET",         # flake8-return
   "RUF",         # Ruff-specific
   "SIM",         # flake8-simplify

diff --git a/LICENSE.md b/LICENSE.md
@@ -1,9 +1,11 @@
-The legend-dataflow-hades package is licensed under the MIT "Expat" License:
+The legend-dataflow package is licensed under the MIT "Expat" License:
 
 > Copyright (c) 2021:
 >
 >    Matteo Agostini <[email protected]>
 >    Oliver Schulz <[email protected]>
+>    George Marshall <[email protected]>
+>    Luigi Pertoldi <[email protected]>
 >
 > Permission is hereby granted, free of charge, to any person obtaining a copy
 > of this software and associated documentation files (the "Software"), to deal

diff --git a/README.md b/README.md
@@ -3,115 +3,3 @@
 Implementation of an automatic data processing flow for L200
 data, based on
 [Snakemake](https://snakemake.readthedocs.io/).
-
-
-## Configuration
-
-Data processing resources are configured via a single site-dependent (and
-possibly user-dependent) configuration file, named `config.json` in the
-following. You may choose an arbitrary name, though.
-
-Use the included [templates/config.json](templates/config.json) as a template
-and adjust the data base paths as necessary. Note that, when running Snakemake,
-the default path to the config file is `./config.json`.
-
-
-## Key-Lists
-
-Data generation is based on key-lists, which are flat text files
-(extension ".keylist") containing one entry of the form
-`{experiment}-{period}-{run}-{datatype}-{timestamp}` per line.
-
-Key-lists can be auto-generated based on the available  DAQ files
-using Snakemake targets of the form
-
-* `all-{experiment}.keylist`
-* `all-{experiment}-{period}.keylist`
-* `all-{experiment}-{period}-{run}.keylist`
-* `all-{experiment}-{period}-{run}-{datatype}.keylist`
-
-which will generate the list of available file keys for all l200 files, resp.
-a specific period, or a specific period and run, etc.
-
-For example:
-```shell
-$ snakemake all-l200-myper.keylist
-```
-will generate a key-list with all files regarding period `myper`.
-
-
-## File-Lists
-
-File-lists are flat files listing output files that should be generated,
-with one file per line. A file-list will typically be generated for a given
-data tier from a key-list, using the Snakemake targets of the form
-`{label}-{tier}.filelist` (generated from `{label}.keylist`).
-
-For file lists based on auto-generated key-lists like
-`all-{experiment}-{period}-{tier}.filelist`, the corresponding key-list
-(`all-{experiment}-{period}.keylist` in this case) will be created
-automatically, if it doesn't exist.
-
-Example:
-```shell
-$ snakemake all-mydet-mymeas-tier2.filelist
-```
-
-File-lists may of course also be derived from custom keylists, generated
-manually or by other means, e.g. `my-dataset-raw.filelist` will be
-generated from `my-dataset.keylist`.
-
-
-## Main output generation
-
-Usually, the main output will be determined by a file-list, resp. a key-list
-and data tier. The special output target `{label}-{tier}.gen` is used to
-generate all files listed in `{label}-{tier}.filelist`. After the files
-are created, the empty file `{label}-{tier}.filelist` will be created to
-mark the successful data production.
-
-Snakemake targets like `all-{experiment}-{period}-{tier}.gen` may be used
-to automatically generate key-lists and file-lists (if not already present)
-and produce all possible output for the given data tier, based on available
-tier0 files which match the target.
-
-Example:
-```shell
-$ snakemake all-mydet-mymeas-tier2.gen
-```
-Targets like `my-dataset-raw.gen` (derived from a key-list
-`my-dataset.keylist`) are of course allowed as well.
-
-
-## Monitoring
-
-Snakemake supports monitoring by connecting to a
-[panoptes](https://github.com/panoptes-organization/panoptes) server.
-
-Run (e.g.)
-```shell
-$ panoptes --port 5000
-```
-in the background to run a panoptes server instance, which comes with a
-GUI that can be accessed with a web-brower on the specified port.
-
-Then use the Snakemake option `--wms-monitor` to instruct Snakemake to push
-progress information to the panoptes server:
-```shell
-snakemake --wms-monitor http://127.0.0.1:5000 [...]
-```
-
-## Using software containers
-
-This dataflow doesn't use Snakemake's internal Singularity support, but
-instead supports Singularity containers via
-[`venv`](https://github.com/oschulz/singularity-venv) environments
-for greater control.
-
-To use this, the path to `venv` and the name of the environment must be set
-in `config.json`.
-
-This is only relevant then running Snakemake *outside* of the software
-container, e.g. then using a batch system (see below). If Snakemake
-and the whole workflow is run inside of a container instance, no
-container-related settings in `config.json` are required.
diff --git a/Snakefile b/Snakefile
@@ -10,18 +10,17 @@ This includes:
 - the same for partition level tiers
 """
 
-import pathlib
+from pathlib import Path
 import os
-import json
 import sys
 import glob
 from datetime import datetime
 from collections import OrderedDict
 import logging
 
 import scripts.util as ds
-from scripts.util.pars_loading import pars_catalog
-from scripts.util.patterns import get_pattern_tier_raw
+from scripts.util.pars_loading import ParsCatalog
+from scripts.util.patterns import get_pattern_tier
 from scripts.util.utils import (
     subst_vars_in_snakemake_config,
     runcmd,
@@ -31,6 +30,7 @@ from scripts.util.utils import (
     metadata_path,
     tmp_log_path,
     pars_path,
+    det_status_path,
 )
 
 # Set with `snakemake --configfile=/path/to/your/config.json`
@@ -43,8 +43,9 @@ setup = config["setups"]["l200"]
 configs = config_path(setup)
 chan_maps = chan_map_path(setup)
 meta = metadata_path(setup)
+det_status = det_status_path(setup)
 swenv = runcmd(setup)
-part = ds.dataset_file(setup, os.path.join(configs, "partitions.json"))
+part = ds.CalGrouping(setup, Path(det_status) / "cal_groupings.yaml")
 basedir = workflow.basedir
 
 
@@ -66,38 +67,13 @@ include: "rules/psp.smk"
 include: "rules/hit.smk"
 include: "rules/pht.smk"
 include: "rules/pht_fast.smk"
+include: "rules/ann.smk"
 include: "rules/evt.smk"
 include: "rules/skm.smk"
 include: "rules/blinding_calibration.smk"
 include: "rules/qc_phy.smk"
 
 
-# Log parameter catalogs in validity.jsonl files
-hit_par_cat_file = os.path.join(pars_path(setup), "hit", "validity.jsonl")
-if os.path.isfile(hit_par_cat_file):
-    os.remove(os.path.join(pars_path(setup), "hit", "validity.jsonl"))
-pathlib.Path(os.path.dirname(hit_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(hit_par_catalog, hit_par_cat_file)
-
-pht_par_cat_file = os.path.join(pars_path(setup), "pht", "validity.jsonl")
-if os.path.isfile(pht_par_cat_file):
-    os.remove(os.path.join(pars_path(setup), "pht", "validity.jsonl"))
-pathlib.Path(os.path.dirname(pht_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(pht_par_catalog, pht_par_cat_file)
-
-dsp_par_cat_file = os.path.join(pars_path(setup), "dsp", "validity.jsonl")
-if os.path.isfile(dsp_par_cat_file):
-    os.remove(dsp_par_cat_file)
-pathlib.Path(os.path.dirname(dsp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(dsp_par_catalog, dsp_par_cat_file)
-
-psp_par_cat_file = os.path.join(pars_path(setup), "psp", "validity.jsonl")
-if os.path.isfile(psp_par_cat_file):
-    os.remove(psp_par_cat_file)
-pathlib.Path(os.path.dirname(psp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-ds.pars_key_resolve.write_to_jsonl(psp_par_catalog, psp_par_cat_file)
-
-
 localrules:
     gen_filelist,
     autogen_output,
@@ -111,36 +87,48 @@ onstart:
         shell('{swenv} python3 -B -c "import ' + pkg + '"')
 
         # Log parameter catalogs in validity.jsonl files
-    hit_par_cat_file = os.path.join(pars_path(setup), "hit", "validity.jsonl")
-    if os.path.isfile(hit_par_cat_file):
-        os.remove(os.path.join(pars_path(setup), "hit", "validity.jsonl"))
-    pathlib.Path(os.path.dirname(hit_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(hit_par_catalog, hit_par_cat_file)
-
-    pht_par_cat_file = os.path.join(pars_path(setup), "pht", "validity.jsonl")
-    if os.path.isfile(pht_par_cat_file):
-        os.remove(os.path.join(pars_path(setup), "pht", "validity.jsonl"))
-    pathlib.Path(os.path.dirname(pht_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(pht_par_catalog, pht_par_cat_file)
-
-    dsp_par_cat_file = os.path.join(pars_path(setup), "dsp", "validity.jsonl")
-    if os.path.isfile(dsp_par_cat_file):
-        os.remove(dsp_par_cat_file)
-    pathlib.Path(os.path.dirname(dsp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(dsp_par_catalog, dsp_par_cat_file)
-
-    psp_par_cat_file = os.path.join(pars_path(setup), "psp", "validity.jsonl")
-    if os.path.isfile(psp_par_cat_file):
-        os.remove(psp_par_cat_file)
-    pathlib.Path(os.path.dirname(psp_par_cat_file)).mkdir(parents=True, exist_ok=True)
-    ds.pars_key_resolve.write_to_jsonl(psp_par_catalog, psp_par_cat_file)
+    hit_par_cat_file = Path(pars_path(setup)) / "hit" / "validity.yaml"
+    if hit_par_cat_file.is_file():
+        hit_par_cat_file.unlink()
+    try:
+        Path(hit_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+        ParsKeyResolve.write_to_yaml(hit_par_catalog, hit_par_cat_file)
+    except NameError:
+        print("No hit parameter catalog found")
+
+    pht_par_cat_file = Path(pars_path(setup)) / "pht" / "validity.yaml"
+    if pht_par_cat_file.is_file():
+        pht_par_cat_file.unlink()
+    try:
+        Path(pht_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+        ParsKeyResolve.write_to_yaml(pht_par_catalog, pht_par_cat_file)
+    except NameError:
+        print("No pht parameter catalog found")
+
+    dsp_par_cat_file = Path(pars_path(setup)) / "dsp" / "validity.yaml"
+    if dsp_par_cat_file.is_file():
+        dsp_par_cat_file.unlink()
+    try:
+        Path(dsp_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+        ParsKeyResolve.write_to_yaml(dsp_par_catalog, dsp_par_cat_file)
+    except NameError:
+        print("No dsp parameter catalog found")
+
+    psp_par_cat_file = Path(pars_path(setup)) / "psp" / "validity.yaml"
+    if psp_par_cat_file.is_file():
+        psp_par_cat_file.unlink()
+    try:
+        Path(psp_par_cat_file).parent.mkdir(parents=True, exist_ok=True)
+        ParsKeyResolve.write_to_yaml(psp_par_catalog, psp_par_cat_file)
+    except NameError:
+        print("No psp parameter catalog found")
 
 
 onsuccess:
     from snakemake.report import auto_report
 
     rep_dir = f"{log_path(setup)}/report-{datetime.strftime(datetime.utcnow(), '%Y%m%dT%H%M%SZ')}"
-    pathlib.Path(rep_dir).mkdir(parents=True, exist_ok=True)
+    Path(rep_dir).mkdir(parents=True, exist_ok=True)
     # auto_report(workflow.persistence.dag, f"{rep_dir}/report.html")
 
     with open(os.path.join(rep_dir, "dag.txt"), "w") as f:
@@ -181,26 +169,20 @@ onsuccess:
 rule gen_filelist:
     """Generate file list.
 
-    It is a checkpoint so when it is run it will update the dag passed on the
-    files it finds as an output. It does this by taking in the search pattern,
-    using this to find all the files that match this pattern, deriving the keys
-    from the files found and generating the list of new files needed.
+    This rule is used as a "checkpoint", so when it is run it will update the
+    DAG based on the files it finds. It does this by taking in the search
+    pattern, using this to find all the files that match this pattern, deriving
+    the keys from the files found and generating the list of new files needed.
     """
     input:
         lambda wildcards: get_filelist(
             wildcards,
             setup,
-            get_pattern_tier_raw(setup),
-            ignore_keys_file=os.path.join(configs, "ignore_keys.keylist"),
-            analysis_runs_file=os.path.join(configs, "analysis_runs.json"),
+            get_search_pattern(wildcards.tier),
+            ignore_keys_file=Path(det_status) / "ignored_daq_cycles.yaml",
+            analysis_runs_file=Path(det_status) / "runlists.yaml",
         ),
     output:
-        os.path.join(filelist_path(setup), "{label}-{tier}.filelist"),
-    run:
-        if len(input) == 0:
-            print(
-                "WARNING: No files found for the given pattern\nmake sure pattern follows the format: all-{experiment}-{period}-{run}-{datatype}-{timestamp}-{tier}.gen"
-            )
-        with open(output[0], "w") as f:
-            for fn in input:
-                f.write(f"{fn}\n")
+        temp(Path(filelist_path(setup)) / "{label}-{tier}.filelist"),
+    script:
+        "scripts/write_filelist.py"