pgsc_calc v1.3.0
This release is focused on improving scalability.
Features
- Variant matching is made more efficient using a split - apply - combine approach when the data is split across chromosomes. This supports parallel PGS calculation for the largest traits in the PGS Catalog (e.g. cancer, 418 PGS [avg 261,000 variants/score]) on big datasets such as UK Biobank.
- Better support for running in offline environments:
- Internet access is only required to download scores by ID. Scores can be pre-downloaded using the utils package (https://pypi.org/project/pgscatalog-utils/)
- Scoring file metadata is read from headers and displayed in the report (removed API calls during report generation)
- Implemented flag (
-–efo_direct
) to return only PGS tagged with exact EFO term (e.g. no PGS for child/descendant terms in the ontology)