Releases · shz9/magenpy · GitHub

03 Dec 08:55

shz9

v0.1.4 Latest

Latest

Changed

Updated the data type for the index pointer in the LDMatrix object to be int64. int32 does
not work well for very large datasets with millions of variants and it causes overflow errors.
Updated the way we determine the pandas chunksize when converting from plink tables to zarr.
Simplified the way we compute the quantization scale in model_utils.
Fixed major bug in how LD window thresholds that are passed to plink1.9 are computed.
Fixed in-place fillna in from_plink_table in LDMatrix to conform to latest pandas API.
Update run_shell_script to check for and capture errors.
Refactored code to slightly reduce import/load times.
Cleaned up load_data method of LDMatrix and subsumed functionality in load_rows.
Fixed bugs in match_snp_tables.
Fixed bugs and re-wrote how the block LD estimator is computed using both the plink and xarray backends.
Updated from_plink_table method in LDMatrix to handle cases where boundaries are different from what
plink computes.
Fixed bug in symmetrize_ut_csr_matrix utility functions.
Changed default storage data type for LD matrices to int16.

Added

Added extra validation checks in LDMatrix to ensure that the index pointer is formatted correctly.
LDLinearOperator class to allow for efficient linear algebra operations on the LD matrix without
representing the full symmetric matrix in memory.
Added utility methods to LDMatrix class to allow for computing eigenvalues, performing SVD, etc.
Added Spectral properties to the attributes of LD matrices.
Added support to slice/retrieve entries of LD matrix by using SNP rsIDs.
Added support to reading LD matrices from AWS s3 storage.
Added utility method to detect if a file contains header information.
Added utility method to generate overlapping windows over a sequence.
Added compute_extremal_eigenvalues to allow the user to compute extremal (minimum and maximum) eigenvalues
of LD matrices.
Added the utility function combine_ld_matrices to allow for combining LD matrices from different sources.

Assets 2

03 Jun 01:06

shz9

v0.1.3

Changed

Updated the logic for detect_outliers in phenotype transforms to actually reflect the function
name (before it was returning true for inliers...).
Updated quantize and dequantize to minimize data copying as much as possible.
Updated LDMatrix.load_rows() method to minimize data copying.
Fixed bug in LDMatrix.n_neighbors implementation.
Updated dask version in requirements.txt to avoid installing dask-expr.

Added

Added get_peak_memory_usage to system_utils to inspect peak memory usage of a process.
Placeholder method to perform QC on SumstatsTable objects (needs to be implemented still).
New attached dataset for long-range LD regions.
New method in SumstatsTable to impute rsID (if missing).
Preliminary support for matching with CHR+POS in SumstatsTable (still needs more work).
LDMatrix updates:
- New method to filter long-range LD regions.
- New method to prune LD matrix.
New algorithm for symmetrizing upper triangular and block diagonal LD matrices.
- Much faster and more memory efficient than using scipy.
- New LDMatrix class has efficient data loading in .load_data method.
- We still retain load_rows because it is useful for loading a subset of rows.

Assets 2

25 Apr 18:52

shz9

v0.1.2

Changed

Fixed manhattan plot implementation to support various new features.
Added a warning when accessing csr_matrix property of LDMatrix when it hasn't been loaded
previously.

Added

reset_mask method for magenpy LDMatrix.
Dockerfiles for both cli and jupyter modes.
A helper script to convert LD matrices from old format to new format.

Assets 2

12 Apr 15:50

shz9

v0.1.1

Fixed bugs in how covariates are processed in SampleTable.
Fixed bugs / issues in implementation of GWAS with xarray backend.
Streamlined implementation of manhattan plotting function.

Assets 2

05 Apr 00:47

shz9

v0.1.0

Small updates / bug fixes to workflow scripts.

Assets 2

04 Apr 23:12

shz9

v0.1.0

A large scale restructuring of the code base to improve efficiency and usability.

Changed

Bug fixes across the entire code base.
Simulator classes have been renamed from GWASimulator to PhenotypeSimulator.
Moved plotting script to its own separate module.
Updated some method names / commandline flags to be consistent throughout.

Added

Basic integration testing with pytest and GitHub workflows.
Documentation for the entire package using mkdocs.
Integration testing / automating building with GitHub workflows.
New implementation of the LD matrix that uses CSR matrix data structures.
- Quantization / float precision specification when storing LD matrices.
- Allow user to specify Compressor / Compressor options for Zarr storage.
New implementation of magenpy_simulate script.
- Allow users to set random seed.
- Now accept --prop-causal instead of specifying full mixing proportions.
Tried to incorporate genome_build into various data structures. This will be useful in the
future to ensure consistent genome builds across different data types.
Allow user to pass various metadata to magenpy_ld to save information about dataset
characteristics.
New sumstats parsers:
- Saige sumstats format.
- plink1.9 sumstats format.
- GWAS Catalog sumstats format.
Chained transform function for transforming phenotypes.

Assets 2