Releases: shz9/magenpy
Releases · shz9/magenpy
v0.1.4
Changed
- Updated the data type for the index pointer in the
LDMatrix
object to beint64
.int32
does
not work well for very large datasets with millions of variants and it causes overflow errors. - Updated the way we determine the
pandas
chunksize when converting fromplink
tables tozarr
. - Simplified the way we compute the quantization scale in
model_utils
. - Fixed major bug in how LD window thresholds that are passed to
plink1.9
are computed. - Fixed in-place
fillna
infrom_plink_table
inLDMatrix
to conform to latestpandas
API. - Update
run_shell_script
to check for and capture errors. - Refactored code to slightly reduce import/load times.
- Cleaned up
load_data
method ofLDMatrix
and subsumed functionality inload_rows
. - Fixed bugs in
match_snp_tables
. - Fixed bugs and re-wrote how the
block
LD estimator is computed using both theplink
andxarray
backends. - Updated
from_plink_table
method inLDMatrix
to handle cases where boundaries are different from what
plink
computes. - Fixed bug in
symmetrize_ut_csr_matrix
utility functions. - Changed default storage data type for LD matrices to
int16
.
Added
- Added extra validation checks in
LDMatrix
to ensure that the index pointer is formatted correctly. LDLinearOperator
class to allow for efficient linear algebra operations on the LD matrix without
representing the full symmetric matrix in memory.- Added utility methods to
LDMatrix
class to allow for computing eigenvalues, performing SVD, etc. - Added
Spectral properties
to the attributes of LD matrices. - Added support to slice/retrieve entries of LD matrix by using SNP rsIDs.
- Added support to reading LD matrices from AWS s3 storage.
- Added utility method to detect if a file contains header information.
- Added utility method to generate overlapping windows over a sequence.
- Added
compute_extremal_eigenvalues
to allow the user to compute extremal (minimum and maximum) eigenvalues
of LD matrices. - Added the utility function
combine_ld_matrices
to allow for combining LD matrices from different sources.
v0.1.3
Changed
- Updated the logic for
detect_outliers
in phenotype transforms to actually reflect the function
name (before it was returning true for inliers...). - Updated
quantize
anddequantize
to minimize data copying as much as possible. - Updated
LDMatrix.load_rows()
method to minimize data copying. - Fixed bug in
LDMatrix.n_neighbors
implementation. - Updated
dask
version inrequirements.txt
to avoid installingdask-expr
.
Added
- Added
get_peak_memory_usage
tosystem_utils
to inspect peak memory usage of a process. - Placeholder method to perform QC on
SumstatsTable
objects (needs to be implemented still). - New attached dataset for long-range LD regions.
- New method in SumstatsTable to impute rsID (if missing).
- Preliminary support for matching with CHR+POS in SumstatsTable (still needs more work).
- LDMatrix updates:
- New method to filter long-range LD regions.
- New method to prune LD matrix.
- New algorithm for symmetrizing upper triangular and block diagonal LD matrices.
- Much faster and more memory efficient than using
scipy
. - New
LDMatrix
class has efficient data loading in.load_data
method. - We still retain
load_rows
because it is useful for loading a subset of rows.
- Much faster and more memory efficient than using
v0.1.2
Changed
- Fixed
manhattan
plot implementation to support various new features. - Added a warning when accessing
csr_matrix
property ofLDMatrix
when it hasn't been loaded
previously.
Added
reset_mask
method for magenpyLDMatrix
.Dockerfile
s for bothcli
andjupyter
modes.- A helper script to convert LD matrices from old format to new format.
v0.1.1
v0.1.0
Small updates / bug fixes to workflow scripts.
v0.1.0
A large scale restructuring of the code base to improve efficiency and usability.
Changed
- Bug fixes across the entire code base.
- Simulator classes have been renamed from
GWASimulator
toPhenotypeSimulator
. - Moved plotting script to its own separate module.
- Updated some method names / commandline flags to be consistent throughout.
Added
- Basic integration testing with
pytest
and GitHub workflows. - Documentation for the entire package using
mkdocs
. - Integration testing / automating building with GitHub workflows.
- New implementation of the LD matrix that uses CSR matrix data structures.
- Quantization / float precision specification when storing LD matrices.
- Allow user to specify Compressor / Compressor options for Zarr storage.
- New implementation of
magenpy_simulate
script.- Allow users to set random seed.
- Now accept
--prop-causal
instead of specifying full mixing proportions.
- Tried to incorporate
genome_build
into various data structures. This will be useful in the
future to ensure consistent genome builds across different data types. - Allow user to pass various metadata to
magenpy_ld
to save information about dataset
characteristics. - New sumstats parsers:
- Saige sumstats format.
- plink1.9 sumstats format.
- GWAS Catalog sumstats format.
- Chained transform function for transforming phenotypes.