Skip to content

Commit

Permalink
Merge pull request #161 from mgalardini/wg_lineage
Browse files Browse the repository at this point in the history
Fix elastic net with lineages
  • Loading branch information
mgalardini authored Jun 25, 2021
2 parents 2e27979 + 44ea141 commit e30810c
Show file tree
Hide file tree
Showing 7 changed files with 77 additions and 16 deletions.
2 changes: 1 addition & 1 deletion pyseer/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@

'''Python reimplementation of SEER for bacterial GWAS'''

__version__ = '1.3.8'
__version__ = '1.3.9'
9 changes: 5 additions & 4 deletions pyseer/__main__.py
Original file line number Diff line number Diff line change
Expand Up @@ -383,6 +383,10 @@ def main():
if options.lineage_clusters:
lineage_clusters, lineage_dict = load_lineage(options.lineage_clusters, p)

# Keep a copy, as these have a column removed below
lineage_clusters_full = np.copy(lineage_clusters)
lineage_dict_full = lineage_dict.copy()

lineage_dict_full = lineage_dict
if options.lineage:
lineage_samples = p.index # this is ensured in load_lineage
Expand All @@ -404,10 +408,7 @@ def main():

min_lineage = min(lineage_wald.items(), key=operator.itemgetter(1))[0]

# Remove from objects, but keep a copy
lineage_clusters_full = np.copy(lineage_clusters)
lineage_dict_full = lineage_dict.copy()

# Remove from objects (full copy kept above)
min_index = lineage_dict.index(min_lineage)
lineage_clusters = np.delete(lineage_clusters, min_index, 1)
del lineage_dict[min_index]
Expand Down
40 changes: 34 additions & 6 deletions tests/baseline/35.err
Original file line number Diff line number Diff line change
@@ -1,8 +1,36 @@
Read 50 phenotypes
Detected binary phenotype
Loaded projection with dimension (50, 35)
Analysing 50 samples found in both phenotype and structure matrix
3 loaded variants
0 pre-filtered variants
3 tested variants
3 printed variants
Reading all variants
Analysing 50 samples found in both phenotype and loaded npy
Applying correlation filtering
0%| | 0/234 [00:00<?, ?variants/s] 60%|██████ | 141/234 [00:00<00:00, 1409.89variants/s]100%|██████████| 234/234 [00:00<00:00, 1433.03variants/s]
Fitting elastic net to top 163 variants
Best penalty (lambda) from cross-validation: 6.33E-02
Best model deviance from cross-validation: 1.249 ± 1.37E-01
Best R^2 from cross-validation: -0.136
Predictions within each lineage
Lineage Size R2 TP TN FP FN
BAPS_1 3 1.000 2 1 0 0
BAPS_10 3 -2.000 0 1 0 2
BAPS_12 4 -1.000 2 0 2 0
BAPS_14 4 -0.333 2 1 0 1
BAPS_15 1 nan 1 0 0 0
BAPS_16 1 nan 1 0 0 0
BAPS_19 2 -1.000 1 0 1 0
BAPS_2 3 nan 3 0 0 0
BAPS_20 1 nan 0 1 0 0
BAPS_22 1 nan 0 1 0 0
BAPS_27 5 -0.667 2 1 2 0
BAPS_28 2 nan 2 0 0 0
BAPS_29 1 nan 0 1 0 0
BAPS_3 2 -1.000 1 0 1 0
BAPS_4 2 nan 0 2 0 0
BAPS_5 2 1.000 1 1 0 0
BAPS_6 2 1.000 1 1 0 0
BAPS_7 5 -0.667 3 0 2 0
BAPS_9 6 -1.000 3 0 3 0
Finding and printing selected variants
886 loaded variants
723 pre-filtered variants
163 tested variants
22 printed variants
Expand Down
27 changes: 23 additions & 4 deletions tests/baseline/35.log
Original file line number Diff line number Diff line change
@@ -1,4 +1,23 @@
variant af filter-pvalue lrt-pvalue beta beta-std-err intercept PC1 PC2 PC3 notes
CDS1 8.00E-02 8.01E-01 1.00E+00 -4.57E-01 9.97E-01 2.75E-01 -9.90E-01 -3.91E-01 4.33E-01 bad-chisq
CDS2 1.40E-01 4.50E-01 1.00E+00 -8.47E-02 1.31E+00 2.51E-01 -9.13E-01 -3.30E-01 4.53E-01 bad-chisq
CDS3 2.20E-01 4.25E-01 6.99E-01 -3.25E-01 8.40E-01 3.29E-01 -9.59E-01 -4.28E-01 5.52E-01
variant af filter-pvalue lrt-pvalue beta notes
FM211187_184_G_A 4.00E-02 2.01E-01 3.40E-01 NA bad-chisq
FM211187_293_G_A 2.00E-02 2.54E-01 -6.69E-01 NA bad-chisq
FM211187_869_C_T 1.00E-01 7.84E-03 -1.63E+00 NA bad-chisq
FM211187_926_G_A 2.00E-02 2.54E-01 -8.91E-05 NA bad-chisq
FM211187_1981_G_A 9.40E-01 1.13E-01 2.66E-02 NA bad-chisq
FM211187_2032_C_A 2.00E-02 2.54E-01 -6.78E-01 NA bad-chisq
FM211187_2865_C_T 2.00E-02 2.54E-01 -4.09E-05 NA bad-chisq
FM211187_2943_T_C 1.20E-01 2.06E-02 1.44E+00 NA bad-chisq
FM211187_3982_C_A 4.00E-02 2.01E-01 1.39E-01 NA bad-chisq
FM211187_6054_C_T 6.00E-02 1.13E-01 4.74E-03 NA bad-chisq
FM211187_6139_A_G 2.00E-02 2.54E-01 -4.40E-02 NA bad-chisq
FM211187_7799_C_T 4.00E-02 2.01E-01 2.46E-04 NA bad-chisq
FM211187_8872_A_G 2.40E-01 2.51E-01 -1.29E-02 NA
FM211187_10838_C_T 5.20E-01 1.64E-01 -4.50E-01 NA
FM211187_11527_T_C 4.80E-01 1.64E-01 -2.77E-01 NA
FM211187_11559_T_G 4.00E-02 2.01E-01 8.47E-05 NA bad-chisq
FM211187_11633_A_G 2.00E-02 2.54E-01 -2.81E-03 NA bad-chisq
FM211187_11762_G_T 6.00E-02 1.13E-01 3.16E-03 NA bad-chisq
FM211187_12304_C_CTTATA 2.00E-02 3.71E-01 7.90E-01 NA bad-chisq
FM211187_13550_G_A 4.00E-02 2.01E-01 9.41E-06 NA bad-chisq
FM211187_13781_C_T 2.00E-02 2.54E-01 -1.02E-04 NA bad-chisq
FM211187_14044_G_A 3.00E-01 1.06E-01 5.63E-01 NA
8 changes: 8 additions & 0 deletions tests/baseline/36.err
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Read 50 phenotypes
Detected binary phenotype
Loaded projection with dimension (50, 35)
Analysing 50 samples found in both phenotype and structure matrix
3 loaded variants
0 pre-filtered variants
3 tested variants
3 printed variants
4 changes: 4 additions & 0 deletions tests/baseline/36.log
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
variant af filter-pvalue lrt-pvalue beta beta-std-err intercept PC1 PC2 PC3 notes
CDS1 8.00E-02 8.01E-01 1.00E+00 -4.57E-01 9.97E-01 2.75E-01 -9.90E-01 -3.91E-01 4.33E-01 bad-chisq
CDS2 1.40E-01 4.50E-01 1.00E+00 -8.47E-02 1.31E+00 2.51E-01 -9.13E-01 -3.30E-01 4.53E-01 bad-chisq
CDS3 2.20E-01 4.25E-01 6.99E-01 -3.25E-01 8.40E-01 3.29E-01 -9.59E-01 -4.28E-01 5.52E-01
3 changes: 2 additions & 1 deletion tests/run_test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,8 @@ python ../pyseer-runner.py --vcf variants.vcf.gz --phenotypes subset.pheno --sav
python ../pyseer-runner.py --kmers kmers.gz --phenotypes subset.pheno --wg enet --alpha 1 --cor-filter 0.25 > 32.log 2> 32.err || die "Load Enet with kmers input"
python ../pyseer-runner.py --pres presence_absence.Rtab --phenotypes subset.pheno --wg enet --alpha 1 --cor-filter 0.25 > 33.log 2> 33.err || die "Load Enet with roary/piggy input"
python ../pyseer-runner.py --vcf variants.vcf.gz --phenotypes subset.pheno --load-vars enet_vcf --wg enet --save-model enet_vcf_model --alpha 1 --cor-filter 0.25 > 34.log 2> 34.err || die "Load Enet and save model"
python ../pyseer-runner.py --vcf variants.vcf.gz --burden burden_regions_multiple.txt --phenotypes subset.pheno --load-m pop_struct.pkl --max-dimensions 3 > 35.log 2> 35.err || die "Multiple regions for burden testing"
python ../pyseer-runner.py --vcf variants.vcf.gz --phenotypes subset.pheno --load-vars enet_vcf --wg enet --lineage-clusters lineage_clusters.txt --sequence-reweighting --alpha 1 --cor-filter 0.25 > 35.log 2> 35.err || die "Enet with lineages"
python ../pyseer-runner.py --vcf variants.vcf.gz --burden burden_regions_multiple.txt --phenotypes subset.pheno --load-m pop_struct.pkl --max-dimensions 3 > 36.log 2> 36.err || die "Multiple regions for burden testing"

# test other pyseer commands
python ../scree_plot_pyseer-runner.py distances.tsv.gz --max-dimensions 20 > /dev/null 2> /dev/null || die "Scree plot"
Expand Down

0 comments on commit e30810c

Please sign in to comment.