Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the algorithm of the main peeling function #169

Open
3 of 10 tasks
XingerTang opened this issue Jun 28, 2024 · 9 comments
Open
3 of 10 tasks

Improve the algorithm of the main peeling function #169

XingerTang opened this issue Jun 28, 2024 · 9 comments

Comments

@XingerTang
Copy link
Contributor

XingerTang commented Jun 28, 2024

@gregorgorjanc @AprilYUZhang

The following are the steps we can take for the algorithm improvements:

  • Improvements regarding the errors of the main peeling function
    • Make a list of errors, which contains the following information:
      • specify the usage of the errors
      • create new names
      • suggest new default values or other modifications
    • Test with new default values, make modifications based on the test
  • Improvements regarding posterior update
    • understand and organize the posterior update process (why separate update and the calculation)
    • modify posterior update
  • Baum-Welch parameter updates
@AprilYUZhang
Copy link

Have Done @XingerTang :

  • Make a list of errors, which contains the following information:
    • specify the usage of the errors
    • create new names
    • suggest new default values or other modifications
Line Actual Function New Name New Default Other Comments
19 The first define in peel function epsilon 1e-10 Avoid division by zero in calculations
75 Unclear error for Jointparents: $$anterior_m(g_m) penetrance_m(g_m) posterior_{m,-f}(g_m) \ anterior_f(g_f) penetrance_f(g_f) posterior_{f,-m}(g_f)$$ epsilon 1e-10 Avoid division by zero in calculations
80 Unclear error for proSire: $anterior_m(g_m) penetrance_m(g_m) posterior_{m,-f}(g_m)$ epsilon 1e-10 Avoid division by zero in calculations
81 Unclear error for proDam: $anterior_f(g_f) penetrance_f(g_f) posterior_{f,-m}(g_f)$ epsilon 1e-10 Avoid division by zero in calculations
92 Unclear error for childValues: $posterior_c(g_c) , penetrance_c(g_c)$ epsilon 1e-10 Avoid division by zero in calculations
189 Unclear error for sirePosterior: $posterior_m(g_m)$ epsilon 1e-10 Avoid division by zero in calculations
194 Unclear error for damPosterior: $posterior_m(g_m)$ epsilon 1e-10 Avoid division by zero in calculations
205 Unclear error for childValues during peel down: $posterior_c(g_c) , penetrance_c(g_c)$ epsilon 1e-10 Avoid division by zero in calculations
225 Unclear error for segregation during peel down: $p(seg_{i,j} = s)$ epsilon 1e-10 Avoid division by zero in calculations
428-431, 444-447 Error for segregation transmission / recombination rate: $\gamma$ r unsure

The "e" defined in line 19 might avioid division by zero in calculations

But the question is :

If we just hope to avoid zero, why we need to multiply by 1 - e?

@AprilYUZhang
Copy link

AprilYUZhang commented Jul 16, 2024

@XingerTang @gregorgorjanc I also summarised the correspondence between the formula in the paper and the objects or variables in the code. Hope it will help modify the code.

Formula

genotype $g_i$

$\pi(g_i) \propto anterior_i(g_i) \cdot posterior_i(g_i) \cdot penetrance_i(g_i)$

$\epsilon$ : genotype error

$$ penetrance_i(g_i) = \begin{cases} 1 - \epsilon & \text{if } g_i \text{ is consistent with the genotype called by the SNP array} \\ \epsilon & \text{otherwise} \end{cases} $$

$\delta$ :error from sequence data

$$ penetrance_i \left( \begin{array}{cccc} aa \\ aA \\ Aa \\ AA \end{array} \right) \propto \left( \begin{array}{cccc} (1 - \delta)^{n_{\text{ref}}} \delta^{n_{\text{alt}}} \\ 0.5^{n_{\text{ref}} + n_{\text{alt}}} \\ 0.5^{n_{\text{ref}} + n_{\text{alt}}} \\ \delta^{n_{\text{ref}}} (1 - \delta)^{n_{\text{alt}}} \end{array} \right) $$

Anterior Probabilities

$anterior_i(g_i) = \sum_{g_m, g_f} tr(g_i \mid g_m, g_f) p_{-i}(g_m, g_f)$

$tr(g_i \mid g_m, g_f)$ is named to childSgTensor in code

Parents Minus Childs

The joint probabilities of the parental genotypes are calculated by combining the anterior and posterior probabilities for both parents except for the information that pertains to individual i

$$ p_{-i}(g_m, g_f) = anterior_m(g_m) penetrance_m(g_m) posterior_{m,-f}(g_m) \\ anterior_f(g_f) penetrance_f(g_f) posterior_{f,-m}(g_f) \\ posterior_{f,m,-i}(g_m, g_f) $$

$p_{-i}(g_m, g_f)$ is named to ParentsMinusChilds in code

$anterior_m(g_m) penetrance_m(g_m) posterior_{m,-f}(g_m) \
anterior_f(g_f) penetrance_f(g_f) posterior_{f,-m}(g_f)$ refers to JointParents in code

Posterior

$posterior_{m,f}(g_m, g_f) = \prod_c \sum_{g_c} tr(g_c \mid g_m, g_f) posterior_c(g_c) penetrance_c(g_c)$

$posterior_{m,f}(g_m, g_f)$ is named to ChildtoParents in code
$posterior_c(g_c) penetrance_c(g_c)$ is named to childValue in code

  • Posterior for one of parents

$posterior_m(g_m) = \prod_k \sum_{g_k} posterior_{m,k}(g_m, g_k) p(g_m, g_k)$

Segregation

$p(seg_{i,j} = s) = p(seg_{i,j} = s, seg_{i,j-1}) \ p(seg_{i,j} = s, g_i, g_f, g_m) \ p(seg_{i,j} = s, seg_{i,j+1})$

where,

$$ p(seg_{i,j}, g_i, g_f, g_m) = tr(g_i \mid g_f, g_m, seg_{i,j}) penetrance_i(g_i) posterior_i(g_i) \\ anterior_m(g_m) penetrance_m(g_m) posterior_{m,-f}(g_m) \\ anterior_f(g_f) penetrance_f(g_f) posterior_{f,-m}(g_f) \\ posterior_{m,f,-i}(g_m, g_f) \ = tr(g_i \mid g_f, g_m, seg_{i,j}) penetrance_i(g_i) posterior_i(g_i) p_{-i}(g_m, g_f) $$

estimateSegregationWithNorm() estimates segregation probability from genotype

where,

$$ p(seg_{i,j} = s, seg_{i,j-1}) = \sum_{s'} p(seg_{i,j} = s \mid seg_{i,j-1} = s') p(seg_{i,j-1} = s',seg_{i,j-2}) $$

basically collapsePointSeg() realizes this step calculated by the relationship between loci

recombination rate

$\gamma = \frac{1}{n} \sum_{i} \sum_{seg_{i,j}} \sum_{seg_{i,j+1}} I(seg_{i,j} \neq seg_{i,j+1}) \ \times p(seg_{i,j} \mid seg_{i,j-1}) p(seg_{i,j}, g_i, g_f, g_m) p(seg_{i,j+1} \mid seg_{i,j+2})$

@XingerTang
Copy link
Contributor Author

@AprilYUZhang @gregorgorjanc
For the posterior probabilities update process, even though it is updated much less frequently than the anterior probabilities, there is no further modification can be made.

In the algorithm, the posterior probabilities are updated with the following variables:

peelingInfo.posterior
peelingInfo.posteriorSire_new
peelingInfo.posteriorDam_new
peelingInfo.posteriorSire_minusFam
peelingInfo.posteriorDam_minusFam

The shape of peelingInfo.posterior is nInd x 4 x nLoci, which stores posterior probabilities for each individual for each genotype at each locus.

The shape of peelingInfo.posteriorSire_new and peelingInfo.posteriorDam_new are nFam x 4 x nLoci, which stores the updated posterior probabilities for the sire or dam of each family for each genotype at each locus. But this probability only contains the information of the current family, that is, if the sire or dam has multiple families then the information of this posterior probability is incomplete, it ignores the information from the progenies of the other families.

The shape of peelingInfo.posteriorSire_minusFam and peelingInfo.posteriorDam_minusFam are also nFam x 4 x nLoci, which stores the updated posterior probabilities for the sire or dam of each family for each genotype at each locus. But this probability contains the information from the progenies of all the families except the current family. The reason why we need this value is that while calculating the genotype probabilities, both the anterior probabilities and the posterior probabilities contain the information of the current family, so one of the probability need to exclude the information to avoid double counting.

The calculation of the posterior probabilities is started with the calculation of the peelingInfo.posteriorSire_new and peelingInfo.posteriorDam_new in the main peeling function, then after the peeling down operation for each generation, the new complete posterior probabilities are calculated with the full complete information of all families of the given sire/dam across the generation along with the calculation of peelingInfo.posteriorSire_minusFam and peelingInfo.posteriorDam_new.

The reason why the anterior probabilities can be updated inside the main peeling function which is executed family by family while posterior probabilities can only be updated outside the main peeling function and be updated generation by generation is that, each individual can only have one sire or dam however each sire or each dam can have multiple families. To gather the information upward requires calculations across the families in a generation.

@XingerTang
Copy link
Contributor Author

XingerTang commented Aug 15, 2024

@AprilYUZhang @gregorgorjanc
Without splitting the use of the errors, that is, keeping a single error term for all the variables, the best individual population accuracy of segregation probabilities generated by the simple multi-locus peeling is approximately0.714 with e = 0.00106, while with the original value e = 0.000001 the accuracy is 0.705.

The detailed version of the accuracy:

  • For e = 0.00106:
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.714                nan                  0.086                0.934                0.930                0.886                multi                               

  • For e = 0.000001:
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.705                nan                  0.082                0.905                0.930                0.887                multi                               

The full reports:

  • For e = 0.00106:
============================================================================================= Marker Accuracy ==============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.679                0.517                0.866                0.872                0.869                0.714                single                              
2                    0.798                0.517                0.959                0.991                0.994                0.988                multi                               
3                    0.798                0.517                0.959                0.991                0.994                0.988                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.656                0.517                0.800                0.809                0.805                0.609                single                              
2                    0.794                0.517                0.930                0.989                0.991                0.984                multi                               
3                    0.794                0.517                0.930                0.989                0.991                0.984                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.083                0.036                0.302                0.283                0.249                0.140                single                              
2                    0.126                0.035                0.487                0.976                0.991                0.984                multi                               
3                    0.126                0.035                0.487                0.976                0.991                0.984                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
2                    0.934                0.689                0.965                0.992                0.995                0.991                multi                               
3                    0.934                0.689                0.965                0.992                0.995                0.991                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
2                    0.922                0.601                0.967                0.993                0.996                0.992                multi                               
3                    0.922                0.601                0.967                0.993                0.996                0.992                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.747                nan                  0.091                0.964                0.966                0.953                multi                               
=========================================================================================== Individual Accuracy ============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
2                    0.914                0.602                0.981                0.996                0.997                0.995                multi                               
3                    0.914                0.602                0.981                0.996                0.997                0.995                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.821                0.602                0.896                0.902                0.900                0.805                single                              
2                    0.910                0.602                0.966                0.995                0.996                0.993                multi                               
3                    0.910                0.602                0.966                0.995                0.996                0.993                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.337                0.148                0.472                0.454                0.385                0.226                single                              
2                    0.849                0.356                0.917                0.987                0.995                0.991                multi                               
3                    0.849                0.356                0.917                0.987                0.995                0.991                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
2                    0.908                0.603                0.960                0.991                0.995                0.990                multi                               
3                    0.908                0.603                0.960                0.991                0.995                0.990                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
2                    0.878                0.457                0.956                0.991                0.995                0.990                multi                               
3                    0.878                0.457                0.956                0.991                0.995                0.990                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.714                nan                  0.086                0.934                0.930                0.886                multi                               
=================================================================================== Marker Accuracy (Order by accuracy) ====================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.798                0.517                0.959                0.991                0.994                0.988                multi                               
3                    0.798                0.517                0.959                0.991                0.994                0.988                hybrid                              
1                    0.679                0.517                0.866                0.872                0.869                0.714                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.794                0.517                0.930                0.989                0.991                0.984                multi                               
3                    0.794                0.517                0.930                0.989                0.991                0.984                hybrid                              
1                    0.656                0.517                0.800                0.809                0.805                0.609                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.126                0.035                0.487                0.976                0.991                0.984                multi                               
3                    0.126                0.035                0.487                0.976                0.991                0.984                hybrid                              
1                    0.083                0.036                0.302                0.283                0.249                0.140                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.934                0.689                0.965                0.992                0.995                0.991                multi                               
3                    0.934                0.689                0.965                0.992                0.995                0.991                hybrid                              
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.922                0.601                0.967                0.993                0.996                0.992                multi                               
3                    0.922                0.601                0.967                0.993                0.996                0.992                hybrid                              
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.747                nan                  0.091                0.964                0.966                0.953                multi                               
================================================================================= Individual Accuracy (Order by accuracy) ==================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.914                0.602                0.981                0.996                0.997                0.995                multi                               
3                    0.914                0.602                0.981                0.996                0.997                0.995                hybrid                              
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.910                0.602                0.966                0.995                0.996                0.993                multi                               
3                    0.910                0.602                0.966                0.995                0.996                0.993                hybrid                              
1                    0.821                0.602                0.896                0.902                0.900                0.805                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.849                0.356                0.917                0.987                0.995                0.991                multi                               
3                    0.849                0.356                0.917                0.987                0.995                0.991                hybrid                              
1                    0.337                0.148                0.472                0.454                0.385                0.226                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.908                0.603                0.960                0.991                0.995                0.990                multi                               
3                    0.908                0.603                0.960                0.991                0.995                0.990                hybrid                              
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.878                0.457                0.956                0.991                0.995                0.990                multi                               
3                    0.878                0.457                0.956                0.991                0.995                0.990                hybrid                              
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.714                nan                  0.086                0.934                0.930                0.886                multi                               

------------------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------------------
Name (time in s)                              Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_accu[single-None-None-None-None]     15.6064 (1.0)      16.0104 (1.0)      15.7213 (1.0)      0.1640 (1.26)     15.6643 (1.0)      0.1281 (1.0)           1;1  0.0636 (1.0)           5           1
test_accu[multi-None-None-None-None]      17.5130 (1.12)     17.8208 (1.11)     17.6767 (1.12)     0.1306 (1.0)      17.7322 (1.13)     0.2148 (1.68)          2;0  0.0566 (0.89)          5           1
test_accu[hybrid-None-None-None-None]     17.8008 (1.14)     18.7914 (1.17)     18.0643 (1.15)     0.4107 (3.15)     17.9269 (1.14)     0.3190 (2.49)          1;1  0.0554 (0.87)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
====================================================================================== 3 passed in 380.57s (0:06:20) =======================================================================================
  • For e = 0.000001:
============================================================================================= Marker Accuracy ==============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.679                0.517                0.866                0.872                0.869                0.715                single                              
2                    0.792                0.515                0.940                0.976                0.994                0.989                multi                               
3                    0.792                0.515                0.940                0.976                0.994                0.989                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.616                0.517                0.784                0.783                0.792                0.523                single                              
2                    0.788                0.515                0.915                0.973                0.992                0.985                multi                               
3                    0.788                0.515                0.915                0.973                0.992                0.985                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.099                0.058                0.245                0.206                0.208                0.125                single                              
2                    0.125                0.035                0.474                0.960                0.992                0.985                multi                               
3                    0.125                0.035                0.474                0.960                0.992                0.985                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
2                    0.928                0.684                0.951                0.980                0.995                0.991                multi                               
3                    0.928                0.684                0.951                0.980                0.995                0.991                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
2                    0.916                0.593                0.954                0.982                0.996                0.992                multi                               
3                    0.916                0.593                0.954                0.982                0.996                0.992                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.745                nan                  0.099                0.941                0.966                0.953                multi                               
=========================================================================================== Individual Accuracy ============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
2                    0.911                0.600                0.974                0.989                0.997                0.995                multi                               
3                    0.911                0.600                0.974                0.989                0.997                0.995                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.824                0.602                0.896                0.901                0.904                0.816                single                              
2                    0.908                0.600                0.960                0.988                0.996                0.993                multi                               
3                    0.908                0.600                0.960                0.988                0.996                0.993                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.307                0.143                0.427                0.396                0.343                0.224                single                              
2                    0.845                0.354                0.909                0.978                0.995                0.991                multi                               
3                    0.845                0.354                0.909                0.978                0.995                0.991                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
2                    0.902                0.599                0.946                0.978                0.995                0.990                multi                               
3                    0.902                0.599                0.946                0.978                0.995                0.990                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
2                    0.870                0.448                0.940                0.978                0.994                0.990                multi                               
3                    0.870                0.448                0.940                0.978                0.994                0.990                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.705                nan                  0.082                0.905                0.930                0.887                multi                               
=================================================================================== Marker Accuracy (Order by accuracy) ====================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.792                0.515                0.940                0.976                0.994                0.989                multi                               
3                    0.792                0.515                0.940                0.976                0.994                0.989                hybrid                              
1                    0.679                0.517                0.866                0.872                0.869                0.715                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.788                0.515                0.915                0.973                0.992                0.985                multi                               
3                    0.788                0.515                0.915                0.973                0.992                0.985                hybrid                              
1                    0.616                0.517                0.784                0.783                0.792                0.523                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.125                0.035                0.474                0.960                0.992                0.985                multi                               
3                    0.125                0.035                0.474                0.960                0.992                0.985                hybrid                              
1                    0.099                0.058                0.245                0.206                0.208                0.125                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.928                0.684                0.951                0.980                0.995                0.991                multi                               
3                    0.928                0.684                0.951                0.980                0.995                0.991                hybrid                              
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.916                0.593                0.954                0.982                0.996                0.992                multi                               
3                    0.916                0.593                0.954                0.982                0.996                0.992                hybrid                              
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.745                nan                  0.099                0.941                0.966                0.953                multi                               
================================================================================= Individual Accuracy (Order by accuracy) ==================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.911                0.600                0.974                0.989                0.997                0.995                multi                               
3                    0.911                0.600                0.974                0.989                0.997                0.995                hybrid                              
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.908                0.600                0.960                0.988                0.996                0.993                multi                               
3                    0.908                0.600                0.960                0.988                0.996                0.993                hybrid                              
1                    0.824                0.602                0.896                0.901                0.904                0.816                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.845                0.354                0.909                0.978                0.995                0.991                multi                               
3                    0.845                0.354                0.909                0.978                0.995                0.991                hybrid                              
1                    0.307                0.143                0.427                0.396                0.343                0.224                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.902                0.599                0.946                0.978                0.995                0.990                multi                               
3                    0.902                0.599                0.946                0.978                0.995                0.990                hybrid                              
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.870                0.448                0.940                0.978                0.994                0.990                multi                               
3                    0.870                0.448                0.940                0.978                0.994                0.990                hybrid                              
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.705                nan                  0.082                0.905                0.930                0.887                multi                               

------------------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------------------
Name (time in s)                              Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_accu[single-None-None-None-None]     15.8125 (1.0)      16.4474 (1.0)      16.0725 (1.0)      0.2585 (2.90)     16.1196 (1.0)      0.3805 (3.16)          2;0  0.0622 (1.0)           5           1
test_accu[multi-None-None-None-None]      17.5431 (1.11)     17.7639 (1.08)     17.6237 (1.10)     0.0891 (1.0)      17.5833 (1.09)     0.1202 (1.0)           1;0  0.0567 (0.91)          5           1
test_accu[hybrid-None-None-None-None]     17.6173 (1.11)     17.8927 (1.09)     17.7307 (1.10)     0.1092 (1.23)     17.7096 (1.10)     0.1624 (1.35)          2;0  0.0564 (0.91)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
====================================================================================== 3 passed in 380.21s (0:06:20) =======================================================================================

@XingerTang
Copy link
Contributor Author

@AprilYUZhang @gregorgorjanc
While the best marker population accuracy of segregation probabilities generated by the simple multi-locus peeling is achieved by a different error value e = 0.00105, for which the accuracy is 0.751, while for the original value of the error e = 0.000001, the accuracy is 0.745.

The detailed version of the accuracy:

  • For e = 0.00105:
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.751                nan                  0.096                0.964                0.966                0.953                multi       
  • For e = 0.000001:
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.745                nan                  0.099                0.941                0.966                0.953                multi       

The full reports:

  • For e = 0.00105:
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.10.10, pytest-7.4.0, pluggy-1.0.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /Users/evie/Lab/Alpha/AlphaPeel
plugins: anyio-4.4.0, benchmark-4.0.0
collected 3 items                                                                                                                                                                                          

tests/accuracy_tests/run_accu_test.py ...                                                                                                                                                            [100%]

============================================================================================= Marker Accuracy ==============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.679                0.517                0.866                0.872                0.869                0.714                single                              
2                    0.798                0.517                0.959                0.991                0.994                0.988                multi                               
3                    0.798                0.517                0.959                0.991                0.994                0.988                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.656                0.517                0.800                0.809                0.805                0.609                single                              
2                    0.794                0.517                0.929                0.989                0.991                0.984                multi                               
3                    0.794                0.517                0.929                0.989                0.991                0.984                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.084                0.035                0.296                0.275                0.255                0.137                single                              
2                    0.126                0.035                0.486                0.976                0.991                0.984                multi                               
3                    0.126                0.035                0.486                0.976                0.991                0.984                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
2                    0.934                0.689                0.965                0.992                0.995                0.991                multi                               
3                    0.934                0.689                0.965                0.992                0.995                0.991                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
2                    0.922                0.601                0.967                0.993                0.996                0.992                multi                               
3                    0.922                0.601                0.967                0.993                0.996                0.992                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.751                nan                  0.096                0.964                0.966                0.953                multi                               
=========================================================================================== Individual Accuracy ============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
2                    0.914                0.602                0.981                0.996                0.997                0.995                multi                               
3                    0.914                0.602                0.981                0.996                0.997                0.995                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.821                0.602                0.896                0.902                0.900                0.805                single                              
2                    0.910                0.602                0.965                0.995                0.996                0.993                multi                               
3                    0.910                0.602                0.965                0.995                0.996                0.993                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.340                0.148                0.473                0.460                0.397                0.224                single                              
2                    0.849                0.355                0.917                0.987                0.995                0.991                multi                               
3                    0.849                0.355                0.917                0.987                0.995                0.991                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
2                    0.908                0.603                0.960                0.991                0.995                0.990                multi                               
3                    0.908                0.603                0.960                0.991                0.995                0.990                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
2                    0.878                0.457                0.956                0.991                0.995                0.990                multi                               
3                    0.878                0.457                0.956                0.991                0.995                0.990                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.713                nan                  0.081                0.934                0.930                0.886                multi                               
=================================================================================== Marker Accuracy (Order by accuracy) ====================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.798                0.517                0.959                0.991                0.994                0.988                multi                               
3                    0.798                0.517                0.959                0.991                0.994                0.988                hybrid                              
1                    0.679                0.517                0.866                0.872                0.869                0.714                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.794                0.517                0.929                0.989                0.991                0.984                multi                               
3                    0.794                0.517                0.929                0.989                0.991                0.984                hybrid                              
1                    0.656                0.517                0.800                0.809                0.805                0.609                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.126                0.035                0.486                0.976                0.991                0.984                multi                               
3                    0.126                0.035                0.486                0.976                0.991                0.984                hybrid                              
1                    0.084                0.035                0.296                0.275                0.255                0.137                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.934                0.689                0.965                0.992                0.995                0.991                multi                               
3                    0.934                0.689                0.965                0.992                0.995                0.991                hybrid                              
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.922                0.601                0.967                0.993                0.996                0.992                multi                               
3                    0.922                0.601                0.967                0.993                0.996                0.992                hybrid                              
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.751                nan                  0.096                0.964                0.966                0.953                multi                               
================================================================================= Individual Accuracy (Order by accuracy) ==================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.914                0.602                0.981                0.996                0.997                0.995                multi                               
3                    0.914                0.602                0.981                0.996                0.997                0.995                hybrid                              
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.910                0.602                0.965                0.995                0.996                0.993                multi                               
3                    0.910                0.602                0.965                0.995                0.996                0.993                hybrid                              
1                    0.821                0.602                0.896                0.902                0.900                0.805                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.849                0.355                0.917                0.987                0.995                0.991                multi                               
3                    0.849                0.355                0.917                0.987                0.995                0.991                hybrid                              
1                    0.340                0.148                0.473                0.460                0.397                0.224                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.908                0.603                0.960                0.991                0.995                0.990                multi                               
3                    0.908                0.603                0.960                0.991                0.995                0.990                hybrid                              
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.878                0.457                0.956                0.991                0.995                0.990                multi                               
3                    0.878                0.457                0.956                0.991                0.995                0.990                hybrid                              
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.713                nan                  0.081                0.934                0.930                0.886                multi                               

------------------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------------------
Name (time in s)                              Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_accu[single-None-None-None-None]     15.8193 (1.0)      16.1785 (1.0)      16.0066 (1.0)      0.1394 (3.48)     16.0426 (1.0)      0.2038 (2.75)          2;0  0.0625 (1.0)           5           1
test_accu[multi-None-None-None-None]      17.6604 (1.12)     17.7483 (1.10)     17.6981 (1.11)     0.0401 (1.0)      17.6894 (1.10)     0.0740 (1.0)           1;0  0.0565 (0.90)          5           1
test_accu[hybrid-None-None-None-None]     17.8129 (1.13)     18.1118 (1.12)     18.0061 (1.12)     0.1175 (2.93)     18.0324 (1.12)     0.1431 (1.93)          1;0  0.0555 (0.89)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
====================================================================================== 3 passed in 381.05s (0:06:21) =======================================================================================

  • For e=0.000001:
============================================================================================= Marker Accuracy ==============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.679                0.517                0.866                0.872                0.869                0.715                single                              
2                    0.792                0.515                0.940                0.976                0.994                0.989                multi                               
3                    0.792                0.515                0.940                0.976                0.994                0.989                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.616                0.517                0.784                0.783                0.792                0.523                single                              
2                    0.788                0.515                0.915                0.973                0.992                0.985                multi                               
3                    0.788                0.515                0.915                0.973                0.992                0.985                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.099                0.058                0.245                0.206                0.208                0.125                single                              
2                    0.125                0.035                0.474                0.960                0.992                0.985                multi                               
3                    0.125                0.035                0.474                0.960                0.992                0.985                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
2                    0.928                0.684                0.951                0.980                0.995                0.991                multi                               
3                    0.928                0.684                0.951                0.980                0.995                0.991                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
2                    0.916                0.593                0.954                0.982                0.996                0.992                multi                               
3                    0.916                0.593                0.954                0.982                0.996                0.992                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.745                nan                  0.099                0.941                0.966                0.953                multi                               
=========================================================================================== Individual Accuracy ============================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
2                    0.911                0.600                0.974                0.989                0.997                0.995                multi                               
3                    0.911                0.600                0.974                0.989                0.997                0.995                hybrid                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.824                0.602                0.896                0.901                0.904                0.816                single                              
2                    0.908                0.600                0.960                0.988                0.996                0.993                multi                               
3                    0.908                0.600                0.960                0.988                0.996                0.993                hybrid                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.307                0.143                0.427                0.396                0.343                0.224                single                              
2                    0.845                0.354                0.909                0.978                0.995                0.991                multi                               
3                    0.845                0.354                0.909                0.978                0.995                0.991                hybrid                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
2                    0.902                0.599                0.946                0.978                0.995                0.990                multi                               
3                    0.902                0.599                0.946                0.978                0.995                0.990                hybrid                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
2                    0.870                0.448                0.940                0.978                0.994                0.990                multi                               
3                    0.870                0.448                0.940                0.978                0.994                0.990                hybrid                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.705                nan                  0.082                0.905                0.930                0.887                multi                               
=================================================================================== Marker Accuracy (Order by accuracy) ====================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.792                0.515                0.940                0.976                0.994                0.989                multi                               
3                    0.792                0.515                0.940                0.976                0.994                0.989                hybrid                              
1                    0.679                0.517                0.866                0.872                0.869                0.715                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.788                0.515                0.915                0.973                0.992                0.985                multi                               
3                    0.788                0.515                0.915                0.973                0.992                0.985                hybrid                              
1                    0.616                0.517                0.784                0.783                0.792                0.523                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.125                0.035                0.474                0.960                0.992                0.985                multi                               
3                    0.125                0.035                0.474                0.960                0.992                0.985                hybrid                              
1                    0.099                0.058                0.245                0.206                0.208                0.125                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.928                0.684                0.951                0.980                0.995                0.991                multi                               
3                    0.928                0.684                0.951                0.980                0.995                0.991                hybrid                              
1                    0.844                0.689                0.901                0.906                0.903                0.794                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.916                0.593                0.954                0.982                0.996                0.992                multi                               
3                    0.916                0.593                0.954                0.982                0.996                0.992                hybrid                              
1                    0.835                0.644                0.896                0.900                0.897                0.806                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.745                nan                  0.099                0.941                0.966                0.953                multi                               
================================================================================= Individual Accuracy (Order by accuracy) ==================================================================================
-------------------------------------------------------------------------------------------------- dosage --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.911                0.600                0.974                0.989                0.997                0.995                multi                               
3                    0.911                0.600                0.974                0.989                0.997                0.995                hybrid                              
1                    0.861                0.602                0.938                0.942                0.940                0.884                single                              
----------------------------------------------------------------------------------------- geno_0.3333333333333333 ------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.908                0.600                0.960                0.988                0.996                0.993                multi                               
3                    0.908                0.600                0.960                0.988                0.996                0.993                hybrid                              
1                    0.824                0.602                0.896                0.901                0.904                0.816                single                              
------------------------------------------------------------------------------------------------- hap_0.5 --------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.845                0.354                0.909                0.978                0.995                0.991                multi                               
3                    0.845                0.354                0.909                0.978                0.995                0.991                hybrid                              
1                    0.307                0.143                0.427                0.396                0.343                0.224                single                              
------------------------------------------------------------------------------------------------ geno_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.902                0.599                0.946                0.978                0.995                0.990                multi                               
3                    0.902                0.599                0.946                0.978                0.995                0.990                hybrid                              
1                    0.810                0.603                0.884                0.891                0.889                0.784                single                              
--------------------------------------------------------------------------------------------- phased_geno_prob ---------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.870                0.448                0.940                0.978                0.994                0.990                multi                               
3                    0.870                0.448                0.940                0.978                0.994                0.990                hybrid                              
1                    0.771                0.494                0.861                0.868                0.865                0.767                single                              
------------------------------------------------------------------------------------------------- seg_prob -------------------------------------------------------------------------------------------------
Test Num             Population Accu      Gen1 Accu            Gen2 Accu            Gen3 Accu            Gen4 Accu            Gen5 Accu            Test Name                           
2                    0.705                nan                  0.082                0.905                0.930                0.887                multi                               

------------------------------------------------------------------------------------------ benchmark: 3 tests ------------------------------------------------------------------------------------------
Name (time in s)                              Min                Max               Mean            StdDev             Median               IQR            Outliers     OPS            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_accu[single-None-None-None-None]     15.8125 (1.0)      16.4474 (1.0)      16.0725 (1.0)      0.2585 (2.90)     16.1196 (1.0)      0.3805 (3.16)          2;0  0.0622 (1.0)           5           1
test_accu[multi-None-None-None-None]      17.5431 (1.11)     17.7639 (1.08)     17.6237 (1.10)     0.0891 (1.0)      17.5833 (1.09)     0.1202 (1.0)           1;0  0.0567 (0.91)          5           1
test_accu[hybrid-None-None-None-None]     17.6173 (1.11)     17.8927 (1.09)     17.7307 (1.10)     0.1092 (1.23)     17.7096 (1.10)     0.1624 (1.35)          2;0  0.0564 (0.91)          5           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Legend:
  Outliers: 1 Standard Deviation from Mean; 1.5 IQR (InterQuartile Range) from 1st Quartile and 3rd Quartile.
  OPS: Operations Per Second, computed as 1 / Mean
====================================================================================== 3 passed in 380.21s (0:06:20) =======================================================================================

@XingerTang
Copy link
Contributor Author

Have Done @XingerTang :

  • Make a list of errors, which contains the following information:

    • specify the usage of the errors
    • create new names
    • suggest new default values or other modifications

Line Actual Function New Name New Default Other Comments
19 The first define in peel function epsilon 1e-10 Avoid division by zero in calculations
75 Unclear error for Jointparents:
𝑎
𝑛
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
(
𝑔
𝑚
)
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑚
(
𝑔
𝑚
)
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
,

𝑓
(
𝑔
𝑚
)
 
𝑎
𝑛
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑓
(
𝑔
𝑓
)
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑓
(
𝑔
𝑓
)
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑓
,

𝑚
(
𝑔
𝑓
)
epsilon 1e-10 Avoid division by zero in calculations
80 Unclear error for proSire:
𝑎
𝑛
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
(
𝑔
𝑚
)
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑚
(
𝑔
𝑚
)
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
,

𝑓
(
𝑔
𝑚
)
epsilon 1e-10 Avoid division by zero in calculations
81 Unclear error for proDam:
𝑎
𝑛
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑓
(
𝑔
𝑓
)
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑓
(
𝑔
𝑓
)
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑓
,

𝑚
(
𝑔
𝑓
)
epsilon 1e-10 Avoid division by zero in calculations
92 Unclear error for childValues:
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑐
(
𝑔
𝑐
)
,
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑐
(
𝑔
𝑐
)
epsilon 1e-10 Avoid division by zero in calculations
189 Unclear error for sirePosterior:
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
(
𝑔
𝑚
)
epsilon 1e-10 Avoid division by zero in calculations
194 Unclear error for damPosterior:
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑚
(
𝑔
𝑚
)
epsilon 1e-10 Avoid division by zero in calculations
205 Unclear error for childValues during peel down:
𝑝
𝑜
𝑠
𝑡
𝑒
𝑟
𝑖
𝑜
𝑟
𝑐
(
𝑔
𝑐
)
,
𝑝
𝑒
𝑛
𝑒
𝑡
𝑟
𝑎
𝑛
𝑐
𝑒
𝑐
(
𝑔
𝑐
)
epsilon 1e-10 Avoid division by zero in calculations
225 Unclear error for segregation during peel down:
𝑝
(
𝑠
𝑒
𝑔
𝑖
,
𝑗

𝑠
)
epsilon 1e-10 Avoid division by zero in calculations
428-431, 444-447 Error for segregation transmission / recombination rate:
𝛾
r unsure
The "e" defined in line 19 might avioid division by zero in calculations

But the question is :

If we just hope to avoid zero, why we need to multiply by 1 - e?

Thank you @AprilYUZhang for the summary! But there are still a few errors in PeelingInfo.py that are also needed to be included, could you add them to your table as well?

@XingerTang
Copy link
Contributor Author

XingerTang commented Aug 15, 2024

@AprilYUZhang @gregorgorjanc
With the better choice of the value of e, the accuracies of the prediction of the recombination events are also increased and this is true for all the methods of the recombination events predictions and all the metrics we have used. The improvements are most significant in the middle generation.

The old e:

old_seg_data

The new e:

new_seg_data

@XingerTang
Copy link
Contributor Author

XingerTang commented Aug 16, 2024

@AprilYUZhang @gregorgorjanc

I tried to organize the errors of the peeling in addition to the Uni's work above.

The sources of the errors during the peeling could be:

Source Current location in AlphaPeel Expect location Current implemented error distribution Expect error distribution
uncertainty from missing data (U) not clearly specified, probably included in the series of e in Peeling.py during peeling not clearly specified, probably evenly across genotypes evenly distributed error across genotypes, higher error rate for individuals with more missing data
all trio members are heterozygous (H) probably included in the e of the called ProbMath.generateSegregation functions in PeelingInfo.py, which are called before peeling the same location as the current location + during peeling? evenly across all segregation states of all loci of all individuals evenly across segregation states but only for the loci that satisfy the condition
genotype error/sequence error (G) called ProbMath.getGenotypeProbabilities function in PeelingInfo.py, which is called before peeling the same location as the current location see documentation less error distributed for genotypes with 2 different haplotypes (e.g. aa and AA)
mutation (M) not clearly specified, probably included in the series of e in Peeling.py during peeling not clearly specified, probably evenly across genotypes less error distributed for genotypes with 2 different haplotypes (e.g. aa and AA)

For the AlphaPeel, the current usages of the errors can be summarized as the following (exclude the recombination rate as it is trivial):

Errors in tinypeel Possible sources Current distribution Possible improvements
errors applied during the first Baulm-Welch, on: JointParents, probSire and probDam, childValues, and sirePosterior and damPosterior, U, M evenly distributed across all geno probs using separate variables for errors from U and M, and a better error distribution for each error source
errors for the generation of segregation in PeelingInfo.py H and the uncertainties and errors from genotype probabilities which depend on U, M and G evenly across all segregation states of all loci of all individuals distinguish the error sources and improve the error distributions, maybe multiple error introductions during peeling
genotype and sequence errors, applied while the function readInPedigreeFromInputs being called in tinypeel.py and also on the loci generated by getHetMidpoint in PeelingInfo.py G for the genotype prob see documentation, for the HetMidpoint, the error is evenly distributed refine the error distribution

Note that the influences of the errors should decrease during the iteration as the information propagates through the pedigree, maybe we can also introduce a mechanism that reduces the size of the errors during the peeling process.

@XingerTang
Copy link
Contributor Author

mutation rate will be an argument for AlphaPeel with a default value set.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants