Better GE rate simulation? #109
Replies: 5 comments 2 replies
-
@jamonterotena what/which piece of code in AlphaSimR are you referring to? |
Beta Was this translation helpful? Give feedback.
-
@gregorgorjanc recombinations. Not sure of the name of the script, sorry. |
Beta Was this translation helpful? Give feedback.
-
@jamonterotena are you referring to a function, if so, which one? If you are referring to a script, which script are you referring to? If this is more of a general discussion, then https://github.com/gaynorr/AlphaSimR/discussions might be a better place. If I am getting your question right, then it really depends what you want to do. Saying that genotyping error rate is 6% and getting, say, 5, errors out of a 100 genotypes, is just how probability works. If you want exactly 6 errors, then you can use your solution, but note that in reality we always have some deviations/distribution of values;) |
Beta Was this translation helpful? Give feedback.
-
@jamonterotena I changed this to a discussion topic, because I don't support genotyping error within AlphaSimR itself. However, a discussion about adding this to AlphaSimR makes sense. I've never added it to AlphaSimR for two reasons. First, I'm not sure if there is a single method for representing genotyping error that makes sense for multiple use cases. The method that I tend to use myself, is to identify an individual and a locus with a genotyping error and then shift the genotype by a score of 1. In the diploid case, this means a homozygote gets reported as a heterozygote and a heterozygote randomly shifts to one of the homozygotes. The second reasons I haven't implemented anything is because there isn't an easy way to store these errors within an AlphaSimR population. |
Beta Was this translation helpful? Give feedback.
-
This is the script I'm referring to, line 101: I'm using it to compare the precision of a crossover-calling program at different GE rates so it works better for me if the input GE rates are actually the proportion of false GEs in the final genotypes, but I understand it's also senseful to take GE rate as the probability of any site to be wrong. |
Beta Was this translation helpful? Give feedback.
-
Hello again,
I noticed that when simulating genotyping errors the actual genotyping error rate can differ from the input. This is because the input GE rate is taken as a probability that any site suffers a GE:
markers = which(rbinom(length(masked_genotypes), 1, error_rate) == 1)
I guess this is a matter of the definition of genotyping error rate and not a bug, but I think it's more intuitive to think that GE rate 6% means that there are 6 genotyping errors in 100 sites, and not maybe 4, 7 or 6.
I suggest something like the following
Beta Was this translation helpful? Give feedback.
All reactions