Better GE rate simulation? #109

jamonterotena · 2023-01-31T12:22:36Z

jamonterotena
Jan 31, 2023

Hello again,

I noticed that when simulating genotyping errors the actual genotyping error rate can differ from the input. This is because the input GE rate is taken as a probability that any site suffers a GE:

markers = which(rbinom(length(masked_genotypes), 1, error_rate) == 1)

I guess this is a matter of the definition of genotyping error rate and not a bug, but I think it's more intuitive to think that GE rate 6% means that there are 6 genotyping errors in 100 sites, and not maybe 4, 7 or 6.

I suggest something like the following

total_GEs = error_rate*length(masked_genotypes)
markers = sample(1:length(masked_genotypes), total_GEs , replace = FALSE) ## Randomly pick a proportion of the sites equal to error_rate for GEs

gregorgorjanc · 2023-01-31T13:09:08Z

gregorgorjanc
Jan 31, 2023

@jamonterotena what/which piece of code in AlphaSimR are you referring to?

0 replies

jamonterotena · 2023-01-31T13:13:54Z

jamonterotena
Jan 31, 2023
Author

@gregorgorjanc recombinations. Not sure of the name of the script, sorry.

0 replies

gregorgorjanc · 2023-01-31T13:37:32Z

gregorgorjanc
Jan 31, 2023

@jamonterotena are you referring to a function, if so, which one? If you are referring to a script, which script are you referring to?

If this is more of a general discussion, then https://github.com/gaynorr/AlphaSimR/discussions might be a better place. If I am getting your question right, then it really depends what you want to do. Saying that genotyping error rate is 6% and getting, say, 5, errors out of a 100 genotypes, is just how probability works. If you want exactly 6 errors, then you can use your solution, but note that in reality we always have some deviations/distribution of values;)

0 replies

gaynorr · 2023-01-31T20:03:48Z

gaynorr
Jan 31, 2023
Maintainer

@jamonterotena I changed this to a discussion topic, because I don't support genotyping error within AlphaSimR itself. However, a discussion about adding this to AlphaSimR makes sense. I've never added it to AlphaSimR for two reasons.

First, I'm not sure if there is a single method for representing genotyping error that makes sense for multiple use cases. The method that I tend to use myself, is to identify an individual and a locus with a genotyping error and then shift the genotype by a score of 1. In the diploid case, this means a homozygote gets reported as a heterozygote and a heterozygote randomly shifts to one of the homozygotes.

The second reasons I haven't implemented anything is because there isn't an easy way to store these errors within an AlphaSimR population.

2 replies

jamonterotena Feb 2, 2023
Author

The method you applied in the script linked below assumes constant genotyping error rate by SNP and individual. This ignores the effect of sample contamination or 'bad' SNPs.

Just out of curiosity @gaynorr @gregorgorjanc, did you find/developed any way to consider this when adding errors?

gregorgorjanc Feb 2, 2023

@jamonterotena you could sample the rate from a beta distribution for each SNP and for each individual, which will give you the variation you might be after. Maybe you would then multiply/add the two rates to get SNP-specific rate of each SNP of each individual.

jamonterotena · 2023-02-02T11:16:05Z

jamonterotena
Feb 2, 2023
Author

This is the script I'm referring to, line 101:
https://github.com/gaynorr/AlphaSimR_Examples/blob/master/recombination_simulations/simulate_chromosome.r

I'm using it to compare the precision of a crossover-calling program at different GE rates so it works better for me if the input GE rates are actually the proportion of false GEs in the final genotypes, but I understand it's also senseful to take GE rate as the probability of any site to be wrong.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better GE rate simulation? #109

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Better GE rate simulation? #109

jamonterotena Jan 31, 2023

Replies: 5 comments · 2 replies

gregorgorjanc Jan 31, 2023

jamonterotena Jan 31, 2023 Author

gregorgorjanc Jan 31, 2023

gaynorr Jan 31, 2023 Maintainer

jamonterotena Feb 2, 2023 Author

gregorgorjanc Feb 2, 2023

jamonterotena Feb 2, 2023 Author

jamonterotena
Jan 31, 2023

Replies: 5 comments 2 replies

gregorgorjanc
Jan 31, 2023

jamonterotena
Jan 31, 2023
Author

gregorgorjanc
Jan 31, 2023

gaynorr
Jan 31, 2023
Maintainer

jamonterotena Feb 2, 2023
Author

jamonterotena
Feb 2, 2023
Author