-
Hello good morning. I hope they are fine. In my doctoral work I will use the R package ALphaSimR to carry out a simulation using real genotypic information from Holstein and Avileña-Negra Iberica bovines. For this reason, Dr. Gregor has shared this link with me where they teach how it is possible to generate the base or founder population from this genotypic information in AlphaSimR (https://github.com/gaynorr/AlphaSimR_Examples/blob/master/misc/ASR_ImportExternalData.R). Before using the bovine genotypic data I wanted to do a test with rice data. This is because not all bovines in the pedigree are genotyped, which is why I have been asking how to indicate this in the AlphaSimR package. In step 1 of the code they mention that the genetic map is needed. That is, a file indicating the name of the marker, the chromosome and its location. I got this from the .map file in PLINK using this code:
And you see this: Then they mention that a haplotype file is needed. I did this from the .bim, .bed and .fam files, using a series of R packages (genio and BEDMatrix) and with the code that you see below (I found a lot of this code in help like stackoverflow):
And you see this (451 individuals (902 rows or haplotype alleles) and 1000 loci): I wanted to ask you if you suddenly think that the process that you carried out to obtain the "mapGen" and "haplo" objects was correct and/or if you know if AlphaSimR has a function that allows you to do this in an easier way (for example, writePlink (https://rdrr.io/cran/AlphaSimR/man/writePlink.html) which allows you to go from an AlphaSimR pop object to PLINK PED and MAP files). Sorry if it is very long. Thank you very much and a happy day. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
I didn't go through your code at length, so I'll just make a couple general comments. First, I'm not a user of Plink and my knowledge of that software is pretty limited. I've gone through some of their documentation of file standards and this is the basis of my comments. For the genetic map, you'll want the map positions in Morgans for AlphaSimR. Genetic map positions in Plink are optional, so you may not have them. When they are present, they'll be listed in cM, so you'll need to divide by 100 when importing in AlphaSimR. The one weakness of the Plink formats from the standpoint of simulations in AlphaSimR is that it doesn't contain haplotypic phase information. That is, it doesn't distinguish between 0|1 and 1|0. If you are working with inbred plant lines this is no big deal, because you expect heterozygotes to be rare. and you can generally ignore them. However, for cattle this is going to be a limitation because you'd expect quite a few heterozygotes. I'd recommend going through software to phase your genotype data before loading it into AlphaSimR. Since I work on plants, I don't have much experience using phasing software programs and can't give very good recommendation. My former group at Roslin offers one such software program, but there may be better option out there for your particular data set. I'm not if it is best to use the standalone AlphaPhase or AlphaImpute which includes AlphaPhase. @gregorgorjanc might be able to give you a better recommendation on how to proceed with phasing. |
Beta Was this translation helpful? Give feedback.
I didn't go through your code at length, so I'll just make a couple general comments. First, I'm not a user of Plink and my knowledge of that software is pretty limited. I've gone through some of their documentation of file standards and this is the basis of my comments.
For the genetic map, you'll want the map positions in Morgans for AlphaSimR. Genetic map positions in Plink are optional, so you may not have them. When they are present, they'll be listed in cM, so you'll need to divide by 100 when importing in AlphaSimR.
The one weakness of the Plink formats from the standpoint of simulations in AlphaSimR is that it doesn't contain haplotypic phase information. That is, it doesn't disti…