-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Different number of variants between single cell and bulk #5
Comments
Hi, I met the same bug at line 375 of the functions.R script. Have you solved the problem yet? |
@red-t Unfortunately I did not solve it. My research group decided to move forward without using LiRA |
Can you confirm that the input single-cell and bulk samples were run on the same multisample VCF file? |
@KEYS248 All right, thank you. |
Sorry, I'm not very clear about "the same multisample VCF file". As a try, I just run GATK for single-cell and bulk sample(one sample for each kind) respectively. Then run LiRA step by step for each sample. The following shows the configure files for single-cell and bulk:
|
I see that the VCF field in both is different. So I think it is likely that what I suggested is the issue (it will be unless the VCFs in both configs describe the same set of sites). "The same multisample VCF file" refers to a VCF file with multiple sample columns describing evidence for the listed variants across multiple BAM files. It can be created using GATK. See, e.g.: https://gatk.broadinstitute.org/hc/en-us/articles/360035889971 The instructions relating to calling on multiple input GVCFs, e.g. "If you have GVCFs from multiple samples..." produce the expected LiRA input, where the multiple samples include at least the bulk and single-cell. Given this has caused some confusion, we will update the README to clarify. |
I think what @cbohrson is saying (correct me otherwise) is that the program may be expecting all samples, both single cell and bulk, to come from a single multisample VCF, instead of two VCFs (one for single cell, one for bulk). If that is the case, I believe I was not doing this so I wonder if that was the cause of my error. |
OK, I just ran GATK with single sample mode before. I'll run GATK again to create a multisample VCF file, then try LiRA again. Thanks a lot! |
At line 375 of the functions.R script, I get the following error
It seems that the
site.frame
variable gets populated by thesingle.cell.vcf.info
variable and then attempts to fill more columns ofsite.frame
with thebulk.vcf.info
variable. The error seems to occur becausebulk.vcf.info
has fewer rows thansingle.cell.vcf.info
. Is this situation not supposed to occur? Did I make a mistake with some previous step?If I kept the rows of
bulk.vcf.info
that matchsite.frame
, there are still rows ofsite.frame
that would be empty and I'm not sure how the rest of the program will handle that.The text was updated successfully, but these errors were encountered: