Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different number of variants between single cell and bulk #5

Open
KEYS248 opened this issue May 25, 2020 · 8 comments
Open

Different number of variants between single cell and bulk #5

KEYS248 opened this issue May 25, 2020 · 8 comments

Comments

@KEYS248
Copy link

KEYS248 commented May 25, 2020

At line 375 of the functions.R script, I get the following error

Error in `$<-.data.frame`(`*tmp*`, "id", value = c(".", ".", "rs371110082",  : 
  replacement has 218653 rows, data has 324935
Calls: compare -> $<- -> $<-.data.frame
Execution halted

It seems that the site.frame variable gets populated by the single.cell.vcf.info variable and then attempts to fill more columns of site.frame with the bulk.vcf.info variable. The error seems to occur because bulk.vcf.info has fewer rows than single.cell.vcf.info. Is this situation not supposed to occur? Did I make a mistake with some previous step?

If I kept the rows of bulk.vcf.info that match site.frame, there are still rows of site.frame that would be empty and I'm not sure how the rest of the program will handle that.

@red-t
Copy link

red-t commented Sep 26, 2020

Hi, I met the same bug at line 375 of the functions.R script. Have you solved the problem yet?

@KEYS248
Copy link
Author

KEYS248 commented Sep 26, 2020

@red-t Unfortunately I did not solve it. My research group decided to move forward without using LiRA

@cbohrson
Copy link

Can you confirm that the input single-cell and bulk samples were run on the same multisample VCF file?

@red-t
Copy link

red-t commented Sep 27, 2020

@KEYS248 All right, thank you.

@red-t
Copy link

red-t commented Sep 27, 2020

Can you confirm that the input single-cell and bulk samples were run on the same multisample VCF file?

Sorry, I'm not very clear about "the same multisample VCF file". As a try, I just run GATK for single-cell and bulk sample(one sample for each kind) respectively. Then run LiRA step by step for each sample. The following shows the configure files for single-cell and bulk:

single-cell:
name    SRR475137
analysis_path   SRR475137
reference_file   human_g1k_v37.fasta
bam   SRR475137.bam
vcf     SRR475137.vcf.gz
gender  female
sample  SC
bulk    F
reference_identifier    hg19
phasing_software        eagle
only_chromosomes        21,22
bulk:
name    SRR475185
analysis_path   SRR475185
reference_file   human_g1k_v37.fasta
bam   SRR475185.bam
vcf     SRR475185.vcf.gz
gender  female
sample  BULK
bulk    T
reference_identifier    hg19
phasing_software        eagle
only_chromosomes        21,22

@cbohrson
Copy link

I see that the VCF field in both is different. So I think it is likely that what I suggested is the issue (it will be unless the VCFs in both configs describe the same set of sites).

"The same multisample VCF file" refers to a VCF file with multiple sample columns describing evidence for the listed variants across multiple BAM files. It can be created using GATK. See, e.g.: https://gatk.broadinstitute.org/hc/en-us/articles/360035889971

The instructions relating to calling on multiple input GVCFs, e.g. "If you have GVCFs from multiple samples..." produce the expected LiRA input, where the multiple samples include at least the bulk and single-cell.

Given this has caused some confusion, we will update the README to clarify.

@KEYS248
Copy link
Author

KEYS248 commented Sep 27, 2020

I think what @cbohrson is saying (correct me otherwise) is that the program may be expecting all samples, both single cell and bulk, to come from a single multisample VCF, instead of two VCFs (one for single cell, one for bulk). If that is the case, I believe I was not doing this so I wonder if that was the cause of my error.

@red-t
Copy link

red-t commented Sep 27, 2020

OK, I just ran GATK with single sample mode before. I'll run GATK again to create a multisample VCF file, then try LiRA again. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants