Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why snp-sites generated variations numbers different snippy mapping results #113

Open
Zjianglin opened this issue Jun 6, 2024 · 0 comments

Comments

@Zjianglin
Copy link

Hi,

I have some bacterial NGS reads as well as assemblies. I used two methods to call variations:

  1. mapping-based: I use clean reads with snippy and reference genome to call variation, and bcftools merge to combined the variations from multiple samples.
  2. assembly-based: I assemblied the sample using shovill, annotated them using prokka, got pan-genome results using Panaroo, generated a recombination-free core-genes alignment using ClonalFrameML. Then I get the core genome variations using snp-sites by snp-sites -v -o ours_core_variations.vcf ../PGout_panaroo/core_gene_alignment_filtered.aln

Here is the statistics for two vcfs:

# for merged VCF from individually calling by snippy
1. $ bcftools stats ../Ours_vcf_merged.vcf.gz | grep SN
# SN, Summary numbers:
#   number of SNPs      .. number of rows with a SNP
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
#   counter. For example, a row with a SNP and an indel increments both the SNP and
# SN	[2]id	[3]key	[4]value
SN	0	number of samples:	196
SN	0	number of records:	2292
SN	0	number of no-ALTs:	0
SN	0	number of SNPs:	2053
SN	0	number of MNPs:	47
SN	0	number of indels:	180
SN	0	number of others:	13
SN	0	number of multiallelic sites:	13
SN	0	number of multiallelic SNP sites:	1

#for snp-sites results
2. $ bcftools stats ours_core_variations.vcf | grep SN
# SN, Summary numbers:
#   number of SNPs      .. number of rows with a SNP
#   number of multiallelic SNP sites .. number of rows with multiple alternate alleles, all SNPs
#   counter. For example, a row with a SNP and an indel increments both the SNP and
# SN	[2]id	[3]key	[4]value
SN	0	number of samples:	196
SN	0	number of records:	5907
SN	0	number of no-ALTs:	0
SN	0	number of SNPs:	5907
SN	0	number of MNPs:	0
SN	0	number of indels:	0
SN	0	number of others:	0
SN	0	number of multiallelic sites:	3408
SN	0	number of multiallelic SNP sites:	34

There is a huge difference between the total variations (2292 vs. 5907) as well as SNP(2053 vs. 5907). Is there anything I did wrong? Could you please help me figure it out?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant