-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential sequences that should be included in B.1.617 #49
Comments
Update to B.1.617 designationFigure 1: ML phylogeny built using iqtree (command: iqtree -s sequences.aln.fasta -nt 4 -blmin 0.0000000001 -m GTR+G -bb 1000). Sequences in alignment include all B.1.617 sequences on GISAID from 2021-04-20, and include the above sequences from this issue (currently assigned B.1.596). An outgroup A sequence (Wuhan/WH04/2020) and the basal B.1 sequence from Italy (EPI_ISL_420563) also included in tree for context.Current lineage assignments found here (pangoLEARN 2020-04-14): lineage_report.csv Proposed changes to designation from this treeB.1.txt Total number of changes:
Treefile found here: sequences.aln.fasta.tree.zip Full csv of lineage designations including update: lineages.csv |
B.1.617.sublineages.csv
|
hey @aineniamh a public health officer contacted us last night re: hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 being designated B.1.617 in the latest pango updates. This sample is in the B.1.txt file above to be changed to B.1.617. However, looks clearly to me to be B.1.575, which was the original designation. Spike mutations are S:S494P,S:D614G,S:P681H,S:T716I. This change is also affecting new samples we are ready to release that are also B.1.575. A similar finding of a non-E484Q/L452R sample was also noted by the NYPHL in the SPHERES slack. |
Hi @oroak, I think you've misunderstood about the files above. That sequence is being designated B.1, not B.1.617. Previously B.1.617 only had ~60 sequences designated, however on GISAID there were over 600 sequences assigned by pangolin to that lineage. I've taken the sequences that were being assigned B.1.617 by pangolin and given them designations. The sequences in the B.1.txt file are getting designated B.1. hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 would have only been assigned a lineage based on pangolin's 'best guess', and looking into these assignments made it clear it was not B.1.617. |
If you have a list of these sequences you believe should be B.1.575 I'm happy to investigate and add them to the official designations! |
@aineniamh Yes it looks like I misinterpreted "and IDs above" to mean the files as well. The other B.1.575s getting misassigned (by pangoLEARN 2021-04-14) we haven't released yet. This information is now part of mandatory public health reporting for Oregon and we want to make sure things are "correct" before they are reported. Yes, I do believe hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 should be designated B.1.575 (and not B.1) based on the prior pangoLEARN from early April and mutation information here https://outbreak.info/situation-reports?pango=B.1.575. This is a screen shot from a build I just ran last week |
@aineniamh I do believe hCoV-19/USA/OR-OHSU-10702/2021|EPI_ISL_1541883|2021-03-17 should be designated B.1.575. I quickly spot checked the first few US samples listed in the B.1.txt file. Based on the mutations listed they appear to be B.1.575. Could the B.1 annotation have been broadly mistaken for this group? |
Hey, @aineniamh I am from India working on genome sequencing. In some samples that we have the pangolin assigning them B.1.617.1 or B.1.617 lineage even though it appears B.1.617.2 lineage as it does not have the mutation E484Q and have T478K mutation which is not found in B.1.617.1 lineage. |
Hi @nehajha21, thanks for messaging and bringing this to our attention! Are your sequences on GISAID? If you could provide us with a list I'll make sure to get this fixed! |
@aineniamh the samples are not on GISAID and I can't share the sequence and data with you. But I have taken snapshots that I can share with you. |
@aineniamh The amino acid mutation E484Q has the nucleotide mutation as G23012C and T478K has C22995A. |
Hi @nehajha21, sorry meant to say were the sequences on GISAID, I've corrected the message above. If you see from other issues on the repo, providing a list of the GISAID names allows us to designate the sequences and that feeds into the model from a GISAID download. An example of the information to supply is here: #1 |
@aineniamh These samples are not uploaded on GISAID, so can't share those here. I am working in the sequencing lab in India and we receive samples from all over India on daily basis and we have to revert back to them also, and for the analysis purpose, I use pangolin to find the lineage information for all the samples. So, it's really confusing for us to calling them B.1.617.1 and report even though they appear to be B.1.617.2 |
pangolin can only assign based on the known diversity that we have access to going into the training model. It's very difficult for us to do anything to fix this on our end if the sequences haven't been shared on GISAID. If they were uploaded, you wouldn't even need to supply any metadata with them necessarily if that's an issue- but that's where we can access the sequences and input into our assignment model. Alternatively, there is a tool scorpio developed by @rmcolq and @benjamincjackson that assigns explicitly based on SNPs rather than the machine learning approach of pangolin. The tool is very new and found here: https://github.com/cov-lineages/scorpio. As an aside though, if you know your sequences should be B.1.617.2 there is no reason for you to not call your sequences B.1.617.2, even if pangolin assigns otherwise. |
Hey @aineniamh I will send you the fasta sequence of above-mentioned sample and can you please check why the pangolin assigning B.1.617.1 to the sequence even though it appears B.1.617.2 lineage |
Potential need for inclusion in designation of B.1.617
Flagging this for follow up
From Bijaya Dhakal via Gunter Bach:
Virus with similar mutation profile were classified as B.1.617 (E484Q/L452R). However below have the same double mutation are still classified as B.1.596.
The text was updated successfully, but these errors were encountered: