You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I looked into the feasibility of adding the 16 NSPs into the exported (Auspice) dataset. This'll need nextclade v3 since RdRp includes the slip site, so perhaps a time to make some bigger changes too. (We've decided not to modify the ORF1a ORF1b annotations; discussion on slack.)
Nextclade does the translations, so we need to update the genemap.gff for Nextclade's 'sars-cov-2' dataset.
Our ancestral reconstruction of the translations (rule translate) is what creates the annotations block in the JSON. This currently uses defaults/reference_seq.gb for the annotations, and nothing else uses this.
We can shift the reconstruction to augur ancestral, and either keep the script to generate the JSON annotations, or (preferred) just keep a JSON representation of the annotations block in the repo and use this. (We'll want to have more than just the coordinates in the JSON - we'll want to add some extra display names / colours / descriptions; the latter being important to explain why we use ORF1a + ORF1b!)
This will allow us to remove this genbank file
Other things noticed / improvements we could make:
The workflow-config-file.rst has fallen out of date. This is seemingly inevitable with documentation, but this is a good chance to improve it.
We don't use any nextclade datasets other than 'sars-cov-2'; I assumed we'd use the 'sars-cov-2-21L' dataset for our 21L builds, and we have config settings to allow this, but I don't think we do.
rule align uses Nextalign, with a fasta + gff from the ncov repo. Why don't we replace the fasta+gff with the nextclade dataset we fetch later on in the process?
My understanding of nextclade v3 is we'll replace nextalign with nextclade in this step anyways.
rule build_mutation_summary and rule mutation_summary seem unused. If these can be removed, we could then remove defaults/reference.seq.fasta (alignment_reference), defaults/annotation.gff (annotation). If the rules are still in use, we may want to use the nextclade dataset files anyway.
The 2nd rule here is the only place we use the translations from rule align, so we may be able to avoid translating every genome.
The text was updated successfully, but these errors were encountered:
I looked into the feasibility of adding the 16 NSPs into the exported (Auspice) dataset. This'll need nextclade v3 since RdRp includes the slip site, so perhaps a time to make some bigger changes too. (We've decided not to modify the ORF1a ORF1b annotations; discussion on slack.)
genemap.gff
for Nextclade's 'sars-cov-2' dataset.rule translate
) is what creates the annotations block in the JSON. This currently usesdefaults/reference_seq.gb
for the annotations, and nothing else uses this.augur ancestral
, and either keep the script to generate the JSON annotations, or (preferred) just keep a JSON representation of the annotations block in the repo and use this. (We'll want to have more than just the coordinates in the JSON - we'll want to add some extra display names / colours / descriptions; the latter being important to explain why we use ORF1a + ORF1b!)Other things noticed / improvements we could make:
workflow-config-file.rst
has fallen out of date. This is seemingly inevitable with documentation, but this is a good chance to improve it.rule align
uses Nextalign, with a fasta + gff from the ncov repo. Why don't we replace the fasta+gff with the nextclade dataset we fetch later on in the process?rule build_mutation_summary
andrule mutation_summary
seem unused. If these can be removed, we could then removedefaults/reference.seq.fasta
(alignment_reference
),defaults/annotation.gff
(annotation
). If the rules are still in use, we may want to use the nextclade dataset files anyway.rule align
, so we may be able to avoid translating every genome.The text was updated successfully, but these errors were encountered: