You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Create two separate JSON files for each virus: 1 to hold ontology names for protein names and symbols, 1 to hold ontology names for gene names and symbols. Include a link to something (figure out what) in the master JSON
Update functionalannotation.py to generate new DH template format - (generate files for SC2 + MPOX + push to repo) - column names are correct, but the mutation index is missing some nucleotide names, causing some misformatting; double check that the next run of the index fixes this
Check surveillance report generation script to make sure it works, and change it to be 1 GVF=1 TSV (MRI do this part), and 1 TSV=1 PDF (MZA)
Ask Damion about "Not Applicable" menu
Review template with Zohaib and Emma when Emma is back (early next week), then submit v1 for release
Changes to JSON
gene key comes from GFF file; check if it's needed in the workflow, if not, remove it from the JSON and keep vcf_gene in the GVF alone
protein_alias comes from the manually curated key, virus_genomeAnnotation; update this key file to have list values, removing alias names that are the same as the gene name (?) [check with Ivan first to see if he needs protein_alias for the visualization]
change RdRp protein_alias to nsp12
add orf1ab to protein_alias list for orf1b entry
remove pokay_id key, as protein_alias list will contain the Pokay id already
in Pokay itself, rename proteins as needed (eg. Plpro ->PL_pro) to match protein_alias entries MRI: doing this will mess up the way functionalannotation.py separates out the protein names from the functional category, so I'm temporarily going to hardcode this. In the future, Pokay won't use these filenames anymore, and this issue will be solved.
automatically add ontology names from ontology name files to JSON (see 'Top Priority', above)
Change GVF keys to match corresponding JSON keys product, gene (if still using), and protein_alias, and notify Ivan of changes that will impact the visualization
Future work
Implement one-to-many DataHarmonizer template functionality to deal with multiple mutations
Add HGVS format checks in the DH template itself (regex for basic format) and in addfunctions2gvf.py (eg. print log of unmatched names) to make sure mutation names entered by the user match those in our functional annotation database
Implement a way for a user to add their functional annotations on top of our already-annotated data, at the end of the workflow after a GVF has been created
Use GitHub Actions to auto-update functional annotation (generate files for SC2 + MPOX) - some code already written, using nonstandardized Pokay terms for now
Review new category names with Paul
Embed DataHarmonizer template in VIRUS-MVP website
Update Pokay to use new standardized names (after approval from Paul and everybody)
Get ‘organism’ and ‘reference accession’ and ‘reference database’ from the JSON, not the functional annotation
Remove trailing semicolons in GVF attributes
Figure out visualization of multiple functions per paper
The text was updated successfully, but these errors were encountered:
Slides from September 9 update are here.
Top priority, to do week of September 9
functionalannotation.py
to generate new DH template format - (generate files for SC2 + MPOX + push to repo) - column names are correct, but the mutation index is missing some nucleotide names, causing some misformatting; double check that the next run of the index fixes thisChanges to JSON
gene
key comes from GFF file; check if it's needed in the workflow, if not, remove it from the JSON and keepvcf_gene
in the GVF aloneprotein_alias
comes from the manually curated key,virus_genomeAnnotation
; update this key file to have list values, removing alias names that are the same as the gene name (?) [check with Ivan first to see if he needsprotein_alias
for the visualization]pokay_id
key, as protein_alias list will contain the Pokay id alreadyprotein_alias
entries MRI: doing this will mess up the way functionalannotation.py separates out the protein names from the functional category, so I'm temporarily going to hardcode this. In the future, Pokay won't use these filenames anymore, and this issue will be solved.product
,gene
(if still using), andprotein_alias
, and notify Ivan of changes that will impact the visualizationFuture work
addfunctions2gvf.py
(eg. print log of unmatched names) to make sure mutation names entered by the user match those in our functional annotation databaseThe text was updated successfully, but these errors were encountered: