Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

September 9, 2024 meeting notes #180

Open
13 of 23 tasks
miseminger opened this issue Sep 10, 2024 · 1 comment
Open
13 of 23 tasks

September 9, 2024 meeting notes #180

miseminger opened this issue Sep 10, 2024 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@miseminger
Copy link
Collaborator

miseminger commented Sep 10, 2024

Slides from September 9 update are here.

Top priority, to do week of September 9

  • Create two separate JSON files for each virus: 1 to hold ontology names for protein names and symbols, 1 to hold ontology names for gene names and symbols. Include a link to something (figure out what) in the master JSON
  • Update functionalannotation.py to generate new DH template format - (generate files for SC2 + MPOX + push to repo) - column names are correct, but the mutation index is missing some nucleotide names, causing some misformatting; double check that the next run of the index fixes this
  • Check surveillance report generation script to make sure it works, and change it to be 1 GVF=1 TSV (MRI do this part), and 1 TSV=1 PDF (MZA)
  • Ask Damion about "Not Applicable" menu
  • Review template with Zohaib and Emma when Emma is back (early next week), then submit v1 for release

Changes to JSON

  • gene key comes from GFF file; check if it's needed in the workflow, if not, remove it from the JSON and keep vcf_gene in the GVF alone
  • protein_alias comes from the manually curated key, virus_genomeAnnotation; update this key file to have list values, removing alias names that are the same as the gene name (?) [check with Ivan first to see if he needs protein_alias for the visualization]
  • change RdRp protein_alias to nsp12
  • add orf1ab to protein_alias list for orf1b entry
  • remove pokay_id key, as protein_alias list will contain the Pokay id already
  • in Pokay itself, rename proteins as needed (eg. Plpro ->PL_pro) to match protein_alias entries MRI: doing this will mess up the way functionalannotation.py separates out the protein names from the functional category, so I'm temporarily going to hardcode this. In the future, Pokay won't use these filenames anymore, and this issue will be solved.
  • automatically add ontology names from ontology name files to JSON (see 'Top Priority', above)
  • Change GVF keys to match corresponding JSON keys product, gene (if still using), and protein_alias, and notify Ivan of changes that will impact the visualization

Future work

  • Implement one-to-many DataHarmonizer template functionality to deal with multiple mutations
  • Add HGVS format checks in the DH template itself (regex for basic format) and in addfunctions2gvf.py (eg. print log of unmatched names) to make sure mutation names entered by the user match those in our functional annotation database
  • Implement a way for a user to add their functional annotations on top of our already-annotated data, at the end of the workflow after a GVF has been created
  • Use GitHub Actions to auto-update functional annotation (generate files for SC2 + MPOX) - some code already written, using nonstandardized Pokay terms for now
  • Review new category names with Paul
  • Embed DataHarmonizer template in VIRUS-MVP website
  • Update Pokay to use new standardized names (after approval from Paul and everybody)
  • Get ‘organism’ and ‘reference accession’ and ‘reference database’ from the JSON, not the functional annotation
  • Remove trailing semicolons in GVF attributes
  • Figure out visualization of multiple functions per paper
@miseminger miseminger added the enhancement New feature or request label Sep 10, 2024
@miseminger miseminger pinned this issue Sep 10, 2024
@miseminger
Copy link
Collaborator Author

Sheet for figuring out new Pokay-to-ontology-term mapping strategy: https://docs.google.com/spreadsheets/d/1QM8l8D4IXu4gYEwO1YnN5BKYcJETQ0bM4KBCQE6OcbM/edit?usp=sharing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants