Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Latest alignment to Pathogen repo guide and add frequency panel #88

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

j23414
Copy link
Contributor

@j23414 j23414 commented Dec 18, 2024

Description of proposed changes

Since I was already in this repository to add the frequency panel, I decided to initiate an update to the workflow to align it with the latest version of the Pathogen Repo Guide.

Draft Frequencies Plot

Added frequencies plots from Actions run: https://github.com/nextstrain/dengue/actions/runs/12433607449

full genome frequencies:

E gene frequencies:

Feedback welcome

Related issue(s)

Checklist

  • Checks pass

Since auspice automatically detects "authors" and we prefer the abbreviated authors
list displayed, change "abbr_authors" to "authors" and "authors" to "full_authors".

This matches the pathogen-repo-guide
To match the pathogen repo guide, change:

* `genbank_accession` to `accession`
* `genbank_accession_rev` to `accession_version`

There should be a subsequent change in the phylogenetic workflow
To be more consistent with the pathogen-repo-guide and the ingest workflow
Use the combination of `accession` and `url` fields in the phylogenetic build.
This change follows a similar change to the ingest workflow.
@j23414 j23414 force-pushed the pathogen-repo-guide branch from 073bcf2 to a56685a Compare December 18, 2024 17:38
@j23414 j23414 changed the title WIP: Latest alignment to Pathogen repo guide WIP: Latest alignment to Pathogen repo guide and add frequency panel Dec 20, 2024
@j23414 j23414 force-pushed the pathogen-repo-guide branch from 98d7a19 to 65520e5 Compare December 20, 2024 20:58
@j23414 j23414 changed the title WIP: Latest alignment to Pathogen repo guide and add frequency panel Latest alignment to Pathogen repo guide and add frequency panel Dec 20, 2024
@j23414 j23414 marked this pull request as ready for review December 20, 2024 20:58
@trvrb
Copy link
Member

trvrb commented Dec 20, 2024

Something is off in the frequency calculations. Here's a view from https://nextstrain.org/staging/dengue/trials/2024bfreq/dengue/all/genome?d=tree,frequencies&f_region=South%20America that just filters to South American sequences. I can mouse-over and clearly see a large number of samples from the last five years. However, none of these samples are showing up in the frequencies panel.

dengue

You can see a similar issue without even filtering. It looks there's a single sample that's contributing to the KDE frequencies post 2020.

Screenshot 2024-12-20 at 1 17 41 PM

@trvrb
Copy link
Member

trvrb commented Dec 20, 2024

Compare measles example here that without normalizing frequencies you should have things sum to 100% without any filters in place:

Screenshot 2024-12-20 at 1 24 51 PM

@j23414
Copy link
Contributor Author

j23414 commented Dec 20, 2024

Thank you! I was also expecting frequencies to sum up to 100%. The code changes to add frequencies panel are here, in case someone sees what's wrong faster than I can:

I'm not entirely certain what is causing the empty regions:

  • perhaps unassigned lineage strains?
  • perhaps this requires fine tuning the bandwidth parameters? I'm looking at the augur frequencies documentation
  • perhaps drop the max-date 6M filter (might be why recent samples are missing)

I was mainly following frequency parameters for yellow-fever and measles but feel like I'm missing a nuance somewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants