Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Update to the latest version of nextclade. #1701

Merged
merged 5 commits into from
Aug 16, 2024

Conversation

jgadling
Copy link
Contributor

Summary:

  • What: Upgrade to the latest version of nextclade, so our strain results are more accurate
  • Ticket:
  • Env:

Demos:

Notes:

Checklist:

  • I merged latest <base branch>
  • I manually verified the change
  • I added labels to my PR
  • I tested in multiple browsers
  • I added relevant unit tests
  • I have notified others of changes they need to make locally (migrations, jobs, package updates, etc)

Copy link
Collaborator

@danrlu danrlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much!

@@ -127,6 +128,13 @@ def cli(
# generalized case and we'll need to figure out how to handle that,
# but right now the workflow is hardcoded to always expecting dataset.
nextclade_dataset_name = target_pathogen.nextclade_dataset_name
# Nextclade 3.2.8 has new names for datasets vs the 2.1 names in the db.
new_nextclade_dataset_names = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if normalizing the new names back to whatever the old one was is the right choice. I don't remember how the dataset name gets used, but if there's no logic based on the value elsewhere and its just being held so we know what reference was used, I think we shouldn't standardize to the old name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, whoops, I see I misread the direction of the lookup var.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I think this is just the minimal change possible -- we only use this value to dowload the right dataset via the nextclade cli, and nothing else changes anywhere in our system

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know: is there a reason we can't modify the row for the pathogens table so the nextclade_dataset_name for MPX is the new value instead? If that's doable, it seems preferable to go that way, but I also don't know if the refresh logic would freak out if old MPX and new MPX referenced different things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a skim through the code and it looks like this is the one-and-only place where the nextclade dataset name value is used, so it should be safe to update the db values instead - I'll update the PR

@jgadling jgadling merged commit 8b44c4b into trunk Aug 16, 2024
13 checks passed
@jgadling jgadling deleted the jgadling/upgrade-nextclade branch August 16, 2024 23:09
@danrlu
Copy link
Collaborator

danrlu commented Aug 19, 2024

The update is in:
Prod (old version)
image
Staging (new version and includes updated lineages)
image

Also tested SC2 samples and nothing is broken. Thank you both!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants