Filter cattle-outbreak using GenoFLU B3.13 #140
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The previous approach relied on broad filtering -- minimum date of 2024, region of North America -- a hardcoded exclude list and a clock filter. As the diversity of sequences increased the clock-filter became less effective and ultimately dropped all the desired strains. See #133 for more.
We now use GenoFLU constellations and can relax the date and region filters accordingly. This relaxation didn't result in any non-North-American samples included but did add one B3.13 genome from 2023: 'A/Goose/USA/23-038138-001-original/2023'.
These changes to filtering will also apply to the D1.1 builds, but testing indicates no changes.
The segment-level approach is not addressed here, but could be similarly adjusted to use the GenoFLU matching on the segment level. Specifically, https://github.com/nextstrain/avian-flu/pull/138/files adds the segment-level annotations and the expanded constellation is: B3.13 = PA:ea1, HA:ea1, PB1:am4, MP:ea1, NA:ea1, PB2:am2.2, NP:am8, NS:am1.1
Closes #133