Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenoFLU for fauna datasets #138

Merged
merged 3 commits into from
Feb 24, 2025
Merged

GenoFLU for fauna datasets #138

merged 3 commits into from
Feb 24, 2025

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Feb 23, 2025

Generates and exports both the per-segment GenoFLU results and the whole-genome genotype/constellation.

There's a few outstanding to-dos, to be done either here or elsewhere:

  • If there are fewer than 8 segments then genoflu doesn't report this in the results (or the log). We already have n_segments in the metadata so we could use this to generate a "<8 segments sequenced" label if desired.
  • What's more frustrating for samples with fewer than 8 segments is that we don't get the per-segment calls for the sequenced segments. Presumably we could modify genoflu to report these.
  • For genomes where some of the segments are too diverged my current parsing approach doesn't pull them out. While it was fun to do everything in csvtk we should switch to a python script to correctly parse annotations where (e.g.) the genotype list is MP:ea3, HA:ea3, NA:ea5, PB1:ea3 and the genotype is Not assigned: Only 4 segments >98.0% match found of total 8 segments in input file

Includes both the per-segment results and the whole-genome genotype/
constellation
The previous csvtk approach would drop "Not assigned" records because
they didn't have results for all 8 segments. We now report the genome
result as "Not assigned (too divergent)" and report the segment results
where available.
@jameshadfield
Copy link
Member Author

jameshadfield commented Feb 24, 2025

I'm going to merge this now as it works locally and will enable us to improve the cattle-outbreak segment-level filtering discussed here. I'll run actions to re-ingest fauna and then update the datasets and update this comment accordingly.

@jameshadfield jameshadfield merged commit 6e8569d into master Feb 24, 2025
6 checks passed
@jameshadfield jameshadfield deleted the james/fauna-genoflu branch February 24, 2025 22:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant