Skip to content

Commit

Permalink
[ingest] Add GenoFLU for each segment
Browse files Browse the repository at this point in the history
  • Loading branch information
jameshadfield committed Feb 23, 2025
1 parent ba8d73c commit d2d08ad
Showing 1 changed file with 16 additions and 8 deletions.
24 changes: 16 additions & 8 deletions ingest/rules/genoflu.smk
Original file line number Diff line number Diff line change
Expand Up @@ -43,19 +43,27 @@ rule run_genoflu:
"""


rule subset_genoflu:
rule parse_genoflu:
"""
Parses the genoflu TSV to produce a TSV with 10 columns:
* strain - ID used for matching
* genoflu - the "genotype" or "constellation"
* genoflu_<SEGMENT> - the individual segment genoflu calls
"""
input:
genoflu="{data_source}/data/genoflu/results/results.tsv"
output:
genotypes="{data_source}/data/genoflu/genoflu_genotypes.tsv",
shell:
"""
csvtk cut -t \
-f Strain,Genotype \
{input.genoflu} \
| csvtk rename -t \
-f Strain,Genotype \
-n strain,genoflu > {output.genotypes}
r"""
cat {input.genoflu} | \
csvtk cut -t -F -f Strain,Genotype,'Genotype List Used*' | \
csvtk grep -t -F -f 'Genotype List Used*' -r -p "^PA:.+HA:.+PB1:.+MP:.+NA:.+PB2:.+NP:.+NS:.+$" -N | \
csvtk sep -t -n genoflu_PA,genoflu_HA,genoflu_PB1,genoflu_MP,genoflu_NA,genoflu_PB2,genoflu_NP,genoflu_NS -f 3 -s ", " | \
csvtk replace -t -f 4-11 -p "^(.+):" | \
csvtk cut -t -f 1,2,4-11 | \
csvtk rename -t -f Strain,Genotype -n strain,genoflu \
> {output.genotypes}
"""


Expand Down

0 comments on commit d2d08ad

Please sign in to comment.