Skip to content

Commit

Permalink
Refactor post-processing script to be specific to zika strain name fixes
Browse files Browse the repository at this point in the history
This commit refactors the generic post-processing script to better align
with its specific purpose in Zika ingest. The purpose of this script is
to fix zika strain names based on historical modifications from the fauna
repo. In summary the following changes:

* Rename script to fix-zika-strain-names.py to match the purpose
* Add a docstring to the script
* Replace the accession argument with a strain field argument, which is
  the field that needs to be fixed
  • Loading branch information
j23414 committed Jan 18, 2024
1 parent aefdec1 commit b734c78
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 7 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,10 @@

def parse_args():
parser = argparse.ArgumentParser(
description="Reformat a NCBI Virus metadata.tsv file for a pathogen build."
description="Modify zika strain names by referencing historical modifications from the fauna repo."
)
parser.add_argument("--accession-field", default='accession',
help="Field from the records to use as the sequence ID in the FASTA file.")
parser.add_argument("--strain-field", default='strain',
help="Field from the records to use as the strain name to be fixed.")

return parser.parse_args()

Expand Down Expand Up @@ -48,8 +48,7 @@ def main():

for index, record in enumerate(stdin):
record = json.loads(record)
record["strain"] = _set_strain_name(record)
record["authors"] = record["abbr_authors"]
record[args.strain_field] = _set_strain_name(record)
stdout.write(json.dumps(record) + "\n")


Expand Down
3 changes: 1 addition & 2 deletions ingest/rules/transform.smk
Original file line number Diff line number Diff line change
Expand Up @@ -85,8 +85,7 @@ rule transform:
--abbr-authors-field {params.abbr_authors_field} \
| ./vendored/apply-geolocation-rules \
--geolocation-rules {input.all_geolocation_rules} \
| ./bin/post_process_metadata.py \
--accession-field {params.id_field} \
| ./bin/fix-zika-strain-names.py \
| ./vendored/merge-user-metadata \
--annotations {input.annotations} \
--id-field {params.annotations_id} \
Expand Down

0 comments on commit b734c78

Please sign in to comment.