Skip to content

python script to create ENA submission form for generation of new species-level taxonomics IDs

License

Notifications You must be signed in to change notification settings

SchistoDan/ena-taxid-creation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

ena_taxonomy_request.py

Reads [sample2taxid].csv (see sample-processing repo), filters rows where matched_rank != "Species", renames and reorders columns based on taxonomy request spreadsheet requirements, and outputs results to .tsv file to be emailed to ENA for taxid creation.

Requires pygbif be installed in conda env to grab GBIF ID's from GBIF Backbone taxonomy using API.

usage: python ena_taxonomy_request.py [path/to/sample2taxid.csv] taxonomy_request.tsv species_output.csv

  • path/to/[sample2taxid].csv = path to user-named output.csv file from sample-processing repo.
  • taxonomy_request.tsv = .tsv file containing necessary fields for requesting taxonomic id creation by ENA. Can be named anything (see below).
  • specis_output.csv = .csv file containing rows from sample2taxid.csv where matched_rank == 'species'.
proposed_name name_type host project_id description
177658 Apatania stylata BGE: [Process ID] https://www.gbif.org/species/[GBIF ID]
177627 Agapetus iridipennis BGE: [Process ID] https://www.gbif.org/species/[GBIF ID]
177860 Diplectrona meridionalis BGE: [Process ID] https://www.gbif.org/species/[GBIF ID]

Species with inconsistencies in their GBIF ID's output to gbif_inconsistent.tsv for review. Parameter thresholds for 'inconsistent GBIF IDs):

  • Multiple synonymous GBIF ID's
  • < 95% confidence
  • Without 'ACCEPTED' status
  • Class != Insecta
  • MatchType != EXACT

taxonomy_request.tsv emailed to ENA to request species-level taxID creation

TO DO

  • Figure out what to do when GBIF IDs are inconsistent.
  • Parse new taxIDs created by ENA to file. Currently unsure how new taxIDs will be returned by ENA after creation, and how to get them into ENA sample registration form for sample accession number creation.

GBIF ID inconsistency example:

usageKey scientificName canonicalName rank status confidence matchType kingdom phylum order family genus species kingdomKey phylumKey classKey orderKey familyKey genusKey speciesKey synonym class index acceptedUsageKey
8753555 Erotesis melanella McLachlan, 1884 Erotesis melanella SPECIES SYNONYM 98 EXACT Animalia Arthropoda Trichoptera Leptoceridae Adicella Adicella melanella 1 54 216 1003 4395 1436670 1436745 True Insecta 5 1436745
  • Erotesis melanella McLachlan, 1884 == 8753555
  • Adicella melanella (McLachlan, 1884) == 1436745

About

python script to create ENA submission form for generation of new species-level taxonomics IDs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages