You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NCBI’s geolocation field is of the format: country:subdivision according to their documentation it should be possible to further split subdivision into geoLocAdmin1, geoLocAdmin2. However, for a large number of samples the division field does not follow this format. Currently we map country to our geoLocCountry field and the entire division field into geoLocAdmin1. We would like to be able to at least split this into geoLocAdmin1 and geoLocAdmin2.
anna-parker
changed the title
At the moment we just split ncbi's ncbiGeoLocation field into division = geoLocAdmin1 and country=geoLocCountry. The division could still be further split into geoLocAdmin1 and geoLocAdmin2, e.g.
Curate NCBI geolocation metadata
Oct 29, 2024
The most long term high level decision is on what to normalize to. Ideally we normalize to an established gazetteer that unambiguously allows pulling things like coordinates, and potentially other metadata for free. This means we'd possibly actually want to normalize to an id, instead of our current four location fields (these would semi-automatically derive from the entity pointed to).
Another key decision is what we mean by admin level 1/2 and city for each country.
A target gazetter should be open, allow dump download (to be free of rate limits), have stability track record, be maintained.
Geonames is one option, OpenStreetmap another.
The actual how to should be an implementation detail, which isn't to devalue it, just to beware not to let convenient implementation drive high level permanent decisions.
This curation was also requested by users.
NCBI’s geolocation field is of the format:
country:subdivision
according to their documentation it should be possible to further splitsubdivision
intogeoLocAdmin1, geoLocAdmin2
. However, for a large number of samples the division field does not follow this format. Currently we mapcountry
to ourgeoLocCountry
field and the entiredivision
field intogeoLocAdmin1
. We would like to be able to at least split this intogeoLocAdmin1
andgeoLocAdmin2
.Discussion
From discussion in group these changes should actually be made by by a curator bot
https://docs.google.com/document/d/1TQiE66Hk6WjgkMvvMMhu8uMSEhHFP2AwINMAIsyODCw/edit
Proof of concept of suggestions in ingest
#3015
#3026
The text was updated successfully, but these errors were encountered: