Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api for making bulk atributions #3

Open
abubelinha opened this issue Mar 4, 2023 · 2 comments
Open

api for making bulk atributions #3

abubelinha opened this issue Mar 4, 2023 · 2 comments

Comments

@abubelinha
Copy link

abubelinha commented Mar 4, 2023

I have just discovered this repository and I guess this api is for recovering data from bionomia (which is great).

But I wonder if there are chances to use an api for making attributions, instead of having to browse bionomia.net website and manually attribute records.

I am not meaning bulk attributions related to a GBIF dataset curated by me, which I already can bulk-attribute by populating with ORCID & wikidata IDs and then republishing to GBIF.

I mean that I know of many other GBIF records which I do not upload to GBIF, for example:

  • Duplicated specimens of my own-curated collections, stored in collections at other institutions.
  • Other GBIF specimens collected/identified by people that I already know their ORCIDs (mostly because these people are also present in labels of my institution's collections). If data provider is not providing people IDs, I could do it.

Is there anyway that authenticated users can somehow update structurated data like this to bionomia.net?

gbifID recordedById identifiedById
3031150476 Q6117770,Q11703496
3829872920 Q11703496
1935887248 Q5707146 Q6117770
... ... ...

I suppose it wouldn't harm to re-attribute data which have already been attributed by other people.
If that ever happened, you just can keep the original attribution.
(also, if both differ you could take profit of this for detecting possible errors).

Thanks
@abubelinha

@dshorthouse
Copy link
Member

dshorthouse commented Mar 5, 2023

Thanks @abubelinha for the idea. What you'd like is a write API OR an interface to upload a crafted csv file structured similarly to the table you included here and/or a textarea box to paste records in a web form. This is interesting. I'd not likely create a write API because this would require sophisticated authentication. Upload a csv or a large textarea are considerably easier. In any case, we'd first want to follow the recommendations for how to construct an entry for recordedByID or identifiedByID as URIs separated by pipes. I'd also have to resolve each of the entries on-the-fly because Bionomia might not know anything about them so would have to first create the person via calls to ORCID or wikidata. In principle, it's no different than what I already do prior to bi-monthly data refreshes from GBIF. In this case, I use a processing queue on my laptop with a csv export at the end for which I then make bulk attributions. My structure is a pivot from yours because of this need to call wikidata and ORCID and so I have a column of unique identifiers for people with two other columns containing comma-separated values of gbifIDs for the actions collected and identified.

In your case, you're likely to have heaps identical entries for Q numbers or ORCID IDs in your two recordedByID or identifiedByID columns and, if I were to attempt to atomize these rows in a processing queue, I'd likely hammer ORCID or wikidata if Bionomia knows nothing yet of those identifiers & must first create them. The likelihood of that has been attenuating to nil because Bionomia has become more and more thorough as time marches on.

I'll ponder this and try a few things in development to see what might be feasible. At the very least we'd probably want some form of processing queue with realtime feedback on each row somewhat like quickstatements for wikidata if you're familiar with that one. Or, there'd be a report at the completion of an upload/paste with sufficient information to describe what worked and what did not (for whatever reason).

@abubelinha
Copy link
Author

abubelinha commented Mar 5, 2023

Upload a csv or a large textarea are considerably easier.

I wonder if it would be even easier to provide our data as a remote csv file which bionomia can regularly read and process.

You could add a new textbox (in bionomia user profile) where users can optionally enter an url for providing such a file (next to that textbox you can provide also a link to an example file, so users can just replicate that structure).

Optionally, we could identify attributors (other than the logged bionoia user) by simply providing their ORCID in a right additional column.

gbifID recordedById identifiedById attributor_ORCID*
3031150476 Q6117770 | Q11703496 0000-0001-7618-5230
3829872920 Q11703496 0000-0001-7618-5230
1935887248 Q5707146 Q6117770 0000-0001-7618-5230
... ... ... ...

That last column* wouldn't be strictly necessary (in my case I will probably be the only attributor of my file records).
I was just thinking that these attribution csv files might also be created as a collaborative task (i.e. several alumni during a class, or different staff members of the same museum) and it would be easier if a team leader can provide bionomia all data in a single file.

Of course you could then process this file at any time, checking first for people's IDs found in recordeById and identifiedById columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants