api for making bulk atributions #3

abubelinha · 2023-03-04T21:21:02Z

I have just discovered this repository and I guess this api is for recovering data from bionomia (which is great).

But I wonder if there are chances to use an api for making attributions, instead of having to browse bionomia.net website and manually attribute records.

I am not meaning bulk attributions related to a GBIF dataset curated by me, which I already can bulk-attribute by populating with ORCID & wikidata IDs and then republishing to GBIF.

I mean that I know of many other GBIF records which I do not upload to GBIF, for example:

Duplicated specimens of my own-curated collections, stored in collections at other institutions.
Other GBIF specimens collected/identified by people that I already know their ORCIDs (mostly because these people are also present in labels of my institution's collections). If data provider is not providing people IDs, I could do it.

Is there anyway that authenticated users can somehow update structurated data like this to bionomia.net?

gbifID	recordedById	identifiedById
3031150476	Q6117770,Q11703496
3829872920		Q11703496
1935887248	Q5707146	Q6117770
...	...	...

I suppose it wouldn't harm to re-attribute data which have already been attributed by other people.
If that ever happened, you just can keep the original attribution.
(also, if both differ you could take profit of this for detecting possible errors).

Thanks
@abubelinha

dshorthouse · 2023-03-05T18:27:23Z

Thanks @abubelinha for the idea. What you'd like is a write API OR an interface to upload a crafted csv file structured similarly to the table you included here and/or a textarea box to paste records in a web form. This is interesting. I'd not likely create a write API because this would require sophisticated authentication. Upload a csv or a large textarea are considerably easier. In any case, we'd first want to follow the recommendations for how to construct an entry for recordedByID or identifiedByID as URIs separated by pipes. I'd also have to resolve each of the entries on-the-fly because Bionomia might not know anything about them so would have to first create the person via calls to ORCID or wikidata. In principle, it's no different than what I already do prior to bi-monthly data refreshes from GBIF. In this case, I use a processing queue on my laptop with a csv export at the end for which I then make bulk attributions. My structure is a pivot from yours because of this need to call wikidata and ORCID and so I have a column of unique identifiers for people with two other columns containing comma-separated values of gbifIDs for the actions collected and identified.

In your case, you're likely to have heaps identical entries for Q numbers or ORCID IDs in your two recordedByID or identifiedByID columns and, if I were to attempt to atomize these rows in a processing queue, I'd likely hammer ORCID or wikidata if Bionomia knows nothing yet of those identifiers & must first create them. The likelihood of that has been attenuating to nil because Bionomia has become more and more thorough as time marches on.

I'll ponder this and try a few things in development to see what might be feasible. At the very least we'd probably want some form of processing queue with realtime feedback on each row somewhat like quickstatements for wikidata if you're familiar with that one. Or, there'd be a report at the completion of an upload/paste with sufficient information to describe what worked and what did not (for whatever reason).

abubelinha · 2023-03-05T20:09:44Z

Upload a csv or a large textarea are considerably easier.

I wonder if it would be even easier to provide our data as a remote csv file which bionomia can regularly read and process.

You could add a new textbox (in bionomia user profile) where users can optionally enter an url for providing such a file (next to that textbox you can provide also a link to an example file, so users can just replicate that structure).

Optionally, we could identify attributors (other than the logged bionoia user) by simply providing their ORCID in a right additional column.

gbifID	recordedById	identifiedById	attributor_ORCID*
3031150476	Q6117770 \| Q11703496		0000-0001-7618-5230
3829872920		Q11703496	0000-0001-7618-5230
1935887248	Q5707146	Q6117770	0000-0001-7618-5230
...	...	...	...

That last column* wouldn't be strictly necessary (in my case I will probably be the only attributor of my file records).
I was just thinking that these attribution csv files might also be created as a collaborative task (i.e. several alumni during a class, or different staff members of the same museum) and it would be easier if a team leader can provide bionomia all data in a single file.

Of course you could then process this file at any time, checking first for people's IDs found in recordeById and identifiedById columns.

abubelinha mentioned this issue Mar 5, 2023

how to make attributions for a particular GBIF occurrence? bionomia/bionomia#259

Open

abubelinha mentioned this issue Mar 5, 2023

checking attributions of many GBIF occurrenceIDs with one api request #4

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api for making bulk atributions #3

api for making bulk atributions #3

abubelinha commented Mar 4, 2023 •

edited

Loading

dshorthouse commented Mar 5, 2023 •

edited

Loading

abubelinha commented Mar 5, 2023 •

edited

Loading

api for making bulk atributions #3

api for making bulk atributions #3

Comments

abubelinha commented Mar 4, 2023 • edited Loading

dshorthouse commented Mar 5, 2023 • edited Loading

abubelinha commented Mar 5, 2023 • edited Loading

abubelinha commented Mar 4, 2023 •

edited

Loading

dshorthouse commented Mar 5, 2023 •

edited

Loading

abubelinha commented Mar 5, 2023 •

edited

Loading