-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duplicate gbifIDs in BIONOMIA occurrence download #336
Comments
Related? #267 |
For reference, this one https://doi.org/10.15468/dl.8b63cr (using the same query as https://doi.org/10.15468/dl.emvv7z) has no duplicates. |
Much appreciated @MattBlissett @timrobertson100 if you've any insight. It's a blocker for a scheduled refresh at my end. |
In this case I don't think #267 is related. There is a discrepancy related to these datasets: https://registry.gbif.org/dataset/6aeebd1a-c3ad-4bc5-bdfe-24de0e2e9052 It looks like a migration intended to keep identifiers stable has instead made a mess, and we now have the same identifier used for occurrences in different datasets, although it's essentially the same occurrence. I'll add some additional monitoring, with a daily check that @ManonGros, could you work out what is supposed to have happened? |
For giggles @ManonGros @MattBlissett, I tried again https://doi.org/10.15468/dl.zcyyzs but still see duplicate gbifIDs. |
@MattBlissett The publishers sent a list of occurrenceIDs to be transferred to different datasets. I have divided it between datasets to transfer data to and ran the script. I am not sure what exactly happened but I am happy to show you which files I have used. |
@dshorthouse this should be fixed now. Could you let us know if this seems ok to you? Thanks! |
Thanks for the work on this! I triggered a new download. And, I've also found a work-around by making use of |
Good to go! Thanks for the fixes. |
A BIONOMIA download, https://doi.org/10.15468/dl.emvv7z (see https://github.com/gbif/occurrence/tree/master/occurrence-download/src/main/resources/download-workflow/bionomia) contains a heap of duplicate gbifIDs and I'm not sure how this was possible. I thought perhaps my logic in this relatively new request (for BIONOMIA) by sloughing occurrenceStatus == ABSENT was at fault so I also did https://doi.org/10.15468/dl.b7hqhu. However, it too has a heap of duplicate gbifIDs. And so, I'm at a loss. Is there something odd happening in the production of these downloads that explains the duplicate records & that can be repaired at your end?
The text was updated successfully, but these errors were encountered: