Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with References: only Auhtor present, no Year, no Details #1081

Closed
yroskov opened this issue Dec 14, 2021 · 6 comments
Closed

Problem with References: only Auhtor present, no Year, no Details #1081

yroskov opened this issue Dec 14, 2021 · 6 comments

Comments

@yroskov
Copy link

yroskov commented Dec 14, 2021

Checking data on PREVIEW (CoL of 2021-12-13): https://preview.catalogueoflife.org/

Both World Ferns & World Plants have incomplete references. Only Author present, no Year, no Details.

CatalogueOfLife/testing#4 (comment)
CatalogueOfLife/testing#3 (comment))

@mdoering, Geoff said, it is a backend problem: https://data.catalogueoflife.org/dataset/1141/verbatim?type=col%3AReference&col:ID=455468&termOp=OR

image

@mdoering
Copy link
Member

Hm, was the original data always so sparse?
There is now just an author and a citation column:

col:citation = Benth. 1837. Enum. Pl. Hueg. 53.
col:author = Benth.

If atomised fields are given these are trusted more, just as we do for scientific names.
So the resulting citation only has authors as the rest is missing.
If nothing but the author can be extracted I would say it is best to not include the author column at all but just use the citation string and nothing else.

Well, but this is weird. The Reference.csv file does contain other columns:
ID citation author title year source doi link 455468 Benth. 1837. Enum. Pl. Hueg. 53. Benth. 1837 In: Enum. Pl. Hueg. 53

The final ColDP format does not support year and source. Instead there is issued and containerTitle for the journal title (but not the volume etc, so it strictly does not match).

I will update the importer to also recognize year and source to be backwards compatible and map those columns to issued and containerTitle even though this is not strictly right.

@gdower it would be good to adjust your existing scripts to the final ColDP version.

We should really build a validator that primarily checks the syntactical structure of a ColDP archive and reports about unrecognized columns etc.

@mdoering
Copy link
Member

@dhobern I found that your Lepidoptera archives also still use last years reference format which was strongly based on ACEF.
The previous columns were:

ID,author,title,year,source,details,link
Could you update to the latest which is more aligned to BibTex and other reference standards?
https://github.com/CatalogueOfLife/coldp/blob/master/README.md#reference

Namely year, source and details are the 3 columns which have been replaced by others.

@dhobern
Copy link

dhobern commented Dec 19, 2021

So, will I be correct if I keep it simple and change as follows?

  • year --> issued
  • source --> containerTitle
  • details --> page

I may be able to split details into issue and page, but I'll check whether the data is clean enough.

@mdoering
Copy link
Member

yes. Thats what I instructed the importer now also to do as a best guess when it faces these old ACEF terms in ColDP.
Obviously the "details" is a catch all thing which can contain all sorts of things, page(s), volume, issue. You probably know your data best to evaluate if that can be split into several terms. volume & issue would be great to ultimately locate an article, they are key if I understand Rod Page and others correctly.

issued is an ISO date that can be truncated, so all these are accepted: 1878, 1878-03, 1878-03-12

@dhobern
Copy link

dhobern commented Dec 21, 2021

Fixed as issued, containerTitle, volume, issue, page for the Alucitoidea and Pterophoroidea datasets.

@yroskov
Copy link
Author

yroskov commented Jan 14, 2022

FIXED (preview 2022-01-12):
image

@yroskov yroskov closed this as completed Jan 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants