Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow CITATION.cff as alternative to Authors field in dataset_description #901

Closed
Remi-Gau opened this issue Oct 15, 2021 · 24 comments · Fixed by #1525
Closed

Allow CITATION.cff as alternative to Authors field in dataset_description #901

Remi-Gau opened this issue Oct 15, 2021 · 24 comments · Fixed by #1525
Labels
discussion ongoing discussion

Comments

@Remi-Gau
Copy link
Collaborator

CITATION.cff can be used for citing software or datasets.

Would it make sense to allow them officially in a BIDS dataset ? What do you all think?

Its content would be in part redundant with dataset_description and thus might require validation for internal consistency.


Links

https://citation-file-format.github.io/

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files

https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-citation-files#citing-a-dataset

@Remi-Gau Remi-Gau added the discussion ongoing discussion label Oct 15, 2021
@tsalo
Copy link
Member

tsalo commented Oct 19, 2021

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

@CPernet
Copy link
Collaborator

CPernet commented Oct 20, 2021

i thought that is used for software only? ie we should have one in our BIDS repo

@Remi-Gau
Copy link
Collaborator Author

i thought that is used for software only? ie we should have one in our BIDS repo

created an example

https://github.com/Remi-Gau/cff_example_data

YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_data (Version 1.0.0) [Data set]. https://doi.org/10.5281/zenodo.1234

@misc{YOUR_NAME_HERE_cff_example_data_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_data}},
url = {https://github.com/Remi-Gau/cff_example_data},
year = {2021}
}

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

Testing things here

https://github.com/Remi-Gau/cff_example_software

YOUR_NAME_HERE, Y., & Lisa, M. (2021). cff_example_software (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.1234

@software{YOUR_NAME_HERE_cff_example_software_2021,
author = {YOUR_NAME_HERE, YOUR_NAME_HERE and Lisa, Mona},
doi = {10.5281/zenodo.1234},
month = {10},
title = {{cff_example_software}},
url = {https://github.com/Remi-Gau/cff_example_software},
version = {1.0.0},
year = {2021}
}

@Remi-Gau
Copy link
Collaborator Author

Do you know if CITATION.cff can include multiple citations? E.g., citing the versioned dataset and a data paper?

Updated the software example to use the preferred citation feature.

@CPernet
Copy link
Collaborator

CPernet commented Oct 20, 2021

ok @Remi-Gau smarty pants you win :-)
so it's all possible - the questions are

  • what is the advantage over the current solution (all in dataset_description right?)
  • what is the technical support needed

@Remi-Gau
Copy link
Collaborator Author

@Remi-Gau
Copy link
Collaborator Author

ok @Remi-Gau smarty pants you win :-) so it's all possible - the questions are

* what is the advantage over the current solution (all in dataset_description right?)

Their schema does offer a few things we don't have.
https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md#index

Could also allow a "division of labor": typical dataset info goes in CITATION.cff, BIDS specific info goes in dataset.

This could also potentially better integrate with other non-BIDS tools and services (at the moment "only" github, zenodo, zotero).

FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons. (And advertise CFF files in case it could interest people for other things).

* what is the technical support needed

there is a python validator for those files and there a json schema already, that could be used for other validations

https://github.com/citation-file-format/citation-file-format/blob/main/README.md#validation-heavy_check_mark

From the BIDS perspective we would have to ensure consistency between dataset_description and those .cff files.

@sappelhoff
Copy link
Member

My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion. If we see that it becomes very important and widespread (which I hope it does), we should officially adopt it. Until then, users can add it, and bids-ignore it ... as is already done for many BIDS datasets on GIN and the datacite.yml file there. E.g., https://gin.g-node.org/sappelhoff/mpib_sp_eeg/

Until then, one could also write a dataset_description.json to CITATION.CFF converter. I think I recently saw such a converter from BIDS to datacite.yml on Twitter. @adswa might know more about that :-)

@Remi-Gau
Copy link
Collaborator Author

My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.

yup I think that sums up why this is not a hill I want to die on just yet.

@CPernet
Copy link
Collaborator

CPernet commented Oct 21, 2021

but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)

@sappelhoff
Copy link
Member

but we could still use one inside https://github.com/bids-standard/bids-specification with all relevant publications :-) so it renders nice on github (ie we don't support it for datasets, but use it for the repo)

Agreed!

@Remi-Gau
Copy link
Collaborator Author

I suggest we revisit this for the BIDS repo after the steering group election because we'll have updated our list of contributors by then and .cff could also help us to do that but it will have to take into account suggestions from #66 and #627

@adswa
Copy link
Contributor

adswa commented Oct 21, 2021

I think I recently saw such a converter from BIDS to datacite.yml on Twitter. @adswa might know more about that :-)

@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)

@Remi-Gau
Copy link
Collaborator Author

@christian-monch wrote one during a hackathon, I believe the most recent state of it can be found here :-)

Had forgotten about this WIP while I started creating a package to streamline the creation of datacite.yml file for BIDS dataset...

https://github.com/Remi-Gau/bids2cite

@ericearl
Copy link
Collaborator

@Remi-Gau: Should BIDS support CITATION.cff files ?

Yes.

@CPernet: what is the advantage over the current solution (all in dataset_description right?)

The Authors list is just list of strings. There is a lot more nuance to authorship than just a name. Like a whole file-format's worth! And GitHub, Zenodo, and Zotero are supporting CITATION.cff. And there is a user-friendly tool to make CITATION.cff files.

@CPernet: what is the technical support needed

  1. A PR to the BIDS Specification to include language about using either a CITATION.cff or the Authors list, but not both.
  2. Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.

@Remi-Gau: FYI I am not really convinced that this should be done. Just wanted to start a conversation to weight the pros and cons.

I think this should be done. The pros seem to outweigh the cons.

@sappelhoff commented on Oct 21, 2021
My personal opinion on this is that we should wait how CITATION.CFF develops in the next months / year / years and then revive the discussion.

It's been years and it looks good to me!

@effigies effigies changed the title Should BIDS support CITATION.cff files ? Allow CITATION.cff as alternative to Authors field in dataset_description Jun 20, 2023
@effigies
Copy link
Collaborator

  1. Work on the validator (I do not know how or what exactly) to say one or the other is allowed, but not both.

In the schema, we would write a rule like:

SingleSourceAuthors:
  issue:
    code: AUTHORS_AND_CITATION_FILE_MUTUALLY_EXCLUSIVE
    level: error
    message: |
      CITATION.cff file found. The "Authors" field of dataset_description.json
      should be removed to avoid inconsistency.
  selectors:
    - path == 'CITATION.cff'
  checks:
    - '!("Authors" in dataset_description)'

I would not be inclined to also implement this in the legacy validator.

Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas

@nellh
Copy link
Member

nellh commented Jun 21, 2023

I agree this change would be very helpful for including more complete authorship information in BIDS datasets. It's an issue for OpenNeuro and a BIDS solution would let us add this to datasets in a way that allowed for reuse.

Unfortunately, CFF does not have a Javascript validator, just Python. They do share JSON schemas though, so it wouldn't be awful to validate ourselves: https://github.com/citation-file-format/cff-converter-python/tree/main/cffconvert/schemas

The CFF Initializer tool @ericearl mentioned has a simple JavaScript validator implementation. https://github.com/citation-file-format/cff-initializer-javascript/blob/main/src/store/validation.ts

@Remi-Gau
Copy link
Collaborator Author

I had worked on a little package to help create citation files for bids datasets because they can also be ingested by datalad metadata tools.

Having the citation file take precedence and not having to synch with the dataset description would make things even easier.

https://github.com/Remi-Gau/bids2cite

@effigies
Copy link
Collaborator

effigies commented Jun 24, 2023

Looking at https://github.com/citation-file-format/citation-file-format/blob/main/README.md, we have additional overlaps with dataset_description.json:

BIDS CFF
HowToAcknowledge message/preferred-citation
Name title
Authors authors
Version version
ReferencesAndLinks references
DatasetDOI doi
License license

We may want to make more than just authors mutually exclusive with CITATION.cff. I think at least for name and version we should probably just duplicate and validate identity.

Also, authors have no role at this point (citation-file-format/citation-file-format#112). While highly desirable, this is also not currently possible in BIDS, so CITATION.cff is still an upgrade.

@Remi-Gau
Copy link
Collaborator Author

Wait... Technically we only have bids version in dataset description and not version, right?
The only "trace" of a the version of the dataset is in the changelog if it is present. Or maybe I missed it somewhere else?
So in that sense citation.cff would actually add a way to track this.

@effigies
Copy link
Collaborator

Ah, sorry, I didn't actually look it up. I guess I was thinking of it being part of DOIs in many cases.

@dmoracze
Copy link

Contribution roles will be included in the next release!

citation-file-format/citation-file-format#112 (comment)

@Remi-Gau
Copy link
Collaborator Author

Contribution roles will be included in the next release!

citation-file-format/citation-file-format#112 (comment)

I saw that and got all excited about it!

@effigies
Copy link
Collaborator

Please see #1525 for proposed text and validation rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion ongoing discussion
Projects
None yet
9 participants