This is a supplemental resource to Leipzig et al. "The Role of Metadata in Reproducible Computational Research" now published in Cell Patterns https://www.cell.com/patterns/fulltext/S2666-3899(21)00170-7
Contributions are welcome!
├───data/
│ ├───examples/ Examples of metadata standards
│ ├───lens/ Search exports for scimetric journal analysis
│ └───standards.tsv Raw standards table
├───src/
│ ├───cwl/tools/ CWL configuration to produce the timeline plot
│ ├───manuscript/ Manuscript revision document
│ ├───secrets/
│ │ └───api.template.py Replace this with api.py using your NCBI/NCBO keys
│ ├───ontologies/ Scimetric ontology popularity analysis
│ ├───repotutils/ Scripts for automating management of this repository
│ ├───scimetric/ Scimetric journal meta/rcr frequency analysis in a Jupyter Notebook
│ ├───timeline/ R Markdown document to produce the RCR case study timeline in the paper, incl. helper files for execution with CWL (wrapper script, Dockerfile)
│ ├───wget2jsonld.py Helper script to convert wget output to jsonld
│ └───wordcloud/ R script to produce word cloud from cited abstracts
├───LICENSE The LICENSE file
├───README.md What you are looking at
├───environment.osx.yaml OSX pinned Conda depenencies
├───environment.unpinned.yaml Unpinned Conda depenencies
└───ro-crate-metadata.jsonld RO Crate config
└───.binder Environment configuration files for usage with Binder (mybinder.org)
In this table we provide links to the authoritative publications and homepages for these metadata standards, as well as examples we have collected. Schema
refers the parent structure this standard conforms to, if any. Encoding
refers to the markup format used. Note that for schemas such as OWL, which relies on RDF subject–predicate–object triplets, the encoding could be one of at least seven serialization types (RDF/XML, RDF/JSON, JSON-LD, Turtle, N-Triples, N-Quads, N3), so the listed encoding is somewhat arbitrary. For other standards, such as DICOM, the encoding is a custom binary although there are numerous export format and even attempts to serialize JSON within DICOM.
[:books:] Publication [:house:] Homepage [:clipboard:] Example
Standard | Layer | Domain | Encoding | Schema | Description |
---|---|---|---|---|---|
CellML 📚 🏠 📋 | Input | Biology | XML | RDF | mathematical models for biology |
CIF2 📚 🏠 | Input | Crystallography | Custom | atomic structure | |
DATS 📚 🏠 | Input | Biomedical | JSON | desc metadata (people, org, repo) for data pubs | |
DICOM 📚 🏠 📋 | Input | Images | Custom | Key-Value | standard for all medical imaging |
EML 📚 🏠 | Input | Ecology | XML | eco support for geo, species, pubs used in KNB | |
FAANG 🏠 | Input | Specimens | Tabular | ||
GBIF 📚 🏠 | Input | Biodiversity | JSON | ||
GO 📚 🏠 | Input | Genes | XML | ||
ISO/TC 276 🏠 | Input | Biotechnology | |||
MIAME 📚 🏠 | Input | Microarrays | XML | ||
NetCDF 📚 🏠 | Input | Arrays | |||
OGC 🏠 | Input | Geospatial | |||
ThermoML 📚 🏠 | Input | Compounds | XML | ||
CRAN 🏠 | Tools | R packages | |||
Conda 🏠 | Tools | Dependencies | |||
pip setup.cfg 🏠 | Tools | Python modules | CFG | Key-Value | Python cfg files have headers and key-value pairs similar to Windows INI files |
EDAM 📚 🏠 | Tools | Bfx data | |||
CodeMeta 🏠 | Tools | Source code | |||
Biotoolsxsd 📚 🏠 | Tools | Bfx software | XML | ||
DOAP 🏠 | Tools | Software | XML | ||
ontosoft 🏠 | Tools | Geo software | |||
SWO 📚 🏠 | Tools | Bfx Software | |||
OBCS 📚 🏠 | Reports | Biostatistics | |||
STATO 🏠 | Reports | Statistics | |||
SDMX 🏠 | Reports | Statistics | JSON | ||
DDI 🏠 | Reports | Studies | XML | ||
MEX 📚 🏠 | Reports | ML | XML | ||
MLSchema 🏠 | Reports | ML | |||
MLFlow 🏠 | Reports | ML | |||
Rmd 🏠 | Reports | Docs | YAML | Key-Value | |
CWL 📚 🏠 | Tools, Pipelines | YAML | Schema Salad | Common Workflow Language specifies how to invoke a command line tool or a pipeline of such tools | |
CWLProv 📚 🏠 | Pipelines | YAML, JSON, XML | BagIt of Research Object folder containing manifest (JSON-LD), CWL (YAML), PROV (JSON, XML, RDF) | ||
RO-Crate 🏠 | Input, Pipelines, Publication | JSON-LD | RDF, schema.org | RO-Crate is a profile of using schema.org to annotate any collections of research data and their real-life origins | |
RO 🏠 | Pipelines | Turtle, JSON-LD, XML | OWL | ||
WICUS 🏠 | Pipelines | ||||
OPM 🏠 | Pipelines | ||||
PROV-O 🏠 | Pipelines | OWL | Several PROV serializations exists; PROV-O is in OWL, which again has many serializations including the RDF syntaxes | ||
ReproZIp 🏠 | Pipelines | ||||
ProvOne 🏠 | Pipelines | ||||
WES | Pipelines | ||||
BagIt 🏠 | Input, Pipelines | Text | Key-Value | For long-term perservation and availability BagIt specifies a fixed folder structure of payload files, their checksums and other metadata tag files. Bags can be archived as zip, tar, etc or remain folders | |
BCO | Pipelines | ||||
ERC 📚 🏠 | Pipelines | Research Compendia | YAML | Key-Value | |
BEL | Publication | ||||
DC | Publication | ||||
JATS 🏠 | Publication | Articles | XML | Tags DTD | |
ONIX | Publication | ||||
MeSH | Publication | ||||
LCSH | Publication | ||||
MP 📚 | Publication | Micropublications | OWL | ||
Open PHACTS 📚 🏠 | Publication | Drugs | RDF | ||
SWAN 📚 | Publication | Neuromedicine | |||
SPAR 🏠 | Publication | Publishing | OWL | ||
PWO 📚 | Publication | Publishing | |||
PAV 📚 | Publication | Authorship | OWL | ||
Manubot 📋 | Publication | Publishing | YAML | ||
ReScience 📋 | Publication | Publishing | YAML | ||
PandocScholar 📋 | Publication | Publishing | YAML |
RDF vs OWL https://stackoverflow.com/questions/1740341/what-is-the-difference-between-rdf-and-owl
Install cwltool
pip install cwltool
cwltool src/cwl/tools/timeline.cwl --reportfile timeline.html
Note that the tools requires Docker for runningthe computing environment, see the file timeline/Dockerfile
for the definition of the image used in the .cwl
file.
MyBinder is a tool for creating executable computing environments based on standard and widely used dependency management files.
You can easily run important parts of the analysis for the manuscript by clicking on the badges below.
Binder will create a container using the environment configuration from the directory .binder/
and provide you with an interactive environment to execute notebooks or scripts.
- Scimetric journal frequency analysis of RCR and metadata terms (opens a Jupyter Notebook)
- Create Figure 2 from the paper (R Markdown notebook, open the file
src/timeline/timeline.Rmd
manually in RStudio) - Create word cloud from cited abstracts (run R script
src/wordcloud/wordcloud.R
)
For development purposes, you can also run repo2docker
locally in the directory of the repository.
repo2docker --editable .