From 5ea6106cb32a7cf8f2de7cd22b469f6f77f6767f Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Fri, 28 Jun 2024 15:40:06 +0100 Subject: [PATCH 1/5] move developer guide and mapping info to dedicated files --- docs/developer_guide.md | 63 ++++++++ mapping-notes.md => docs/mapping-notes.md | 0 docs/mapping.md | 188 ++++++++++++++++++++++ 3 files changed, 251 insertions(+) create mode 100644 docs/developer_guide.md rename mapping-notes.md => docs/mapping-notes.md (100%) create mode 100644 docs/mapping.md diff --git a/docs/developer_guide.md b/docs/developer_guide.md new file mode 100644 index 0000000..1faaaa1 --- /dev/null +++ b/docs/developer_guide.md @@ -0,0 +1,63 @@ +# Developer Guide + +## Setup + +### Set up the environmental variables +1. copy and rename `.env.template` to `.env` in the same folder +1. open `.env` with a text editor and fill in your API key in the `INVENIORDM_API_KEY` variable +1. fill in the InvenioRDM base URL in the `INVENIORDM_BASE_URL` variable + - in case of Zenodo Sandbox: use `https://sandbox.zenodo.org/` + - in case of TU Wien test instance: use `https://test.researchdata.tuwien.ac.at/` +1. Run `source .env` to set the environment variables for the session + +If you prefer to set the environment variables `INVENIORDM_API_KEY` and `INVENIORDM_BASE_URL` another way (e.g. in `~/.bashrc`), you can do that instead. However, the `.env` file must also be configured as it is used by `pytest`. + +### Set up the Python environment + +If you do not already have `poetry` installed, install it following the [Poetry installation documentation](https://python-poetry.org/docs/#installation). + +Then install dependencies from `poetry.lock`: + +```bash +cd ro-crate-inveniordm +poetry install +``` + +Activate the virtual environment: +```bash +poetry shell +``` + +## Run tests + +Beware that tests can make Zenodo uploads using your access token. + +In the root directory: +```bash +pytest +``` + +## Publish a release + +1. Update the version in `pyproject.toml` +2. Make a git tag for the release and push it to GitHub +3. Run `poetry build` +4. Run `poetry publish -u -p ` +5. Create a release on GitHub, including the build artifacts + +## Project structure + +The project consists of the following structure: + +- `/src/rocrate_inveniordm/`: Source code for the package + - `mapping/`: Contains code for the mapping process + - `converter.py`: Python script used to map between RO-Crates and DataCite. Not to be called by the user. + - `mapping.json`: Encodes the mapping between RO-Crates and DataCite. See [Mapping](docs/mapping.md) for more. + - `condition_functions.py`: Defines functions used for the mapping. See [Conditon Functions](docs/mapping.md#condition-functions) for more. + - `processing_functions.py`: Defines functions used for the mapping. See [Processing Functions](docs/mapping.md#processing-functions) for more. + - `upload/`: Contains code for the upload process + - `uploader.py`: Python script used to upload the files to the InvenioRDM. Not to be called by the user. + - `deposit.py`: Starting point. Used to map and upload the RO-Crate directory. +- `.env.template`: Template file for the environment variables. +- `/docs`: contains documentation +- `/test`: contains tests and test data diff --git a/mapping-notes.md b/docs/mapping-notes.md similarity index 100% rename from mapping-notes.md rename to docs/mapping-notes.md diff --git a/docs/mapping.md b/docs/mapping.md new file mode 100644 index 0000000..e5cd7f3 --- /dev/null +++ b/docs/mapping.md @@ -0,0 +1,188 @@ +# Mapping + +The project aims at decoupling the definition of the mapping between RO-Crates and DataCite from code. This means, that users can quickly change/add/remove mapping rules without code changes. + +Relative to the root folder of the package `src/rocrate_inveniordm/`, the mapping is implemented in `mapping/converter.py`. The mapping rules are defined in `mapping/mapping.json`. Processing functions and condition functions are defined in `mapping/processing_functions.py` and `condition_functions.py`, respectively. + +A textual description including shortcomings and assumptions of the mapping can be found in [mapping-notes.md](./mapping-notes.md). + +## Mapping format + +The mapping is defined in [/src/rocrate_inveniordm/mapping/mapping.json](../src/rocrate_inveniordm/mapping/mapping.json) and consists of **Mapping Collections** and **Mapping Rules**. + +### Mapping Collections + +A Mapping Collection bundles different mapping rules together, e.g. rules that define the mapping between `author` in RO-Crates and `creators` in DataCite. Each mapping collection contains the following keys: + +| Key | Description | Possible values | Mandatory? | +|---------------|-------------- | --------------- |-------------| +| `mappings` | contains the mapping rules | mapping rules | yes (unless `_ignore` is present) | +| `_ignore` | ignores the mapping rule if present | any | no | +| `ifNonePresent` | in case no mapping rule is applied, the value defined here is applied | see below | no + +#### `ifNonePresent` + +`ifNonePresent` can be used to specify what happens if no Mapping Rule of the defined Mapping Rules in the current Mapping Collection is applied. The value of the field is an array of the following form: + +```json +{ + "": "" +} +``` + +In case no Mapping Rule is applied, the value specified in `` is applied to the field defined by `` in the DataCite. + +### Mapping Rules + +A Mapping Rule defines which fields from RO-Crates are mapped to which fields in DataCite. + +Each rule may contain the following keys: + + +| Key | Description | Possible values | Mandatory? | +|---------------|-------------- | --------------- |-------------| +| `from` | defines the source in the RO-Crates file | query string (see below) | yes | +| `to` | defines the target in the DataCite file | query string (see below) | yes | +| `value` | allows value transformations | may be a string, array, or object | no | +| `processing` | uses a processing function | string starting with `$` and referencing an existing processing function | no | +| `onlyIf` | uses a condition function | string starting with `?` and referencing an existing condition function | no | +| `_ignore` | ignores the rule if present | any | no | + +### `from` and `to` querying + +To define the mapping between RO-Crates and DataCite, it is necessary to specify which field in RO-Crates is mapped to which field in DataCite. This is achieved by specifying the `from` and `to` fields in a Mapping Rule. + +**Example** + +Given the following RO-Crates metadata file: + +```json +{ + "@context": "https://w3id.org/ro/crate/1.1/context", + "@graph": [ + { + "@type": "CreativeWork", + "@id": "ro-crate-metadata.json", + "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}, + "about": {"@id": "./"} + }, + { + "@id": "./", + "@type": "Dataset", + "name": "Name", + "author": {"@id": "https://orcid.org/0000-0002-8367-6908"} + }, + { + "@id": "https://orcid.org/0000-0002-8367-6908", + "@type": "Person", + "name": "J. Xuan" + } + ] +} +``` + +Speficifying the `title` field is achieved with `title`. In case the value of a key refers to another object, such as in the case of authors, querying is done using the `$` charater. Refering to the `name` field of an `author` is done using `$author.name`. It is important to note, that the `author` field may be an array. Therefore, it is necessary to mark this as a possible array. Refering to this value can be done by using the `[]` characters, i.e., `$author[].name`. + +Specifying the DataCite field is done in a similar fashion. + + + +### Processing functions + +Processing functions are functions that are applied to the raw source value extracted from the RO-Crates metadata file. When a processing function wants to be applied to a mapping rule, the `processing` entry is assigned the value `$`. The function then needs to be implemented in `/mapping/processing_functions.py`. + +**Example** + +Given is the following mapping of the author type: + +```json +"person_or_org_type_mapping": { + "from": "$author.@type", + "to": "metadata.creators[].person_or_org.type", + "processing": "$authorProcessing" +} +``` + +The value `Person` in the RO-Crates metadata file should be mapped to the value `personal`. Also, the value `Organization` should be mapped to the value `organizational`. The function `authorProcessing` can now be implemented in `/mapping/processing_functions.py` to achieve this logic. Note that the value of the `processing` key in the mapping rule and the function name need to coincide: + +```py +def authorProcessing(value): + if value == "Person": + return "personal" + elif value == "Organization": + return "organizational" + else: + return "" +``` + + +### Condition functions + +Condition functions are similar to processing functions. Condition functions can be used to restrict when a mapping rule should be executed. The mapping is executed, if the function defined in the `onlyIf` key returns true. + +**Example** + +The mapping of DOI identifiers looks as follows: + +```json +"alternate_mapping": { + "from": "identifier", + "to": "metadata.identifiers[]", + "value": { + "scheme": "doi", + "identifier": "@@this" + }, + "processing": "$doi_processing", + "onlyIf": "?doi" +} +``` + +The mapping should only be executed, if the value in the `identifier` field in the RO-Crates metadata file is indeed a DOI identifier. This check can be achieved by defining the `doi` function in `/mapping/condition_functions.py`. Note that the value of the `onlyIf` key in the mapping rule and the function name need to coincide: + +```py +def doi(value): + return value.startswith("https://doi.org/") +``` + + +### Value formatting + +A value can also be formatted, e.g. as needed when a value in RO-Crate needs to be transformed to another value in DataCite. Although this can also be achived using a processing function, value transformations provide an easier alternative. Every occurence of `@@this` is replaced by the source value. + +**Example** + +Given the following mapping rule: +```json +"languages_mapping_direct": { + "from": "inLanguage", + "to": "metadata.languages[]", + "value": { + "id": "@@this" + } +} +``` + +The RO-Crate entry +```json +... +"inLanguage": "en" +... +``` + +is transferred into + +```json +"metadata": { + "languages": [ + { + "id": "en" + } + ] +} +``` + +### Flow + +This figure illustrates how the functions that are applied in a mapping rule. + +![](./images/mapping_rule_flow.svg) From dd98e53433494054134110f97a6301518f0de494 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Fri, 28 Jun 2024 15:41:27 +0100 Subject: [PATCH 2/5] remove developer info from readme --- README.md | 224 +----------------------------------------------------- 1 file changed, 2 insertions(+), 222 deletions(-) diff --git a/README.md b/README.md index f47da4a..9935436 100644 --- a/README.md +++ b/README.md @@ -54,236 +54,16 @@ First, run the program with the `--no-upload` option, to create the DataCite fil After verifying and adjusting the DataCite file, use the `-d` option to tell the program to use this file for upload and skip the process of conversion: -`python3 deposit.py -d `. - - -### Further options - -``` -usage: deposit.py [-h] [-d DATACITE] [--no-upload] [-o] [-p] [-z] ro_crate_directory - -Takes a RO-Crate directory as input and uploads it to an InvenioRDM repository - -positional arguments: - ro_crate_directory Path to the RO-Crate directory to upload - -options: - -h, --help show this help message and exit - -d DATACITE, --datacite DATACITE - Path to a DataCite metadata file to use for the upload. Skips the conversion process from RO-Crate metadata to DataCite - --no-upload Stop before creating InvenioRDM record and do not upload files. Use this option to create a DataCite metadata file for manual - review - -o, --omit-roc-files Omit files named 'ro-crate-metadata.json' and directories/files containing 'ro-crate-preview' from the upload (not - recommended) - -p, --publish Publish the record after uploading - -z, --zip Instead of uploading all the files within the crate, create and upload a single zip file containing the whole crate -``` - -## File structure - -The project consists of the following structure: - -- `/mapping`: Contains code for the mapping process - - `converter.py`: Python script used to map between RO-Crates and DataCite. Not to be called by the user. - - `mapping.json`: Encodes the mapping between RO-Crates and DataCite. See [Mapping](#mapping) for more. - - `condition_functions.py`: Defines functions used for the mapping. See [Conditon Functions](#condition-functions) for more. - - `processing_functions.py`: Defines functions used for the mapping. See [Processing Functions](#processing-functions) for more. -- `/upload`: Contains code for the upload process - - `uploader.py`: Python script used to upload the files to the InvenioRDM. Not to be called by the user. -- `deposit.py`: Starting point. Used to map and upload the RO-Crate directory. -- `.env.template`: Template file for the environment variables. -- `/test`: contains tests and test data +`rocrate_inveniordm -d `. ## Mapping The project aims at decoupling the definition of the mapping between RO-Crates and DataCite from code. This means, that users can quickly change/add/remove mapping rules without code changes. -The mapping is implemented in `/mapping/converter.py`. The mapping rules are defined in `/mapping/mapping.json`. Processing functions and condition functions are defined in `/mapping/processing_functions.py` and `condition_functions.py`, respectively. A textual description including shortcomings and assumptions of the mapping can be found in [mapping-notes.md](./mapping-notes.md). - -### Mapping format - -The mapping is defined in `/mapping/mapping.json` and consists of **Mapping Collections** and **Mapping Rules**. - -#### Mapping Collections - -A Mapping Collection bundles different mapping rules together, e.g. rules that define the mapping between `author` in RO-Crates and `creators` in DataCite. Each mapping collection contains the following keys: - -| Key | Description | Possible values | Mandatory? | -|---------------|-------------- | --------------- |-------------| -| `mappings` | contains the mapping rules | mapping rules | yes (unless `_ignore` is present) | -| `_ignore` | ignores the mapping rule if present | any | no | -| `ifNonePresent` | in case no mapping rule is applied, the value defined here is applied | see below | no - -##### `ifNonePresent` - -`ifNonePresent` can be used to specify what happens if no Mapping Rule of the defined Mapping Rules in the current Mapping Collection is applied. The value of the field is an array of the following form: - -```json -{ - "": "" -} -``` - -In case no Mapping Rule is applied, the value specified in `` is applied to the field defined by `` in the DataCite. - -#### Mapping Rules - -A Mapping Rule defines which fields from RO-Crates are mapped to which fields in DataCite. - -Each rule may contain the following keys: - - -| Key | Description | Possible values | Mandatory? | -|---------------|-------------- | --------------- |-------------| -| `from` | defines the source in the RO-Crates file | query string (see below) | yes | -| `to` | defines the target in the DataCite file | query string (see below) | yes | -| `value` | allows value transformations | may be a string, array, or object | no | -| `processing` | uses a processing function | string starting with `$` and referencing an existing processing function | no | -| `onlyIf` | uses a condition function | string starting with `?` and referencing an existing condition function | no | -| `_ignore` | ignores the rule if present | any | no | - -#### `from` and `to` querying - -To define the mapping between RO-Crates and DataCite, it is necessary to specify which field in RO-Crates is mapped to which field in DataCite. This is achieved by specifying the `from` and `to` fields in a Mapping Rule. - -**Example** - -Given the following RO-Crates metadata file: - -```json -{ - "@context": "https://w3id.org/ro/crate/1.1/context", - "@graph": [ - { - "@type": "CreativeWork", - "@id": "ro-crate-metadata.json", - "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"}, - "about": {"@id": "./"} - }, - { - "@id": "./", - "@type": "Dataset", - "name": "Name", - "author": {"@id": "https://orcid.org/0000-0002-8367-6908"} - }, - { - "@id": "https://orcid.org/0000-0002-8367-6908", - "@type": "Person", - "name": "J. Xuan" - } - ] -} -``` - -Speficifying the `title` field is achieved with `title`. In case the value of a key refers to another object, such as in the case of authors, querying is done using the `$` charater. Refering to the `name` field of an `author` is done using `$author.name`. It is important to note, that the `author` field may be an array. Therefore, it is necessary to mark this as a possible array. Refering to this value can be done by using the `[]` characters, i.e., `$author[].name`. - -Specifying the DataCite field is done in a similar fashion. - - - -#### Processing functions - -Processing functions are functions that are applied to the raw source value extracted from the RO-Crates metadata file. When a processing function wants to be applied to a mapping rule, the `processing` entry is assigned the value `$`. The function then needs to be implemented in `/mapping/processing_functions.py`. - -**Example** - -Given is the following mapping of the author type: - -```json -"person_or_org_type_mapping": { - "from": "$author.@type", - "to": "metadata.creators[].person_or_org.type", - "processing": "$authorProcessing" -} -``` - -The value `Person` in the RO-Crates metadata file should be mapped to the value `personal`. Also, the value `Organization` should be mapped to the value `organizational`. The function `authorProcessing` can now be implemented in `/mapping/processing_functions.py` to achieve this logic. Note that the value of the `processing` key in the mapping rule and the function name need to coincide: - -```py -def authorProcessing(value): - if value == "Person": - return "personal" - elif value == "Organization": - return "organizational" - else: - return "" -``` - - -#### Condition functions - -Condition functions are similar to processing functions. Condition functions can be used to restrict when a mapping rule should be executed. The mapping is executed, if the function defined in the `onlyIf` key returns true. - -**Example** - -The mapping of DOI identifiers looks as follows: - -```json -"alternate_mapping": { - "from": "identifier", - "to": "metadata.identifiers[]", - "value": { - "scheme": "doi", - "identifier": "@@this" - }, - "processing": "$doi_processing", - "onlyIf": "?doi" -} -``` - -The mapping should only be executed, if the value in the `identifier` field in the RO-Crates metadata file is indeed a DOI identifier. This check can be achieved by defining the `doi` function in `/mapping/condition_functions.py`. Note that the value of the `onlyIf` key in the mapping rule and the function name need to coincide: - -```py -def doi(value): - return value.startswith("https://doi.org/") -``` - - -#### Value formatting - -A value can also be formatted, e.g. as needed when a value in RO-Crate needs to be transformed to another value in DataCite. Although this can also be achived using a processing function, value transformations provide an easier alternative. Every occurence of `@@this` is replaced by the source value. - -**Example** - -Given the following mapping rule: -```json -"languages_mapping_direct": { - "from": "inLanguage", - "to": "metadata.languages[]", - "value": { - "id": "@@this" - } -} -``` - -The RO-Crate entry -```json -... -"inLanguage": "en" -... -``` - -is transferred into - -```json -"metadata": { - "languages": [ - { - "id": "en" - } - ] -} -``` - -#### Flow - -This figure illustrates how the functions that are applied in a mapping rule. - -![](./images/mapping_rule_flow.svg) +For more information, see [Mapping](docs/mapping.md). ## Results - ### Minimal RO-Crate The result of uploading the minimal RO-Crate as shown on [https://www.researchobject.org/ro-crate/1.1/root-data-entity.html#minimal-example-of-ro-crate](https://www.researchobject.org/ro-crate/1.1/root-data-entity.html#minimal-example-of-ro-crate) ([`/test/minimal-ro-crate`](./test/minimal-ro-crate/)) leads to the following result: From eab5d7bc3c4f6590ce83cc9febb58d99d1f8ce57 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Fri, 28 Jun 2024 15:42:04 +0100 Subject: [PATCH 3/5] update CLI docs --- README.md | 51 +++++++++++++++++++++++++++++++++------------------ 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/README.md b/README.md index 9935436..e4cc0bc 100644 --- a/README.md +++ b/README.md @@ -1,12 +1,14 @@ -# RO-Crates Data Deposit +# RO-Crate InvenioRDM Deposit -[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8127644.svg)](https://doi.org/10.5281/zenodo.8127644) -Command line tool to deposit a [RO-Crate directory](https://www.researchobject.org/ro-crate/) to an [InvenioRDM](https://inveniordm.web.cern.ch/). + +Command line tool to deposit an [RO-Crate](https://www.researchobject.org/ro-crate/) to an [InvenioRDM](https://inveniordm.web.cern.ch/) repository. + +Originally developed as [`ro-crates-deposit`](https://github.com/beerphilipp/ro-crates-deposit) by Philipp Beer and Milan Szente. [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.8127644.svg)](https://doi.org/10.5281/zenodo.8127644) ## Requirements -- [`Python 3.x`](https://www.python.org/downloads/) +- [`Python 3.8.1`](https://www.python.org/downloads/) or higher ## Setup @@ -22,35 +24,48 @@ Command line tool to deposit a [RO-Crate directory](https://www.researchobject.o ![Screenshot of token creation page on TU Wien instance](./images/researchdata.png) ### Set up the environmental variables -1. copy and rename `.env.template` to `.env` in the same folder -1. open `.env` with a text editor and fill in your API key in the `INVENIORDM_API_KEY` variable -1. fill in the InvenioRDM base URL in the `INVENIORDM_BASE_URL` variable - - in case of Zenodo Sandbox: use `https://sandbox.zenodo.org/` - - in case of TU Wien test instance: use `https://test.researchdata.tuwien.ac.at/` -1. Run `source .env` to set the environment variables for the session -If you prefer to set the environment variables `INVENIORDM_API_KEY` and `INVENIORDM_BASE_URL` another way (e.g. in `~/.bashrc`), you can do that instead. +The package requires two environment variables to be set: +1. `INVENIORDM_BASE_URL` – the URL of your preferred InvenioRDM instance, e.g. `"https://sandbox.zenodo.org/"` +2. `INVENIORDM_API_KEY` – the API token you created in the section above. + + Run the following lines to set the environment variables: +```bash +export INVENIORDM_BASE_URL="your_preferred_instance_url" +export INVENIORDM_API_KEY="your_api_key" +``` + +You can also add the lines to your `~/.bashrc` file so that they are set whenever you start a shell, if you plan to use the package regularly. -### Set up the Python environment -Run `python3 -m pip install -r requirements.txt` +If you want to change your target InvenioRDM instance, you can set the environment variables again using the same code. ## Usage ### General usage -Run `python3 deposit.py ` with `` being the path to the RO-Crate directory. The record is saved as a draft and not published. +Run `rocrate_inveniordm ` with `` being the path to the RO-Crate directory. The record is saved as a draft and not published. + +You can publish the record through the web interface of your chosen instance, or you can instead run the same command with the `-p` option to publish the record. + +Additional options can be found by running `rocrate_inveniordm --help`: + +### Uploading as a zip file -Run the same command with the `-p` option to publish the record. +Some repositories use a "flat" structure for records, where all the files are stored in the root of the archive and there is no directory structure. To preserve a directory structure, which can be important for the RO-Crate metadata file to be accurate, you can upload the crate as a single zip file instead – this can be achieved with the `-z` option (you do not need to pre-zip your crate). For example, + +``` +rocrate_inveniordm -z test-ro-crate +``` -Run `python3 deposit.py -h` for help. +will result in an uploaded file called `test-ro-crate.zip`. ### Manually verifying DataCite conversion before upload -This tool is a *best-effort* approach. After converting the metadata file, the resulting DataCite file is stored as `datacite-out.json` in the root directory. Users can adjust the generated DataCite file as needed, and can run the program in two stages to facilitate this: +This tool is a *best-effort* approach. After converting the metadata file, the resulting DataCite file is stored as `datacite-out.json` in the root directory. You can adjust the generated DataCite file as needed, and can run the program in two stages to facilitate this: First, run the program with the `--no-upload` option, to create the DataCite file without uploading anything to InvenioRDM: -`python3 deposit.py --no-upload `. +`rocrate_inveniordm --no-upload `. After verifying and adjusting the DataCite file, use the `-d` option to tell the program to use this file for upload and skip the process of conversion: From 92500eb8943db5f061e5653aa83e597761aee70d Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Fri, 28 Jun 2024 15:57:10 +0100 Subject: [PATCH 4/5] rename mapping-notes and make its purpose clearer --- README.md | 4 +++- docs/{mapping-notes.md => all-mappings.md} | 8 ++++++-- docs/mapping.md | 8 ++++---- 3 files changed, 13 insertions(+), 7 deletions(-) rename docs/{mapping-notes.md => all-mappings.md} (87%) diff --git a/README.md b/README.md index e4cc0bc..cee61b6 100644 --- a/README.md +++ b/README.md @@ -75,7 +75,9 @@ After verifying and adjusting the DataCite file, use the `-d` option to tell the The project aims at decoupling the definition of the mapping between RO-Crates and DataCite from code. This means, that users can quickly change/add/remove mapping rules without code changes. -For more information, see [Mapping](docs/mapping.md). +To find out how each piece of RO-Crate metadata is converted to DataCite, see [How RO-Crate metadata is mapped to DataCite](docs/all-mappings.md). + +For technical details on the implementation, see [Mapping](docs/mapping.md). ## Results diff --git a/docs/mapping-notes.md b/docs/all-mappings.md similarity index 87% rename from docs/mapping-notes.md rename to docs/all-mappings.md index 51319d0..6f0e733 100644 --- a/docs/mapping-notes.md +++ b/docs/all-mappings.md @@ -1,4 +1,8 @@ -# Notes on Mapping +# How RO-Crate metadata is mapped to DataCite + +DataCite is the data format used by InvenioRDM when uploading a record through the API. This document describes how different parts of the RO-Crate metadata are converted into the DataCite format. + +Note that RO-Crate and DataCite each contain features that the other does not have, so it is difficult to create a fully accurate mapping and some information may be lost along the way. You should always check the outputs to ensure their accuracy before publishing your record. ## Mapping of resource type @@ -23,7 +27,7 @@ - in case `name` does not exist, it falls back to using the value of `@alternativeName` - in case neither of those exist, `title` is assigned `:unkn` -## Mapping of additional title +## Mapping of additional title - `@alternativeName` is mapped to `additional_titles` - a new array entry in `additional_titles` is added diff --git a/docs/mapping.md b/docs/mapping.md index e5cd7f3..c0f7381 100644 --- a/docs/mapping.md +++ b/docs/mapping.md @@ -4,7 +4,7 @@ The project aims at decoupling the definition of the mapping between RO-Crates a Relative to the root folder of the package `src/rocrate_inveniordm/`, the mapping is implemented in `mapping/converter.py`. The mapping rules are defined in `mapping/mapping.json`. Processing functions and condition functions are defined in `mapping/processing_functions.py` and `condition_functions.py`, respectively. -A textual description including shortcomings and assumptions of the mapping can be found in [mapping-notes.md](./mapping-notes.md). +A human-friendly description of all mappings implemented, including shortcomings and assumptions, can be found in [all-mappings.md](./all-mappings.md). ## Mapping format @@ -89,7 +89,7 @@ Specifying the DataCite field is done in a similar fashion. ### Processing functions -Processing functions are functions that are applied to the raw source value extracted from the RO-Crates metadata file. When a processing function wants to be applied to a mapping rule, the `processing` entry is assigned the value `$`. The function then needs to be implemented in `/mapping/processing_functions.py`. +Processing functions are functions that are applied to the raw source value extracted from the RO-Crates metadata file. When a processing function wants to be applied to a mapping rule, the `processing` entry is assigned the value `$`. The function then needs to be implemented in `mapping/processing_functions.py`. **Example** @@ -103,7 +103,7 @@ Given is the following mapping of the author type: } ``` -The value `Person` in the RO-Crates metadata file should be mapped to the value `personal`. Also, the value `Organization` should be mapped to the value `organizational`. The function `authorProcessing` can now be implemented in `/mapping/processing_functions.py` to achieve this logic. Note that the value of the `processing` key in the mapping rule and the function name need to coincide: +The value `Person` in the RO-Crates metadata file should be mapped to the value `personal`. Also, the value `Organization` should be mapped to the value `organizational`. The function `authorProcessing` can now be implemented in `mapping/processing_functions.py` to achieve this logic. Note that the value of the `processing` key in the mapping rule and the function name need to coincide: ```py def authorProcessing(value): @@ -137,7 +137,7 @@ The mapping of DOI identifiers looks as follows: } ``` -The mapping should only be executed, if the value in the `identifier` field in the RO-Crates metadata file is indeed a DOI identifier. This check can be achieved by defining the `doi` function in `/mapping/condition_functions.py`. Note that the value of the `onlyIf` key in the mapping rule and the function name need to coincide: +The mapping should only be executed, if the value in the `identifier` field in the RO-Crates metadata file is indeed a DOI identifier. This check can be achieved by defining the `doi` function in `mapping/condition_functions.py`. Note that the value of the `onlyIf` key in the mapping rule and the function name need to coincide: ```py def doi(value): From da08f2a8793f64cacb18bd76d453fb06f5a1a967 Mon Sep 17 00:00:00 2001 From: Eli Chadwick Date: Mon, 1 Jul 2024 13:16:37 +0100 Subject: [PATCH 5/5] clarify outputs --- README.md | 12 +++++++----- docs/developer_guide.md | 9 +++++++-- 2 files changed, 14 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index cee61b6..2854cd4 100644 --- a/README.md +++ b/README.md @@ -43,11 +43,9 @@ If you want to change your target InvenioRDM instance, you can set the environme ### General usage -Run `rocrate_inveniordm ` with `` being the path to the RO-Crate directory. The record is saved as a draft and not published. +Run `rocrate_inveniordm ` with `` being the path to the RO-Crate directory. This will upload the record (including files and metadata) as a draft to your chosen InvenioRDM instance, and save the generated DataCite metadata as `datacite-out.json` in the current directory. -You can publish the record through the web interface of your chosen instance, or you can instead run the same command with the `-p` option to publish the record. - -Additional options can be found by running `rocrate_inveniordm --help`: +By default, the record is not published after upload. You can publish the record through the web interface of your chosen instance, or you can instead run the same command with the `-p` option to re-upload the crate into a new record and publish it immediately. ### Uploading as a zip file @@ -63,7 +61,7 @@ will result in an uploaded file called `test-ro-crate.zip`. This tool is a *best-effort* approach. After converting the metadata file, the resulting DataCite file is stored as `datacite-out.json` in the root directory. You can adjust the generated DataCite file as needed, and can run the program in two stages to facilitate this: -First, run the program with the `--no-upload` option, to create the DataCite file without uploading anything to InvenioRDM: +First, run the program with the `--no-upload` option, to create the DataCite file in the current directory without uploading anything to InvenioRDM: `rocrate_inveniordm --no-upload `. @@ -71,6 +69,10 @@ After verifying and adjusting the DataCite file, use the `-d` option to tell the `rocrate_inveniordm -d `. +### Other options + +Additional options can be found by running `rocrate_inveniordm --help`. + ## Mapping The project aims at decoupling the definition of the mapping between RO-Crates and DataCite from code. This means, that users can quickly change/add/remove mapping rules without code changes. diff --git a/docs/developer_guide.md b/docs/developer_guide.md index 1faaaa1..b9aafb7 100644 --- a/docs/developer_guide.md +++ b/docs/developer_guide.md @@ -10,16 +10,21 @@ - in case of TU Wien test instance: use `https://test.researchdata.tuwien.ac.at/` 1. Run `source .env` to set the environment variables for the session -If you prefer to set the environment variables `INVENIORDM_API_KEY` and `INVENIORDM_BASE_URL` another way (e.g. in `~/.bashrc`), you can do that instead. However, the `.env` file must also be configured as it is used by `pytest`. +The `.env` file must always be configured as it is used by `pytest`. However, for the final step, if you prefer to set the environment variables `INVENIORDM_API_KEY` and `INVENIORDM_BASE_URL` another way (e.g. in `~/.bashrc`), you can do that instead. ### Set up the Python environment +Clone the repository: +``` +git clone git@github.com:ResearchObject/ro-crates-deposit.git +cd ro-crate-inveniordm +``` + If you do not already have `poetry` installed, install it following the [Poetry installation documentation](https://python-poetry.org/docs/#installation). Then install dependencies from `poetry.lock`: ```bash -cd ro-crate-inveniordm poetry install ```