From 96c65c24f0f30c4f7c328175e65b7510c35b12bb Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Tue, 2 Jul 2024 17:23:08 +0200 Subject: [PATCH 01/14] correct allowed paths for tool description #110 --- ARC specification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ARC specification.md b/ARC specification.md index c7c0393..58324a3 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -186,7 +186,7 @@ Notes: Workflow execution and metadata MUST be described using the [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, in a file `workflow.cwl`, which MUST be placed in the subdirectory containing all files specific to this workflow under the top-level `workflows` subdirectory. This file MUST contain either of: -- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the workflow subdirectory. All paths used within the tool description MUST be relative to itself. +- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the ARC root directory. All paths used within the tool description MUST be relative to itself. - A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/22-nested-workflows/index.html), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. From e538379494212a9f2bfd1b84b8740ed661fd6a0c Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Tue, 2 Jul 2024 22:27:04 +0200 Subject: [PATCH 02/14] fix clw user guide links --- ARC specification.md | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 58324a3..89fe411 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -188,7 +188,7 @@ Workflow execution and metadata MUST be described using the [Common Workflow Lan - A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the ARC root directory. All paths used within the tool description MUST be relative to itself. -- A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/22-nested-workflows/index.html), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. +- A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/topics/workflows.html#nested-workflows), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. Notes: @@ -196,11 +196,13 @@ Notes: - While workflows typically are (and should be) *generic*, i.e. a single workflow can be applied to different data of the same type, this is not a requirement. It is allowed to hard-code assay file paths and other parameters if workflow reusability is not a priority. -- It is highly recommended that tool descriptions contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/07-containers/index.html) container description. +- It is highly recommended that tool descriptions contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/topics/using-containers.html) container description. - It is expected that workflow and tool descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. -- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). +### Metadata + +- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as [CWL metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). ## Run Description @@ -218,7 +220,7 @@ Notes: - It is expected that run descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. -- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). +- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). ## Additional Payload From fc95c795679f8da326348f25ad226272f9be13d3 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Tue, 2 Jul 2024 22:43:47 +0200 Subject: [PATCH 03/14] add metadata section to run and workflow #110 --- ARC specification.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 89fe411..db9b6bd 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -200,9 +200,15 @@ Notes: - It is expected that workflow and tool descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. -### Metadata +### Workflow Metadata -- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as [CWL metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). +- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) + +- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as CWL metadata. + + - The referenced authors and contributors must be the ones involved in the creation of the tool description or workflow description, not the person executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements). + +- It is encouraged, to add metadata relevant to the tool description or workflow description. This metadata must be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters must be added to the `run.yml` parameter file. ## Run Description @@ -220,7 +226,15 @@ Notes: - It is expected that run descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. -- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). +### Run Metadata + +- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) + +- It is strongly encouraged to include author and contributor metadata in `run.yml` parameter files as CWL metadata. + + - The referenced authors and contributors must be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. + + - It is encouraged, to add metadata relevant to the `run.yml` parameter file. This metadata must be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit must be added to the corresponding `.cwl` file. ## Additional Payload From 205c0c2d03b647a4d1b32ca4744f6dfa2ba1fa8c Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Thu, 18 Jul 2024 11:11:12 +0200 Subject: [PATCH 04/14] update arc example structure (cwl files) --- ARC specification.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index db9b6bd..4190ce0 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -96,9 +96,7 @@ Note: ``` -| isa.investigation.xlsx -| arc.cwl [optional] -| arc.yml [optional] +| isa.investigation.xlsx \--- studies \--- | isa.study.xlsx @@ -118,8 +116,8 @@ Note: \--- runs \--- | [files;...] (different output files) - | run.cwl - | run.yml [optional] + | run.cwl + | run.yml ``` ## ARC Representation From 929a9509f6a8266e5cef0a9b84211cffce2f518b Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Thu, 18 Jul 2024 11:14:39 +0200 Subject: [PATCH 05/14] link to example arc structure for file locations --- ARC specification.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/ARC specification.md b/ARC specification.md index 4190ce0..994e492 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -188,6 +188,8 @@ Workflow execution and metadata MUST be described using the [Common Workflow Lan - A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/topics/workflows.html#nested-workflows), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. +The file locations can be seen in the [Example ARC structure](#example-arc-structure). + Notes: - There are no requirements on the structure or granularity of workflows. An ARC may contain no workflows at all if it contains no [run results](#run-description), or MAY utilize a single workflow to generate a single run result containing all computational output. From b6f6e1ba078acbd572d3c9c61628267cfd0d8150 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Thu, 18 Jul 2024 11:27:29 +0200 Subject: [PATCH 06/14] replace root arc.cwl with run.cwl --- ARC specification.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 994e492..bdc3016 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -26,7 +26,7 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use - [Additional Payload](#additional-payload) - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) - [Investigation and Study Metadata](#investigation-and-study-metadata) - - [Top-Level Run Description](#top-level-run-description) + - [Individual Run Description](#individual-run-description) - [Data Path Annotation](#data-path-annotation) - [Examples](#examples) - [General Pattern](#general-pattern) @@ -84,7 +84,7 @@ Each ARC is a directory containing the following elements: - *Runs* capture data products (i.e., outputs of computational analyses) derived from assays, other runs, or study materials using workflows (located in the aforementioned *workflows* subdirectory). Each run is a collection of files, stored in the top-level `runs` subdirectory. It MUST be accompanied by a per-run CWL workflow description, stored in `.cwl` as further described [below](#run-description). -- *Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the file `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. Furthermore, top-level reproducibility information SHOULD be provided in the CWL `arc.cwl`. +- *Top-level metadata and workflow description* tie together the elements of an ARC in the contexts of investigation and associated studies (in the ISA definition), captured in the file `isa.investigation.xlsx` in [ISA-XLSX format](#isa-xlsx-format), which MUST be present. All other files contained in an ARC (e.g., a `README.txt`, pre-print PDFs, additional annotation files) are referred to as *additional payload*, and MAY be located anywhere within the ARC structure. However, an ARC MUST be [reproducible](#reproducible-arcs) and [publishable](#shareable-and-publishable-arcs) even if these files are deleted. Further considerations on additional payload are described [below](#additional-payload). @@ -251,7 +251,7 @@ Note: The `investigation` file MUST follow the [ISA-XLSX investigation file specification](ISA-XLSX.md#investigation-file). -Furthermore, top-level reproducibility information SHOULD be provided in the CWL `arc.cwl`. +Furthermore, run-level reproducibility information SHOULD be provided in the CWL `run.cwl` ([Individual Run Description](#individual-run-description)). ### Investigation and Study Metadata @@ -260,11 +260,11 @@ The ARC root directory is identifiable by the presence of the `isa.investigation Multiple studies MUST be stored using one worksheet per study in `isa.studies.xlsx` in the root directory of the ARC. The study-level SHOULD define [ISA factors](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) of a study and MAY contain overlapping information also to be found in all assays grouped by the study. --> -### Top-Level Run Description +### Individual Run Description -The file `arc.cwl` SHOULD exist at the root directory of each ARC. It describes which runs are executed (and specifically, their order) to (re)produce the computational outputs contained within the ARC. +The file `run.cwl` MUST exist in the directory of each run. It describes the runs execution to (re)produce the computational outputs contained within the ARC. -`arc.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. An optional file `arc.yml` MAY be provided to specify input parameters. +`run.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. ## Data Path Annotation From a0b039147800e87291cc2337bf33c070b98b0074 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Mon, 2 Sep 2024 13:23:53 +0200 Subject: [PATCH 07/14] add metadata specification --- ARC specification.md | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index bdc3016..61fff87 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -210,6 +210,11 @@ Notes: - It is encouraged, to add metadata relevant to the tool description or workflow description. This metadata must be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters must be added to the `run.yml` parameter file. +- The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description) + should be used to describe workflow metadata. + + - This is mainly done using [Property Values](https://schema.org/PropertyValue). + ## Run Description **Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. @@ -232,9 +237,14 @@ Notes: - It is strongly encouraged to include author and contributor metadata in `run.yml` parameter files as CWL metadata. - - The referenced authors and contributors must be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. + - The referenced authors and contributors must be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. + +- It is encouraged, to add metadata relevant to the `run.yml` parameter file. This metadata must be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit must be added to the corresponding `.cwl` file. + +- The properties of [Lab Process](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) and [Create Action](https://schema.org/CreateAction) should be used to +describe run metadata. - - It is encouraged, to add metadata relevant to the `run.yml` parameter file. This metadata must be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit must be added to the corresponding `.cwl` file. + - This is mainly done using the processSequence (currently [about](https://schema.org/about)). ## Additional Payload From 07973a02afa8cb941dadb7be1249a164c0fc3024 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Mon, 2 Sep 2024 16:37:57 +0200 Subject: [PATCH 08/14] adapt must/should/may usage for cwl section --- ARC specification.md | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 61fff87..227ffe6 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -192,26 +192,26 @@ The file locations can be seen in the [Example ARC structure](#example-arc-struc Notes: -- There are no requirements on the structure or granularity of workflows. An ARC may contain no workflows at all if it contains no [run results](#run-description), or MAY utilize a single workflow to generate a single run result containing all computational output. +- There are no requirements on the structure or granularity of workflows. An ARC MAY contain no workflows at all if it contains no [run results](#run-description), or MAY utilize a single workflow to generate a single run result containing all computational output. -- While workflows typically are (and should be) *generic*, i.e. a single workflow can be applied to different data of the same type, this is not a requirement. It is allowed to hard-code assay file paths and other parameters if workflow reusability is not a priority. +- While workflows typically are (and SHOULD be) *generic*, i.e. a single workflow can be applied to different data of the same type, this is not a requirement. It is allowed to hard-code assay file paths and other parameters if workflow reusability is not a priority. -- It is highly recommended that tool descriptions contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/topics/using-containers.html) container description. +- Tool descriptions SHOULD contain a reproducible execution environment description in the form of a [Docker](https://www.commonwl.org/user_guide/topics/using-containers.html) container description. - It is expected that workflow and tool descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. ### Workflow Metadata -- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) +- For metadata annotation,namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) -- It is strongly encouraged to include author and contributor metadata in tool descriptions and workflow descriptions as CWL metadata. +- Author and contributor metadata SHOULD be included in tool descriptions and workflow descriptions as CWL metadata. - - The referenced authors and contributors must be the ones involved in the creation of the tool description or workflow description, not the person executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements). + - The referenced authors and contributors MUST be the ones involved in the creation of the tool description or workflow description, not the person executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements). -- It is encouraged, to add metadata relevant to the tool description or workflow description. This metadata must be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters must be added to the `run.yml` parameter file. +- Metadata relevant to the tool description or workflow description SHOULD be added. This metadata MUST be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters MUST be added to the `run.yml` parameter file. - The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description) - should be used to describe workflow metadata. + SHOULD be used to describe workflow metadata. - This is mainly done using [Property Values](https://schema.org/PropertyValue). @@ -219,7 +219,7 @@ Notes: **Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. -Each such subdirectory must contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. +Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. `run.cwl` MAY (and sensibly, should) refer to assay data files, external data files, workflow descriptions, and files in other run results; such references MUST use relative paths. Furthermore, `run.cwl` MUST specify as outputs all result files. `run.cwl` MUST BE executable without referring to [additional payload files](#additional-auxiliary-payload) or files outside the ARC. @@ -233,15 +233,15 @@ Notes: ### Run Metadata -- For metadata annotation, it is encouraged to reference namespaces and schemas, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) +- For metadata annotation, namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) -- It is strongly encouraged to include author and contributor metadata in `run.yml` parameter files as CWL metadata. +- Author and contributor metadata SHOULD be included in `run.yml` parameter files as CWL metadata. - - The referenced authors and contributors must be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. + - The referenced authors and contributors MUST be the ones executing the [processing unit](https://www.commonwl.org/user_guide/introduction/basic-concepts.html#processes-and-requirements), not the person that created the processing unit. -- It is encouraged, to add metadata relevant to the `run.yml` parameter file. This metadata must be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit must be added to the corresponding `.cwl` file. +- Metadata relevant to the `run.yml` parameter file SHOULD be added. This metadata MUST be limited to only metadata that directly describes the run parameters. Metadata describing the processing unit MUST be added to the corresponding `.cwl` file. -- The properties of [Lab Process](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) and [Create Action](https://schema.org/CreateAction) should be used to +- The properties of [Lab Process](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) and [Create Action](https://schema.org/CreateAction) SHOULD be used to describe run metadata. - This is mainly done using the processSequence (currently [about](https://schema.org/about)). From 29f9ea37421781eaa08d5193a4f38b9027d31bf1 Mon Sep 17 00:00:00 2001 From: Caroline Ott <39764934+caroott@users.noreply.github.com> Date: Tue, 3 Sep 2024 10:16:48 +0200 Subject: [PATCH 09/14] capitalize must --- ARC specification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ARC specification.md b/ARC specification.md index 227ffe6..0b2d92a 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -184,7 +184,7 @@ Notes: Workflow execution and metadata MUST be described using the [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, in a file `workflow.cwl`, which MUST be placed in the subdirectory containing all files specific to this workflow under the top-level `workflows` subdirectory. This file MUST contain either of: -- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions must be self-contained and not refer to any files outside the ARC root directory. All paths used within the tool description MUST be relative to itself. +- A CWL [tool description](https://www.commonwl.org/v1.2/CommandLineTool.html). Tool descriptions MUST be self-contained and not refer to any files outside the ARC root directory. All paths used within the tool description MUST be relative to itself. - A CWL [workflow description](https://www.commonwl.org/v1.2/Workflow.html). Such descriptions MAY utilize other ARC workflows as [nested workflows](https://www.commonwl.org/user_guide/topics/workflows.html#nested-workflows), but MUST use relative paths in this case. Files outside the ARC root directory MUST NOT be referenced. From 975f547769bda3691b224bc46d84b1df3d71c091 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Mon, 9 Sep 2024 13:58:57 +0200 Subject: [PATCH 10/14] move individual run description --- ARC specification.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 0b2d92a..de2258e 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -23,10 +23,10 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use - [Assay Data and Metadata](#assay-data-and-metadata) - [Workflow Description](#workflow-description) - [Run Description](#run-description) + - [Individual Run Description](#individual-run-description) - [Additional Payload](#additional-payload) - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) - [Investigation and Study Metadata](#investigation-and-study-metadata) - - [Individual Run Description](#individual-run-description) - [Data Path Annotation](#data-path-annotation) - [Examples](#examples) - [General Pattern](#general-pattern) @@ -246,6 +246,12 @@ describe run metadata. - This is mainly done using the processSequence (currently [about](https://schema.org/about)). +### Individual Run Description + +The file `run.cwl` MUST exist in the directory of each run. It describes the runs execution to (re)produce the computational outputs contained within the ARC. + +`run.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. + ## Additional Payload ARCs can include additional payload according to user requirements, e.g. presentations, reading material, or manuscripts. While these files can be placed anywhere in the ARC, it is strongly advised to organize these in additional subdirectories. @@ -270,12 +276,6 @@ The ARC root directory is identifiable by the presence of the `isa.investigation Multiple studies MUST be stored using one worksheet per study in `isa.studies.xlsx` in the root directory of the ARC. The study-level SHOULD define [ISA factors](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) of a study and MAY contain overlapping information also to be found in all assays grouped by the study. --> -### Individual Run Description - -The file `run.cwl` MUST exist in the directory of each run. It describes the runs execution to (re)produce the computational outputs contained within the ARC. - -`run.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. - ## Data Path Annotation All metadata references to files or directories located inside the ARC MUST follow the following patterns: From f7799e221803f8d17245feaea2c785865d494dcc Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Mon, 9 Sep 2024 14:06:32 +0200 Subject: [PATCH 11/14] add relative path clarification --- ARC specification.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/ARC specification.md b/ARC specification.md index de2258e..1e7f7f2 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -221,7 +221,7 @@ Notes: Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. -`run.cwl` MAY (and sensibly, should) refer to assay data files, external data files, workflow descriptions, and files in other run results; such references MUST use relative paths. Furthermore, `run.cwl` MUST specify as outputs all result files. `run.cwl` MUST BE executable without referring to [additional payload files](#additional-auxiliary-payload) or files outside the ARC. +`run.cwl` MAY (and sensibly, should) refer to assay data files, external data files, workflow descriptions, and files in other run results; such references MUST use relative paths, which can be given in the corresponding `run.yml`. The paths MUST be relative to the location of the `run.yml` file. Furthermore, `run.cwl` MUST specify as outputs all result files. `run.cwl` MUST BE executable without referring to [additional payload files](#additional-auxiliary-payload) or files outside the ARC. Notes: From 2b1c6842d825dce888e394a57e1a572cd8816969 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Mon, 9 Sep 2024 16:31:11 +0200 Subject: [PATCH 12/14] clarify metadata annotation and type mapping --- ARC specification.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 1e7f7f2..2d0b9a2 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -202,7 +202,9 @@ Notes: ### Workflow Metadata -- For metadata annotation,namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) +- Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). + +- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol). - Author and contributor metadata SHOULD be included in tool descriptions and workflow descriptions as CWL metadata. @@ -233,7 +235,9 @@ Notes: ### Run Metadata -- For metadata annotation, namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) +- Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). + +- Namespaces and schemas SHOULD be referenced (e.g. [LabProcess](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate.md#labprocess). - Author and contributor metadata SHOULD be included in `run.yml` parameter files as CWL metadata. @@ -244,7 +248,7 @@ Notes: - The properties of [Lab Process](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) and [Create Action](https://schema.org/CreateAction) SHOULD be used to describe run metadata. - - This is mainly done using the processSequence (currently [about](https://schema.org/about)). + - This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)). ### Individual Run Description From d75f18d60479822713d184afa9ef35008e3a8002 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Tue, 10 Sep 2024 12:12:06 +0200 Subject: [PATCH 13/14] remove redundant individual runs section --- ARC specification.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 2d0b9a2..2940820 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -23,7 +23,6 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use - [Assay Data and Metadata](#assay-data-and-metadata) - [Workflow Description](#workflow-description) - [Run Description](#run-description) - - [Individual Run Description](#individual-run-description) - [Additional Payload](#additional-payload) - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) - [Investigation and Study Metadata](#investigation-and-study-metadata) @@ -250,12 +249,6 @@ describe run metadata. - This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)). -### Individual Run Description - -The file `run.cwl` MUST exist in the directory of each run. It describes the runs execution to (re)produce the computational outputs contained within the ARC. - -`run.cwl` MUST be a CWL v1.2 workflow description and adhere to the same requirements as [run descriptions](#run-description). In particular, references to study or assay data files, nested workflows MUST use relative paths. - ## Additional Payload ARCs can include additional payload according to user requirements, e.g. presentations, reading material, or manuscripts. While these files can be placed anywhere in the ARC, it is strongly advised to organize these in additional subdirectories. From a25fecab365be596c4b666c3fe0b28501c7a2244 Mon Sep 17 00:00:00 2001 From: Caroline Ott Date: Tue, 10 Sep 2024 15:12:01 +0200 Subject: [PATCH 14/14] add clarifying sentence about metadata annotation in cwl --- ARC specification.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/ARC specification.md b/ARC specification.md index 2940820..c8b4c25 100644 --- a/ARC specification.md +++ b/ARC specification.md @@ -203,7 +203,7 @@ Notes: - Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). -- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol). +- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol)). - Author and contributor metadata SHOULD be included in tool descriptions and workflow descriptions as CWL metadata. @@ -214,6 +214,8 @@ Notes: - The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description) SHOULD be used to describe workflow metadata. + - The types MUST be used in the [CWL syntax](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) and SHOULD be referenced as described above. + - This is mainly done using [Property Values](https://schema.org/PropertyValue). ## Run Description @@ -236,7 +238,7 @@ Notes: - Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). -- Namespaces and schemas SHOULD be referenced (e.g. [LabProcess](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate.md#labprocess). +- Namespaces and schemas SHOULD be referenced (e.g. [LabProcess](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate.md#labprocess)). - Author and contributor metadata SHOULD be included in `run.yml` parameter files as CWL metadata. @@ -247,6 +249,8 @@ Notes: - The properties of [Lab Process](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprocess) and [Create Action](https://schema.org/CreateAction) SHOULD be used to describe run metadata. + - The types MUST be used in the [CWL syntax](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) and SHOULD be referenced as described above. + - This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)). ## Additional Payload