Rework/Add CWL Part #111

caroott · 2024-07-03T08:33:47Z

This PR adds specifications for the metadata associated with workflows and runs. It also reworks the existing CWL part. #110

ARC specification.md

caroott

As per this discussion, the root arc.cwl was replaced by individual run.cwl files

kMutagene · 2024-09-02T06:02:15Z

@caroott what is missing to get this merged?

HLWeil · 2024-09-02T13:30:15Z

Please, as @kMutagene already suggested, stick to the MUST/SHOULD/MAY specification conventions used throughout other sections of the specification.

caroott · 2024-09-02T14:12:12Z

Are you referring to the capitalization of MUST/SHOULD/MAY or the usage? Because the part @kMutagene highlighted is a part of the specification that has not been changed by this PR

HLWeil · 2024-09-02T14:15:42Z

Both new and existing parts of text. Maybe you could go through the sections you edited?

ARC specification.md

HLWeil

I've just read through it again and it already looks pretty good. But I would still suggest to fix a few points (which I commented) before merging.

Especially one thing I think needs some clarification. If I understoof it correctly, the Run Metadata and Workflow Metadata section refer to the additional metadata in yaml format that can extend all file defined in the cwl specification, right?
This does not really come through here. I would wish for an explicit clafirication which states this.

HLWeil · 2024-09-03T10:14:01Z

ARC specification.md

@@ -26,7 +26,7 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use
  - [Additional Payload](#additional-payload)
  - [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description)
    - [Investigation and Study Metadata](#investigation-and-study-metadata)
-    - [Top-Level Run Description](#top-level-run-description)
+    - [Individual Run Description](#individual-run-description)


This does not fit under "Top-level Metadata" anymore if it's about run-specific metadata.

It is not the individual workflow description, but the description which workflow was executed with which job file. It also replaces the arc.cwl, which intended to do this for the whole arc. Since it is then the highest level description of the run, it would still be top level in my opinion

I'd say the top-level does not refer to top-level per metadata kind, but rather top-level on the ARC in general.

HLWeil · 2024-09-03T11:57:32Z

ARC specification.md


 ## Run Description

 **Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory.

-Each such subdirectory must contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters.
+Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters.


Is there an article, (e.g. the) needed before the run.cwl?

for me, run.cwl stands as a replacement for the actual file in the specs and i would leave it without an article. The run CWL file would be the other case, where i would use it. But I'm fine with either one

HLWeil · 2024-09-03T12:13:31Z

ARC specification.md


 ## Run Description

 **Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory.

-Each such subdirectory must contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters.
+Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters.

 `run.cwl` MAY (and sensibly, should) refer to assay data files, external data files, workflow descriptions, and files in other run results; such references MUST use relative paths. Furthermore, `run.cwl` MUST specify as outputs all result files. `run.cwl` MUST BE executable without referring to [additional payload files](#additional-auxiliary-payload) or files outside the ARC.


Should we add clarifications for the relative paths?

We have an example for relative paths in General Patterns, but i think it would make sense here, since the paths are not relative to the arc root, but rather relative to the job file

Yeah maybe just add a helping sentence or use some phrasing like relative to the cwl file

HLWeil · 2024-09-03T12:14:12Z

ARC specification.md

- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html).
+### Run Metadata
+
+- For metadata annotation, namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html)


. missing at the end

HLWeil · 2024-09-03T12:16:12Z

ARC specification.md

@@ -218,7 +231,20 @@ Notes:

 - It is expected that run descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool.

- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html).
+### Run Metadata


Should we add a link to the profile here? Especially the last bullet point

This is mainly done using the processSequence (currently about).

seems off without some more context

I agree, this would help

HLWeil · 2024-09-03T12:17:49Z

ARC specification.md

@@ -244,11 +270,11 @@ The ARC root directory is identifiable by the presence of the `isa.investigation
 Multiple studies MUST be stored using one worksheet per study in `isa.studies.xlsx` in the root directory of the ARC. 
 The study-level SHOULD define [ISA factors](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) of a study and MAY contain overlapping information also to be found in all assays grouped by the study. -->

-### Top-Level Run Description
+### Individual Run Description


Again, this section does not fit into the supersection anymore.

As per my comment above, we could return the name to Top-Level i think. If not i would agree that it should be moved

HLWeil

Just one change, otherwise lgtm 👍

HLWeil · 2024-09-10T08:31:12Z

ARC specification.md

+
+  - This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)).
+
+### Individual Run Description


I would cut out this section, as it is just duplicated information now.

HLWeil

lgtm, @kMutagene?

kMutagene · 2024-09-10T10:47:55Z

ARC specification.md

+
+- Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html).
+
+- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol).


missing closing parenthesis

kMutagene · 2024-09-10T10:50:45Z

ARC specification.md

+
+- Metadata relevant to the tool description or workflow description SHOULD be added. This metadata MUST be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters MUST be added to the `run.yml` parameter file.
+
+- The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description)


TBH i do not understand how this relates to the ROCrate/schema.org types you mention here, as the way to annotate metadata in cwl files (mentioned in the first bullet point in this section) links to a completely different format than that. Does this mean that the property names of those types should be used in the CWL metadata format?

Also where possible, stick to one line per sentence in markdown files to improve reviewability

Yes, that is what it means. You can reference the namespaces and use them in the cwl format to annotate your data

You can reference the namespaces and use them in the cwl format to annotate your data

Then i'd suggest to include that explanation here. It might be findable in the CWL metadata specification, but this is so integral that it should be explicitly explained here IMO

i added a bullet point which hopefully clarifies it

caroott added 3 commits July 2, 2024 17:23

correct allowed paths for tool description nfdi4plants#110

96c65c2

fix clw user guide links

e538379

add metadata section to run and workflow nfdi4plants#110

fc95c79

kMutagene linked an issue Jul 9, 2024 that may be closed by this pull request

Add/improve explicit CWL-related section(s) #110

Closed

kMutagene mentioned this pull request Jul 18, 2024

[Feature Request] Add ARC CWL Data Model nfdi4plants/ARCtrl#420

Closed

kMutagene requested changes Jul 18, 2024

View reviewed changes

ARC specification.md Show resolved Hide resolved

ARC specification.md Show resolved Hide resolved

caroott added 3 commits July 18, 2024 11:11

update arc example structure (cwl files)

205c0c2

link to example arc structure for file locations

929a950

replace root arc.cwl with run.cwl

b6f6e1b

caroott commented Jul 18, 2024

View reviewed changes

add metadata specification

a0b0391

caroott marked this pull request as ready for review September 2, 2024 11:24

caroott requested a review from kMutagene September 2, 2024 11:24

adapt must/should/may usage for cwl section

07973a0

kMutagene reviewed Sep 3, 2024

View reviewed changes

ARC specification.md Outdated Show resolved Hide resolved

capitalize must

29f9ea3

HLWeil requested changes Sep 3, 2024

View reviewed changes

kMutagene changed the base branch from main to dev September 5, 2024 14:31

caroott added 3 commits September 9, 2024 13:58

move individual run description

975f547

add relative path clarification

f7799e2

clarify metadata annotation and type mapping

2b1c684

caroott requested a review from HLWeil September 9, 2024 14:41

HLWeil requested changes Sep 10, 2024

View reviewed changes

remove redundant individual runs section

d75f18d

HLWeil approved these changes Sep 10, 2024

View reviewed changes

kMutagene requested changes Sep 10, 2024

View reviewed changes

add clarifying sentence about metadata annotation in cwl

a25feca

kMutagene approved these changes Sep 10, 2024

View reviewed changes

kMutagene merged commit 7bbc1e0 into nfdi4plants:dev Sep 10, 2024

caroott mentioned this pull request Sep 11, 2024

Add/improve explicit CWL-related section(s) #110

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework/Add CWL Part #111

Rework/Add CWL Part #111

caroott commented Jul 3, 2024

caroott left a comment

kMutagene commented Sep 2, 2024

HLWeil commented Sep 2, 2024

caroott commented Sep 2, 2024

HLWeil commented Sep 2, 2024

HLWeil left a comment •

edited

Loading

HLWeil Sep 3, 2024

caroott Sep 3, 2024

HLWeil Sep 4, 2024

HLWeil Sep 3, 2024

caroott Sep 3, 2024

HLWeil Sep 3, 2024

caroott Sep 3, 2024

HLWeil Sep 4, 2024

HLWeil Sep 3, 2024

HLWeil Sep 3, 2024

caroott Sep 3, 2024

HLWeil Sep 3, 2024

caroott Sep 3, 2024

HLWeil left a comment

HLWeil Sep 10, 2024

HLWeil left a comment

kMutagene Sep 10, 2024

kMutagene Sep 10, 2024

kMutagene Sep 10, 2024 •

edited

Loading

caroott Sep 10, 2024

kMutagene Sep 10, 2024

caroott Sep 10, 2024


		- This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)).

		### Individual Run Description


		- Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html).

		- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol).


		- Metadata relevant to the tool description or workflow description SHOULD be added. This metadata MUST be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters MUST be added to the `run.yml` parameter file.

		- The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description)

Rework/Add CWL Part #111

Rework/Add CWL Part #111

Conversation

caroott commented Jul 3, 2024

caroott left a comment

Choose a reason for hiding this comment

kMutagene commented Sep 2, 2024

HLWeil commented Sep 2, 2024

caroott commented Sep 2, 2024

HLWeil commented Sep 2, 2024

HLWeil left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HLWeil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HLWeil left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kMutagene Sep 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HLWeil left a comment •

edited

Loading

kMutagene Sep 10, 2024 •

edited

Loading