-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework/Add CWL Part #111
Rework/Add CWL Part #111
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per this discussion, the root arc.cwl was replaced by individual run.cwl files
@caroott what is missing to get this merged? |
Please, as @kMutagene already suggested, stick to the MUST/SHOULD/MAY specification conventions used throughout other sections of the specification. |
Are you referring to the capitalization of MUST/SHOULD/MAY or the usage? Because the part @kMutagene highlighted is a part of the specification that has not been changed by this PR |
Both new and existing parts of text. Maybe you could go through the sections you edited? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just read through it again and it already looks pretty good. But I would still suggest to fix a few points (which I commented) before merging.
Especially one thing I think needs some clarification. If I understoof it correctly, the Run Metadata
and Workflow Metadata
section refer to the additional metadata in yaml
format that can extend all file defined in the cwl specification, right?
This does not really come through here. I would wish for an explicit clafirication which states this.
ARC specification.md
Outdated
@@ -26,7 +26,7 @@ Licensed under the Creative Commons License CC BY, Version 4.0; you may not use | |||
- [Additional Payload](#additional-payload) | |||
- [Top-level Metadata and Workflow Description](#top-level-metadata-and-workflow-description) | |||
- [Investigation and Study Metadata](#investigation-and-study-metadata) | |||
- [Top-Level Run Description](#top-level-run-description) | |||
- [Individual Run Description](#individual-run-description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not fit under "Top-level Metadata" anymore if it's about run-specific metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the individual workflow description, but the description which workflow was executed with which job file. It also replaces the arc.cwl, which intended to do this for the whole arc. Since it is then the highest level description of the run, it would still be top level in my opinion
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the top-level does not refer to top-level per metadata kind, but rather top-level on the ARC in general.
|
||
## Run Description | ||
|
||
**Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. | ||
|
||
Each such subdirectory must contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. | ||
Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an article, (e.g. the
) needed before the run.cwl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for me, run.cwl
stands as a replacement for the actual file in the specs and i would leave it without an article. The run CWL file would be the other case, where i would use it. But I'm fine with either one
ARC specification.md
Outdated
|
||
## Run Description | ||
|
||
**Runs** in an ARC represent all artefacts that result from some computation on the data within the ARC, i.e. [assays](#assay-data-and-metadata) and [external data](#external-data). These results (e.g. plots, tables, data files, etc. ) MUST reside inside one or more subdirectory of the top-level `runs` directory. | ||
|
||
Each such subdirectory must contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. | ||
Each such subdirectory MUST contain a workflow description `run.cwl`, given in [Common Workflow Language](https://www.commonwl.org/) (CWL), [v1.2](https://www.commonwl.org/v1.2/) or higher, that describes how the files contained with the run are derived from assay or external data, or other runs. `run.cwl` MUST be placed in the subdirectory under the top-level `runs` directory. A parameter file `run.yml` MAY be given to specify run-specific input parameters. | ||
|
||
`run.cwl` MAY (and sensibly, should) refer to assay data files, external data files, workflow descriptions, and files in other run results; such references MUST use relative paths. Furthermore, `run.cwl` MUST specify as outputs all result files. `run.cwl` MUST BE executable without referring to [additional payload files](#additional-auxiliary-payload) or files outside the ARC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add clarifications for the relative paths
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have an example for relative paths in General Patterns, but i think it would make sense here, since the paths are not relative to the arc root, but rather relative to the job file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah maybe just add a helping sentence or use some phrasing like relative to the cwl file
ARC specification.md
Outdated
- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). | ||
### Run Metadata | ||
|
||
- For metadata annotation, namespaces and schemas SHOULD be referenced, as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
missing at the end
@@ -218,7 +231,20 @@ Notes: | |||
|
|||
- It is expected that run descriptions are authored semi-automatically, e.g. using the [arcCommander](https://github.com/nfdi4plants/arcCommander) tool. | |||
|
|||
- It is strongly encouraged to include author and contributor metadata in run descriptions as [CWL metadata](https://www.commonwl.org/user_guide/17-metadata/index.html). | |||
### Run Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add a link to the profile here? Especially the last bullet point
This is mainly done using the processSequence (currently about).
seems off without some more context
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this would help
ARC specification.md
Outdated
@@ -244,11 +270,11 @@ The ARC root directory is identifiable by the presence of the `isa.investigation | |||
Multiple studies MUST be stored using one worksheet per study in `isa.studies.xlsx` in the root directory of the ARC. | |||
The study-level SHOULD define [ISA factors](https://isa-specs.readthedocs.io/en/latest/isamodel.html#study) of a study and MAY contain overlapping information also to be found in all assays grouped by the study. --> | |||
|
|||
### Top-Level Run Description | |||
### Individual Run Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this section does not fit into the supersection anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my comment above, we could return the name to Top-Level i think. If not i would agree that it should be moved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one change, otherwise lgtm 👍
ARC specification.md
Outdated
|
||
- This is mainly done using the processSequence (which currently maps to the [about](https://schema.org/about) type of LabProcess, see [here](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/release/profile/isa_ro_crate_mapping.md)). | ||
|
||
### Individual Run Description |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would cut out this section, as it is just duplicated information now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, @kMutagene?
ARC specification.md
Outdated
|
||
- Add metadata annotation as shown in the [CWL metadata user guide](https://www.commonwl.org/user_guide/topics/metadata-and-authorship.html). | ||
|
||
- Namespaces and schemas SHOULD be referenced (e.g. [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing closing parenthesis
|
||
- Metadata relevant to the tool description or workflow description SHOULD be added. This metadata MUST be limited to only metadata that directly describes the processing unit. Metadata describing the run parameters MUST be added to the `run.yml` parameter file. | ||
|
||
- The properties of [Lab Protocol](https://github.com/nfdi4plants/isa-ro-crate-profile/blob/main/profile/isa_ro_crate.md#labprotocol) and [Computational Workflow](https://bioschemas.org/profiles/ComputationalWorkflow/1.0-RELEASE#nav-description) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH i do not understand how this relates to the ROCrate/schema.org types you mention here, as the way to annotate metadata in cwl files (mentioned in the first bullet point in this section) links to a completely different format than that. Does this mean that the property names of those types should be used in the CWL metadata format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also where possible, stick to one line per sentence in markdown files to improve reviewability
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is what it means. You can reference the namespaces and use them in the cwl format to annotate your data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can reference the namespaces and use them in the cwl format to annotate your data
Then i'd suggest to include that explanation here. It might be findable in the CWL metadata specification, but this is so integral that it should be explicitly explained here IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i added a bullet point which hopefully clarifies it
This PR adds specifications for the metadata associated with workflows and runs. It also reworks the existing CWL part. #110