Skip to content
This repository has been archived by the owner on Aug 20, 2024. It is now read-only.

Commit

Permalink
Merge pull request #141 from nvnieuwk/move-json-validator
Browse files Browse the repository at this point in the history
Update the JSON schema validator library + major refactor
  • Loading branch information
nvnieuwk committed Feb 19, 2024
2 parents 0a0ba1b + cf0da97 commit 4113068
Show file tree
Hide file tree
Showing 83 changed files with 2,179 additions and 1,610 deletions.
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,31 @@
# nextflow-io/nf-validation: Changelog

# Version 2.0.0dev

:warning: This version contains a number of breaking changes. Please read the changelog carefully before upgrading. :warning:

To migrate your schemas please follow the [migration guide](https://nextflow-io.github.io/nf-validation/latest/migration_guide/)

## New features

- Added the `uniqueEntries` keyword. This keyword takes a list of strings corresponding to names of fields that need to be a unique combination. e.g. `uniqueEntries: ['sample', 'replicate']` will make sure that the combination of the `sample` and `replicate` fields is unique. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

## Changes

- Changed the used draft for the schema from `draft-07` to `draft-2020-12`. See the [2019-09](https://json-schema.org/draft/2019-09/release-notes) and [2020-12](https://json-schema.org/draft/2020-12/release-notes) release notes for all changes ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed all validation code from the `.fromSamplesheet()` channel factory. The validation is now solely done in the `validateParameters()` function. A custom error message will now be displayed if any error has been encountered during the conversion ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `unique` keyword from the samplesheet schema. You should now use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or `uniqueEntries` instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Removed the `skip_duplicate_check` option from the `fromSamplesheet()` channel factory and the `--validationSkipDuplicateCheck` parameter. You should now use the `uniqueEntries` or [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) keywords in the schema instead ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` now does dynamic typecasting instead of using the `type` fields in the JSON schema. This is done due to the complexity of `draft-2020-12` JSON schemas. This should not have that much impact but keep in mind that some types can be different between this and earlier versions because of this ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` will now set all missing values as `[]` instead of the type specific defaults (because of the changes in the previous point). This should not change that much as this will also result in `false` when used in conditions. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

## Improvements

- Setting the `exists` keyword to `false` will now check if the path does not exist ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- The `schema` keyword will now work in all schemas. ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- Improved the error messages ([#141](https://github.com/nextflow-io/nf-validation/pull/141))
- `.fromSamplesheet()` now supports deeply nested samplesheets ([#141](https://github.com/nextflow-io/nf-validation/pull/141))

# Version 1.1.3 - Asahikawa

## Improvements
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This [Nextflow plugin](https://www.nextflow.io/docs/latest/plugins.html#plugins)
- 📋 Validate the contents of supplied sample sheet files
- 🛠️ Create a Nextflow channel with a parsed sample sheet

Supported sample sheet formats are CSV, TSV and YAML (simple).
Supported sample sheet formats are CSV, TSV, JSON and YAML.

## Quick Start

Expand All @@ -31,7 +31,7 @@ This is all that is needed - Nextflow will automatically fetch the plugin code a
> [!NOTE]
> The snippet above will always try to install the latest version, good to make sure
> that the latest bug fixes are included! However, this can cause difficulties if running
> offline. You can pin a specific release using the syntax `nf-validation@0.3.2`
> offline. You can pin a specific release using the syntax `nf-validation@2.0.0`
You can now include the plugin helper functions into your Nextflow pipeline:

Expand All @@ -58,7 +58,7 @@ ch_input = Channel.fromSamplesheet("input")
## Dependencies

- Java 11 or later
- <https://github.com/everit-org/json-schema>
- <https://github.com/harrel56/json-schema>

## Slack channel

Expand All @@ -75,3 +75,4 @@ We would like to thank the key contributors who include (but are not limited to)
- Nicolas Vannieuwkerke ([@nvnieuwk](https://github.com/nvnieuwk))
- Kevin Menden ([@KevinMenden](https://github.com/KevinMenden))
- Phil Ewels ([@ewels](https://github.com/ewels))
- Arthur ([@awgymer](https://github.com/awgymer))
68 changes: 68 additions & 0 deletions docs/migration_guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
title: Migration guide
description: Guide to migrate pipelines using nf-validation pre v2.0.0 to after v2.0.0
hide:
- toc
---

# Migration guide

This guide is intended to help you migrate your pipeline from older versions of the plugin to version 2.0.0 and later.

## Major changes in the plugin

Following list shows the major breaking changes introduced in version 2.0.0:

1. The JSON schema draft has been updated from `draft-07` to `draft-2020-12`. See [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.
2. The `unique` keyword for samplesheet schemas has been removed. Please use [`uniqueItems`](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems) or [`uniqueEntries`](nextflow_schema/nextflow_schema_specification.md#uniqueentries) now instead.
3. The `dependentRequired` keyword now works as it's supposed to work in JSON schema. See [`dependentRequired`](https://json-schema.org/understanding-json-schema/reference/conditionals#dependentRequired) for more information

A full list of changes can be found in the [changelog](../CHANGELOG.md).

## Updating your pipeline

If you aren't using any special features in your schemas, you can simply update your `nextflow_schema.json` file using the following command:

```bash
sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' nextflow_schema.json
```

This will replace the old schema draft specification (`draft-07`) by the new one (`2020-12`), and the old keyword `definitions` by the new notation `defs`.

!!! note
Repeat this command for every JSON schema you use in your pipeline. e.g. for the default samplesheet schema:
`bash sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' assets/schema_input.json `

If you are using any special features in your schemas, you will need to update your schemas manually. Please refer to the [JSON Schema draft 2020-12 release notes](https://json-schema.org/draft/2020-12/release-notes) and [JSON schema draft 2019-09 release notes](https://json-schema.org/draft/2019-09/release-notes) for more information.

However here are some guides to the more common migration patterns:

### Updating `unique` keyword

When you use `unique` in your schemas, you should update it to use `uniqueItems` or `uniqueEntries` instead.

If you used the `unique:true` field, you should update it to use `uniqueItems` like this:

=== "Before v2.0"
`json hl_lines="9" { "$schema": "http://json-schema.org/draft-07/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string", "unique": true } } } } `

=== "After v2.0"
`json hl_lines="12" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string" } } }, "uniqueItems": true } `

If you used the `unique: ["field1", "field2"]` field, you should update it to use `uniqueEntries` like this:

=== "Before v2.0"
`json hl_lines="9" { "$schema": "http://json-schema.org/draft-07/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string", "unique": ["sample"] } } } } `

=== "After v2.0"
`json hl_lines="12" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "array", "items": { "type": "object", "properties": { "sample": { "type": "string" } } }, "uniqueEntries": ["sample"] } `

### Updating `dependentRequired` keyword

When you use `dependentRequired` in your schemas, you should update it like this:

=== "Before v2.0"
`json hl_lines="12" { "$schema": "http://json-schema.org/draft-07/schema", "type": "object", "properties": { "fastq_1": { "type": "string", "format": "file-path" }, "fastq_2": { "type": "string", "format": "file-path" "dependentRequired": ["fastq_1"] } } } `

=== "After v2.0"
`json hl_lines="14 15 16" { "$schema": "https://json-schema.org/draft/2020-12/schema", "type": "object", "properties": { "fastq_1": { "type": "string", "format": "file-path" }, "fastq_2": { "type": "string", "format": "file-path" } }, "dependentRequired": { "fastq_2": ["fastq_1"] } } `
9 changes: 9 additions & 0 deletions docs/nextflow_schema/create_schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,15 @@ go to the pipeline root and run the following:
nf-core schema build
```

!!! warning

The current version of `nf-core` tools (v2.12.1) does not support the new schema draft used in `nf-validation`. Running this command after building the schema will convert the schema to the right draft:

```bash
sed -i -e 's/http:\/\/json-schema.org\/draft-07\/schema/https:\/\/json-schema.org\/draft\/2020-12\/schema/g' -e 's/definitions/defs/g' nextflow_schema.json
```
A new version of the nf-core schema builder will be available soon. Keep an eye out!

The tool will run the `nextflow config` command to extract your pipeline's configuration
and compare the output to your `nextflow_schema.json` file (if it exists).
It will prompt you to update the schema file with any changes, then it will ask if you
Expand Down
102 changes: 68 additions & 34 deletions docs/nextflow_schema/nextflow_schema_specification.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,24 +30,24 @@ You can find more information about JSON Schema here:

## Definitions

A slightly strange use of a JSON schema standard that we use for Nextflow schema is `definitions`.
A slightly strange use of a JSON schema standard that we use for Nextflow schema is `defs`.

JSON schema can group variables together in an `object`, but then the validation expects this structure to exist in the data that it is validating.
In reality, we have a very long "flat" list of parameters, all at the top level of `params.foo`.

In order to give some structure to log outputs, documentation and so on, we group parameters into `definitions`.
Each `definition` is an object with a title, description and so on.
However, as they are under `definitions` scope they are effectively ignored by the validation and so their nested nature is not a problem.
In order to give some structure to log outputs, documentation and so on, we group parameters into `defs`.
Each `def` is an object with a title, description and so on.
However, as they are under `defs` scope they are effectively ignored by the validation and so their nested nature is not a problem.
We then bring the contents of each definition object back to the "flat" top level for validation using a series of `allOf` statements at the end of the schema,
which reference the specific definition keys.

<!-- prettier-ignore-start -->
```json
{
"$schema": "http://json-schema.org/draft-07/schema",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
// Definition groups
"definitions": { // (1)!
"defs": { // (1)!
"my_group_of_params": { // (2)!
"title": "A virtual grouping used for docs and pretty-printing",
"type": "object",
Expand All @@ -64,7 +64,7 @@ which reference the specific definition keys.
},
// Contents of each definition group brought into main schema for validation
"allOf": [
{ "$ref": "#/definitions/my_group_of_params" } // (6)!
{ "$ref": "#/defs/my_group_of_params" } // (6)!
]
}
```
Expand All @@ -77,7 +77,7 @@ which reference the specific definition keys.
5. Shortened here for the example, see below for full parameter specification.
6. A `$ref` line like this needs to be added for every definition group

Parameters can be described outside of the `definitions` scope, in the regular JSON Schema top-level `properties` scope.
Parameters can be described outside of the `defs` scope, in the regular JSON Schema top-level `properties` scope.
However, they will be displayed as ungrouped in tools working off the schema.

## Nested parameters
Expand Down Expand Up @@ -115,8 +115,7 @@ Any parameters that _must_ be specified should be set as `required` in the schem

!!! tip

Make sure you do not set a default value for the parameter, as then it will have
a value even if not supplied by the pipeline user and the required property will have no effect.
Make sure you do set `null` as a default value for the parameter, otherwise it will have a value even if not supplied by the pipeline user and the required property will have no effect.

This is not done with a property key like other things described below, but rather by naming
the parameter in the `required` array in the definition object / top-level object.
Expand Down Expand Up @@ -164,13 +163,13 @@ Variable type, taken from the [JSON schema keyword vocabulary](https://json-sche
- `number` (float)
- `integer`
- `boolean` (true / false)
- `object` (currently only supported for file validation, see [Nested paramters](#nested-parameters))
- `array` (currently only supported for file validation, see [Nested paramters](#nested-parameters))

Validation checks that the supplied parameter matches the expected type, and will fail with an error if not.

These JSON schema types are _not_ supported (see [Nested paramters](#nested-parameters)):
This JSON schema type is _not_ supported:

- `object`
- `array`
- `null`

### `default`
Expand Down Expand Up @@ -223,7 +222,7 @@ If validation fails, this `errorMessage` is printed instead, and the raw JSON sc
For example, instead of printing:

```
ERROR ~ * --input: string [samples.yml] does not match pattern ^\S+\.csv$ (samples.yml)
* --input (samples.yml): "samples.yml" does not match regular expression [^\S+\.csv$]
```

We can set
Expand All @@ -239,9 +238,21 @@ We can set
and get:

```
ERROR ~ * --input: File name must end in '.csv' cannot contain spaces (samples.yml)
* --input (samples.yml): File name must end in '.csv' cannot contain spaces
```

### `deprecated`

!!! example "Extended key"

A boolean JSON flag that instructs anything using the schema that this parameter/field is deprecated and should not be used. This can be useful to generate messages telling the user that a parameter has changed between versions.

JSON schema states that this is an informative key only, but in `nf-validation` this will cause a validation error if the parameter/field is used.

!!! tip

Using the [`errorMessage`](#errormessage) keyword can be useful to provide more information about the deprecation and what to use instead.

### `enum`

An array of enumerated values: the parameter must match one of these values exactly to pass validation.
Expand Down Expand Up @@ -325,11 +336,6 @@ Formats can be used to give additional validation checks against `string` values
The `format` key is a [standard JSON schema key](https://json-schema.org/understanding-json-schema/reference/string.html#format),
however we primarily use it for validating file / directory path operations with non-standard schema values.

!!! note

In addition to _validating_ the strings as the provided format type, nf-validation also _coerces_ the parameter variable type.
That is: if the schema defines `params.input` as a `file-path`, nf-validation will convert the parameter from a `String` into a `Nextflow.File`.

Example usage is as follows:

```json
Expand All @@ -342,7 +348,7 @@ Example usage is as follows:
The available `format` types are below:

`file-path`
: States that the provided value is a file. Does not check its existence, but it does check that the path is not a directory.
: States that the provided value is a file. Does not check its existence, but it does check if the path is not a directory.

`directory-path`
: States that the provided value is a directory. Does not check its existence, but if it exists, it does check that the path is not a file.
Expand All @@ -351,11 +357,11 @@ The available `format` types are below:
: States that the provided value is a path (file or directory). Does not check its existence.

`file-path-pattern`
: States that the provided value is a globbing pattern that will be used to fetch files. Checks that the pattern is valid and that at least one file is found.
: States that the provided value is a glob pattern that will be used to fetch files. Checks that the pattern is valid and that at least one file is found.

### `exists`

When a format is specified for a value, you can provide the key `exists` set to true in order to validate that the provided path exists.
When a format is specified for a value, you can provide the key `exists` set to true in order to validate that the provided path exists. Set this to `false` to validate that the path does not exist.

Example usage is as follows:

Expand All @@ -367,18 +373,9 @@ Example usage is as follows:
}
```

!!! note

If `exists` is set to `false`, this validation is ignored. Does not check if the path exists.

!!! note

If the parameter is set to `null`, `false` or an empty string, this validation is ignored. It does not check if the path exists.

!!! note

If the parameter is an S3 URL path, this validation is ignored.
Use `--validationS3PathCheck` or set `params.validationS3PathCheck = true` to validate them.

### `mimetype`

Expand All @@ -404,8 +401,7 @@ Should only be set when `format` is `file-path`.

!!! tip

Setting this field is key to working with sample sheet validation and channel generation,
as described in the next section of the nf-validation docs.
Setting this field is key to working with sample sheet validation and channel generation, as described in the next section of the nf-validation docs.

These schema files are typically stored in the pipeline `assets` directory, but can be anywhere.

Expand Down Expand Up @@ -448,3 +444,41 @@ Specify a minimum / maximum value for an integer or float number length with `mi
The JSON schema doc also mention `exclusiveMinimum`, `exclusiveMaximum` and `multipleOf` keys.
Because nf-validation uses stock JSON schema validation libraries, these _should_ work for validating keys.
However, they are not officially supported within the Nextflow schema ecosystem and so some interfaces may not recognise them.

## Array-specific keys

### `uniqueItems`

All items in the array should be unique.

- See the [JSON schema docs](https://json-schema.org/understanding-json-schema/reference/array#uniqueItems)
for details.

```json
{
"type": "array",
"uniqueItems": true
}
```

### `uniqueEntries`

!!! example "Non-standard key"

The combination of all values in the given keys should be unique. For this key to work you need to make sure the array items are of type `object` and contains the keys in the `uniqueEntries` list.

```json
{
"type": "array",
"items": {
"type": "object",
"uniqueEntries": ["foo", "bar"],
"properties": {
"foo": { "type": "string" },
"bar": { "type": "string" }
}
}
}
```

This schema tells `nf-validation` that the combination of `foo` and `bar` should be unique across all objects in the array.
Loading

0 comments on commit 4113068

Please sign in to comment.