Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Co-authored-by: Andrew Thrasher <[email protected]>
  • Loading branch information
a-frantz and adthrasher authored Feb 19, 2025
1 parent 1eccb4a commit 1324df8
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The repository is laid out as follows:

## Expected FASTQ file name conventions

The tasks and worfklows in this repository which have one or more FASTQ files as an input will also have a `prefix` input which will determine the filenames for any output files. The `prefix` input can be specified manually, or it can be left at the default value. The default value will attempt to strip common file suffixes from one of the input FASTQs and calculate an appropriate basename to be used by all output files. That calculation is performed using the POSIX ERE Regular Expression `(([_.][rR](?:ead)?[12])((?:[_.-][^_.-]*?)*?))?\.(fastq|fq)(\.gz)?$`. In plain english, this REGEX will at a minimum search for and remove the file extensions `.fastq` and `.fq` with or without a `.gz` GZIP extension. Additionally, if the FASTQ filename contains a "read number" signifier (`R1`/`R2`/`r1`/`r2`/`read1`/`read2`) somewhere before the FASTQ extension, that will be truncated off the basename. This means that _everything_ after the read indicator will be removed. If there is important information encoded in your filenames _between the read number and the final extension,_ we recommend you manually specify an appropriate `prefix` value.
The tasks and worfklows in this repository which have one or more FASTQ files as an input will also have a `prefix` input which will determine the filenames for any output files. The `prefix` input can be specified manually, or it can be left at the default value. The default value will attempt to strip common file suffixes from one of the input FASTQs and determine an appropriate basename to be used by all output files. That evaluation is performed using the POSIX ERE Regular Expression `(([_.][rR](?:ead)?[12])((?:[_.-][^_.-]*?)*?))?\.(fastq|fq)(\.gz)?$`. In plain english, this REGEX will at a minimum search for and remove the file extensions `.fastq` and `.fq` with or without a `.gz` GZIP extension. Additionally, if the FASTQ filename contains a "read number" signifier (`R1`/`R2`/`r1`/`r2`/`read1`/`read2`) somewhere before the FASTQ extension, that will be truncated off the basename. This means that _everything_ after the read indicator will be removed. If there is important information encoded in your filenames _between the read number and the final extension,_ we recommend you manually specify an appropriate `prefix` value.

### Examples

Expand Down

0 comments on commit 1324df8

Please sign in to comment.