merge_fastq is a Nextflow pipeline that merges FastQ files from different lanes
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Given a list of FastQ files, it merges, for each sample, all the files from different lanes, so that at the end there is only one R1 and R2 file per sample
This pipeline is based on the merge_and_rename_NGI_fastq_files.py standalone script from SciLifeLab. FastQ files are expected to have the typical Illumina naming convention (Ex: SampleName_S1_L001_R1_001.fastq.gz) to make sure that lanes are merged correctly.
Specifically, filenames have to match the following regular expression:
^(.+)_S[0-9]+(_.+)*_R([1-2])_
Example:
fastq_files/E3387-3t_S10_L001_R1_001.fastq.gz
fastq_files/E3387-3t_S10_L003_R1_001.fastq.gz
fastq_files/E3387-3t_S11_L001_R1_001.fastq.gz
fastq_files/E3387-3t_S11_L002_R1_001.fastq.gz
are merged as ./E3387-3t_R1.fastq.gz
-
module load blic-modules
-
module load nf/merge_fastq
. Start running your own analysis!
merge_fastq --inputdir <INPUTDIR> [--outdir <OUTDIR>]
merge_fastq was developed by LC and built around this script from SciLifeLab
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.