Skip to content

guigolab/chip-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

87bbbe8 · Oct 6, 2022
Nov 25, 2021
Apr 11, 2016
Nov 25, 2021
Nov 23, 2021
Oct 6, 2022
Jul 4, 2018
Aug 30, 2017
Jul 4, 2018
Jul 4, 2018
Mar 2, 2018
Feb 20, 2017
Nov 25, 2021
Nov 23, 2021

Repository files navigation

ChIP-nf

Nextflow Build Status

A Nextflow pipeline for processing ChIP-seq data.

Installing Nextflow

Nextflow can be installed by using the following command:

curl -fsSL get.nextflow.io | bash

Running the pipeline

First you need to pull the pipeline using Nextflow:

$ nextflow pull guigolab/chip-nf
Checking guigolab/chip-nf ...
 downloaded from https://github.com/guigolab/chip-nf.git

You can get the pipeline help with the following command:

$ nextflow run chip-nf --help

N E X T F L O W  ~  version 0.24.1
Launching `guigolab/chip-nf` [nostalgic_franklin] - revision: 974a45c356 [master]

C H I P - N F ~ ChIP-seq Pipeline
---------------------------------
Run ChIP-seq analyses on a set of data.

Usage:
    chipseq-pipeline.nf --index TSV_FILE --genome GENOME_FILE [OPTION]...

Options:
    --help                              Show this message and exit.
    --index TSV_FILE                    Tab separted file containing information about the data.
    --genome GENOME_FILE                Reference genome file.
    --genome-index GENOME_INDEX_ FILE   Reference genome index file.
    --genome-size GENOME_SIZE           Reference genome size for MACS2 callpeaks. Must be one of
                                        MACS2 precomputed sizes: hs, mm, dm, ce. (Default: hs)
    --mismatches MISMATCHES             Sets the maximum number/percentage of mismatches allowed for a read (Default: 2).
    --multimaps MULTIMAPS               Sets the maximum number of mappings allowed for a read (Default: 10).
    --min-matched-bases BASES           Sets the minimum number/percentage of bases that have to match with the reference (Default: 0.80).
    --quality-threshold THRESHOLD       Sets the sequence quality threshold for a base to be considered as low-quality (Default: 26).
    --fragment-length LENGTH            Sets the fragment length globally for all samples (Default: 200).
    --remove-duplicates                 Remove duplicate alignments instead of just flagging them (Default: false).
    --rescale                           Rescale peak scores to conform to the format supported by the
                                        UCSC genome browser (score must be <1000) (Default: false).
    --shift                             Move fragments ends and apply global extsize in peak calling. (Default: false).

Input

The input data and metadata should be specified using a tab separated file and passing it to the pipeline command with the option --index. Here is an example of the file format:

sample1    sample1_run1     /path/to/sample1_run1.fastq.gz    -           H3
sample1    sample1_run2     /path/to/sample1_run2.fastq.gz    -           H3
sample1    sample1_run3     /path/to/sample1_run3.fastq.gz    -           H3
sample1    sample1_run4     /path/to/sample1_run4.fastq.gz    -           H3
sample2    sample2_run1     /path/to/sample2_run1.fastq.gz    control1    H3K4me2
control1   control1_run1    /path/to/control1_run1.fastq.gz   control1    input

The fields in the file correspond to:

  1. identifier used for merging the BAM files

  2. single run identifier

  3. path to the fastq file to be processed

  4. identifier of the input or - if no control is used

  5. mark/histone or input if the line refers to a control

  6. optional sample fragment length. If not specified the fragment length is estimated using SPP

Output

The pipeline will produce the following output data:

  • Alignments

  • pileupSignal, pileup signal tracks

  • fcSignal, fold enrichment signal tracks

  • pvalueSignal, -log_10(P) signal tracks

  • narrowPeak, peak locations with peak summit, pvalue and qvalue (BED6+4)

  • broadPeak, similar to narrowPeak (BED6+3)

  • gappedPeak, both narrow and broad peaks (BED12+3)

Check MACS2 output files for details.

The output data information is written to a file called chipseq-pipeline.db created in the folder from where the pipeline is run. Here is an example of the db file:

sample1   /path/to/results/peakOut/sample1.pileup_signal.bw    H3         255     pileupSignal    0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.narrowPeak    H3         255     narrowPeak      0.9960   0.4393
sample1   /path/to/results/sample1.bam                         H3         255     Alignments      0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.gappedPeak    H3         255     gappedPeak      0.9960   0.4393
sample1   /path/to/results/peakOut/sample1_peaks.broadPeak     H3         255     broadPeak       0.9960   0.4393
sample2   /path/to/results/peakOut/sample2_peaks.gappedPeak    H3K4me2    200     gappedPeak      0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.fc_signal.bw        H3K4me2    200     fcSignal        0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.pval_signal.bw      H3K4me2    200     pvalueSignal    0.9995   0.7216
sample2   /path/to/results/peakOut/sample2_peaks.broadPeak     H3K4me2    200     broadPeak       0.9995   0.7216
sample2   /path/to/results/peakOut/sample2.pileup_signal.bw    H3K4me2    200     pileupSignal    0.9995   0.7216
sample2   /path/to/results/peakOut/sample2_peaks.narrowPeak    H3K4me2    200     narrowPeak      0.9995   0.7216
sample2   /path/to/results/sample2_GCCAAT_primary.bam          H3K4me2    200     Alignments      0.9995   0.7216

The fields in the file correspond to:

  1. merge identifier

  2. path

  3. mark/histone

  4. (estimated) fragment length

  5. data type

  6. NRF (Nonredundant Fraction)

  7. FRiP (Fraction of Reads in Peaks)