Skip to content

Commit

Permalink
Merge pull request #99 from ENCODE-DCC/dev
Browse files Browse the repository at this point in the history
v1.3.2
  • Loading branch information
leepc12 authored Oct 23, 2019
2 parents c8ac1b2 + ec67658 commit 94e0237
Show file tree
Hide file tree
Showing 137 changed files with 478 additions and 329 deletions.
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,27 +13,28 @@ This ChIP-Seq pipeline is based off the ENCODE (phase-3) transcription factor an

## Installation

1) [Install Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell).

> **IMPORTANT**: Make sure that you have python3(> 3.4.1) installed on your system.
1) Git clone this pipeline.
> **IMPORTANT*: use `~/chip-seq-pipeline2/chip.wdl` as `[WDL]` in Caper's documentation.

```bash
$ pip install caper # use pip3 if it doesn't work
$ cd
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
```

2) Follow [Caper's README](https://github.com/ENCODE-DCC/caper) carefully. Find an instruction for your platform.
> **IMPORTANT**: Configure your Caper configuration file `~/.caper/default.conf` correctly for your platform.
2) Install pipeline's [Conda environment](docs/install_conda.md) if you want to use Conda instead of Docker/Singularity. Conda is recommneded on local computer and HPCs (e.g. Stanford Sherlock/SCG). Use
> **IMPORTANT*: use `encode-chip-seq-pipeline` as `[PIPELINE_CONDA_ENV]` in Caper's documentation.

3) Git clone this pipeline.
> **IMPORTANT*: use `~/chip-seq-pipeline2/chip.wdl` as `[WDL]` in Caper's documentation.
3) **Skip this step if you have installed pipeline's Conda environment**. Caper is already included in the Conda environment. [Install Caper](https://github.com/ENCODE-DCC/caper#installation). Caper is a python wrapper for [Cromwell](https://github.com/broadinstitute/cromwell).

> **IMPORTANT**: Make sure that you have python3(> 3.4.1) installed on your system.

```bash
$ cd
$ git clone https://github.com/ENCODE-DCC/chip-seq-pipeline2
$ pip install caper # use pip3 if it doesn't work
```

4) Install pipeline's [Conda environment](docs/install_conda.md) if you want to use Conda instead of Docker/Singularity. Conda is recommneded on local computer and HPCs (e.g. Stanford Sherlock/SCG). Use
> **IMPORTANT*: use `encode-chip-seq-pipeline` as `[PIPELINE_CONDA_ENV]` in Caper's documentation.
4) Follow [Caper's README](https://github.com/ENCODE-DCC/caper) carefully. Find an instruction for your platform.
> **IMPORTANT**: Configure your Caper configuration file `~/.caper/default.conf` correctly for your platform.


## Test input JSON file

Expand All @@ -60,7 +61,7 @@ You can also run this pipeline on DNAnexus without using Caper or Cromwell. Ther

## How to organize outputs

Install [Croo](https://github.com/ENCODE-DCC/croo#installation). Make sure that you have python3(> 3.4.1) installed on your system. Find a `metadata.json` on Caper's output directory.
Install [Croo](https://github.com/ENCODE-DCC/croo#installation). **You can skip this installation if you have installed pipeline's Conda environment and activated it**. Make sure that you have python3(> 3.4.1) installed on your system. Find a `metadata.json` on Caper's output directory.

```bash
$ pip install croo
Expand Down
66 changes: 34 additions & 32 deletions chip.wdl
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# ENCODE TF/Histone ChIP-Seq pipeline
# Author: Jin Lee ([email protected])
#CAPER docker quay.io/encode-dcc/chip-seq-pipeline:v1.3.1
#CAPER singularity docker://quay.io/encode-dcc/chip-seq-pipeline:v1.3.1
#CAPER docker quay.io/encode-dcc/chip-seq-pipeline:v1.3.2
#CAPER singularity docker://quay.io/encode-dcc/chip-seq-pipeline:v1.3.2
#CROO out_def https://storage.googleapis.com/encode-pipeline-output-definition/chip.croo.json
workflow chip {
String pipeline_ver = 'v1.3.1'
String pipeline_ver = 'v1.3.2'
### sample name, description
String title = 'Untitled'
String description = 'No description'
Expand All @@ -31,6 +31,7 @@ workflow chip {
File? blacklist # blacklist BED (peaks overlapping will be filtered out)
File? blacklist2 # 2nd blacklist (will be merged with 1st one)
String? mito_chr_name
String? regex_bfilt_peak_chr_name
String? gensz # genome sizes (hs for human, mm for mouse or sum of 2nd col in chrsz)
File? tss # TSS BED file
File? dnase # open chromatin region BED file
Expand Down Expand Up @@ -88,11 +89,7 @@ workflow chip {
Int cap_num_peak_macs2 = 500000 # cap number of raw peaks called from MACS2
Float pval_thresh = 0.01 # p.value threshold
Float idr_thresh = 0.05 # IDR threshold
Boolean keep_irregular_chr_in_bfilt_peak = false
# peaks with irregular chr name will not be filtered out
# in bfilt_peak (blacklist filtered peak) file
# (e.g. chr1_AABBCC, AABR07024382.1, ...)
# reg-ex pattern for 'regular' chr name is chr[\dXY]+\b
### resources
Int align_cpu = 4
Int align_mem_mb = 20000
Expand Down Expand Up @@ -233,6 +230,8 @@ workflow chip {
else blacklist2_
String? mito_chr_name_ = if defined(mito_chr_name) then mito_chr_name
else read_genome_tsv.mito_chr_name
String? regex_bfilt_peak_chr_name_ = if defined(regex_bfilt_peak_chr_name) then regex_bfilt_peak_chr_name
else read_genome_tsv.regex_bfilt_peak_chr_name
String? genome_name_ = if defined(genome_name) then genome_name
else if defined(read_genome_tsv.genome_name) then read_genome_tsv.genome_name
else basename(select_first([genome_tsv, ref_fa_, chrsz_, 'None']))
Expand Down Expand Up @@ -756,7 +755,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_tmp[i],
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand Down Expand Up @@ -797,7 +796,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_tmp[i],
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand All @@ -823,7 +822,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_tmp[i],
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand Down Expand Up @@ -865,7 +864,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_mean.rounded_mean,
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand Down Expand Up @@ -906,7 +905,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_mean.rounded_mean,
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand All @@ -932,7 +931,7 @@ workflow chip {
pval_thresh = pval_thresh,
fraglen = fraglen_mean.rounded_mean,
blacklist = blacklist_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
cpu = call_peak_cpu,
mem_mb = call_peak_mem_mb,
Expand Down Expand Up @@ -966,7 +965,7 @@ workflow chip {
peak_type = peak_type_,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = pool_ta.ta_pooled,
}
}
Expand All @@ -988,7 +987,7 @@ workflow chip {
rank = idr_rank_,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = pool_ta.ta_pooled,
}
}
Expand All @@ -1006,7 +1005,7 @@ workflow chip {
peak_type = peak_type_,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = ta_[i],
}
}
Expand All @@ -1026,7 +1025,7 @@ workflow chip {
rank = idr_rank_,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = ta_[i],
}
}
Expand All @@ -1043,7 +1042,7 @@ workflow chip {
fraglen = fraglen_mean.rounded_mean,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = pool_ta.ta_pooled,
}
}
Expand All @@ -1061,7 +1060,7 @@ workflow chip {
rank = idr_rank_,
blacklist = blacklist_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
regex_bfilt_peak_chr_name = regex_bfilt_peak_chr_name_,
ta = pool_ta.ta_pooled,
}
}
Expand All @@ -1076,7 +1075,6 @@ workflow chip {
peak_ppr = overlap_ppr.bfilt_overlap_peak,
peak_type = peak_type_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
}
}
Expand All @@ -1089,7 +1087,6 @@ workflow chip {
peak_ppr = idr_ppr.bfilt_idr_peak,
peak_type = peak_type_,
chrsz = chrsz_,
keep_irregular_chr_in_bfilt_peak = keep_irregular_chr_in_bfilt_peak,
}
}
Expand Down Expand Up @@ -1189,6 +1186,8 @@ task align {
Array[Array[File]] tmp_fastqs = if paired_end then transpose([fastqs_R1, fastqs_R2])
else transpose([fastqs_R1])
command {
set -e

# check if pipeline dependencies can be found
if [[ -z "$(which encode_task_merge_fastq.py 2> /dev/null || true)" ]]
then
Expand All @@ -1198,7 +1197,7 @@ task align {
echo 'GCP/AWS/Docker users: Did you add --docker flag to Caper command line arg?' 1>&2
echo 'Singularity users: Did you add --singularity flag to Caper command line arg?' 1>&2
echo -e "\n" 1>&2
EXCEPTION_RAISED
exit 3
fi
python3 $(which encode_task_merge_fastq.py) \
${write_tsv(tmp_fastqs)} \
Expand Down Expand Up @@ -1518,14 +1517,16 @@ task call_peak {
Int cap_num_peak # cap number of raw peaks called from MACS2
Float pval_thresh # p.value threshold
File? blacklist # blacklist BED to filter raw peaks
Boolean keep_irregular_chr_in_bfilt_peak
String? regex_bfilt_peak_chr_name

Int cpu
Int mem_mb
Int time_hr
String disks

command {
set -e

if [ '${peak_caller}' == 'macs2' ]; then
python3 $(which encode_task_macs2_chip.py) \
${sep=' ' tas} \
Expand Down Expand Up @@ -1556,7 +1557,7 @@ task call_peak {
python3 $(which encode_task_post_call_peak_chip.py) \
$(ls *Peak.gz) \
${'--ta ' + tas[0]} \
${if keep_irregular_chr_in_bfilt_peak then '--keep-irregular-chr' else ''} \
${'--regex-bfilt-peak-chr-name "' + regex_bfilt_peak_chr_name + '"'} \
${'--chrsz ' + chrsz} \
${'--fraglen ' + fraglen} \
${'--peak-type ' + peak_type} \
Expand Down Expand Up @@ -1622,7 +1623,7 @@ task idr {
File peak_pooled
Float idr_thresh
File? blacklist # blacklist BED to filter raw peaks
Boolean keep_irregular_chr_in_bfilt_peak
String regex_bfilt_peak_chr_name
# parameters to compute FRiP
File? ta # to calculate FRiP
Int fraglen # fragment length from xcor
Expand All @@ -1642,7 +1643,7 @@ task idr {
${'--fraglen ' + fraglen} \
${'--chrsz ' + chrsz} \
${'--blacklist '+ blacklist} \
${if keep_irregular_chr_in_bfilt_peak then '--keep-irregular-chr' else ''} \
${'--regex-bfilt-peak-chr-name "' + regex_bfilt_peak_chr_name + '"'} \
${'--ta ' + ta}
}
output {
Expand Down Expand Up @@ -1670,7 +1671,7 @@ task overlap {
File peak2
File peak_pooled
File? blacklist # blacklist BED to filter raw peaks
Boolean keep_irregular_chr_in_bfilt_peak
String regex_bfilt_peak_chr_name
# parameters to compute FRiP
File? ta # to calculate FRiP
Int fraglen # fragment length from xcor (for FRIP)
Expand All @@ -1688,7 +1689,7 @@ task overlap {
${'--chrsz ' + chrsz} \
${'--blacklist '+ blacklist} \
--nonamecheck \
${if keep_irregular_chr_in_bfilt_peak then '--keep-irregular-chr' else ''} \
${'--regex-bfilt-peak-chr-name "' + regex_bfilt_peak_chr_name + '"'} \
${'--ta ' + ta}
}
output {
Expand Down Expand Up @@ -1717,7 +1718,6 @@ task reproducibility {
File? peak_ppr # Peak file from pooled pseudo replicate.
String peak_type
File chrsz # 2-col chromosome sizes file
Boolean keep_irregular_chr_in_bfilt_peak
command {
python3 $(which encode_task_reproducibility.py) \
Expand All @@ -1726,7 +1726,6 @@ task reproducibility {
${'--peak-ppr '+ peak_ppr} \
--prefix ${prefix} \
${'--peak-type ' + peak_type} \
${if keep_irregular_chr_in_bfilt_peak then '--keep-irregular-chr' else ''} \
${'--chrsz ' + chrsz}
}
output {
Expand Down Expand Up @@ -1927,6 +1926,7 @@ task read_genome_tsv {
touch tss tss_enrich # for backward compatibility
touch dnase prom enh reg2map reg2map_bed roadmap_meta
touch mito_chr_name
touch regex_bfilt_peak_chr_name

python <<CODE
import os
Expand All @@ -1950,6 +1950,8 @@ task read_genome_tsv {
String? blacklist = if size('blacklist')==0 then null_s else read_string('blacklist')
String? blacklist2 = if size('blacklist2')==0 then null_s else read_string('blacklist2')
String? mito_chr_name = if size('mito_chr_name')==0 then null_s else read_string('mito_chr_name')
String? regex_bfilt_peak_chr_name = if size('regex_bfilt_peak_chr_name')==0 then 'chr[\\dXY]+'
else read_string('regex_bfilt_peak_chr_name')
# optional data
String? tss = if size('tss')!=0 then read_string('tss')
else if size('tss_enrich')!=0 then read_string('tss_enrich') else null_s
Expand Down Expand Up @@ -1998,7 +2000,7 @@ task raise_exception {
String msg
command {
echo -e "\n* Error: ${msg}\n" >&2
EXCEPTION_RAISED
exit 2
}
output {
String error_msg = '${msg}'
Expand Down
6 changes: 3 additions & 3 deletions dev/dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@

## Command line for version change
```bash
PREV_VER=v1.3.1
NEW_VER=v1.3.1
PREV_VER=v1.3.2
NEW_VER=v1.3.2
for f in $(grep -rl ${PREV_VER} --include=*.{wdl,md,sh})
do
sed -i "s/${PREV_VER}/${NEW_VER}/g" ${f}
Expand All @@ -24,7 +24,7 @@ Run the following command line locally to build out DX workflows for this pipeli

```bash
# version
VER=v1.3.1
VER=v1.3.2
DOCKER=quay.io/encode-dcc/chip-seq-pipeline:$VER

# general
Expand Down
2 changes: 1 addition & 1 deletion dev/examples/caper/ENCSR936XTK_subsampled_chr19_only.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"chip.pipeline_type" : "tf",
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/hg38_chr19_chrM_caper.tsv",
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_caper.tsv",
"chip.fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/fastq_subsampled/rep1-R1.subsampled.67.fastq.gz"
],
"chip.fastqs_rep1_R2" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/fastq_subsampled/rep1-R2.subsampled.67.fastq.gz"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"chip.pipeline_type" : "tf",
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/hg38_chr19_chrM_caper.tsv",
"chip.genome_tsv" : "https://storage.googleapis.com/encode-pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_caper.tsv",
"chip.fastqs_rep1_R1" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/fastq_subsampled/rep1-R1.subsampled.67.fastq.gz"
],
"chip.fastqs_rep1_R2" : ["https://storage.googleapis.com/encode-pipeline-test-samples/encode-chip-seq-pipeline/ENCSR936XTK/fastq_subsampled/rep1-R2.subsampled.67.fastq.gz"
Expand Down
2 changes: 1 addition & 1 deletion dev/examples/dx/ENCSR000DYI_dx.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"chip.pipeline_type" : "tf",
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-genome-data/hg38_dx.tsv",
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-genome-data/genome_tsv/v1/hg38_dx.tsv",
"chip.fastqs_rep1_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq/rep1.fastq.gz"
],
"chip.fastqs_rep2_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq/rep2.fastq.gz"
Expand Down
2 changes: 1 addition & 1 deletion dev/examples/dx/ENCSR000DYI_subsampled_chr19_only_dx.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"chip.pipeline_type" : "tf",
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-genome-data/hg38_chr19_chrM_dx.tsv",
"chip.genome_tsv" : "dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-genome-data/genome_tsv/v1/hg38_chr19_chrM_dx.tsv",
"chip.fastqs_rep1_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep1.subsampled.25.fastq.gz"
],
"chip.fastqs_rep2_R1" : ["dx://project-BKpvFg00VBPV975PgJ6Q03v6:pipeline-test-samples/encode-chip-seq-pipeline/ENCSR000DYI/fastq_subsampled/rep2.subsampled.20.fastq.gz"
Expand Down
Loading

0 comments on commit 94e0237

Please sign in to comment.