Skip to content

Commit

Permalink
Recent Issues (#49)
Browse files Browse the repository at this point in the history
* Remove old sample output. Update sample output. Add Docker hints. Remove ENTRYPOINT.

* fix: Update md5 for tie bam

* bump star version. improve star script.

* docs: add WDL. Add singularity. general updates.

* bump bwa to match workflows
  • Loading branch information
adthrasher authored Nov 28, 2023
1 parent c3f1520 commit e1c7539
Show file tree
Hide file tree
Showing 49 changed files with 303 additions and 164 deletions.
20 changes: 9 additions & 11 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,18 @@ RUN pip3 install --ignore-installed \
html5lib

RUN cd /tmp \
&& wget https://github.com/lh3/bwa/releases/download/v0.7.13/bwa-0.7.13.tar.bz2 \
&& echo "559b3c63266e5d5351f7665268263dbb9592f3c1c4569e7a4a75a15f17f0aedc *bwa-0.7.13.tar.bz2" | sha256sum --check \
&& tar xf bwa-0.7.13.tar.bz2 \
&& cd bwa-0.7.13 \
&& wget https://github.com/lh3/bwa/releases/download/v0.7.17/bwa-0.7.17.tar.bz2 \
&& echo "de1b4d4e745c0b7fc3e107b5155a51ac063011d33a5d82696331ecf4bed8d0fd *bwa-0.7.17.tar.bz2" | sha256sum --check \
&& tar xf bwa-0.7.17.tar.bz2 \
&& cd bwa-0.7.17 \
&& make -j$(nproc) \
&& mv bwa /usr/local/bin

RUN cd /tmp \
&& wget https://github.com/alexdobin/STAR/archive/2.7.1a.tar.gz \
&& echo "9a35bf4e8a12bec505e11132bc53f94671f596584a6a0dd8f237120dd0df740e *2.7.1a.tar.gz" | sha256sum --check \
&& tar xf 2.7.1a.tar.gz \
&& mv STAR-2.7.1a/bin/Linux_x86_64_static/STAR /usr/local/bin
&& wget https://github.com/alexdobin/STAR/archive/refs/tags/2.7.10a.tar.gz \
&& echo "af0df8fdc0e7a539b3ec6665dce9ac55c33598dfbc74d24df9dae7a309b0426a *2.7.10a.tar.gz" | sha256sum --check \
&& tar xf 2.7.10a.tar.gz \
&& mv STAR-2.7.10a/bin/Linux_x86_64_static/STAR /usr/local/bin

# bz2 and lzma support is for CRAM files. curses is for `samtools tview`.
RUN cd /tmp \
Expand Down Expand Up @@ -101,6 +101,4 @@ COPY --chmod=755 --from=builder /opt/picard /opt/picard
COPY --chmod=755 --from=builder /opt/xenocp /opt/xenocp
COPY --chmod=755 --from=builder /opt/xenocp/bin/* /usr/local/bin/

COPY --chmod=755 cwl /opt/xenocp/cwl

ENTRYPOINT ["cwl-runner", "--parallel", "--outdir", "results", "--no-container", "/opt/xenocp/cwl/xenocp.cwl"]
COPY --chmod=755 cwl /opt/xenocp/cwl
94 changes: 85 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# XenoCP

- [XenoCP](#xenocp)
- [Quick Start](#quick-start)
- [Introduction to XenoCP](#introduction-to-xenocp)
- [Reference Files](#reference-files)
- [BWA for DNA Reads](#bwa-for-dna-reads)
- [STAR for RNA Reads](#star-for-rna-reads)
- [Local Usage without Docker](#local-usage-without-docker)
- [Prerequisites](#prerequisites)
- [Obtain and Build XenoCP](#obtain-and-build-xenocp)
- [Inputs](#inputs)
- [Run](#run)
- [Local Usage with Docker](#local-usage-with-docker)
- [Build Docker image](#build-docker-image)
- [Run](#run-1)
- [Singularity as a Docker alternative](#singularity-as-a-docker-alternative)
- [WDL workflow](#wdl-workflow)
- [WDL reference files](#wdl-reference-files)
- [Running WDL](#running-wdl)
- [Evaluate test data results](#evaluate-test-data-results)
- [St. Jude Cloud](#st-jude-cloud)
- [Availability](#availability)
- [Seeking help](#seeking-help)
- [Citing XenoCP](#citing-xenocp)
- [Common Issues](#common-issues)

XenoCP is a tool for cleansing mouse reads in xenograft BAMs.
XenoCP can be easily incorporated into any workflow, as it takes a BAM file
as input and efficiently cleans up the mouse contamination. The output is a clean
Expand Down Expand Up @@ -141,8 +166,8 @@ aligner: "bwa aln"
For example, a prefix of `MGSCv37.fa` would assume for bwa alignment that
the following files in the same directory exist:
`MGSCv37.fa.amb`, `MGSCv37.fa.ann`, `MGSCv37.fa.bwt`,
`MGSCv37.fa.pac`, and `MGSCv37.fa.sa`.
For STAR alignment, `ref_db_prefix` should be a directory and
`MGSCv37.fa.pac`, and `MGSCv37.fa.sa`. `index` should be the path to that folder.
For STAR alignment, `index` should be a directory and
it would assume the following files exist in the directory:
`chrLength.txt`, `chrNameLength.txt`, `chrName.txt`, `chrStart.txt`,
`exonGeTrInfo.tab`, `exonInfo.tab`, `geneInfo.tab`, `Genome`,
Expand Down Expand Up @@ -195,10 +220,10 @@ $ docker build --tag xenocp .

### Run

The Docker image uses `cwl-runner cwl/xenocp.cwl` as its entrypoint.
The Docker image does not provide an entrypoint.

The image assumes three working directories: `/data` for inputs, `/references` for
reference files, and `/results` for outputs. `/data` and `/references` can be
The image assumes three working directories: `/data` for inputs, `/reference` for
reference files, and `/results` for outputs. `/data` and `/reference` can be
read-only, where as `/results` needs write access.

The paths given in the input parameters file must be from inside the
Expand All @@ -208,13 +233,16 @@ container, not the host, e.g.,
bam:
class: File
path: /data/sample.bam
ref_db_prefix: /reference/ref.fa
ref_db_prefix: ref.fa
index:
class: Directory
path: /reference
aligner: "bwa aln"
```

The following is an example `run` command where files are stored in `test/{data,reference}`. Outputs are saved in `test/results`.
The following is an example `run` command where the data files are stored in the current directory under `sample_data/input_data`. Outputs are saved in `results` in the current directory. The path to the reference files on the host machine needs to be provided.

This example assumes you are running against Mus musculus (genome build MGSCv37). Set the path to the folder containing your reference data
This example assumes you are running against *Mus musculus* (genome build MGSCv37). Set the path to the folder containing your reference data
and run the following command to produce output from the included sample data. Test output for comparison is located at `sample_data/output_data`.

```
Expand All @@ -223,7 +251,33 @@ $ docker run \
--mount type=bind,source=$(pwd)/sample_data/input_data,target=/data,readonly \
--mount type=bind,source=/path/to/reference,target=/reference,readonly \
--mount type=bind,source=$(pwd)/results,target=/results \
xenocp \
ghcr.io/stjude/xenocp:latest \
cwl-runner \
--parallel \
--outdir results \
--no-container \
/opt/xenocp/cwl/xenocp.cwl \
/data/inputs.yml
```

### Singularity as a Docker alternative

Singularity is an experimental container solution that is an HPC-friendly alternative to Docker. For many reasons, `singularity` is not a drop-in replacement for Docker. Many applications require modification to fully run with `singularity`. This alternative is provided on a best-effort basis. If issues are encountered, please open an issue on this repository with details and the maintainers will try to provide support as possible.

```
$ mkdir $(pwd)/results
$ singularity run \
--containall \ # Isolate container from host
-W /path/to/directory \ # Provide a directory with sufficient space to use for working directory
-B $(pwd)/sample_data/input_data:/data \
-B /path/to/reference:/reference \
-B $(pwd)/results:/results \
docker://ghcr.io/stjude/xenocp:latest \
cwl-runner \
--parallel \
--outdir results \
--no-container \
/opt/xenocp/cwl/xenocp.cwl \
/data/inputs.yml
```

Expand All @@ -232,8 +286,30 @@ default temporary file location, /tmp, is small. To solve this, include
`-W <dir>` when executing via Singularity to redirect temp files to a
larger directory `<dir>`.

Note: By default, `singularity` makes many host resources available inside the container. This is in contrast with Docker's native isolation. This also tends to cause conflicts and errors when running Docker-based workflows. Therefore we recommend always using the `--containall` option to Singularity.

[Dockerfile]: ./Dockerfile

## WDL workflow

XenoCP includes a [WDL](https://github.com/openwdl/wdl) workflow implementation. This can be run locally or on a supported HPC system. It can also use Docker or Singularity for containerization.

### WDL reference files

As of v1.2, WDL does not support directory inputs. Therefore the reference files provided to the WDL workflow must be compressed (`.tar.gz`) before running. The compressed reference files can be downloaded from [Zenodo](https://zenodo.org/uploads/10162103).

### Running WDL

To run the WDL workflow, you will need a WDL engine. We suggest [miniwdl](https://github.com/chanzuckerberg/miniwdl), though the [Cromwell](https://github.com/broadinstitute/cromwell/) engine should work, but is untested with XenoCP.

After acquiring the reference files for your chosen aligner, you can run the sample data through the WDL workflow with the following command.

```
miniwdl run https://raw.githubusercontent.com/stjude/XenoCP/main/wdl/workflows/xenocp.wdl input_bam=https://github.com/stjude/XenoCP/raw/main/sample_data/input_data/SJRB001_X.subset.bam input_bai=https://github.com/stjude/XenoCP/raw/main/sample_data/input_data/SJRB001_X.subset.bam.bai reference_tar_gz=MGSCv37_bwa.tar.gz aligner='bwa aln'
```

This will run all of the steps on the local machine with Docker. The WDL runner `miniwdl` supports alternative execution modes, such as the [Singularity](https://miniwdl.readthedocs.io/en/latest/runner_backends.html#singularity-beta) container engine, [Slurm](https://github.com/miniwdl-ext/miniwdl-slurm) for batch systems, and [LSF](https://github.com/adthrasher/miniwdl-lsf) for batch systems. Alternative execution modes can be specified using `miniwdl`'s [configuration system](https://miniwdl.readthedocs.io/en/latest/runner_reference.html#configuration).

## Evaluate test data results

If you have [bcftools] and a [GRCh37-lite] reference file, the following will show two variants in the input file.
Expand Down
8 changes: 8 additions & 0 deletions cwl/bwa_alignse_onlymapped.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,18 @@ hints:
specs: ["bwa aln", "bwa samse"]
tweak_sam:
specs: ["java.sh org.stjude.compbio.sam.TweakSam"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
ref_db_prefix:
type: string
inputBinding:
position: 1
valueFrom: |
${
return inputs.index.path + "/" + self;
}
input_fastq:
type: File
inputBinding:
Expand All @@ -26,6 +32,8 @@ inputs:
label: Must be an output bam file name, not an absolute path
inputBinding:
position: 3
index:
type: Directory

outputs:
bam:
Expand Down
8 changes: 8 additions & 0 deletions cwl/bwa_mem_onlymapped.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ hints:
specs: ["bwa mem"]
tweak_sam:
specs: ["java.sh org.stjude.compbio.sam.TweakSam"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

requirements:
ResourceRequirement:
Expand All @@ -22,6 +24,10 @@ inputs:
type: string
inputBinding:
position: 1
valueFrom: |
${
return inputs.index.path + "/" + self;
}
input_fastq:
type: File
inputBinding:
Expand All @@ -31,6 +37,8 @@ inputs:
label: Must be an output bam file name, not an absolute path
inputBinding:
position: 3
index:
type: Directory

outputs:
bam:
Expand Down
6 changes: 5 additions & 1 deletion cwl/cat.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,11 @@ doc: |
Merge a set of files into file using the cat utility.

requirements:
- class: InlineJavascriptRequirement
InlineJavascriptRequirement: {}

hints:
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

baseCommand: cat

Expand Down
2 changes: 2 additions & 0 deletions cwl/create_contam_lists.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ hints:
packages:
create_contam_list:
specs: [ "java.sh org.stjude.compbio.xenocp.CreateContamLists" ]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
input_bam:
Expand Down
9 changes: 9 additions & 0 deletions cwl/extract.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@ steps:
out: [out_bam]
run:
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"
stdout: other.bam
inputs:
bam:
Expand All @@ -62,6 +65,9 @@ steps:
scatter: chroms
run:
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"
inputs:
chroms:
type: string
Expand All @@ -88,6 +94,9 @@ steps:
out: [unmapped_bam]
run:
class: CommandLineTool
hints:
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"
stdout: unmapped.bam
inputs:
bam:
Expand Down
2 changes: 2 additions & 0 deletions cwl/get_chroms.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ hints:
packages:
bam_to_chr:
specs: ["bam_to_chrs.sh"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

baseCommand: bam_to_chrs.sh

Expand Down
4 changes: 4 additions & 0 deletions cwl/merge_markdup_index.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@ doc: |
requirements:
- class: InlineJavascriptRequirement

hints:
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

baseCommand: merge_markdup_index.sh

inputs:
Expand Down
2 changes: 2 additions & 0 deletions cwl/qc_bam.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ hints:
packages:
qclib:
specs: ["qclib.sh"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
bam:
Expand Down
2 changes: 2 additions & 0 deletions cwl/sort_flagstat.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ hints:
samtools:
specs: ["samtools flagstat"]
version: ["1.3.1"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
input_bam:
Expand Down
2 changes: 2 additions & 0 deletions cwl/split_sam.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ hints:
packages:
SplitSam:
specs: [ "SplitSam.java" ]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
suffix_length:
Expand Down
8 changes: 7 additions & 1 deletion cwl/star_onlymapped.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,14 @@ hints:
specs: ["STAR"]
tweak_sam:
specs: ["java.sh org.stjude.compbio.sam.TweakSam"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
ref_db_prefix:
type: string
inputBinding:
position: 1
position: 4
input_fastq:
type: File
inputBinding:
Expand All @@ -31,6 +33,10 @@ inputs:
label: Must be an output bam file name, not an absolute path
inputBinding:
position: 3
index:
type: Directory
inputBinding:
position: 1

outputs:
bam:
Expand Down
2 changes: 2 additions & 0 deletions cwl/tweak_sam.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ hints:
packages:
create_contam_list:
specs: [ "java.sh org.stjude.compbio.sam.TweakSam" ]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
input_bam:
Expand Down
2 changes: 2 additions & 0 deletions cwl/view_awk_picard.cwl
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ hints:
samtools:
specs: ["samtools view"]
version: ["1.3.1"]
DockerRequirement:
dockerPull: "ghcr.io/stjude/xenocp:latest"

inputs:
input_bam:
Expand Down
Loading

0 comments on commit e1c7539

Please sign in to comment.