diff --git a/README.md b/README.md
index fe8a20f..255debe 100644
--- a/README.md
+++ b/README.md
@@ -6,7 +6,7 @@
# Clair3 - Integrating pileup and full-alignment for high-performance long-read variant calling
-[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)
+[![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause)[![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg?style=flat)](http://bioconda.github.io/recipes/clair3/README.html)
Contact: Ruibang Luo
@@ -26,7 +26,9 @@ Clair3 is the 3rd generation of [Clair](https://github.com/HKU-BAL/Cl
We are actively fixing bugs and issues in Clair3 reported by users.
-*v0.1-r3 (Jun 9)* : 1. added `ulimit -u` (max user processes) check (lowers the `THREADS` if the resource is insufficient) and automatic retries on failed jobs ([#20](https://github.com/HKU-BAL/Clair3/issues/20), [#23](https://github.com/HKU-BAL/Clair3/issues/23), [#24](https://github.com/HKU-BAL/Clair3/issues/24)). 2. Added an ONT Guppy5 model to the images (`ont_guppy5`). Click [here](docs/guppy5.md) for more benchmarks on the Guppy5 model and data.
+*v0.1-r4 (Jun 28)* : 1. Install via [bioconda](#option-3--bioconda). 2. Added an ONT Guppy2 model to the images (`ont_guppy2`). Click [here](docs/guppy2.md) for more benchmarking results. The results show you have to use the Guppy2 model for Guppy2 or earlier data. 3. Added [google colab notebooks](colab) for quick demo. 4. Fixed a bug then there are too few variant candidates ([#28](https://github.com/HKU-BAL/Clair3/issues/28)).
+
+*v0.1-r3 (Jun 9)* : 1. Added `ulimit -u` (max user processes) check (lowers the `THREADS` if the resource is insufficient) and automatic retries on failed jobs ([#20](https://github.com/HKU-BAL/Clair3/issues/20), [#23](https://github.com/HKU-BAL/Clair3/issues/23), [#24](https://github.com/HKU-BAL/Clair3/issues/24)). 2. Added an ONT Guppy5 model to the images (`ont_guppy5`). Click [here](docs/guppy5.md) for more benchmarks on the Guppy5 model and data.
*v0.1-r2 (May 23)* : 1. Fixed BED file out of range error ([#12](https://github.com/HKU-BAL/Clair3/issues/12)). 2. Added support for both `.bam.bai` and `.bai` BAM index filename ([#10](https://github.com/HKU-BAL/Clair3/issues/10)). 3. Added some boundary checks on inputs. 4. Added version checks on required packages and utilities. 5. Increased pipeline robusity.
@@ -39,7 +41,6 @@ We are actively fixing bugs and issues in Clair3 reported by users.
## We are working on ...
* A paper on detailed methods and benchmarks.
-* A model trained with Guppy2 data. The available ONT models are tested and work well with Guppy3 and Guppy4 data, but perform even worse than Clair on Guppy2 data.
---
@@ -51,11 +52,16 @@ We are actively fixing bugs and issues in Clair3 reported by users.
* [Installation](#installation)
+ [Option 1. Docker pre-built image](#option-1--docker-pre-built-image)
+ [Option 2. Singularity](#option-2-singularity)
- + [Option 3. Build an anaconda virtual environment](#option-3-build-an-anaconda-virtual-environment)
- + [Option 4. Docker Dockerfile](#option-4-docker-dockerfile)
+ + [Option 3. Bioconda](#option-3--bioconda)
+ + [Option 4. Build an anaconda virtual environment](#option-4-build-an-anaconda-virtual-environment)
+ + [Option 5. Docker Dockerfile](#option-5-docker-dockerfile)
* [Quick Demo](#quick-demo)
* [Usage](#usage)
* [Folder Structure and Submodule Descriptions](#folder-structure-and-submodule-descriptions)
+* [Pre-trained Models](#pre-trained-models)
+ * [Guppy5 Model](docs/guppy5.md)
+ * [Guppy3-4 Model](#pre-trained-models)
+ * [Guppy2 Model](docs/guppy2.md)
* [Training Data](#training-data)
* [VCF/GVCF Output Formats](#vcfgvcf-output-formats)
* [Pileup Model Training](docs/pileup_training.md)
@@ -116,7 +122,7 @@ A pre-built docker image is available [here](https://hub.docker.com/r/hkubal/cla
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
@@ -124,7 +130,7 @@ docker run -it \
hkubal/clair3:"${BIN_VERSION}" \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
@@ -141,27 +147,53 @@ Check [Usage](#Usage) for more options.
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
conda config --add channels defaults
conda create -n singularity-env -c conda-forge singularity -y
conda activate singularity-env
# singularity pull docker pre-built image
-singularity pull docker://hkubal/clair3:v0.1-r3
+singularity pull docker://hkubal/clair3:v0.1-r4
# run clair3 like this afterward
singularity exec clair3_"${BIN_VERSION}".sif \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
--output=${OUTPUT_DIR} ## absolute output path prefix
```
-### Option 3. Build an anaconda virtual environment
+### Option 3. Bioconda
+
+*For using Clair3 with Illumina data, additional installation steps are needed. Please follow this [guide](docs/quick_demo/illumina_quick_demo.md#step-2-install-boost-graph-library-for-illumina-realignment-process) for the additional steps.*
+
+```bash
+# make sure channels are added in conda
+conda config --add channels defaults
+conda config --add channels bioconda
+conda config --add channels conda-forge
+
+# create conda environment named "clair3"
+conda create -n clair3 -c bioconda clair3 python=3.6.10 -y
+conda activate clair3
+
+# run clair3 like this afterward
+run_clair3.sh \
+ --bam_fn=input.bam \ ## change your bam file name here
+ --ref_fn=ref.fa \ ## change your reference file name here
+ --threads=${THREADS} \ ## maximum threads to be used
+ --platform="ont" \ ## options: {ont,hifi,ilmn}
+ --model_path="${CONDA_PREFIX}/bin/models/ont" \
+ --output=${OUTPUT_DIR} ## output path prefix
+```
+
+Check [Usage](#Usage) for more options. [Pre-trained models](#pre-trained-models) are already included in the bioconda package.
+
+### Option 4. Build an anaconda virtual environment
**Anaconda install**:
@@ -175,7 +207,7 @@ chmod +x ./Miniconda3-latest-Linux-x86_64.sh
**Install Clair3 using anaconda step by step:**
-*For using Clair3 on Illumina data, additional installation steps after the following steps are mandatory. Please follow this [guide](docs/quick_demo/illumina_quick_demo.md#step-2-install-boost-graph-library-for-illumina-realignment-process) for the additional steps.*
+*For using Clair3 on Illumina data, additional installation steps after the following steps are mandatory. Please follow this [guide](https://github.com/HKU-BAL/Clair3/blob/main/docs/quick_demo/illumina_quick_demo.md#step-2-install-boost-graph-library-for-illumina-realignment-process) for the additional steps.*
```bash
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. ./input
@@ -189,12 +221,11 @@ source activate clair3
# install pypy and packages in the environemnt
conda install -c conda-forge pypy3.6 -y
pypy3 -m ensurepip
-pypy3 -m pip install intervaltree==3.0.2
pypy3 -m pip install mpmath==1.2.1
# install python packages in environment
pip3 install tensorflow==2.2.0
-pip3 install intervaltree==3.0.2 tensorflow-addons==0.11.2 tables==3.6.1
+pip3 tensorflow-addons==0.11.2 tables==3.6.1
conda install -c anaconda pigz==2.4 -y
conda install -c conda-forge parallel=20191122 zstd=1.4.4 -y
conda install -c conda-forge -c bioconda samtools=1.10 -y
@@ -212,25 +243,25 @@ tar -zxvf clair3_models.tar.gz -C ./models
# run clair3
./run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path=`pwd`"/models/ont" \ ## model path prefix, change platform accordingly
--output=${OUTPUT_DIR} ## output path prefix
```
-### Option 4. Docker Dockerfile
+### Option 5. Docker Dockerfile
This is the same as option 1 except that you are building a docker image yourself. Please refer to option 1 for usage.
```bash
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
# clone Clair3
git clone https://github.com/hku-bal/Clair3.git
cd Clair3
-# build a docker image named hkubal/clair3:v0.1-r3
+# build a docker image named hkubal/clair3:v0.1-r4
# might require docker authentication to build docker image
docker build -f ./Dockerfile -t hkubal/clair3:"${BIN_VERSION}" .
@@ -311,7 +342,7 @@ CONTIGS_LIST="[YOUR_CONTIGS_LIST]" # e.g "chr21" or "chr21,chr22"
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
@@ -319,7 +350,7 @@ docker run -it \
hkubal/clair3:"${BIN_VERSION}" \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
@@ -334,7 +365,7 @@ KNOWN_VARIANTS_VCF="[YOUR_VCF_PATH]" # e.g. /home/user1/known_variants.vcf.gz
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
@@ -342,7 +373,7 @@ docker run -it \
hkubal/clair3:"${BIN_VERSION}" \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
@@ -369,7 +400,7 @@ BED_FILE_PATH="[YOUR_BED_FILE]" # e.g. /home/user1/tmp.bed (absolute path
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
@@ -377,7 +408,7 @@ docker run -it \
hkubal/clair3:"${BIN_VERSION}" \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
@@ -391,7 +422,7 @@ docker run -it \
INPUT_DIR="[YOUR_INPUT_FOLDER]" # e.g. /home/user1/input (absolute path needed)
OUTPUT_DIR="[YOUR_OUTPUT_FOLDER]" # e.g. /home/user1/output (absolute path needed)
THREADS="[MAXIMUM_THREADS]" # e.g. 8
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
docker run -it \
-v ${INPUT_DIR}:${INPUT_DIR} \
@@ -399,7 +430,7 @@ docker run -it \
hkubal/clair3:"${BIN_VERSION}" \
/opt/bin/run_clair3.sh \
--bam_fn=${INPUT_DIR}/input.bam \ ## change your bam file name here
- --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference name here
+ --ref_fn=${INPUT_DIR}/ref.fa \ ## change your reference file name here
--threads=${THREADS} \ ## maximum threads to be used
--platform="ont" \ ## options: {ont,hifi,ilmn}
--model_path="/opt/models/ont" \ ## absolute model path prefix, change platform accordingly
@@ -459,13 +490,14 @@ Please find more details about the training data and links at [Training Data](do
Download models from [here](http://www.bio8.cs.hku.hk/clair3/clair3_models/) or click on the links below.
-| File | Platform | Training samples | Included in the docker image | Release | Date | Basecaller | Link |
-| :---------------: | :---------: | :----------------------------------------------------------: | :--------------------------: | :-----: | :------: | :--------: | :----------------------------------------------------------: |
-| ont.tar.gz | ONT | HG001,2,4,5 | Yes | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont.tar.gz) |
-| ont_1235.tar.gz | ONT | HG001,2,3,5 | | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_1235.tar.gz) |
-| ont_guppy5.tar.gz | ONT | Base model: HG001,2,4,5 (Guppy3,4)
Fine-tuning data: HG002 (Guppy5_sup) | Yes | 1 | 20210609 | Guppy5 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_guppy5.tar.gz) |
-| hifi.tar.gz | PacBio HiFi | HG001,2,4,5 | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/hifi.tar.gz) |
-| ilmn.tar.gz | Illumina | HG001,2,4,5 | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ilmn.tar.gz) |
+| File | Platform | Training samples | Included in the bioconda package | Included in the docker image | Release | Date | Basecaller | Link |
+| :---------------: | :---------: | :----------------------------------------------------------: | -------------------------------- | :--------------------------: | :-----: | :------: | :--------: | :----------------------------------------------------------: |
+| ont.tar.gz | ONT | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont.tar.gz) |
+| ont_1235.tar.gz | ONT | HG001,2,3,5 | | | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_1235.tar.gz) |
+| ont_guppy5.tar.gz | ONT | Base model: HG001,2,4,5 (Guppy3,4)
Fine-tuning data: HG002 (Guppy5_sup) | Yes | Yes | 1 | 20210609 | Guppy5 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_guppy5.tar.gz) |
+| ont_guppy2.tar.gz | ONT | HG001,2,3,4 | | Yes | 1 | 20210627 | Guppy2 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_guppy2.tar.gz) |
+| hifi.tar.gz | PacBio HiFi | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/hifi.tar.gz) |
+| ilmn.tar.gz | Illumina | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ilmn.tar.gz) |
----
@@ -473,4 +505,4 @@ Download models from [here](http://www.bio8.cs.hku.hk/clair3/clair3_models/) or
Clair3 supports both VCF and GVCF output formats. Clair3 uses VCF version 4.2 specifications. Specifically, Clair3 adds a `P` INFO tag to the results called using a pileup model, and a `F` INFO tag to the results called using a full-alignment model.
-Clair3 outputs a GATK-compatible GVCF format that passes GATK's `ValidateVariants` module. Different from DeepVariant that uses `<*>` to represent any possible alternative allele, Clair3 uses ``, the same as GATK.
+Clair3 outputs a GATK-compatible GVCF format that passes GATK's `ValidateVariants` module. Different from DeepVariant that uses `<*>` to represent any possible alternative allele, Clair3 uses ``, the same as GATK.
\ No newline at end of file
diff --git a/docs/quick_demo/illumina_quick_demo.md b/docs/quick_demo/illumina_quick_demo.md
index b557376..a918d8d 100644
--- a/docs/quick_demo/illumina_quick_demo.md
+++ b/docs/quick_demo/illumina_quick_demo.md
@@ -49,7 +49,7 @@ echo -e "${CONTIGS}\t${START_POS}\t${END_POS}" > ${INPUT_DIR}/quick_demo.bed
### Option 1. Docker pre-built image
```bash
-BIN_VERSION='v0.1-r3'
+BIN_VERSION='v0.1-r4'
THREADS=4
cd ${OUTPUT_DIR}
@@ -124,6 +124,8 @@ conda install -c conda-forge boost=1.67.0 -y
echo "Environment:" ${CONDA_PREFIX}
# Make sure in Clair3 directory
cd Clair3
+# cd ${CONDA_PREFIX}/bin if installing Clair3 using bioconda
+
cd preprocess/realign
g++ -std=c++14 -O1 -shared -fPIC -o realigner ssw_cpp.cpp ssw.c realigner.cpp
g++ -std=c++11 -shared -fPIC -o debruijn_graph -O3 debruijn_graph.cpp -I ${CONDA_PREFIX}/include -L ${CONDA_PREFIX}/lib
diff --git a/docs/quick_demo/ont_quick_demo.md b/docs/quick_demo/ont_quick_demo.md
index 6609677..05d64ab 100644
--- a/docs/quick_demo/ont_quick_demo.md
+++ b/docs/quick_demo/ont_quick_demo.md
@@ -49,7 +49,7 @@ echo -e "${CONTIGS}\t${START_POS}\t${END_POS}" > ${INPUT_DIR}/quick_demo.bed
```bash
THREADS=4
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
cd ${OUTPUT_DIR}
# Run Clair3 using one command
diff --git a/docs/quick_demo/pacbio_hifi_quick_demo.md b/docs/quick_demo/pacbio_hifi_quick_demo.md
index 491349b..e9961aa 100644
--- a/docs/quick_demo/pacbio_hifi_quick_demo.md
+++ b/docs/quick_demo/pacbio_hifi_quick_demo.md
@@ -50,7 +50,7 @@ echo -e "${CONTIGS}\t${START_POS}\t${END_POS}" > ${INPUT_DIR}/quick_demo.bed
```bash
THREADS=4
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
cd ${OUTPUT_DIR}
# Run Clair3 using one command
diff --git a/docs/training_data.md b/docs/training_data.md
index 00cf399..8a31ec0 100644
--- a/docs/training_data.md
+++ b/docs/training_data.md
@@ -54,9 +54,11 @@
Download models from [here](http://www.bio8.cs.hku.hk/clair3/clair3_models/) or click on the links below.
-| File | Platform | Training Samples | In the docker image by default | Link |
-| :-------------: | :---------: | :--------------: | :----------------------------: | :----------------------------------------------------------: |
-| ont.tar.gz | ONT | HG001,2,4,5 | Yes | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont.tar.gz) |
-| ont_1235.tar.gz | ONT | HG001,2,3,5 | | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_1235.tar.gz) |
-| hifi.tar.gz | PacBio HiFi | HG001,2,4,5 | Yes | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/hifi.tar.gz) |
-| ilmn.tar.gz | Illumina | HG001,2,4,5 | Yes | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ilmn.tar.gz) |
\ No newline at end of file
+| File | Platform | Training samples | Included in the bioconda package | Included in the docker image | Release | Date | Basecaller | Link |
+| :---------------: | :---------: | :----------------------------------------------------------: | -------------------------------- | :--------------------------: | :-----: | :------: | :--------: | :----------------------------------------------------------: |
+| ont.tar.gz | ONT | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont.tar.gz) |
+| ont_1235.tar.gz | ONT | HG001,2,3,5 | | | 1 | 20210517 | Guppy3,4 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_1235.tar.gz) |
+| ont_guppy5.tar.gz | ONT | Base model: HG001,2,4,5 (Guppy3,4)
Fine-tuning data: HG002 (Guppy5_sup) | Yes | Yes | 1 | 20210609 | Guppy5 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_guppy5.tar.gz) |
+| ont_guppy2.tar.gz | ONT | HG001,2,3,4 | | Yes | 1 | 20210627 | Guppy2 | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ont_guppy2.tar.gz) |
+| hifi.tar.gz | PacBio HiFi | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/hifi.tar.gz) |
+| ilmn.tar.gz | Illumina | HG001,2,4,5 | Yes | Yes | 1 | 20210517 | NA | [Download](http://www.bio8.cs.hku.hk/clair3/clair3_models/ilmn.tar.gz) |
diff --git a/scripts/clair3.sh b/scripts/clair3.sh
index 6a63bdd..1ff6fa6 100755
--- a/scripts/clair3.sh
+++ b/scripts/clair3.sh
@@ -49,7 +49,7 @@ while true; do
-- ) shift; break; ;;
-h|--help ) print_help_messages; break ;;
- * ) print_help_messages; exit 1 ;;
+ * ) print_help_messages; exit 0 ;;
esac
done
@@ -133,14 +133,20 @@ time ${PARALLEL} --retries ${RETRIES} -C ' ' --joblog ${LOG_PATH}/parallel_1_cal
echo "[INFO] Merge chunked contigs vcf files"
-cat ${PILEUP_VCF_PATH}/pileup_*.vcf | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_FOLDER}/pileup.vcf
+${PYPY} ${CLAIR3} SortVcf \
+ --input_dir ${PILEUP_VCF_PATH} \
+ --vcf_fn_prefix "pileup" \
+ --output_fn ${OUTPUT_FOLDER}/pileup.vcf \
+ --sampleName ${SAMPLE} \
+ --ref_fn ${REFERENCE_FILE_PATH}
+
bgzip -f ${OUTPUT_FOLDER}/pileup.vcf
tabix -f -p vcf ${OUTPUT_FOLDER}/pileup.vcf.gz
if [ ${PILEUP_ONLY} == True ]; then
echo "[INFO] Only call pileup output with --pileup_only, output file: ${OUTPUT_FOLDER}/pileup.vcf.gz"
echo "[INFO] Finish calling!"
- exit 1;
+ exit 0;
fi
# Whatshap phasing and haplotaging
@@ -221,7 +227,13 @@ time ${PARALLEL} --retries ${RETRIES} --joblog ${LOG_PATH}/parallel_6_call_var_b
##Merge pileup and full alignment vcf
##-----------------------------------------------------------------------------------------------------------------------
-cat ${FULL_ALIGNMENT_OUTPUT_PATH}/full_alignment_*.vcf | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_FOLDER}/full_alignment.vcf
+${PYPY} ${CLAIR3} SortVcf \
+ --input_dir ${FULL_ALIGNMENT_OUTPUT_PATH} \
+ --vcf_fn_prefix "full_alignment" \
+ --output_fn ${OUTPUT_FOLDER}/full_alignment.vcf \
+ --sampleName ${SAMPLE} \
+ --ref_fn ${REFERENCE_FILE_PATH}
+
cat ${CANDIDATE_BED_PATH}/*.* > ${CANDIDATE_BED_PATH}/full_aln_regions
bgzip -f ${OUTPUT_FOLDER}/full_alignment.vcf
tabix -f -p vcf ${OUTPUT_FOLDER}/full_alignment.vcf.gz
@@ -245,7 +257,13 @@ time ${PARALLEL} --retries ${RETRIES} --joblog ${LOG_PATH}/parallel_7_merge_vcf.
--ref_fn ${REFERENCE_FILE_PATH} \
--ctgName {1}" ::: ${CHR[@]} |& tee ${LOG_PATH}/7_merge_vcf.log
-cat ${TMP_FILE_PATH}/merge_output/merge_*.vcf | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_FOLDER}/merge_output.vcf
+${PYPY} ${CLAIR3} SortVcf \
+ --input_dir ${TMP_FILE_PATH}/merge_output \
+ --vcf_fn_prefix "merge" \
+ --output_fn ${OUTPUT_FOLDER}/merge_output.vcf \
+ --sampleName ${SAMPLE} \
+ --ref_fn ${REFERENCE_FILE_PATH}
+
if [ ${GVCF} == True ]; then cat ${TMP_FILE_PATH}/merge_output/merge_*.gvcf | ${PYPY} ${CLAIR3} SortVcf --output_fn ${OUTPUT_FOLDER}/merge_output.gvcf; fi
bgzip -f ${OUTPUT_FOLDER}/merge_output.vcf
tabix -f -p vcf ${OUTPUT_FOLDER}/merge_output.vcf.gz
diff --git a/scripts/clair3_hifi_quick_demo.sh b/scripts/clair3_hifi_quick_demo.sh
old mode 100644
new mode 100755
index 94835e7..5e0c286
--- a/scripts/clair3_hifi_quick_demo.sh
+++ b/scripts/clair3_hifi_quick_demo.sh
@@ -2,7 +2,7 @@ PLATFORM='hifi'
INPUT_DIR="${HOME}/clair3_pacbio_hifi_quickDemo"
OUTPUT_DIR="${INPUT_DIR}/output"
THREADS=4
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
## Create local directory structure
mkdir -p ${INPUT_DIR}
diff --git a/scripts/clair3_ilmn_quick_demo.sh b/scripts/clair3_ilmn_quick_demo.sh
old mode 100644
new mode 100755
index 908f72b..54d1bcf
--- a/scripts/clair3_ilmn_quick_demo.sh
+++ b/scripts/clair3_ilmn_quick_demo.sh
@@ -3,7 +3,7 @@ PLATFORM='ilmn'
INPUT_DIR="${HOME}/clair3_illumina_quickDemo"
OUTPUT_DIR="${INPUT_DIR}/output"
THREADS=4
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
## Create local directory structure
mkdir -p ${INPUT_DIR}
diff --git a/scripts/clair3_ont_quick_demo.sh b/scripts/clair3_ont_quick_demo.sh
old mode 100644
new mode 100755
index 33ae4dd..3647d37
--- a/scripts/clair3_ont_quick_demo.sh
+++ b/scripts/clair3_ont_quick_demo.sh
@@ -3,7 +3,7 @@ PLATFORM='ont'
INPUT_DIR="${HOME}/clair3_ont_quickDemo"
OUTPUT_DIR="${INPUT_DIR}/output"
THREADS=4
-BIN_VERSION="v0.1-r3"
+BIN_VERSION="v0.1-r4"
## Create local directory structure
mkdir -p ${INPUT_DIR}