Unsupervised Contrastive PeakCaller

Prerequisites

For input preprocessing steps, the following tools and R libraries are required:

samtools (>= 1.10)
bedtools2 (>= 2.27.1)
parallel (>= 20170322)
R (>= 4.0.2)
bedops (>= 2.4.35)

R library dplyr (>= 1.0.7)
R library bedr (>= 1.0.7)
R library doParallel (>= 1.0.16)

For the deep learner step, GPU is needed. Other packages needed are:

Python (>=3.7.10)
PyTorch Lightning (>=1.5.1)
PyTorch (>=1.10.0)
numpy (>=1.21.5)
pandas (>=1.3.5)
argparse (>=1.1)
scikit-learn (>=1.0.1)

Installation

git clone https://github.com/Tuteja-Lab/UnsupervisedPeakCaller.git

Preprocessing

Usage: preprocessing.bash -p "program directory" -i "input directory" -o "output directory" -g hg -c 2 -m "merged.bam" -b "indi1.bam indi2.bam" -t 12 -n test -L 1000
        -p Absolute directory of where the program is installed at.
        -i Absolute directory of input files.
        -o Absolute directory of output files.
        -g Genome that the data is aligned to. Currently support mm10 (Ensembl) or hg38 (Ensembl).
        -c Cutoff for prefiltering. Either "median" or specific number.
        -m Bam files merged from individual replicates. Only used for preprocessing purpose, not for calling peaks. Must be indexed and sorted.
        -b Individual bam files of every replicate. Must be indexed and sorted.
        -t Number of threads to use.
        -n File name prefix.
        -L Length of input segments.

At this step, the script assumes your data has been aligned to mouse or human genome, Ensembl assembly.

Example

module load samtools
module load bedtools2
module load parallel
module load bedops/2.4.35-gl7y6z6
module load gcc/7.3.0-xegsmw4
module load r/4.0.2-py3-icvulwq
module load gsl/2.5-fpqcpxf
module load udunits/2.2.24-yldmp4h
module load gdal/2.4.4-nw2drgf
module load geos/3.8.1-2m7gav4

bash /work/LAS/geetu-lab-collab/UnsupervisedPeakCaller/preprocessing.bash -p "/work/LAS/geetu-lab-collab/UnsupervisedPeakCaller" -i "/work/LAS/geetu-lab-collab/UnsupervisedPeakCaller/example" -o "/work/LAS/geetu-lab-collab/UnsupervisedPeakCaller/example" -g "hg" -c "median" -m "MCF7_chr10_merged.bam" -b "MCF7_chr10_rep1.bam MCF7_chr10_rep2.bam" -t 12 -n "test" -L 1000

Peak Calling

Example

Train the model and obtain the predictions.

bash run_rcl.sh -p example -f "rep1 rep2"

Command-Line Options

Input (required):
    --p 
        Path to preprocessing data.
    --f
        Names of the individual BAM files (without suffix). For example, if your BAM files are rep1.bam and rep2.bam, use "rep1 rep2"

Parameters (optional):
    --e  Training epoches.
        default=25
    --b Batch size.
        default=256

Output

The trained model is called rcl.ckpt and results are stored in rcl.bed. The output will have

chromosome name, peak start position, peak end position, peak name, peak score, training region start position, training region end position, for example

10      49829   50258   10segment1      0.18526842      49543   50543
10      73663   74515   10segment2      0.8270205       73589   74589

How to Cite

Preprint https://www.biorxiv.org/content/10.1101/2023.01.07.523108v1

Contact

Yudi Zhang ([email protected]), Ha Vu ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
dataProcessing		dataProcessing
example		example
README.md		README.md
Source_code.zip		Source_code.zip
bigInputs.R		bigInputs.R
conclu.py		conclu.py
getAboveThreshold.bash		getAboveThreshold.bash
getCountFiles.R		getCountFiles.R
getMedian.bash		getMedian.bash
hg38-blacklist.v2.ensembl.bed		hg38-blacklist.v2.ensembl.bed
ios.py		ios.py
main.py		main.py
mm10-blacklist.v2.ensembl.bed		mm10-blacklist.v2.ensembl.bed
preprocessing.bash		preprocessing.bash
rcl_score.py		rcl_score.py
run_rcl.sh		run_rcl.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unsupervised Contrastive PeakCaller

Table of Contents

Prerequisites

Installation

Preprocessing

Example

Peak Calling

Example

Command-Line Options

Output

How to Cite

Contact

About

Releases

Packages

Languages

WhenGryphonsFly/UnsupervisedPeakCaller

Folders and files

Latest commit

History

Repository files navigation

Unsupervised Contrastive PeakCaller

Table of Contents

Prerequisites

Installation

Preprocessing

Example

Peak Calling

Example

Command-Line Options

Output

How to Cite

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages