Skip to content

songbowang125/SVision-pro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SVision-pro

SVision-pro detects germline, de novo and somatic structural variants (SV), including simple SV (SSV) and complex SV (CSV). SVision-pro implements genome-to-genome representation and comparative SV detection, genotyping and differentiating using a neural network based image instance segmentation framework.

SVision-pro comprises two modules: a genome-to-genome representation module encoding genomic features from two samples to an image, from which a neural-network recognition module comparatively recognizes SVs as well as their inter-genome differences. In particular, SVision-pro formulates SV detection, genotyping and differentiating as a neural-network-based image instance segmentation task, facilitating the discovery of both de novo and somatic SV and CSV.

Citation

Please cite our paper "Songbo Wang et al. De novo and somatic structural variant discovery with SVision-pro. Nature Biotechnology (2024)"

License

SVision-pro is free for non-commercial use by academic, government, and non-profit/not-for-profit institutions. A commercial version of the software is available and licensed through Xi’an Jiaotong University. For more information, please contact with Songbo Wang ([email protected]) or Kai Ye ([email protected]).

Installation

Operation systems

Linux-based systems required, including but are not limited to:

  • MacOS, Big Sur
  • Ubuntu (including Windows Subsystem)
  • CentOS

Install from source

## Get the source code
git clone https://github.com/songbowang125/SVision-pro.git
cd SVision-pro

## Create a conda environment for SVision-pro
conda env create -f ./environment.yml 

## Install from source
conda activate svision-pro-env
python setup.py install

Note: Please make sure you have installed the same version of dependencies as in ./environment.yml. 
      We recommand that you create a new conda env and install by the command lines above. 

Usage

The input of SVision-pro requires one target genome (aka case genome), N base genomes (aka control genomes, N>=0), the reference genome, the neural-network model (in ./src/pre_process), the sample name and the output path

There are four detection mode in SVision-pro:

Command line format for 'germline' mode (N=0):

## run SVision-pro
SVision-pro --target_path /path/to/target.bam  --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode germline

Command line format for 'somatic' mode (N=1):

## run SVision-pro
SVision-pro --target_path /path/to/tumor.bam  --base_path /path/to/normal.bam --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode somatic

## extract somatic calls
python extract_op.py --input_vcf /path/to/svision_pro.vcf --extract somatic

Command line format for 'denovo' mode (N=2):

## run SVision-pro
SVision-pro --target_path /path/to/child.bam  --base_path /path/to/father.bam /path/to/mother.bam --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode denovo

## extract denovo calls
python extract_op.py --input_vcf /path/to/svision_pro.vcf --extract denovo

Command line format for 'genotype' mode (N=N):

## run SVision-pro
SVision-pro --target_path /path/to/target.bam  --base_path /path/to/base1.bam /path/to/base2.bam /path/to/base3.bam... --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode genotype --region chr1:1000-2000

Model (or image size) parameter:

## For now, SVision-pro provides three different LiteUnet models (256/512/1024, in src/pre_process/model_liteunet_xxx_8_16_32_32_32.pth).
The three models correspond to three differenct image size settings (256, 512 and 1024), which determine the minimum detectable SV frequencies (0.04, 0.02 and 0.01). The description about this setting could be found in the Method section of our paper. 
   
In common, image size 256 (default) is capable for most conditions. 
So, you can directly use the 256 model (src/pre_process/model_liteunet_256_8_16_32_32_32.pth). 

While if you use other models/image sizes, please use this two parameters simultaneously:
SVision-pro ... ... --img_size 512 --model_path model_liteunet_512_8_16_32_32_32.pth, or
SVision-pro ... ... --img_size 1024 --model_path model_liteunet_1024_8_16_32_32_32.pth

Recommended parameters:

## We have provided access BED files for human genomes (src/pre_process), which exclude centromere and heterochromatin regions.
--access_path   Absolute path to access BED file that contains accessable regions of the reference genome

Other optional parameters:

Input parameters:
--process_num   Thread numbers
--preset        Sequence type, including hifi, error-prone and asm (for assembly-based calling)

Filter paramters:
--min_mapq      Minimum read map quality (defalut: 20)
--min_supp      Minimum supporting read number for considering a SV (defalut: 5)
--min_sv_size   Minimum SV size/length for considering a SV (defalut: 50)
--max_sv_size   Maximum SV size/length for considering a SV (defalut: 50000)
--interval_size The sliding genome region size for searching SVs (defalut: 10000000)
--max_coverage  Maximum read coverage of the searched genome regions (defalut: 500)
--skip_coverage_filter  SKip filtering genome regions by 'max_coverage'

Neural network parameters:
--img_size      The representation image sizes (default: 256, optional: 512, 1024)
--batch_size    The batch size for neural network prediction (default: 1)
--device        Using CPU or GPU for prediction (defalut: 'cpu')
--net_cpu_num   When using CPU, specific the CPU core number for each prediction sub-process (defalut: 2)
--gpu_id        When using GPU, specific the GPU id in your device (defalut: '0')

Run demo data

The demo data located at './src/pre_process/', which contains a CSV deletion-inversion and is extracted from HiFi-sequenced AshkenazimTrio from Genome-In-A-Bottle.

  • demo.HG002.child.bam
  • demo.HG003.father.bam
  • demo.HG004.mother.bam
  1. Download reference genome GRCh38

  2. Checking SVision-pro

SVision-pro -h
  1. Run SVision-pro 'germline' mode
SVision-pro --target_path ./src/pre_process/demo.HG002.child.bam  --genome_path /path/to/downloaded.GRCh38.fa --model_path ./src/pre_process/model_liteunet_256_8_16_32_32_32.path --access_path ./src/pre_process/hg38.access.10M.bed --sample test --out_path ./test_out/ --detect_mode germline --process 8
  1. Run SVision-pro 'denovo' mode
SVision-pro --target_path ./src/pre_process/demo.HG002.child.bam  --base_path ./src/pre_process/demo.HG003.father.bam ./src/pre_process/demo.HG004.mother.bam --genome_path /path/to/downloaded.GRCh38.fa --model_path ./src/pre_process/model_liteunet_256_8_16_32_32_32.pth --access_path ./src/pre_process/hg38.access.10M.bed --sample test --out_path ./test_out/ --detect_mode denovo --process 8
  1. Output files
  • *.vcf The standard VCF output with genome-to-genome comparison info columns.

Contact

If you have any questions, please feel free to contact: [email protected]

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages