SVision-pro detects germline, de novo and somatic structural variants (SV), including simple SV (SSV) and complex SV (CSV). SVision-pro implements genome-to-genome representation and comparative SV detection, genotyping and differentiating using a neural network based image instance segmentation framework.
SVision-pro comprises two modules: a genome-to-genome representation module encoding genomic features from two samples to an image, from which a neural-network recognition module comparatively recognizes SVs as well as their inter-genome differences. In particular, SVision-pro formulates SV detection, genotyping and differentiating as a neural-network-based image instance segmentation task, facilitating the discovery of both de novo and somatic SV and CSV.
Please cite our paper "Songbo Wang et al. De novo and somatic structural variant discovery with SVision-pro. Nature Biotechnology (2024)"
SVision-pro is free for non-commercial use by academic, government, and non-profit/not-for-profit institutions. A commercial version of the software is available and licensed through Xi’an Jiaotong University. For more information, please contact with Songbo Wang ([email protected]) or Kai Ye ([email protected]).
Linux-based systems required, including but are not limited to:
- MacOS, Big Sur
- Ubuntu (including Windows Subsystem)
- CentOS
## Get the source code
git clone https://github.com/songbowang125/SVision-pro.git
cd SVision-pro
## Create a conda environment for SVision-pro
conda env create -f ./environment.yml
## Install from source
conda activate svision-pro-env
python setup.py install
Note: Please make sure you have installed the same version of dependencies as in ./environment.yml.
We recommand that you create a new conda env and install by the command lines above.
The input of SVision-pro requires one target genome (aka case genome), N base genomes (aka control genomes, N>=0), the reference genome, the neural-network model (in ./src/pre_process), the sample name and the output path
There are four detection mode in SVision-pro:
Command line format for 'germline' mode (N=0):
## run SVision-pro
SVision-pro --target_path /path/to/target.bam --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode germline
Command line format for 'somatic' mode (N=1):
## run SVision-pro
SVision-pro --target_path /path/to/tumor.bam --base_path /path/to/normal.bam --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode somatic
## extract somatic calls
python extract_op.py --input_vcf /path/to/svision_pro.vcf --extract somatic
Command line format for 'denovo' mode (N=2):
## run SVision-pro
SVision-pro --target_path /path/to/child.bam --base_path /path/to/father.bam /path/to/mother.bam --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode denovo
## extract denovo calls
python extract_op.py --input_vcf /path/to/svision_pro.vcf --extract denovo
Command line format for 'genotype' mode (N=N):
## run SVision-pro
SVision-pro --target_path /path/to/target.bam --base_path /path/to/base1.bam /path/to/base2.bam /path/to/base3.bam... --genome_path /path/to/reference.fasta --model_path /path/to/model.pth --out_path /path/to/output/ --sample_name sample1 --detect_mode genotype --region chr1:1000-2000
Model (or image size) parameter:
## For now, SVision-pro provides three different LiteUnet models (256/512/1024, in src/pre_process/model_liteunet_xxx_8_16_32_32_32.pth).
The three models correspond to three differenct image size settings (256, 512 and 1024), which determine the minimum detectable SV frequencies (0.04, 0.02 and 0.01). The description about this setting could be found in the Method section of our paper.
In common, image size 256 (default) is capable for most conditions.
So, you can directly use the 256 model (src/pre_process/model_liteunet_256_8_16_32_32_32.pth).
While if you use other models/image sizes, please use this two parameters simultaneously:
SVision-pro ... ... --img_size 512 --model_path model_liteunet_512_8_16_32_32_32.pth, or
SVision-pro ... ... --img_size 1024 --model_path model_liteunet_1024_8_16_32_32_32.pth
Recommended parameters:
## We have provided access BED files for human genomes (src/pre_process), which exclude centromere and heterochromatin regions.
--access_path Absolute path to access BED file that contains accessable regions of the reference genome
Other optional parameters:
Input parameters:
--process_num Thread numbers
--preset Sequence type, including hifi, error-prone and asm (for assembly-based calling)
Filter paramters:
--min_mapq Minimum read map quality (defalut: 20)
--min_supp Minimum supporting read number for considering a SV (defalut: 5)
--min_sv_size Minimum SV size/length for considering a SV (defalut: 50)
--max_sv_size Maximum SV size/length for considering a SV (defalut: 50000)
--interval_size The sliding genome region size for searching SVs (defalut: 10000000)
--max_coverage Maximum read coverage of the searched genome regions (defalut: 500)
--skip_coverage_filter SKip filtering genome regions by 'max_coverage'
Neural network parameters:
--img_size The representation image sizes (default: 256, optional: 512, 1024)
--batch_size The batch size for neural network prediction (default: 1)
--device Using CPU or GPU for prediction (defalut: 'cpu')
--net_cpu_num When using CPU, specific the CPU core number for each prediction sub-process (defalut: 2)
--gpu_id When using GPU, specific the GPU id in your device (defalut: '0')
The demo data located at './src/pre_process/', which contains a CSV deletion-inversion and is extracted from HiFi-sequenced AshkenazimTrio from Genome-In-A-Bottle.
demo.HG002.child.bam
demo.HG003.father.bam
demo.HG004.mother.bam
-
Download reference genome GRCh38
-
Checking SVision-pro
SVision-pro -h
- Run SVision-pro 'germline' mode
SVision-pro --target_path ./src/pre_process/demo.HG002.child.bam --genome_path /path/to/downloaded.GRCh38.fa --model_path ./src/pre_process/model_liteunet_256_8_16_32_32_32.path --access_path ./src/pre_process/hg38.access.10M.bed --sample test --out_path ./test_out/ --detect_mode germline --process 8
- Run SVision-pro 'denovo' mode
SVision-pro --target_path ./src/pre_process/demo.HG002.child.bam --base_path ./src/pre_process/demo.HG003.father.bam ./src/pre_process/demo.HG004.mother.bam --genome_path /path/to/downloaded.GRCh38.fa --model_path ./src/pre_process/model_liteunet_256_8_16_32_32_32.pth --access_path ./src/pre_process/hg38.access.10M.bed --sample test --out_path ./test_out/ --detect_mode denovo --process 8
- Output files
*.vcf
The standard VCF output with genome-to-genome comparison info columns.
If you have any questions, please feel free to contact: [email protected]