-
Notifications
You must be signed in to change notification settings - Fork 2
Running
- Running Visualization Server
- Running Calculations
- General
- Input Data
- Run
- Automatically
- Manually
Copy config.template to your data directory:
cp config.template [PATH TO DATA FOLDER]/config.py
Edit config.py to configure:
- whether user control is enables or not (Default: False)
- the server's port (Default: 10000)
- pages which can be seen without login (to disable, comment line with #)
- enable/disable debug of server
#decide whether to have user control or not
HAS_LOGIN = False
#define port to server webpage
SERVER_PORT = 10000
# pages which can be seen without login
librepaths = [
'/api',
'/favicon.ico'
]
DEBUG = False
Initialize iBrowser by running:
./ibrowser.py [PATH TO DATA FOLDER] init
This will
- Create session secret
- Create RSA key
- create SSL certificate
- Create default user (admin:admin)
ONLY IN CASE YOU WANT ACCESS CONTROL
- Edit config.py and Set hasLogin to "True"
- (ADVISED) Change admin password (otherwise default admin:admin will be created) by running:
./ibrowser.py [PATH TO DATA FOLDER] deluser admin
./ibrowser.py [PATH TO DATA FOLDER] adduser admin [DESIRED PASSWORD]
-
Optional
-
(Default: 2048) Change RSA key size by editing [PATH TO DATA FOLDER]/config.keylen
echo 2048 > [PATH TO DATA FOLDER]/config.keylen
- Clean all config by running:
./ibrowser.py [PATH TO DATA FOLDER] clean
- Create users manually by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] adduser [USER] [DESIRED PASSWORD]
- Delete users manually by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] deluser [USER]
- List users by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] listusers
Run ibrowser.py
./ibrowser.py [PATH TO DATA FOLDER]
This set of scripts takes as input a series of Variant Call Files (VCF) of species mapped against a single reference. After a series of conversions, all homozigous Single Nucleotide Polymorphisms (SNP) are extracted while ignoring heterozigous SNPS (hetSNP), Multiple Nucleotide Polymorphisms (MNP) and Insertion/Deletion events (InDel). For each individual, the reference's nucleotide will be assigned unless a SNP is presented. If any individual has a MNP, hetSNP or InDel at a given position, this position is skipped entirely. A General Feature Format (GFF) describing coordinates is used to split the genome into segments. Those segments can be genes, even sized fragments (10kb, 50kb, etc) or particular segments of interest as long as the coordinates are the same as the VCF files. A auxiliary script is provided to generate evenly sized segments. For each selected segment a fasta file will be generated and FastTree will create a distance matrix and a Newick Tree. After all data has been processed, the three files (fasta, matrix and newick) will be read and converted to a database. The webserver scripts will read and serve the data to a web browser. There are three scripts, a main script serves the data and two auxiliary servers to perform on-the-fly clustering and image conversion (from SVG to PNG).
Enter the introgression browser folder
cd ~/introgressionbrowser/
Add vcfmerger folder to PATH:
export PATH=$PWD/vcfmerger:$PATH
If in a VM, check if your files were correctly shared by virtual box.
ls data
if you don't see your files, there's a mistake in the VM configuration. If you see you data, you can proceed. Enter the data folder. The folder structure should be as follows:
~/introgressionbrowser/
~/introgressionbrowser/project_name/
~/introgressionbrowser/project_name/analysis_name/
~/introgressionbrowser/project_name/analysis_name/input/
add your reference fasta file in the inside analysis_name folder.
IF YOU HAVE MULTIPLE, SINGLE SAMPLE, VCF FILES:: add all your VCF files inside input folder. add your reference fasta file in the base folder. create a TAB delimited file containing the name of your input files ( stands for TAB):
1<tab>input/file1.vcf.gz<tab>species 1
1<tab>input/file2.vcf.gz<tab>species 2
1<tab>input/file3.vcf.gz<tab>species 3
the folder structure should resemble:
~/introgressionbrowser/project_name/analysis_name/input/file1.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/analysis.csv
IF YOU HAVE A MULTI-COLUMN VCF FILE AND SINGLE SAMPLE VCF FILES. RUN split_multicolumn_vcf.py ON YOUR FILE:
split_multicolumn_vcf.py multi_VCF.vcf.gz
it will create a VCF file for each of your samples:
multi_VCFvcf_1_sample1.vcf
multi_VCFvcf_1_sample2.vcf
multi_VCFvcf_1_sample3.vcf
multi_VCFvcf_1_sample4.vcf
And it will automatically create a list file which you can use directly with iBrowser:
$cat batch_1.vcf.lst
1<tab>multi_VCFvcf_1_sample1.vcf<tab>sample1
1<tab>multi_VCFvcf_1_sample2.vcf<tab>sample2
1<tab>multi_VCFvcf_1_sample3.vcf<tab>sample3
1<tab>multi_VCFvcf_1_sample4.vcf<tab>sample4
the folder structure should resemble:
~/introgressionbrowser/project_name/analysis_name/input/file1.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/batch_1.vcf.lst
IF YOU HAVE ONLY A SINGLE MULTI-COLUMN VCF FILE, CREATE A SAMPLE NAMES FILE AND RUN vcfmerger_multicolumn.py ON YOUR FILE:
for multi_VCF.vcf.gz
##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1.bam.vcf.gz sample2.bam.vcf.gz
create a sample_names.csv
#sample,name
sample1.bam.vcf.gz,sample 1
sample2.bam.vcf.gz,sample 2
run vcfmerger_multicolumn.py
usage: vcfmerger_multicolumn.py [-h] -i [INPUT] [-o [OUTPUT]] [-t [TABLE]]
[-k [KEYS]] [-v [TABLE_VS]] [-c [TRANSLATION]]
[-s [SAMPLES]] [-n] [-e]
Simplify merged VCF file.
optional arguments:
-h, --help show this help message and exit
-i [INPUT], --input [INPUT]
Input file
-o [OUTPUT], --output [OUTPUT]
Output file
-t [TABLE], --table [TABLE]
Input table
-k [KEYS], --keys [KEYS]
Input keys
-v [TABLE_VS], --table-values [TABLE_VS]
Input table values
-c [TRANSLATION], --chromosome-translation [TRANSLATION]
Translation table to chromosome names [e.g.:
1:Chr1;2:Chr2
-s [SAMPLES], --samples [SAMPLES]
Samples (Columns) to keep [e.g.: Spp1;Spp3;Spp5
-n, --keep-no-coverage
Keep rows containing no coverage
-e, --keep-heterozygous
Keep rows hoterozygosity
It will automatically create a CSV file with all sample names and directly create an merged vcf file.
e.g.:
vcfmerger_multicolumn.py --input multi_VCF.vcf.gz --table sample_names.csv --keep-no-coverage
the folder structure should resemble:
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz.simplified.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz.simplified.vcf.gz.filtered.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/sample_names.csv
Now you can generate the makefile:
gen_makefile.py
usage: gen_makefile.py [-h] [-i [INLIST]] [-f [INFASTA]] [-s [SIZE]]
[-p [PROJECT]] [-o [OUTFILE]] [-ec EXCLUDED_CHROMS]
[-ic INCLUDED_CHROMS] [-n] [-m] [-np]
[-t [SUB_THREADS]] [-St [SMART_THREADS]] [-SH] [-SI]
[-SS] [-So [SIMPLIFY_OUTPUT]]
[-Coc [CONCAT_CHROMOSOME]]
[-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]]]
[-Cos [CONCAT_START]] [-Coe [CONCAT_END]]
[-Cot [CONCAT_THREADS]] [-Cor] [-Con [CONCAT_REFNAME]]
[-CoR] [-CoRm [CONCAT_RILMADS]]
[-CoRs [CONCAT_RILMINSIM]] [-CoRg] [-CoRd]
[-Ftt [FASTTREE_THREADS]] [-Ftb [FASTTREE_BOOTSTRAP]]
[-Cle [CLUSTER_EXTENSION]] [-Clt [CLUSTER_THREADS]]
[-Clp] [-Cls] [-Cln] [-Clr] [-Clc]
[-Fic [FILTER_CHROMOSOME]] [-Fig [FILTER_GFF]]
[-FiI [FILTER_IGNORE [FILTER_IGNORE ...]]]
[-Fis [FILTER_START]] [-Fie [FILTER_END]] [-Fik] [-Fin]
[-Fiv] [-Fip FILTER_PROTEIN] [-Dbt DB_READ_THREADS]
Create makefile to convert files.
optional arguments:
-h, --help show this help message and exit
-i [INLIST], --input [INLIST], --inlist [INLIST]
input tab separated file
-f [INFASTA], --fasta [INFASTA], --infasta [INFASTA]
input reference fasta. requires split size
-s [SIZE], --size [SIZE]
split size
-p [PROJECT], --proj [PROJECT], --project [PROJECT]
project name
-o [OUTFILE], --out [OUTFILE], --outfile [OUTFILE]
output name [default: makefile]
-ec EXCLUDED_CHROMS, --excluded-chrom EXCLUDED_CHROMS
Do not use the following chromosomes
-ic INCLUDED_CHROMS, --included-chrom INCLUDED_CHROMS
Use EXCLUSIVELY these chromosomes
-n, --dry, --dry-run dry-run
-m, --merge, --cluster_merge
do merged clustering (resource intensive) [default:
no]
-np, --no-pickle do not generate pickle database [default: no]
-t [SUB_THREADS], --sub_threads [SUB_THREADS]
threads of submake to tree building [default: 5]
-St [SMART_THREADS], --smart_threads [SMART_THREADS]
threads of submake to tree building [default: 5]
-SH, --simplify-include-hetero
Do not simplify heterozygous SNPS
-SI, --simplify-include-indel
Do not simplify indel SNPS
-SS, --simplify-include-singleton
Do not simplify single SNPS
-So [SIMPLIFY_OUTPUT], --simplify-output [SIMPLIFY_OUTPUT]
Simplify output file
-Coc [CONCAT_CHROMOSOME], --concat-chrom [CONCAT_CHROMOSOME], --concat-chromosome [CONCAT_CHROMOSOME]
Concat - Chromosome to filter [all]
-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-ignore [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-skip [CONCAT_IGNORE [CONCAT_IGNORE ...]]
Concat - Chromosomes to skip
-Cos [CONCAT_START], --concat-start [CONCAT_START]
Concat - Chromosome start position to filter [0]
-Coe [CONCAT_END], --concat-end [CONCAT_END]
Concat - Chromosome end position to filter [-1]
-Cot [CONCAT_THREADS], --concat-threads [CONCAT_THREADS]
Concat - Number of threads [num chromosomes]
-Cor, --concat-noref Concat - Do not print reference [default: true]
-Con [CONCAT_REFNAME], --concat-ref-name [CONCAT_REFNAME]
Concat - Reference name [default: ref]
-CoR, --concat-RIL Concat - RIL mode: false]
-CoRm [CONCAT_RILMADS], --concat-RIL-mads [CONCAT_RILMADS]
Concat - RIL percentage of Median Absolute Deviation
to use (smaller = more restrictive): 0.25]
-CoRs [CONCAT_RILMINSIM], --concat-RIL-minsim [CONCAT_RILMINSIM]
Concat - RIL percentage of nucleotides identical to
reference to classify as reference: 0.75]
-CoRg, --concat-RIL-greedy
Concat - RIL greedy convert nucleotides to either the
reference sequence or the alternative sequence: false]
-CoRd, --concat-RIL-delete
Concat - RIL delete invalid sequences: false]
-Ftt [FASTTREE_THREADS], --fasttree_threads [FASTTREE_THREADS]
FastTree - number of threads for fasttree
-Ftb [FASTTREE_BOOTSTRAP], --fasttree_bootstrap [FASTTREE_BOOTSTRAP]
FastTree - fasttree bootstrap
-Cle [CLUSTER_EXTENSION], --cluster-ext [CLUSTER_EXTENSION], --cluster-extension [CLUSTER_EXTENSION]
Cluster - [optional] extension to search. [default:
.matrix]
-Clt [CLUSTER_THREADS], --cluster-threads [CLUSTER_THREADS]
Cluster - threads for clustering [default: 5]
-Clp, --cluster-no-png
Cluster - do not export cluster png
-Cls, --cluster-no-svg
Cluster - do not export cluster svg
-Cln, --cluster-no-tree
Cluster - do not export cluster tree. precludes no png
and no svg
-Clr, --cluster-no-rows
Cluster - no rows clustering
-Clc, --cluster-no-cols
Cluster - no column clustering
-Fic [FILTER_CHROMOSOME], --filter-chrom [FILTER_CHROMOSOME], --filter-chromosome [FILTER_CHROMOSOME]
Filter - Chromosome to filter [all]
-Fig [FILTER_GFF], --filter-gff [FILTER_GFF]
Filter - Gff Coordinate file
-FiI [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-ignore [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-skip [FILTER_IGNORE [FILTER_IGNORE ...]]
Filter - Chromosomes to skip
-Fis [FILTER_START], --filter-start [FILTER_START]
Filter - Chromosome start position to filter [0]
-Fie [FILTER_END], --filter-end [FILTER_END]
Filter - Chromosome end position to filter [-1]
-Fik, --filter-knife Filter - Export to separate files
-Fin, --filter-negative
Filter - Invert gff
-Fiv, --filter-verbose
Filter - Verbose
-Fip FILTER_PROTEIN, --filter-prot FILTER_PROTEIN, --filter-protein FILTER_PROTEIN
Filter - Input Fasta File to convert to Protein
-Dbt DB_READ_THREADS, --db-threads DB_READ_THREADS
Db - Number of threads to read raw files
This will generate a makefile for you project (follow one of the examples in the manual) To run the analysis:
e.g. For a 50kb fragmentation with 20 threads:
gen_makefile.py -i multi_VCF.vcf.gz.list.csv --project run_name --smart_threads 20 --fasttree_threads 20 --merge --fasta reference.fasta --size 50000 --no-pickle --cluster-no-png --cluster-no-svg --no-pickle --cluster_merge
then run make to create the database:
make
It will generate a database output:
~/introgressionbrowser/project_name/analysis_name/run_name.sqlite
Now create a link to the data folder:
cd ~/introgressionbrowser/data
ln -s analysis/run_name.sqlite .
restart ibrowser:
If inside a VM, you can restart ibrowser by:
~/introgressionbrowser/restart.sh
Or restart the VM:
sudo shutdown -r now
If not inside a VM, you can restart ibrowser:
cd ~/introgressionbrowser/
pgrep -f ibrowser.py | xargs kill
python ibrowser.py data/
Run gen_makefile.py to create a makefile for your project
gen_makefile.py -h
usage: gen_makefile.py [-h] [-i [INLIST]] [-f [INFASTA]] [-s [SIZE]]
[-p [PROJECT]] [-o [OUTFILE]] [-ec EXCLUDED_CHROMS]
[-ic INCLUDED_CHROMS] [-n] [-m] [-np]
[-t [SUB_THREADS]] [-St [SMART_THREADS]] [-SH] [-SI]
[-SS] [-So [SIMPLIFY_OUTPUT]]
[-Coc [CONCAT_CHROMOSOME]]
[-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]]]
[-Cos [CONCAT_START]] [-Coe [CONCAT_END]]
[-Cot [CONCAT_THREADS]] [-Cor] [-Con [CONCAT_REFNAME]]
[-CoR] [-CoRm [CONCAT_RILMADS]]
[-CoRs [CONCAT_RILMINSIM]] [-CoRg] [-CoRd]
[-Ftt [FASTTREE_THREADS]] [-Ftb [FASTTREE_BOOTSTRAP]]
[-Cle [CLUSTER_EXTENSION]] [-Clt [CLUSTER_THREADS]]
[-Clp] [-Cls] [-Cln] [-Clr] [-Clc]
[-Fic [FILTER_CHROMOSOME]] [-Fig [FILTER_GFF]]
[-FiI [FILTER_IGNORE [FILTER_IGNORE ...]]]
[-Fis [FILTER_START]] [-Fie [FILTER_END]] [-Fik] [-Fin]
[-Fiv] [-Fip FILTER_PROTEIN] [-Dbt DB_READ_THREADS]
Create makefile to convert files.
optional arguments:
-h, --help show this help message and exit
-i [INLIST], --input [INLIST], --inlist [INLIST]
input tab separated file
-f [INFASTA], --fasta [INFASTA], --infasta [INFASTA]
input reference fasta. requires split size
-s [SIZE], --size [SIZE]
split size
-p [PROJECT], --proj [PROJECT], --project [PROJECT]
project name
-o [OUTFILE], --out [OUTFILE], --outfile [OUTFILE]
output name [default: makefile]
-ec EXCLUDED_CHROMS, --excluded-chrom EXCLUDED_CHROMS
Do not use the following chromosomes
-ic INCLUDED_CHROMS, --included-chrom INCLUDED_CHROMS
Use EXCLUSIVELY these chromosomes
-n, --dry, --dry-run dry-run
-m, --merge, --cluster_merge
do merged clustering (resource intensive) [default:
no]
-np, --no-pickle do not generate pickle database [default: no]
-t [SUB_THREADS], --sub_threads [SUB_THREADS]
threads of submake to tree building [default: 5]
-St [SMART_THREADS], --smart_threads [SMART_THREADS]
threads of submake to tree building [default: 5]
-SH, --simplify-include-hetero
Do not simplify heterozygous SNPS
-SI, --simplify-include-indel
Do not simplify indel SNPS
-SS, --simplify-include-singleton
Do not simplify single SNPS
-So [SIMPLIFY_OUTPUT], --simplify-output [SIMPLIFY_OUTPUT]
Simplify output file
-Coc [CONCAT_CHROMOSOME], --concat-chrom [CONCAT_CHROMOSOME], --concat-chromosome [CONCAT_CHROMOSOME]
Concat - Chromosome to filter [all]
-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-ignore [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-skip [CONCAT_IGNORE [CONCAT_IGNORE ...]]
Concat - Chromosomes to skip
-Cos [CONCAT_START], --concat-start [CONCAT_START]
Concat - Chromosome start position to filter [0]
-Coe [CONCAT_END], --concat-end [CONCAT_END]
Concat - Chromosome end position to filter [-1]
-Cot [CONCAT_THREADS], --concat-threads [CONCAT_THREADS]
Concat - Number of threads [num chromosomes]
-Cor, --concat-noref Concat - Do not print reference [default: true]
-Con [CONCAT_REFNAME], --concat-ref-name [CONCAT_REFNAME]
Concat - Reference name [default: ref]
-CoR, --concat-RIL Concat - RIL mode: false]
-CoRm [CONCAT_RILMADS], --concat-RIL-mads [CONCAT_RILMADS]
Concat - RIL percentage of Median Absolute Deviation
to use (smaller = more restrictive): 0.25]
-CoRs [CONCAT_RILMINSIM], --concat-RIL-minsim [CONCAT_RILMINSIM]
Concat - RIL percentage of nucleotides identical to
reference to classify as reference: 0.75]
-CoRg, --concat-RIL-greedy
Concat - RIL greedy convert nucleotides to either the
reference sequence or the alternative sequence: false]
-CoRd, --concat-RIL-delete
Concat - RIL delete invalid sequences: false]
-Ftt [FASTTREE_THREADS], --fasttree_threads [FASTTREE_THREADS]
FastTree - number of threads for fasttree
-Ftb [FASTTREE_BOOTSTRAP], --fasttree_bootstrap [FASTTREE_BOOTSTRAP]
FastTree - fasttree bootstrap
-Cle [CLUSTER_EXTENSION], --cluster-ext [CLUSTER_EXTENSION], --cluster-extension [CLUSTER_EXTENSION]
Cluster - [optional] extension to search. [default:
.matrix]
-Clt [CLUSTER_THREADS], --cluster-threads [CLUSTER_THREADS]
Cluster - threads for clustering [default: 5]
-Clp, --cluster-no-png
Cluster - do not export cluster png
-Cls, --cluster-no-svg
Cluster - do not export cluster svg
-Cln, --cluster-no-tree
Cluster - do not export cluster tree. precludes no png
and no svg
-Clr, --cluster-no-rows
Cluster - no rows clustering
-Clc, --cluster-no-cols
Cluster - no column clustering
-Fic [FILTER_CHROMOSOME], --filter-chrom [FILTER_CHROMOSOME], --filter-chromosome [FILTER_CHROMOSOME]
Filter - Chromosome to filter [all]
-Fig [FILTER_GFF], --filter-gff [FILTER_GFF]
Filter - Gff Coordinate file
-FiI [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-ignore [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-skip [FILTER_IGNORE [FILTER_IGNORE ...]]
Filter - Chromosomes to skip
-Fis [FILTER_START], --filter-start [FILTER_START]
Filter - Chromosome start position to filter [0]
-Fie [FILTER_END], --filter-end [FILTER_END]
Filter - Chromosome end position to filter [-1]
-Fik, --filter-knife Filter - Export to separate files
-Fin, --filter-negative
Filter - Invert gff
-Fiv, --filter-verbose
Filter - Verbose
-Fip FILTER_PROTEIN, --filter-prot FILTER_PROTEIN, --filter-protein FILTER_PROTEIN
Filter - Input Fasta File to convert to Protein
-Dbt DB_READ_THREADS, --db-threads DB_READ_THREADS
Db - Number of threads to read raw files
Run MAKE:
makefile -f makefile_[project name]
Copy [project name].sqlite to iBrowser/data folder
cp [project name].sqlite ..
Create [project name].sqlite.nfo with the information about the database:
#title as shall be shown in the UI
title=Tomato 60 RIL - 50k
#custom orders are optional.
#more than one can be given in separate lines
custom_order=RIL.customorder
(OPTIONAL) create custom order files:
#NAME=RIL Single
##NAME is the name of this particular ordering as it will appear in the UI
##
#ROWNUM=1
##ROWNUM is the column to read 1 in the "row order" section
##
##CHROMOSOME=
##CHROMOSOME can either be __global__/empty for ordering all chromosomes, chomosome name for ordering a particular chromosome
##
##row order
ref
S lycopersicum cv MoneyMaker LYC1365
615
634
667
688
710
618
694
678
693
685
651
669
674
676
Reload iBrowser
How to run gen_makefile.py can be found at vcfmerger/gen_makefile.py.examples
gen_makefile.py --input arabidopsis.csv --infasta TAIR10.fasta --size 50000 --project arabidopsis_50k --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --excluded-chrom chloroplast --excluded-chrom mitochondria --cluster-no-cols
make -f makefile_arabidopsis_50k
gen_makefile.py --input short2.lst --infasta S_lycopersicum_chromosomes.2.40.fa --size 10000 --project tom84_10k --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_10k
gen_makefile.py --input short2.lst --infasta S_lycopersicum_chromosomes.2.40.fa --size 50000 --project tom84_50k --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_50k
gen_makefile.py --input short2.lst --filter-gff ITAG2.3_gene_models.gff3.gene.gff3 --project tom84_genes --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_genes
gen_makefile.py --input short2.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_10000_introgression.gff --project tom84_10k_introgression --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_10k_introgression
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff --project RIL_50k --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_RIL_50k
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff --project RIL_50k_mode_ril --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --cluster-no-cols
make -f makefile_RIL_50k_mode_ril
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff --project RIL_50k_mode_ril_greedy --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-greedy --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_greedy
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff --project RIL_50k_mode_ril_delete --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-delete --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_delete
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff --project RIL_50k_mode_ril_delete_greedy --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-greedy --concat-RIL-delete --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_delete_greedy
Merge VCF files:
vcfmerger.py short.lst
OUTPUT: short.lst.vcf.gz
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FILENAMES
SL2.40ch00 280 . A C . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S cheesemaniae (055)
SL2.40ch00 284 . A G . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S cheesemaniae (054)
SL2.40ch00 316 . C T . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S arcanum (059)
SL2.40ch00 323 . C T . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S arcanum (059)
SL2.40ch00 332 . A T . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S pimpinellifolium (047)
SL2.40ch00 362 . G T . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S galapagense (104)
SL2.40ch00 385 . A C . PASS NV=1;NW=1;NS=1;NT=1;NU=1 FI S neorickii (056)
SL2.40ch00 391 . C T . PASS NV=1;NW=1;NS=6;NT=6;NU=6 FI S chiemliewskii (052),S neorickii (056),S arcanum (059),S habrochaites glabratum (066),S habrochaites glabratum (067),S habrochaites (072)
Simplify merged VCF deleting hetSNP, MNP and InDels:
vcfsimplify.py short.lst.vcf.gz
OUTPUT: short.lst.vcf.gz.filtered.vcf.gz
SL2.40ch00 391 . C T . PASS NV=1;NW=1;NS=6;NT=6;NU=6 FI S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)
SL2.40ch00 416 . T A . PASS NV=1;NW=1;NS=6;NT=6;NU=6 FI S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)
SL2.40ch00 424 . C T . PASS NV=1;NW=1;NS=5;NT=5;NU=5 FI LA0113 (039),S cheesemaniae (054),S pimpinellifolium (044),S pimpinellifolium unc (045),S pimpinellifolium (047)
Generate even sized fragments (if needed):
fasta_spacer.py GENOME.fa 50000
OUTPUT: GENOME.fa.50000.gff
SL2.40ch00 . fragment_10000 1 10000 . . . Alias=Frag_SL2.40ch00g10000_1;ID=fragment:Frag_SL2.40ch00g10000_1;Name=Frag_SL2.40ch00g10000_1;length=10000;csize=21805821
SL2.40ch00 . fragment_10000 10001 20000 . . . Alias=Frag_SL2.40ch00g10000_2;ID=fragment:Frag_SL2.40ch00g10000_2;Name=Frag_SL2.40ch00g10000_2;length=10000;csize=21805821
Filter with gff:
vcffiltergff.py -k -f PROJNAME -g GENOME.fa_50000.gff -i short2.lst.vcf.gz.simplified.vcf.gz 2>&1 | tee short2.lst.vcf.gz.simplified.vcf.gz.log
OUTPUT:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT FILENAMES
SL2.40ch00 391 . C T . PASS NV=1;NW=1;NS=6;NT=6;NU=6 FI S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)
Concatenate the SNPs of each fragment into FASTA:
find PROJNAME -name '*.vcf.gz' | xargs -I{} -P50 bash -c 'vcfconcat.py -f -i {} 2>&1 | tee {}.concat.log'
OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta
>Moneymaker_001
ATAATCTAGCTGGAACCCTTGTTTTTCTCGCGATTGGGGTTCAAGTGCACACCACATGTC
AGGGA
>Alisa_Craig_002
ATAATCTAGCTGGAACCCTTGTTTTTCTTGCGATTGGGGTTCAAGTGCGCGCTGCGTGAC
AGGAA
Run FastTree in each of the FASTA files:
export OMP_NUM_THREADS=3
find PROJNAME -name '*.fasta' | sort | xargs -I{} -P30 bash -c 'FastTreeMP -fastest -gamma -nt -bionj -boot 100 -log {}.tree.log -out {}.tree {}'
OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta.tree
((((Dana_018:0.0,Belmonte_033:0.0):0.00054,((TR00026_102:0.01587,(PI272654_023:0.03426,(((S_huaylasense_063:0.00054,((Lycopersicon_sp_025:0.0,S_chilense_065:0.0):0.00054,S_chilense_064:0.01555)0.780:0.01548)0.860:0.01547,((S_peruvianum_new_049:0.0,S_chiemliewskii_051:0.0,S_chiemliewskii_052:0.0,S_cheesemaniae_053:0.0,S_cheesemaniae_054:0.0,S_neorickii_056:0.0,S_neorickii_057:0.0,S_peruvianum_060:0.0,S_habrochaites_glabratum_066:0.0,S_habrochaites_glabratum_068:0.0,S_habrochaites_070:0.0,S_habrochaites_071:0.0,S_habrochaites_072:0.0,S_pennellii_073:0.0,S_pennellii_074:0.0,TR00028_LA1479_105:0.0,ref:0.0):0.00054,((S_arcanum_058:0.01482,(S_huaylasense_062:0.08258,S._arcanum_new_075:0.00054)0.880:0.03260)0.960:0.04917,(((Gardeners_Delight_003:0.00054,(Katinka_Cherry_007:0.0,Trote_Beere_016:0.0,Winter_Tipe_031:0.0):0.01559)0.900:0.03206,(PI129097_022:0.00054,(S_galapagense_104:0.04782,(LA0113_039:0.01223,((S_pimpinellifolium_047:0.01628,(S_arcanum_059:0.00055,(S_habrochaites_glabratum_067:0.01562,S_habrochaites_glabratum_069:0.01562)1.000:0.08287)0.920:0.04857)0.670:0.01186,S_habrochaites_042:0.03551)0.990:0.12956)0.960:0.06961)0.710:0.00054)0.800:0.01578)0.760:0.01558,(T1039_017:0.08246,S_pimpinellifolium_044:0.00054)0.980:0.08153)0.230:0.00053)0.910:0.00055)0.910:0.00054)0.830:0.01549,S_pimpinellifolium_046:0.00054)0.980:0.08610)0.660:0.01369)0.530:0.04644,(TR00027_103:0.00054,(PI365925_037:0.04936,S_cheesemaniae_055:0.03179)0.650:0.08462)1.000:0.41706)0.650:0.00296)0.940:0.01555,(The_Dutchman_028:0.00053,(((Polish_Joe_026:0.0,Brandywine_089:0.0):0.00054,((((Porter_078:0.01608,Kentucky_Beefsteak_093:0.01542)0.880:0.03271,(Thessaloniki_096:0.08543,Bloodt_Butcher_088:0.03267)0.700:0.01564)0.800:0.01585,(Giant_Belgium_091:0.01562,(Moneymaker_001:0.00054,(Dixy_Golden_Giant_090:0.01579,(Large_Red_Cherry_077:0.03276,Momatero_015:0.04969)0.720:0.01528)0.870:0.01570)0.850:0.01556)0.480:0.00055)0.930:0.03157,Marmande_VFA_094:0.03158)0.970:0.00053)0.880:0.00053,Watermelon_Beefsteak_097:0.01555)0.890:0.01559)0.970:0.03159)0.950:0.00054,PI169588_041:0.00054,((Sonato_012:0.11798,(((All_Round_011:0.01555,Chih-Mu-Tao-Se_038:0.00054)0.180:0.00054,(((Jersey_Devil_024:0.0,Chag_Li_Lycopersicon_esculentum_032:0.0,S_pimpinellifolium_unc_043:0.0):0.00054,(((PI311117_036:0.04839,((Taxi_006:0.0,Tiffen_Mennonite_034:0.0):0.00054,(Cal_J_TM_VF_027:0.00053,(Lycopersicon_esculentum_828_021:0.00054,(Black_Cherry_029:0.03245,(Galina_005:0.00054,S_pimpinellifolium_unc_045:0.01559)0.880:0.03248)0.770:0.01547)0.950:0.03179)0.160:0.01560)0.840:0.01563)0.420:0.00054,Lycopersicon_esculentum_825_020:0.00054)0.860:0.01556,((Cross_Country_013:0.0,ES_58_Heinz_040:0.0):0.00054,(Rutgers_004:0.01554,Lidi_014:0.04758)0.900:0.00054)0.880:0.00054)0.860:0.01558)0.080:0.01560,(Alisa_Craig_002:0.01560,John_s_big_orange_008:0.00054)1.000:0.00054)0.840:0.01558)0.800:0.01566,(Large_Pink_019:0.01555,Anto_030:0.00054)0.140:0.00054)0.920:0.01555)0.680:0.00054,Wheatley_s_Frost_Resistant_035:0.03155)0.950:0.00054);
find PROJNAME -name '*.fasta' | sort | xargs -I{} -P30 bash -c 'FastTreeMP -nt -makematrix {} > {}.matrix'
OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta.matrix
Moneymaker_001 0.000000 0.134437 0.345611 0.134437 0.321609
Alisa_Craig_002 0.134437 0.000000 0.211925 0.064210
Gardeners_Delight_003 0.345611 0.211925 0.000000 0.211925
Process the data into memory dump database (pyckle):
vcf_walk_ram.py --pickle PROJNAME
OUTPUT:
walk_out_10k.db
walk_out_10k_SL2.40ch00.db
walk_out_10k_SL2.40ch01.db
walk_out_10k_SL2.40ch02.db
walk_out_10k_SL2.40ch03.db
walk_out_10k_SL2.40ch04.db
walk_out_10k_SL2.40ch05.db
walk_out_10k_SL2.40ch06.db
walk_out_10k_SL2.40ch07.db
walk_out_10k_SL2.40ch08.db
walk_out_10k_SL2.40ch09.db
walk_out_10k_SL2.40ch10.db
walk_out_10k_SL2.40ch11.db
walk_out_10k_SL2.40ch12.db
Convert (pickle) database to SQLite (if dependencies installed):
vcf_walk_sql.py PROJNAME
OUTPUT:
walk_out_10k.sqlite
This project is maintained by Saulo Aflitos ( GitHub and LinkedIn ) with support from Applied Bioinformatics and WageningenUR
<script type="text/javascript">var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E"));</script> <script type="text/javascript">try { var pageTracker = _gat._getTracker("UA-5291039-9"); pageTracker._trackPageview(); } catch(err) {}</script>--Get Data
---Installation
----Docker
-----Virtual Machine
------VirtualBox
------VMWare
-----Manually
------Getting the code
------Global dependencies
-------Visualization
-------Standalone
--------Install Linux dependencies
--------Install Python dependencies
-------Apache
--------Install Apache dependencies
-------Calculations
-Running
--Running Visualization Server
--Running Calculations
---General
---Input Data
---Run
----Automatically
-----Examples
----Manually
-----Merging
-----Splitting
-----Cleaning
-----Phylogeny
-----Extraction
-----Database creation