Skip to content

Running

Saulo edited this page May 20, 2016 · 7 revisions

Running Visualization Server

Copy config.template to your data directory:

cp config.template [PATH TO DATA FOLDER]/config.py

Edit config.py to configure:

  • whether user control is enables or not (Default: False)
  • the server's port (Default: 10000)
  • pages which can be seen without login (to disable, comment line with #)
  • enable/disable debug of server
#decide whether to have user control or not
HAS_LOGIN    = False

#define port to server webpage
SERVER_PORT               = 10000

# pages which can be seen without login
librepaths = [
    '/api',
    '/favicon.ico'
]

DEBUG                     = False

Initialize iBrowser by running:

./ibrowser.py [PATH TO DATA FOLDER] init

This will

  • Create session secret
  • Create RSA key
  • create SSL certificate
  • Create default user (admin:admin)

ONLY IN CASE YOU WANT ACCESS CONTROL

  • Edit config.py and Set hasLogin to "True"
  • (ADVISED) Change admin password (otherwise default admin:admin will be created) by running:
./ibrowser.py [PATH TO DATA FOLDER] deluser admin
./ibrowser.py [PATH TO DATA FOLDER] adduser admin [DESIRED PASSWORD]
  • Optional

  • (Default: 2048) Change RSA key size by editing [PATH TO DATA FOLDER]/config.keylen

echo 2048 > [PATH TO DATA FOLDER]/config.keylen
  • Clean all config by running:
./ibrowser.py [PATH TO DATA FOLDER] clean
  • Create users manually by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] adduser [USER] [DESIRED PASSWORD]
  • Delete users manually by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] deluser [USER]
  • List users by running (can be perfomed in the UI):
./ibrowser.py [PATH TO DATA FOLDER] listusers

Run ibrowser.py

./ibrowser.py [PATH TO DATA FOLDER]

Running Calculations

General

This set of scripts takes as input a series of Variant Call Files (VCF) of species mapped against a single reference. After a series of conversions, all homozigous Single Nucleotide Polymorphisms (SNP) are extracted while ignoring heterozigous SNPS (hetSNP), Multiple Nucleotide Polymorphisms (MNP) and Insertion/Deletion events (InDel). For each individual, the reference's nucleotide will be assigned unless a SNP is presented. If any individual has a MNP, hetSNP or InDel at a given position, this position is skipped entirely. A General Feature Format (GFF) describing coordinates is used to split the genome into segments. Those segments can be genes, even sized fragments (10kb, 50kb, etc) or particular segments of interest as long as the coordinates are the same as the VCF files. A auxiliary script is provided to generate evenly sized segments. For each selected segment a fasta file will be generated and FastTree will create a distance matrix and a Newick Tree. After all data has been processed, the three files (fasta, matrix and newick) will be read and converted to a database. The webserver scripts will read and serve the data to a web browser. There are three scripts, a main script serves the data and two auxiliary servers to perform on-the-fly clustering and image conversion (from SVG to PNG).

Input Data

Enter the introgression browser folder

cd ~/introgressionbrowser/

Add vcfmerger folder to PATH:

export PATH=$PWD/vcfmerger:$PATH

If in a VM, check if your files were correctly shared by virtual box.

ls data

if you don't see your files, there's a mistake in the VM configuration. If you see you data, you can proceed. Enter the data folder. The folder structure should be as follows:

~/introgressionbrowser/
~/introgressionbrowser/project_name/
~/introgressionbrowser/project_name/analysis_name/
~/introgressionbrowser/project_name/analysis_name/input/

add your reference fasta file in the inside analysis_name folder.

IF YOU HAVE MULTIPLE, SINGLE SAMPLE, VCF FILES:: add all your VCF files inside input folder. add your reference fasta file in the base folder. create a TAB delimited file containing the name of your input files ( stands for TAB):

1<tab>input/file1.vcf.gz<tab>species 1
1<tab>input/file2.vcf.gz<tab>species 2
1<tab>input/file3.vcf.gz<tab>species 3

the folder structure should resemble:

~/introgressionbrowser/project_name/analysis_name/input/file1.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/analysis.csv

IF YOU HAVE A MULTI-COLUMN VCF FILE AND SINGLE SAMPLE VCF FILES. RUN split_multicolumn_vcf.py ON YOUR FILE:

split_multicolumn_vcf.py multi_VCF.vcf.gz

it will create a VCF file for each of your samples:

multi_VCFvcf_1_sample1.vcf
multi_VCFvcf_1_sample2.vcf
multi_VCFvcf_1_sample3.vcf
multi_VCFvcf_1_sample4.vcf

And it will automatically create a list file which you can use directly with iBrowser:

$cat batch_1.vcf.lst
1<tab>multi_VCFvcf_1_sample1.vcf<tab>sample1
1<tab>multi_VCFvcf_1_sample2.vcf<tab>sample2
1<tab>multi_VCFvcf_1_sample3.vcf<tab>sample3
1<tab>multi_VCFvcf_1_sample4.vcf<tab>sample4

the folder structure should resemble:

~/introgressionbrowser/project_name/analysis_name/input/file1.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/input/file2.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/batch_1.vcf.lst

IF YOU HAVE ONLY A SINGLE MULTI-COLUMN VCF FILE, CREATE A SAMPLE NAMES FILE AND RUN vcfmerger_multicolumn.py ON YOUR FILE:

for multi_VCF.vcf.gz

##fileformat=VCFv4.2
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample1.bam.vcf.gz sample2.bam.vcf.gz

create a sample_names.csv

#sample,name
sample1.bam.vcf.gz,sample 1
sample2.bam.vcf.gz,sample 2

run vcfmerger_multicolumn.py

usage: vcfmerger_multicolumn.py [-h] -i [INPUT] [-o [OUTPUT]] [-t [TABLE]]
                                [-k [KEYS]] [-v [TABLE_VS]] [-c [TRANSLATION]]
                                [-s [SAMPLES]] [-n] [-e]

Simplify merged VCF file.

optional arguments:
  -h, --help            show this help message and exit
  -i [INPUT], --input [INPUT]
                        Input file
  -o [OUTPUT], --output [OUTPUT]
                        Output file
  -t [TABLE], --table [TABLE]
                        Input table
  -k [KEYS], --keys [KEYS]
                        Input keys
  -v [TABLE_VS], --table-values [TABLE_VS]
                        Input table values
  -c [TRANSLATION], --chromosome-translation [TRANSLATION]
                        Translation table to chromosome names [e.g.:
                        1:Chr1;2:Chr2
  -s [SAMPLES], --samples [SAMPLES]
                        Samples (Columns) to keep [e.g.: Spp1;Spp3;Spp5
  -n, --keep-no-coverage
                        Keep rows containing no coverage
  -e, --keep-heterozygous
                        Keep rows hoterozygosity

It will automatically create a CSV file with all sample names and directly create an merged vcf file.

e.g.:

vcfmerger_multicolumn.py --input multi_VCF.vcf.gz --table sample_names.csv --keep-no-coverage

the folder structure should resemble:

~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz.simplified.vcf.gz
~/introgressionbrowser/project_name/analysis_name/multi_VCF.vcf.gz.list.csv.vcf.gz.simplified.vcf.gz.filtered.vcf.gz
~/introgressionbrowser/project_name/analysis_name/reference.fasta
~/introgressionbrowser/project_name/analysis_name/sample_names.csv

Now you can generate the makefile:

gen_makefile.py 

usage: gen_makefile.py [-h] [-i [INLIST]] [-f [INFASTA]] [-s [SIZE]]
                       [-p [PROJECT]] [-o [OUTFILE]] [-ec EXCLUDED_CHROMS]
                       [-ic INCLUDED_CHROMS] [-n] [-m] [-np]
                       [-t [SUB_THREADS]] [-St [SMART_THREADS]] [-SH] [-SI]
                       [-SS] [-So [SIMPLIFY_OUTPUT]]
                       [-Coc [CONCAT_CHROMOSOME]]
                       [-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]]]
                       [-Cos [CONCAT_START]] [-Coe [CONCAT_END]]
                       [-Cot [CONCAT_THREADS]] [-Cor] [-Con [CONCAT_REFNAME]]
                       [-CoR] [-CoRm [CONCAT_RILMADS]]
                       [-CoRs [CONCAT_RILMINSIM]] [-CoRg] [-CoRd]
                       [-Ftt [FASTTREE_THREADS]] [-Ftb [FASTTREE_BOOTSTRAP]]
                       [-Cle [CLUSTER_EXTENSION]] [-Clt [CLUSTER_THREADS]]
                       [-Clp] [-Cls] [-Cln] [-Clr] [-Clc]
                       [-Fic [FILTER_CHROMOSOME]] [-Fig [FILTER_GFF]]
                       [-FiI [FILTER_IGNORE [FILTER_IGNORE ...]]]
                       [-Fis [FILTER_START]] [-Fie [FILTER_END]] [-Fik] [-Fin]
                       [-Fiv] [-Fip FILTER_PROTEIN] [-Dbt DB_READ_THREADS]

Create makefile to convert files.

optional arguments:
  -h, --help            show this help message and exit
  -i [INLIST], --input [INLIST], --inlist [INLIST]
                        input tab separated file
  -f [INFASTA], --fasta [INFASTA], --infasta [INFASTA]
                        input reference fasta. requires split size
  -s [SIZE], --size [SIZE]
                        split size
  -p [PROJECT], --proj [PROJECT], --project [PROJECT]
                        project name
  -o [OUTFILE], --out [OUTFILE], --outfile [OUTFILE]
                        output name [default: makefile]
  -ec EXCLUDED_CHROMS, --excluded-chrom EXCLUDED_CHROMS
                        Do not use the following chromosomes
  -ic INCLUDED_CHROMS, --included-chrom INCLUDED_CHROMS
                        Use EXCLUSIVELY these chromosomes
  -n, --dry, --dry-run  dry-run
  -m, --merge, --cluster_merge
                        do merged clustering (resource intensive) [default:
                        no]
  -np, --no-pickle      do not generate pickle database [default: no]
  -t [SUB_THREADS], --sub_threads [SUB_THREADS]
                        threads of submake to tree building [default: 5]
  -St [SMART_THREADS], --smart_threads [SMART_THREADS]
                        threads of submake to tree building [default: 5]
  -SH, --simplify-include-hetero
                        Do not simplify heterozygous SNPS
  -SI, --simplify-include-indel
                        Do not simplify indel SNPS
  -SS, --simplify-include-singleton
                        Do not simplify single SNPS
  -So [SIMPLIFY_OUTPUT], --simplify-output [SIMPLIFY_OUTPUT]
                        Simplify output file
  -Coc [CONCAT_CHROMOSOME], --concat-chrom [CONCAT_CHROMOSOME], --concat-chromosome [CONCAT_CHROMOSOME]
                        Concat - Chromosome to filter [all]
  -CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-ignore [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-skip [CONCAT_IGNORE [CONCAT_IGNORE ...]]
                        Concat - Chromosomes to skip
  -Cos [CONCAT_START], --concat-start [CONCAT_START]
                        Concat - Chromosome start position to filter [0]
  -Coe [CONCAT_END], --concat-end [CONCAT_END]
                        Concat - Chromosome end position to filter [-1]
  -Cot [CONCAT_THREADS], --concat-threads [CONCAT_THREADS]
                        Concat - Number of threads [num chromosomes]
  -Cor, --concat-noref  Concat - Do not print reference [default: true]
  -Con [CONCAT_REFNAME], --concat-ref-name [CONCAT_REFNAME]
                        Concat - Reference name [default: ref]
  -CoR, --concat-RIL    Concat - RIL mode: false]
  -CoRm [CONCAT_RILMADS], --concat-RIL-mads [CONCAT_RILMADS]
                        Concat - RIL percentage of Median Absolute Deviation
                        to use (smaller = more restrictive): 0.25]
  -CoRs [CONCAT_RILMINSIM], --concat-RIL-minsim [CONCAT_RILMINSIM]
                        Concat - RIL percentage of nucleotides identical to
                        reference to classify as reference: 0.75]
  -CoRg, --concat-RIL-greedy
                        Concat - RIL greedy convert nucleotides to either the
                        reference sequence or the alternative sequence: false]
  -CoRd, --concat-RIL-delete
                        Concat - RIL delete invalid sequences: false]
  -Ftt [FASTTREE_THREADS], --fasttree_threads [FASTTREE_THREADS]
                        FastTree - number of threads for fasttree
  -Ftb [FASTTREE_BOOTSTRAP], --fasttree_bootstrap [FASTTREE_BOOTSTRAP]
                        FastTree - fasttree bootstrap
  -Cle [CLUSTER_EXTENSION], --cluster-ext [CLUSTER_EXTENSION], --cluster-extension [CLUSTER_EXTENSION]
                        Cluster - [optional] extension to search. [default:
                        .matrix]
  -Clt [CLUSTER_THREADS], --cluster-threads [CLUSTER_THREADS]
                        Cluster - threads for clustering [default: 5]
  -Clp, --cluster-no-png
                        Cluster - do not export cluster png
  -Cls, --cluster-no-svg
                        Cluster - do not export cluster svg
  -Cln, --cluster-no-tree
                        Cluster - do not export cluster tree. precludes no png
                        and no svg
  -Clr, --cluster-no-rows
                        Cluster - no rows clustering
  -Clc, --cluster-no-cols
                        Cluster - no column clustering
  -Fic [FILTER_CHROMOSOME], --filter-chrom [FILTER_CHROMOSOME], --filter-chromosome [FILTER_CHROMOSOME]
                        Filter - Chromosome to filter [all]
  -Fig [FILTER_GFF], --filter-gff [FILTER_GFF]
                        Filter - Gff Coordinate file
  -FiI [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-ignore [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-skip [FILTER_IGNORE [FILTER_IGNORE ...]]
                        Filter - Chromosomes to skip
  -Fis [FILTER_START], --filter-start [FILTER_START]
                        Filter - Chromosome start position to filter [0]
  -Fie [FILTER_END], --filter-end [FILTER_END]
                        Filter - Chromosome end position to filter [-1]
  -Fik, --filter-knife  Filter - Export to separate files
  -Fin, --filter-negative
                        Filter - Invert gff
  -Fiv, --filter-verbose
                        Filter - Verbose
  -Fip FILTER_PROTEIN, --filter-prot FILTER_PROTEIN, --filter-protein FILTER_PROTEIN
                        Filter - Input Fasta File to convert to Protein
  -Dbt DB_READ_THREADS, --db-threads DB_READ_THREADS
                        Db - Number of threads to read raw files

This will generate a makefile for you project (follow one of the examples in the manual) To run the analysis:

e.g. For a 50kb fragmentation with 20 threads:

gen_makefile.py -i multi_VCF.vcf.gz.list.csv --project run_name --smart_threads 20 --fasttree_threads 20 --merge --fasta reference.fasta --size 50000 --no-pickle --cluster-no-png --cluster-no-svg --no-pickle --cluster_merge

then run make to create the database:

make

It will generate a database output:

~/introgressionbrowser/project_name/analysis_name/run_name.sqlite

Now create a link to the data folder:

cd ~/introgressionbrowser/data
ln -s analysis/run_name.sqlite .

restart ibrowser:

If inside a VM, you can restart ibrowser by:

~/introgressionbrowser/restart.sh

Or restart the VM:

sudo shutdown -r now

If not inside a VM, you can restart ibrowser:

cd ~/introgressionbrowser/
pgrep -f ibrowser.py | xargs kill
python ibrowser.py data/

Run

Automatically

    Run gen_makefile.py to create a makefile for your project
      gen_makefile.py -h
usage: gen_makefile.py [-h] [-i [INLIST]] [-f [INFASTA]] [-s [SIZE]]
                       [-p [PROJECT]] [-o [OUTFILE]] [-ec EXCLUDED_CHROMS]
                       [-ic INCLUDED_CHROMS] [-n] [-m] [-np]
                       [-t [SUB_THREADS]] [-St [SMART_THREADS]] [-SH] [-SI]
                       [-SS] [-So [SIMPLIFY_OUTPUT]]
                       [-Coc [CONCAT_CHROMOSOME]]
                       [-CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]]]
                       [-Cos [CONCAT_START]] [-Coe [CONCAT_END]]
                       [-Cot [CONCAT_THREADS]] [-Cor] [-Con [CONCAT_REFNAME]]
                       [-CoR] [-CoRm [CONCAT_RILMADS]]
                       [-CoRs [CONCAT_RILMINSIM]] [-CoRg] [-CoRd]
                       [-Ftt [FASTTREE_THREADS]] [-Ftb [FASTTREE_BOOTSTRAP]]
                       [-Cle [CLUSTER_EXTENSION]] [-Clt [CLUSTER_THREADS]]
                       [-Clp] [-Cls] [-Cln] [-Clr] [-Clc]
                       [-Fic [FILTER_CHROMOSOME]] [-Fig [FILTER_GFF]]
                       [-FiI [FILTER_IGNORE [FILTER_IGNORE ...]]]
                       [-Fis [FILTER_START]] [-Fie [FILTER_END]] [-Fik] [-Fin]
                       [-Fiv] [-Fip FILTER_PROTEIN] [-Dbt DB_READ_THREADS]

Create makefile to convert files.

optional arguments:
  -h, --help            show this help message and exit
  -i [INLIST], --input [INLIST], --inlist [INLIST]
                        input tab separated file
  -f [INFASTA], --fasta [INFASTA], --infasta [INFASTA]
                        input reference fasta. requires split size
  -s [SIZE], --size [SIZE]
                        split size
  -p [PROJECT], --proj [PROJECT], --project [PROJECT]
                        project name
  -o [OUTFILE], --out [OUTFILE], --outfile [OUTFILE]
                        output name [default: makefile]
  -ec EXCLUDED_CHROMS, --excluded-chrom EXCLUDED_CHROMS
                        Do not use the following chromosomes
  -ic INCLUDED_CHROMS, --included-chrom INCLUDED_CHROMS
                        Use EXCLUSIVELY these chromosomes
  -n, --dry, --dry-run  dry-run
  -m, --merge, --cluster_merge
                        do merged clustering (resource intensive) [default:
                        no]
  -np, --no-pickle      do not generate pickle database [default: no]
  -t [SUB_THREADS], --sub_threads [SUB_THREADS]
                        threads of submake to tree building [default: 5]
  -St [SMART_THREADS], --smart_threads [SMART_THREADS]
                        threads of submake to tree building [default: 5]
  -SH, --simplify-include-hetero
                        Do not simplify heterozygous SNPS
  -SI, --simplify-include-indel
                        Do not simplify indel SNPS
  -SS, --simplify-include-singleton
                        Do not simplify single SNPS
  -So [SIMPLIFY_OUTPUT], --simplify-output [SIMPLIFY_OUTPUT]
                        Simplify output file
  -Coc [CONCAT_CHROMOSOME], --concat-chrom [CONCAT_CHROMOSOME], --concat-chromosome [CONCAT_CHROMOSOME]
                        Concat - Chromosome to filter [all]
  -CoI [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-ignore [CONCAT_IGNORE [CONCAT_IGNORE ...]], --concat-skip [CONCAT_IGNORE [CONCAT_IGNORE ...]]
                        Concat - Chromosomes to skip
  -Cos [CONCAT_START], --concat-start [CONCAT_START]
                        Concat - Chromosome start position to filter [0]
  -Coe [CONCAT_END], --concat-end [CONCAT_END]
                        Concat - Chromosome end position to filter [-1]
  -Cot [CONCAT_THREADS], --concat-threads [CONCAT_THREADS]
                        Concat - Number of threads [num chromosomes]
  -Cor, --concat-noref  Concat - Do not print reference [default: true]
  -Con [CONCAT_REFNAME], --concat-ref-name [CONCAT_REFNAME]
                        Concat - Reference name [default: ref]
  -CoR, --concat-RIL    Concat - RIL mode: false]
  -CoRm [CONCAT_RILMADS], --concat-RIL-mads [CONCAT_RILMADS]
                        Concat - RIL percentage of Median Absolute Deviation
                        to use (smaller = more restrictive): 0.25]
  -CoRs [CONCAT_RILMINSIM], --concat-RIL-minsim [CONCAT_RILMINSIM]
                        Concat - RIL percentage of nucleotides identical to
                        reference to classify as reference: 0.75]
  -CoRg, --concat-RIL-greedy
                        Concat - RIL greedy convert nucleotides to either the
                        reference sequence or the alternative sequence: false]
  -CoRd, --concat-RIL-delete
                        Concat - RIL delete invalid sequences: false]
  -Ftt [FASTTREE_THREADS], --fasttree_threads [FASTTREE_THREADS]
                        FastTree - number of threads for fasttree
  -Ftb [FASTTREE_BOOTSTRAP], --fasttree_bootstrap [FASTTREE_BOOTSTRAP]
                        FastTree - fasttree bootstrap
  -Cle [CLUSTER_EXTENSION], --cluster-ext [CLUSTER_EXTENSION], --cluster-extension [CLUSTER_EXTENSION]
                        Cluster - [optional] extension to search. [default:
                        .matrix]
  -Clt [CLUSTER_THREADS], --cluster-threads [CLUSTER_THREADS]
                        Cluster - threads for clustering [default: 5]
  -Clp, --cluster-no-png
                        Cluster - do not export cluster png
  -Cls, --cluster-no-svg
                        Cluster - do not export cluster svg
  -Cln, --cluster-no-tree
                        Cluster - do not export cluster tree. precludes no png
                        and no svg
  -Clr, --cluster-no-rows
                        Cluster - no rows clustering
  -Clc, --cluster-no-cols
                        Cluster - no column clustering
  -Fic [FILTER_CHROMOSOME], --filter-chrom [FILTER_CHROMOSOME], --filter-chromosome [FILTER_CHROMOSOME]
                        Filter - Chromosome to filter [all]
  -Fig [FILTER_GFF], --filter-gff [FILTER_GFF]
                        Filter - Gff Coordinate file
  -FiI [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-ignore [FILTER_IGNORE [FILTER_IGNORE ...]], --filter-skip [FILTER_IGNORE [FILTER_IGNORE ...]]
                        Filter - Chromosomes to skip
  -Fis [FILTER_START], --filter-start [FILTER_START]
                        Filter - Chromosome start position to filter [0]
  -Fie [FILTER_END], --filter-end [FILTER_END]
                        Filter - Chromosome end position to filter [-1]
  -Fik, --filter-knife  Filter - Export to separate files
  -Fin, --filter-negative
                        Filter - Invert gff
  -Fiv, --filter-verbose
                        Filter - Verbose
  -Fip FILTER_PROTEIN, --filter-prot FILTER_PROTEIN, --filter-protein FILTER_PROTEIN
                        Filter - Input Fasta File to convert to Protein
  -Dbt DB_READ_THREADS, --db-threads DB_READ_THREADS
                        Db - Number of threads to read raw files
Run MAKE:
      makefile -f makefile_[project name]
Copy [project name].sqlite to iBrowser/data folder
      cp [project name].sqlite ..
Create [project name].sqlite.nfo with the information about the database:
      #title as shall be shown in the UI
      title=Tomato 60 RIL - 50k
      #custom orders are optional.
      #more than one can be given in separate lines
      custom_order=RIL.customorder
(OPTIONAL) create custom order files:
      #NAME=RIL Single
      ##NAME is the name of this particular ordering as it will appear in the UI
      ##
      #ROWNUM=1
      ##ROWNUM is the column to read 1 in the "row order" section
      ##
      ##CHROMOSOME=
      ##CHROMOSOME can either be __global__/empty for ordering all chromosomes, chomosome name for ordering a particular chromosome
      ##
      ##row order
      ref
      S lycopersicum cv MoneyMaker LYC1365
      615
      634
      667
      688
      710
      618
      694
      678
      693
      685
      651
      669
      674
      676
Reload iBrowser

Examples

How to run gen_makefile.py can be found at vcfmerger/gen_makefile.py.examples

Arabidopsis 50Kbp
gen_makefile.py --input arabidopsis.csv         --infasta TAIR10.fasta                                     --size 50000 --project arabidopsis_50k              --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --excluded-chrom chloroplast --excluded-chrom mitochondria --cluster-no-cols
make -f makefile_arabidopsis_50k
Tomato 10Kbp
gen_makefile.py --input short2.lst --infasta S_lycopersicum_chromosomes.2.40.fa --size 10000               --project tom84_10k               --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_10k
Tomato 50Kbp
gen_makefile.py --input short2.lst --infasta S_lycopersicum_chromosomes.2.40.fa --size 50000               --project tom84_50k               --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_50k
Tomato Genes given a gff file containing only gene coordinates
gen_makefile.py --input short2.lst --filter-gff ITAG2.3_gene_models.gff3.gene.gff3                         --project tom84_genes             --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_genes
Tomato 10Kbp - Introgressed fragment giving a gff containing only the desired coordinated
gen_makefile.py --input short2.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_10000_introgression.gff --project tom84_10k_introgression --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_tom84_10k_introgression
RIL 50kbp
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff   --project RIL_50k                        --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --cluster-no-cols
make -f makefile_RIL_50k
RIL 50kbp with RIL mode activated
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff   --project RIL_50k_mode_ril               --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --cluster-no-cols
make -f makefile_RIL_50k_mode_ril
RIL 50kbp with RIL mode activated and greedy correction
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff   --project RIL_50k_mode_ril_greedy        --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-greedy --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_greedy
RIL 50kbp with RIL mode activated and deletion of bad regions
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff   --project RIL_50k_mode_ril_delete        --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-delete --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_delete
RIL 50kbp with RIL mode activated, greedy correction and deletion of bad regions
gen_makefile.py --input RIL.lst --filter-gff S_lycopersicum_chromosomes.2.40.fa_50000.gff   --project RIL_50k_mode_ril_delete_greedy --no-pickle --cluster-no-svg --smart_threads 25 --cluster-threads 5 --concat-RIL --concat-RIL-greedy --concat-RIL-delete --cluster-no-cols
make -f makefile_RIL_50k_mode_ril_delete_greedy

Manually

    Merge VCF files:
        vcfmerger.py short.lst
            OUTPUT: short.lst.vcf.gz
                #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FILENAMES
                SL2.40ch00      280     .       A       C       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S cheesemaniae (055)
                SL2.40ch00      284     .       A       G       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S cheesemaniae (054)
                SL2.40ch00      316     .       C       T       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S arcanum (059)
                SL2.40ch00      323     .       C       T       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S arcanum (059)
                SL2.40ch00      332     .       A       T       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S pimpinellifolium (047)
                SL2.40ch00      362     .       G       T       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S galapagense (104)
                SL2.40ch00      385     .       A       C       .       PASS    NV=1;NW=1;NS=1;NT=1;NU=1        FI      S neorickii (056)
                SL2.40ch00      391     .       C       T       .       PASS    NV=1;NW=1;NS=6;NT=6;NU=6        FI      S chiemliewskii (052),S neorickii (056),S arcanum (059),S habrochaites glabratum (066),S habrochaites glabratum (067),S habrochaites (072)

    Simplify merged VCF deleting hetSNP, MNP and InDels:
        vcfsimplify.py short.lst.vcf.gz
            OUTPUT: short.lst.vcf.gz.filtered.vcf.gz
                SL2.40ch00      391     .       C       T       .       PASS    NV=1;NW=1;NS=6;NT=6;NU=6        FI      S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)
                SL2.40ch00      416     .       T       A       .       PASS    NV=1;NW=1;NS=6;NT=6;NU=6        FI      S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)
                SL2.40ch00      424     .       C       T       .       PASS    NV=1;NW=1;NS=5;NT=5;NU=5        FI      LA0113 (039),S cheesemaniae (054),S pimpinellifolium (044),S pimpinellifolium unc (045),S pimpinellifolium (047)

    Generate even sized fragments (if needed):
        fasta_spacer.py GENOME.fa 50000
            OUTPUT: GENOME.fa.50000.gff
                SL2.40ch00      .       fragment_10000  1       10000   .       .       .       Alias=Frag_SL2.40ch00g10000_1;ID=fragment:Frag_SL2.40ch00g10000_1;Name=Frag_SL2.40ch00g10000_1;length=10000;csize=21805821
                SL2.40ch00      .       fragment_10000  10001   20000   .       .       .       Alias=Frag_SL2.40ch00g10000_2;ID=fragment:Frag_SL2.40ch00g10000_2;Name=Frag_SL2.40ch00g10000_2;length=10000;csize=21805821

    Filter with gff:
        vcffiltergff.py -k -f PROJNAME -g GENOME.fa_50000.gff -i short2.lst.vcf.gz.simplified.vcf.gz 2>&1 | tee short2.lst.vcf.gz.simplified.vcf.gz.log
            OUTPUT:
                #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  FILENAMES
                SL2.40ch00      391     .       C       T       .       PASS    NV=1;NW=1;NS=6;NT=6;NU=6        FI      S arcanum (059),S chiemliewskii (052),S habrochaites (072),S habrochaites glabratum (066),S habrochaites glabratum (067),S neorickii (056)

    Concatenate the SNPs of each fragment into FASTA:
        find PROJNAME -name '*.vcf.gz' | xargs -I{} -P50 bash -c 'vcfconcat.py -f -i {} 2>&1 | tee {}.concat.log'
            OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta
                >Moneymaker_001
                ATAATCTAGCTGGAACCCTTGTTTTTCTCGCGATTGGGGTTCAAGTGCACACCACATGTC
                AGGGA
                >Alisa_Craig_002
                ATAATCTAGCTGGAACCCTTGTTTTTCTTGCGATTGGGGTTCAAGTGCGCGCTGCGTGAC
                AGGAA

    Run FastTree in each of the FASTA files:
        export OMP_NUM_THREADS=3
        find PROJNAME -name '*.fasta' | sort | xargs -I{} -P30 bash -c 'FastTreeMP -fastest -gamma -nt -bionj -boot 100 -log {}.tree.log -out {}.tree {}'
            OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta.tree
                ((((Dana_018:0.0,Belmonte_033:0.0):0.00054,((TR00026_102:0.01587,(PI272654_023:0.03426,(((S_huaylasense_063:0.00054,((Lycopersicon_sp_025:0.0,S_chilense_065:0.0):0.00054,S_chilense_064:0.01555)0.780:0.01548)0.860:0.01547,((S_peruvianum_new_049:0.0,S_chiemliewskii_051:0.0,S_chiemliewskii_052:0.0,S_cheesemaniae_053:0.0,S_cheesemaniae_054:0.0,S_neorickii_056:0.0,S_neorickii_057:0.0,S_peruvianum_060:0.0,S_habrochaites_glabratum_066:0.0,S_habrochaites_glabratum_068:0.0,S_habrochaites_070:0.0,S_habrochaites_071:0.0,S_habrochaites_072:0.0,S_pennellii_073:0.0,S_pennellii_074:0.0,TR00028_LA1479_105:0.0,ref:0.0):0.00054,((S_arcanum_058:0.01482,(S_huaylasense_062:0.08258,S._arcanum_new_075:0.00054)0.880:0.03260)0.960:0.04917,(((Gardeners_Delight_003:0.00054,(Katinka_Cherry_007:0.0,Trote_Beere_016:0.0,Winter_Tipe_031:0.0):0.01559)0.900:0.03206,(PI129097_022:0.00054,(S_galapagense_104:0.04782,(LA0113_039:0.01223,((S_pimpinellifolium_047:0.01628,(S_arcanum_059:0.00055,(S_habrochaites_glabratum_067:0.01562,S_habrochaites_glabratum_069:0.01562)1.000:0.08287)0.920:0.04857)0.670:0.01186,S_habrochaites_042:0.03551)0.990:0.12956)0.960:0.06961)0.710:0.00054)0.800:0.01578)0.760:0.01558,(T1039_017:0.08246,S_pimpinellifolium_044:0.00054)0.980:0.08153)0.230:0.00053)0.910:0.00055)0.910:0.00054)0.830:0.01549,S_pimpinellifolium_046:0.00054)0.980:0.08610)0.660:0.01369)0.530:0.04644,(TR00027_103:0.00054,(PI365925_037:0.04936,S_cheesemaniae_055:0.03179)0.650:0.08462)1.000:0.41706)0.650:0.00296)0.940:0.01555,(The_Dutchman_028:0.00053,(((Polish_Joe_026:0.0,Brandywine_089:0.0):0.00054,((((Porter_078:0.01608,Kentucky_Beefsteak_093:0.01542)0.880:0.03271,(Thessaloniki_096:0.08543,Bloodt_Butcher_088:0.03267)0.700:0.01564)0.800:0.01585,(Giant_Belgium_091:0.01562,(Moneymaker_001:0.00054,(Dixy_Golden_Giant_090:0.01579,(Large_Red_Cherry_077:0.03276,Momatero_015:0.04969)0.720:0.01528)0.870:0.01570)0.850:0.01556)0.480:0.00055)0.930:0.03157,Marmande_VFA_094:0.03158)0.970:0.00053)0.880:0.00053,Watermelon_Beefsteak_097:0.01555)0.890:0.01559)0.970:0.03159)0.950:0.00054,PI169588_041:0.00054,((Sonato_012:0.11798,(((All_Round_011:0.01555,Chih-Mu-Tao-Se_038:0.00054)0.180:0.00054,(((Jersey_Devil_024:0.0,Chag_Li_Lycopersicon_esculentum_032:0.0,S_pimpinellifolium_unc_043:0.0):0.00054,(((PI311117_036:0.04839,((Taxi_006:0.0,Tiffen_Mennonite_034:0.0):0.00054,(Cal_J_TM_VF_027:0.00053,(Lycopersicon_esculentum_828_021:0.00054,(Black_Cherry_029:0.03245,(Galina_005:0.00054,S_pimpinellifolium_unc_045:0.01559)0.880:0.03248)0.770:0.01547)0.950:0.03179)0.160:0.01560)0.840:0.01563)0.420:0.00054,Lycopersicon_esculentum_825_020:0.00054)0.860:0.01556,((Cross_Country_013:0.0,ES_58_Heinz_040:0.0):0.00054,(Rutgers_004:0.01554,Lidi_014:0.04758)0.900:0.00054)0.880:0.00054)0.860:0.01558)0.080:0.01560,(Alisa_Craig_002:0.01560,John_s_big_orange_008:0.00054)1.000:0.00054)0.840:0.01558)0.800:0.01566,(Large_Pink_019:0.01555,Anto_030:0.00054)0.140:0.00054)0.920:0.01555)0.680:0.00054,Wheatley_s_Frost_Resistant_035:0.03155)0.950:0.00054);

        find PROJNAME -name '*.fasta' | sort | xargs -I{} -P30 bash -c 'FastTreeMP -nt -makematrix {} > {}.matrix'
            OUTPUT: PROJNAME/CHROMOSOME/short2.lst.vcf.gz.simplified.vcf.gz.filtered.vcf.gz.SL2.40ch01.000090300001-000090310000.Frag_SL2.40ch01g10000_9031.vcf.gz.SL2.40ch01.fasta.matrix
                Moneymaker_001 0.000000 0.134437 0.345611 0.134437  0.321609
                Alisa_Craig_002 0.134437 0.000000 0.211925 0.064210
                Gardeners_Delight_003 0.345611 0.211925 0.000000 0.211925

    Process the data into memory dump database (pyckle):
        vcf_walk_ram.py --pickle PROJNAME
            OUTPUT:
                walk_out_10k.db
                walk_out_10k_SL2.40ch00.db
                walk_out_10k_SL2.40ch01.db
                walk_out_10k_SL2.40ch02.db
                walk_out_10k_SL2.40ch03.db
                walk_out_10k_SL2.40ch04.db
                walk_out_10k_SL2.40ch05.db
                walk_out_10k_SL2.40ch06.db
                walk_out_10k_SL2.40ch07.db
                walk_out_10k_SL2.40ch08.db
                walk_out_10k_SL2.40ch09.db
                walk_out_10k_SL2.40ch10.db
                walk_out_10k_SL2.40ch11.db
                walk_out_10k_SL2.40ch12.db

    Convert (pickle) database to SQLite (if dependencies installed):
        vcf_walk_sql.py PROJNAME
            OUTPUT:
                walk_out_10k.sqlite

Merging

Splitting

Cleaning

Phylogeny

Extraction

Database creation