Skip to content

Commit

Permalink
DESCHRAMBLER
Browse files Browse the repository at this point in the history
  • Loading branch information
jkimlab authored and jkimlab committed Jan 23, 2017
0 parents commit f883318
Show file tree
Hide file tree
Showing 1,301 changed files with 7,972,982 additions and 0 deletions.
81 changes: 81 additions & 0 deletions DESCHRAMBLER.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
#!/usr/bin/perl

use strict;
use warnings;
use FindBin qw($Bin);
use Cwd;
use Cwd 'abs_path';

# check the number of argument
if ($#ARGV+1 != 1) {
print STDERR "Usage: ./DESCHRAMBLER.pl <parameter file>\n";
exit(1);
}

my $params_f = $ARGV[0];

# parse parameter file
my %params = ();
open(F,"$params_f");
while(<F>) {
chomp;
my $line = trim($_);
if ($line =~ /^#/ || $line eq "") { next; }
my ($name, $value) = split(/=/);
$name = trim($name);
$value = trim($value);
if (-f $value || -d $value) {
$params{$name} = abs_path($value);
} else {
$params{$name} = $value;
}
}
close(F);

check_parameters(\%params);

my $sf_dir = $params{"OUTPUTDIR"}."/SFs";
`mkdir -p $params{"OUTPUTDIR"}`;

# make blocks
print STDERR "\n## Constructing syntenic fragments ##\n";
my $cwd = getcwd();
`mkdir -p $sf_dir`;
`sed -e 's:<resolutionwillbechanged>:$params{"RESOLUTION"}:' $params{"CONFIGSFSFILE"} > $sf_dir/config.file`;
`sed -e 's:<willbechanged>:$Bin/code/makeBlocks:;s:<treewillbechanged>:$params{"TREEFILE"}:' $params{"MAKESFSFILE"} > $sf_dir/Makefile`;
#}
chdir($sf_dir);
`make all`;

chdir($cwd);
`$Bin/script/create_blocklist.pl $params{"REFSPC"} $sf_dir`;

# reconstruct APCFs
`$Bin/script/wrap_recon_apcf.pl $params{"TREEFILE"} $params{"RESOLUTION"} $params{"REFSPC"} $params{"MINADJSCR"} $sf_dir $params{"OUTPUTDIR"}`;

###############################################################
sub check_parameters {
my $rparams = shift;
my $flag = 0;
my $out = "";
my @parnames = ("REFSPC","OUTPUTDIR","RESOLUTION","TREEFILE","CONFIGSFSFILE","MAKESFSFILE","MINADJSCR");

foreach my $pname (@parnames) {
if (!defined($$rparams{$pname})) {
$out .= "$pname ";
$flag = 1;
}
}

if ($flag == 1) {
print STDERR "missing parameters: $out\n";
exit(1);
}
}

sub trim {
my $str = shift;
$str =~ s/^\s+//;
$str =~ s/\s+$//;
return $str;
}
13 changes: 13 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
BIN ?= $(shell pwd)

all:
sed -e 's:<pathtochainNet>:$(BIN)/examples/chainNet:g' examples/config.SFs.tmp > examples/config.SFs
cd lib/kent/src/lib && ${MAKE}
cd code/makeBlocks && ${MAKE}
cd code && ${MAKE}

clean:
cd lib/kent/src/lib && ${MAKE} clean
cd code/makeBlocks && ${MAKE} clean
cd code && ${MAKE} clean
cd examples && rm -rf APCFs.300K config.SFs
185 changes: 185 additions & 0 deletions README
Original file line number Diff line number Diff line change
@@ -0,0 +1,185 @@

DESCHRAMBLER
======================================

1. How to compile?
------------------

1.1. Type make

Simple type make to compile the DESCHRAMBLER package.


2. How to run?
--------------

2.1. Mandatory input files

DESCHRAMBLER requires the following four files as input. Example files are in
the "examples" directory.

- config.SFs: a configuration file for chain/net directories and species
- Makefile.SFs: a makefile for generating syntenic fragments
- tree.txt: a newick tree for used species and a target ancestor
- params.txt: a parameter file

The above four files need to be placed in the same directory.

2.1.1. config.SFs

In the file, the following three values need to be specified. Please refer to the example
file in the "examples" directory.

- >netdir: a path to the directory that contains net files
This directory needs to have a sub-directory structure as below.

reference/descendant1/chain/separate chain files for each chromosome (or scaffold)
reference/descendant1/net/separate net files for each chromosome (or scaffold)
reference/descendant2/chain/separate chain files for each chromosome (or scaffold)
reference/descendant2/net/separate net files for each chromosome (or scaffold)
reference/descendant3/chain/separate chain files for each chromosome (or scaffold)
reference/descendant3/net/separate net files for each chromosome (or scaffold)

An example directory, called "chainNet", is in the "examples" directory.

- >chaindir: a path to the directory that contains chain files
Usually, this directory is the same as the one used in >netdir above.

The chain/net files can be downloaded from the UCSC genome browser (http://genome.ucsc.edu/),
or generated by using tools provided by the UCSC genome browser. Please refer to
http://genomewiki.ucsc.edu/index.php/Whole_genome_alignment_howto.

- >species: the list of reference, ingroup and outgroup species
The species information is listed under ">species". Each line has the following
three values delimited by a space.

- Species name: this name must match with the directory name in the chain/net
directory and the name in the newick tree in the "tree.txt" file.
- Flag indicating whether the species is a reference (0), descendant (1),
or outgroup (2) species
- Flag indicating whether the assembly of the species consists of chromosomes (1)
or not (0)

Please do not change the ">resolution" section.

2.1.2. Makefile.SFs

You don't need to edit this file. Just copy this file to your working directory. Please use
the example file in the "examples" directory.

2.1.3. tree.txt

This file contains the newick tree for the species listed in the config.SFs file.
The target ancestor must be specified by the "@" symbol. Please refer to the example tree
file in the "examples" directory.

2.1.4. params.txt

The following parameters need to be specified.

- REFSPC: the name of a reference species
This name must match with the name used in the chain/net directory, species list in
the config.SFs file, and the tree.txt file.
- OUTPUTDIR: the output directory
- RESOLUTION: block resolution
- TREEFILE: a path to the newick tree file
- MINADJSCR: the minimum scores of adjacent syntenic fragments used in reconstruction
- CONFIGSFSFILE: a path to the config.SFs file
- MAKESFSFFILE: a path to the Makefile.SFs file


2.2. Run DESCHRAMBLER

There is a wrapper Perl script, 'DESCHRAMBLER.pl'. To run DESCHRAMBLER, type as:

<path to DESCHRAMBLER>/DESCHRAMBLER.pl <path to the parameter file: params.txt>


3. What are produced?
---------------------

Many files are generated in the output directory that is specified in the parameter file.
Among them the following files are the most important and useful.

3.1. block_list.txt (in the "SFs" subdirectory)

This file contains the coordinates and identifiers of syntenic fragments (SFs) used in
reconstruction.

Column 1: chromosome or scaffold
Column 2: start position (0-based); add 1 to obtain the actual start position
Column 3: end position (1-based)
Column 4: orientation
Column 5: identifier

3.2. Conserved.Segments (in the "SFs" directory)

This file contains the coordinates of genomic regions of each species belonging to each SF.

Lines starting with ">": ><identifier of an SF>
Other lines: the coordinates of genomic regions of each species belonging to the SF. The
format is as follows.
<species>.<chromosome or scaffold>:<start position>-<end position> <orientation>

Here, the start position is 0-based, and the end position is 1-based. You need to
add 1 to the start position to obtain the actual start position.

3.3. Ancestor.APCF

This file contains the list of ancestral predicted chromosome fragments (APCFs), and the order
and orientation of SFs in each APCF.

Line 1: >ANCESTOR <total number of SFs>
Lines starting with "#": # APCF <identifier of APCF>
Other lines: the order and orientation of SFs belonging to the APCF shown immediately above line.
SFs are shown by using their identifiers (refer to the block_list.txt file).

3.4. Ancestor.ADJS

This file contains the predicted adjacencies of pairs of SFs and their scores used in the
Ancestor.APCF file. SFs are shown by using their identifiers (refer to the block_list.txt file).

Column 1: the first SF identifier
Column 2: the second SF identifier
Column 3: adjacency score of the first and the second identifiers

In the first and second columns, 0 means the end of APCFs. For example, "0 64" means the SF 64
is placed at the end of a current APCF.

3.5. APCF_size.txt

This file contains the total length of APCFs.

Column 1: APCF identifier
Column 2: total length

The last line shows the sum of total APCF lengths.

3.6. APCFs

This file contains the list of coordinates of genomic regions of each species matching to each APCF.

Lines starting with "#": #<identifier of APCF>
The coordinates of genomic regions of each species (grouped by species) follow this line.

3.7. APCF_<SPC>.merged.map

This file contains the mapping genomic regions of the SPC species for APCFs. Each mapping is shown
by using the following three lines.

><identifier of mapping>
APCF.<identifier of APCF>:<start position in APCF>-<end position in APCF> <orientation>
<SPC>.<chromosome or scaffold>:<start position in SPC>-<end position in SPC> <orientation>

The start position is 0-based, and the end position is 1-based. You need to add 1 to the start position to
obtain the actual start position.


4. Supplementary data
---------------------

The chain/net files and reconstruction results are available at our supplementary website
(http://bioinfo.konkuk.ac.kr/DESCHRAMBLER).



30 changes: 30 additions & 0 deletions code/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
CC = gcc
GCC = g++
CDEBUG = -g#-ggdb -g -pg
OPTM =-O3
WARN = -W
KLIB = ../lib/kent/src/lib/
KINC = ../lib/kent/src/inc
CFLAGS = $(WARN) $(OPTM) -I. -I$(KINC)
CLIB = $(KLIB)/jkweb.a -lm

RM = rm -rf

ALLSRC = inferAdjProb deschrambler

all: $(ALLSRC)

%: %.c
$(CC) $(CDEBUG) $(CFLAGS) $+ $(CLIB) -o $@

%: %.cpp
$(GCC) $+ -o $@

.PHONY: tags
tags:
ctags *.[hc] lib/*.[hc] inc/*.h

.PHONY: clean
clean:
$(RM) $(ALLSRC) *.o *.dSYM

Loading

0 comments on commit f883318

Please sign in to comment.