Skip to content

Commit

Permalink
VSEARCH 1.0.10: Fix bug with ignored sizein, update copyright years, …
Browse files Browse the repository at this point in the history
…README.md
  • Loading branch information
torognes committed Jan 23, 2015
1 parent f1f8c02 commit 727f65c
Show file tree
Hide file tree
Showing 24 changed files with 56 additions and 107 deletions.
22 changes: 17 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,22 +30,22 @@ If you can't find an answer in the VSEARCH documentation, please visit the [VSEA

In the example below, VSEARCH will identify sequences in the file database.fsa that are at least 90% identical on the plus strand to the query sequences in the file queries.fsa and write the results to the file alnout.txt.

`./vsearch-1.0.9-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`
`./vsearch-1.0.10-linux-x86_64 --usearch_global queries.fsa --db database.fsa --id 0.9 --alnout alnout.txt`

## Download and install

The latest releases of VSEARCH are available [here](https://github.com/torognes/vsearch/releases).

Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.9-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.9-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).
Binary executables of VSEARCH are available in the `bin` folder for [GNU/Linux on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.10-linux-x86_64) and [Apple Mac OS X on x86-64 systems](https://github.com/torognes/vsearch/blob/master/bin/vsearch-1.0.10-osx-x86_64). These executables include support for input files compressed by zlib and bzip2 (with files usually ending in .gz or .bz2).

Download the appropriate executable and make a symbolic link in a folder included in your `$PATH` from `vsearch` to the appropriate binary. You may use the following commands (assuming `~/bin` is in your `$PATH`):

```sh
cd ~
mkdir -p bin
cd bin
wget https://github.com/torognes/vsearch/releases/download/v1.0.9/vsearch-1.0.9-linux-x86_64
ln -s vsearch-1.0.9-linux-x86_64 vsearch
wget https://github.com/torognes/vsearch/releases/download/v1.0.10/vsearch-1.0.10-linux-x86_64
ln -s vsearch-1.0.10-linux-x86_64 vsearch
```

Substitute `linux` with `osx` in those lines if you're on a Mac.
Expand All @@ -68,7 +68,7 @@ We have chosen to select all unique kmers from the query. At least 6 of these km

It appears that there are differences in usearch between the searches performed by the `--usearch_global` command and the clustering commands. Notably, it appears like `--usearch_global` simply ignores the options `--wordlength`, `--slots` and `--pattern`, while the clustering commands takes them into account. VSEARCH supports the `--wordlength` option for kmer lengths from 3 to 15, but the options `--slots` and `--pattern` are ignored.

**Alignment:** VSEARCH uses a 8-way 16-bit SIMD vectorized implementation of the full dynamic programming algorithm (Needleman-Wunsch) for global sequence alignment. It is an adaptation of the method described by Rognes (2011). USEARCH by default uses a heuristic procedure involving seeding, extension and banded dynamic programming. If the `--fulldp` option is specified to USEARCH it will also use a full dynamic programming approach, but USEARCH is then considerably slower.
**Alignment:** VSEARCH uses a 8-way 16-bit SIMD vectorized implementation of the full dynamic programming algorithm (Needleman-Wunsch) for global sequence alignment. It is an adaptation of the method described by Rognes (2011). Due to the extreme memory requirements of this method when aligning two long sequences (e.g. more than 5000bp long), an alternative algorithm described by Hirschberg (1975) and Myers and Miller (1988) is used when aligning a pair of long sequences. This alternative algorithm uses only a linear amount of memory but is much slower. USEARCH by default uses a heuristic procedure involving seeding, extension and banded dynamic programming. If the `--fulldp` option is specified to USEARCH it will also use a full dynamic programming approach, but USEARCH is then considerably slower.

**Search Accuracy:** The accuracy of VSEARCH searches has been assessed and compared to USEARCH version 7.0.1090. The Rfam 11.0 database was used for the assessment, as described on the [USEARCH website](http://drive5.com/usearch/benchmark_rfam.html). A similar procedure was described in the USEARCH paper using the Rfam 9.1 database.

Expand Down Expand Up @@ -365,25 +365,37 @@ the [Protist Ribosomal Database](http://ssu-rrna.org/) (Guillou et al. 2012).
**Search and clustering orders of magnitude faster than BLAST.**
*Bioinformatics*, 26 (19): 2460-2461.
doi:[10.1093/bioinformatics/btq461](http://dx.doi.org/10.1093/bioinformatics/btq461)

* Edgar RC, Haas BJ, Clemente JC, Quince C, Knight R (2011)
**UCHIME improves sensitivity and speed of chimera detection.**
*Bioinformatics*, 27 (16): 2194-2200.
doi:[10.1093/bioinformatics/btr381](http://dx.doi.org/10.1093/bioinformatics/btr381)

* Farrar M (2007)
**Striped Smith-Waterman speeds database searches six times over other SIMD implementations.**
*Bioinformatics* (2007) 23 (2): 156-161.
doi:[10.1093/bioinformatics/btl582](http://dx.doi.org/10.1093/bioinformatics/btl582)

* Guillou L., Bachar D., Audic S., Bass D., Berney C., Bittner L., Boutte C., Burgaud G., de Vargas C., Decelle J., del Campo J., Dolan J., Dunthorn M., Edvardsen B., Holzmann M., Kooistra W., Lara E., Lebescot N., Logares R., Mahé F., Massana R., Montresor M., Morard R., Not F., Pawlowski J., Probert I., Sauvadet A.-L., Siano R., Stoeck T., Vaulot D., Zimmermann P. & Christen R. (2013)
**The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote Small Sub-Unit rRNA sequences with curated taxonomy.**
*Nucleic Acids Research*, 41 (D1), D597-D604.
doi:[10.1093/nar/gks1160](http://dx.doi.org/10.1093/nar/gks1160)

* Hirschberg D.S (1975) A linear space algorithm for computing maximal common subsequences. *Comm ACM*, 18(6), 341-343. doi:[10.1145/360825.360861](http://dx.doi.org/10.1145/360825.360861)

* Karsenti E., González Acinas S., Bork P., Bowler C., de Vargas C., Raes J., Sullivan M. B., Arendt D., Benzoni F., Claverie J.-M., Follows M., Jaillon O., Gorsky G., Hingamp P., Iudicone D., Kandels-Lewis S., Krzic U., Not F., Ogata H., Pesant S., Reynaud E. G., Sardet C., Sieracki M. E., Speich S., Velayoudon D., Weissenbach J., Wincker P. & the Tara Oceans Consortium (2011)
**A holistic approach to marine eco-systems biology.**
*PLoS Biology*, 9(10), e1001177.
doi:[10.1371/journal.pbio.1001177](http://dx.doi.org/10.1371/journal.pbio.1001177)

* Logares R., Audic S., Bass D., Bittner L., Boutte C., Christen R., Claverie J.-M., Decelle J., Dolan J. R., Dunthorn M., Edvardsen B., Gobet A., Kooistra W. H. C. F., Mahé F., Not F., Ogata H., Pawlowski J., Pernice M. C., Romac S., Shalchian-Tabrizi K., Simon N., Stoeck T., Santini S., Siano R., Wincker P., Zingone A., Richards T., de Vargas C. & Massana R. (2014) The patterning of rare and abundant community assemblages in coastal marine-planktonic microbial eukaryotes.
*Current Biology*, 24(8), 813-821.
doi:[10.1016/j.cub.2014.02.050](http://dx.doi.org/10.1016/j.cub.2014.02.050)

* Myers E.W., & Miller W. (1988) Optimal alignments in linear space.
*Comput Appl Biosci*, 4(1), 11-17.
doi:[10.1093/bioinformatics/4.1.11](http://dx.doi.org/10.1093/bioinformatics/4.1.11)

* Rognes T (2011)
**Faster Smith-Waterman database searches by inter-sequence SIMD parallelisation.**
*BMC Bioinformatics*, 12: 221.
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2014 Torbjorn Rognes
# Copyright (C) 2014-2015 Torbjorn Rognes
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile.BZLIB
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2014 Torbjorn Rognes & Tomas Fluori
# Copyright (C) 2014-2015 Torbjorn Rognes & Tomas Fluori
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile.ZLIB
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2014 Torbjorn Rognes & Tomas Flouri
# Copyright (C) 2014-2015 Torbjorn Rognes & Tomas Flouri
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile.static
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (C) 2014 Torbjorn Rognes & Tomas Flouri
# Copyright (C) 2014-2015 Torbjorn Rognes & Tomas Flouri
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/align.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/align.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/align_simd.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2012-2014 Torbjorn Rognes & Frederic Mahe
Copyright (C) 2012-2015 Torbjorn Rognes & Frederic Mahe
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/allpairs.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/chimera.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
6 changes: 3 additions & 3 deletions src/cluster.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down Expand Up @@ -996,7 +996,7 @@ void cluster(char * dbname,
db_getsequencelen(seqno),
opt_fasta_width);
}
abundance += db_getabundance(seqno);
abundance += opt_sizein ? db_getabundance(seqno) : 1;
size++;
}
else
Expand Down Expand Up @@ -1044,7 +1044,7 @@ void cluster(char * dbname,
}

centroid = clusterinfo[i].seqno;
abundance = db_getabundance(seqno);
abundance = opt_sizein ? db_getabundance(seqno) : 1;
size = 1;
lastcluster = clusterno;
}
Expand Down
2 changes: 1 addition & 1 deletion src/cluster.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/db.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes & Tomas Flouri
Copyright (C) 2014-2015 Torbjorn Rognes & Tomas Flouri
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/derep.cc
Original file line number Diff line number Diff line change
Expand Up @@ -248,7 +248,7 @@ void derep_fulllength()
}
}

long ab = db_getabundance(i);
long ab = opt_sizein ? db_getabundance(i) : 1;
sumsize += ab;

if (bp->size)
Expand Down
79 changes: 8 additions & 71 deletions src/linmemalign.cc
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,11 @@
These functions are based on the following articles:
- Hirschberg (1975) Comm ACM 18:341-343
- Huang, Hardison & Miller (1990) CABIOS 6:373-381
- Myers & Miller (1988) CABIOS 4:11-17
The method has been adapted for the use of different
gap penalties for query/target/left/interior/right gaps.
scorematrix consists of 16x16 long integers
Sequences and alignment matrix:
Expand All @@ -42,8 +44,6 @@
*/

//#define DEBUG

LinearMemoryAligner::LinearMemoryAligner()
{
scorematrix = 0;
Expand Down Expand Up @@ -183,13 +183,6 @@ void LinearMemoryAligner::diff(long a_start,
bool b_left, /* includes left end of b */
bool b_right) /* includes right end of b */
{
#ifdef DEBUG
printf("diff(a_start=%ld, b_start=%ld, a_len=%ld, b_len=%ld, gap_b_left=%d, gap_b_right=%d, a_left=%d, a_right=%d, b_left=%d, b_right=%d)\n",
a_start, b_start, a_len, b_len,
gap_b_left, gap_b_right,
a_left, a_right, b_left, b_right);
#endif

long MaxScore = 0;

if (b_len == 0)
Expand Down Expand Up @@ -299,10 +292,6 @@ void LinearMemoryAligner::diff(long a_start,
}
}

#ifdef DEBUG
printf("MaxScore=%ld best=%ld\n", MaxScore, best);
#endif

if (best == -1)
{
cigar_add('D', 1);
Expand Down Expand Up @@ -347,10 +336,6 @@ void LinearMemoryAligner::diff(long a_start,

/* compute matrix */

#ifdef DEBUG
printf("Forward:\n");
#endif

for (i = 1; i <= I; i++)
{
long p = HH[0];
Expand All @@ -362,10 +347,6 @@ void LinearMemoryAligner::diff(long a_start,
HH[0] = h;
long f = LONG_MIN;

#ifdef DEBUG
printf("%4ld", h);
#endif

for (j = 1; j <= b_len; j++)
{
f = MAX(f, h - go_q_i) - ge_q_i;
Expand All @@ -382,17 +363,9 @@ void LinearMemoryAligner::diff(long a_start,
h = EE[j];
p = HH[j];
HH[j] = h;

#ifdef DEBUG
printf(" %4ld", h);
#endif

}
#ifdef DEBUG
printf("\n");
#endif

}

EE[0] = HH[0];

// Compute XX & YY in reverse phase
Expand All @@ -411,10 +384,7 @@ void LinearMemoryAligner::diff(long a_start,

/* compute matrix */

#ifdef DEBUG
printf("Reverse:\n");
#endif
for (i = 1; i <= a_len - I; i++)
for (i = 1; i <= a_len - I; i++)
{
long p = XX[0];

Expand All @@ -424,10 +394,6 @@ for (i = 1; i <= a_len - I; i++)
XX[0] = h;
long f = LONG_MIN;

#ifdef DEBUG
printf("%4ld", h);
#endif

for (j = 1; j <= b_len; j++)
{
f = MAX(f, h - go_q_i) - ge_q_i;
Expand All @@ -444,15 +410,9 @@ for (i = 1; i <= a_len - I; i++)
h = YY[j];
p = XX[j];
XX[j] = h;

#ifdef DEBUG
printf(" %4ld", h);
#endif
}
#ifdef DEBUG
printf("\n");
#endif
}

YY[0] = XX[0];


Expand All @@ -463,38 +423,24 @@ for (i = 1; i <= a_len - I; i++)

/* solutions with diagonal at break */

#ifdef DEBUG
printf("Forward scores:");
#endif

for (j=0; j <= b_len; j++)
{
long Score = HH[j] + XX[b_len - j];
#ifdef DEBUG
printf(" %ld+%ld=%ld", HH[j], XX[b_len-j], Score);
#endif

if (Score > MaxScore0)
{
MaxScore0 = Score;
best0 = j;
}
}
#ifdef DEBUG
printf("\n");
#endif

long MaxScore1 = LONG_MIN;
long best1 = -1;

/* solutions that end with a gap in b from both ends at break */

#ifdef DEBUG
printf("Reverse scores:");
#endif

for (j=0; j <= b_len; j++)
{

long g;
if (b_left && (j==0))
g = go_t_l;
Expand All @@ -504,18 +450,13 @@ for (i = 1; i <= a_len - I; i++)
g = go_t_i;

long Score = EE[j] + YY[b_len - j] + g;
#ifdef DEBUG
printf(" %ld+%ld=%ld", EE[j], YY[b_len-j], Score);
#endif

if (Score > MaxScore1)
{
MaxScore1 = Score;
best1 = j;
}
}
#ifdef DEBUG
printf("\n");
#endif

long P;
long best;
Expand Down Expand Up @@ -545,10 +486,6 @@ for (i = 1; i <= a_len - I; i++)
}
}

#ifdef DEBUG
printf("Selection: P=%ld best=%ld, MaxScore=%ld\n", P, best, MaxScore);
#endif

/* recursively compute upper left and lower right parts */

if (P == 0)
Expand Down
2 changes: 1 addition & 1 deletion src/results.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
2 changes: 1 addition & 1 deletion src/searchcore.cc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/*
Copyright (C) 2014 Torbjorn Rognes
Copyright (C) 2014-2015 Torbjorn Rognes
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as
Expand Down
Loading

0 comments on commit 727f65c

Please sign in to comment.