Skip to content

fast sketch for DNA or normal sequence data (MinHash WeightedMinHash OrderMinHash and hyperloglog)

License

Notifications You must be signed in to change notification settings

RabbitBio/RabbitSketch

 
 

Repository files navigation

RabbitSketch

Getting Started

A Linux system on a recent x86_64 CPU is required.

Installing (C++ interface)

cd RabbitSketch
mkdir build
cd build
cmake -DCXXAPI=ON .. -DCMAKE_INSTALL_PREFIX=.
make
make install
export LD_LIBRARY_PATH=`pwd`/lib:$LD_LIBRARY_PATH

Testing (C++)

cd ../examples/
#default install dir: ../build/
make 
./exe_main genome.fna

We will get the value of jaccard and distance.

or:

./exe_SKETCH_ALGORITHM FILE_PATH threshold(0.05) thread_num 

We will get the distance among large-scale genome sequences.

PYTHON bind

pip install:

cd RabbitSketch
pip3 install . --user

or

#pypi available (not up to date)
#pip3 install rabbitsketch --user

cmake install

cd RabbitSketch
mkdir build
cd build
cmake .. #default with pybind support
make

test using bpython or python

cd examples
python pysketch.py #require fastx

We will get the Jaccard index among large-scale genome sequences with Python API. To change the algorithm, simply modify

sketch.SKETCH_ALGORITHM.

** case study for multi-thread sketch building with Python API

python multi_minhash.py #require pymp

About

fast sketch for DNA or normal sequence data (MinHash WeightedMinHash OrderMinHash and hyperloglog)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 69.6%
  • C 29.5%
  • Other 0.9%