Some python wrappers for a little bit of Sean Eddy's excellent Easel library for sequence manipulation.
At present, it's just a Python API to the Simple Sequence Index (SSI) format for rapid sequence retrieval from large files.
peasel
requires Python 2.7, either
setuptools or
distribute and a working C compiler.
Development requires Cython, tested with version 0.17.
To install:
pip install peasel
Or for the cutting edge version:
pip install https://github.com/cmccoy/peasel/archive/master.tar.gz
To run the unit tests:
python setup.py test
Use peasel.create_ssi
to build a sequence index:
>>> import peasel
>>> peasel.create_ssi('my_big_sequence_file.fasta') # creates my_big_sequence_file.fasta.ssi
2 # Number of sequences indexed
Sequence-indexes support dict
-like behavior:
>>> import peasel
>>> # Open the index
>>> index = peasel.open_ssi('my_big_sequence_file.fasta')
>>> index['sequence1']
<EaselSequence 0x7f38735b80f0 [name="sequence1";description="";length=5]>
>>> index.get('sequence1')
<EaselSequence 0x7f38735b8108 [name="sequence1";description="";length=5]>
>>> print index.get('missing_sequence')
None
If you'd prefer not to litter the filesystem with .ssi
files, use the temp_ssi
context manager:
>>> import peasel
>>> with peasel.temp_ssi('my_big_sequence_file.fasta') as index:
... index['sequence1']
...
<EaselSequence 0x7ff15065a0f0 [name="sequence1";description="";length=5]>
Distributed under the GPLv3. Easel source code is distributed under the Janelia
Farm License, included in the easel-src
subdirectory.