Skip to content

Latest commit

 

History

History
881 lines (627 loc) · 42.8 KB

CHANGELOG.md

File metadata and controls

881 lines (627 loc) · 42.8 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog and this project adheres to Semantic Versioning.

v0.10.15 - 2024-10-08

Added

  • query property TopHits referencing the original object used to create the TopHits #76.

Changed

  • Require the query object to create a TopHits object.
  • Make TopHits generic over its query property.
  • Deprecate old query properties of TopHits (query_name, query_length, query_accession).

Removed

  • Detection of SSE flush from setup.py (#71).

v0.10.14 - 2024-07-16

Added

  • Detection of SSE flush modes to setup.py for possible performance gains on x86 platforms.

Changed

  • Migrate documentation to pydata-sphinx-theme.

Fixed

  • Documentation examples not using permanent resource links.

v0.10.13 - 2024-06-19

Changed

  • Allow AlphabetMismatch error to allow for an unknown actual alphabet.
  • Make HMMFile and HMMPressedFile raise AlphabetMismatch on files with mixed alphabets.

Fixed

  • Avoid calling fclose with null pointers in Sequence.write and MSA.write.

v0.10.12 - 2024-04-25

Fixed

  • HMM.__setstate__ not properly extracting the cutoff from pickle state for some HMMs (#67).

Changed

  • Update and remove some test files to reduce size of distributed package data.

v0.10.11 - 2024-03-27

Fixed

  • Compilation of Easel and HMMER code not using SSE4.1 extensions.

v0.10.10 - 2024-03-18 - YANKED

Fixed

  • Implement write function for fopencookie with off_t instead of off64_t for compatibility.
  • Fix handling of NULL buffers passed to read and write methods of fopencookie.

v0.10.9 - 2024-03-12 - YANKED

Fixed

  • Reallocation issue causing segmentation faults in nhmmer with more than 64 sequences (#62).

v0.10.8 - 2024-03-06 - YANKED

Added

  • Getter to access the strand of a Domain produced by a LongTargetsPipeline.

Changed

  • Display model and cutoff names in MissingCutoffs error message, if any.
  • Allow LongTargetsPipeline to be configured with window length and beta parameters.
  • Make nhmmer use the window length and beta from the options when creating a Builder.

Fixed

  • nhmmer not computing E-values for non-default window lengths (moshi4/pybarrnap#2).
  • SequenceFile and MSAFile crashing with a segmentation fault when given the path to a folder rather than a file.

v0.10.7 - 2024-03-04 - YANKED

Added

  • Pre-compiled wheels for PyPy 3.10.

Fixed

  • Invalid pointer cast in __getbuffer__ method of Matrix and Vector objects.
  • Remaining tests failing to run on missing importlib-resources.
  • pyhmmer.hmmer dispatchers possibly dead-locking on background thread errors (#60).

v0.10.6 - 2024-02-20 - YANKED

Added

  • armv7 and aarch64 to the PKGBUILD architectures.

Changed

  • SSIReader and SSIWriter constructors now accept path-like objects.
  • Skip tests dependending on importlib.resources.files when it is not available on the host machine.

Fixed

  • Memory leak caused by alphabet allocation in Pipeline._scan_loop_file.

v0.10.5 - 2024-02-16 - YANKED

Added

  • Alignment properties to get the original lengths of the sequence and HMM being stored.
  • Hit.length property storing the length of the hit sequence (or HMM).
  • TopHits.query_length storing the length of the hit HMM (or query).
  • Alignment.posterior_probabilities property showing an encoded representation of posteriors (#59, by @arajkovic).
  • Trace.score method to compute a trace score from a given profile and sequence.
  • Alignment.__sizeof__ implementation leveraing p7_alidisplay_SizeOf.

Fixed

  • Cutoffs proxy objects not recording their owner to prevent deallocation.
  • Avoid GIL re-acquisition in GeneticCode.translate.
  • Query metadata not being recorded in Hits obtained from daemon.Client.
  • Empty MatrixU8 creation attempting zero-allocation.
  • VectorU8.zeros allocating 4x more memory than required.
  • Memory leak caused by string duplication in __getbuffer__ methods of Matrix and Vector types.

v0.10.4 - 2023-10-29 - YANKED

Added

  • residue_markups argument to TextSequence and DigitalSequence constructors.
  • __reduce__ implementation to TextSequence, DigitalSequence, TextSequenceBlock and DigitalSequenceBlock.

Changed

  • Handling of easel I/O methods to avoid implicit GIL acquisition for error checking.

Fixed

  • Syntax errors in type annotation files.

v0.10.3 - 2023-10-22 - YANKED

Added

  • Out-of-band pickle serialization of Bitfield objects.
  • Getters for float attributes and forward/backward parameters of OptimizedProfile.
  • InvalidHMM error raised by HMM.validate.

Changed

  • Mark HMM.zero method as noexcept.
  • Increase size of buffer for the query queue in the hmmer dispatcher.

Fixed

  • Unneeded semaphore in pyhmmer.hmmer message passing implementation.
  • Broken assertion in Bitfield._from_raw_bytes.
  • Relax tolerance of HMM validation in TraceAligner.align_traces.

v0.10.2 - 2023-08-20 - YANKED

Fixed

  • Invalid buffer write in DigitalSequenceBlock.translate (#50).

v0.10.1 - 2023-08-17 - YANKED

Added

  • HMM.set_consensus method to set the consensus for a method or compute it from the emission probabilities.

Fixed

  • Platform detection for MacOS and Armv7 platforms in setup.py.
  • pyhmmer.plan7.HMM constructor setting a consensus string forcefully.

v0.10.0 - 2023-08-16 - YANKED

Added

  • Support for compiling wheels for Aarch64 and NEON-enabled Arm platforms.

Changed

Fixed

  • Patch missing PyInterpreterState_GetID preventing the package from working on PyPy 3.9.

v0.9.0 - 2023-08-03

Added

  • TopHits.mode property showing from which pipeline mode (search or scan) the hits were obtained.

Changed

  • Updated the code for Cython v3.0.

Fixed

  • TopHits.merge not properly handling inclusion and reporting for domains (#46, #47, by @zdk123).

v0.8.2 - 2023-06-07

Added

  • Bracket-style repr implementation to HMM, Profile and OptimizedProfile showing model alphabet, length and name.
  • MissingCutoffs and InvalidParameter exceptions inheriting ValueError.

Changed

  • Replace pthread locks with PyThread API for synchronizing models in OptimizedProfileBlock.

Fixed

  • Sequence length extraction in LongTargetsPipeline.search_hmm (#42).
  • LongTargetsPipeline.search_msa not building a HMM with Builder.build_msa.

v0.8.1 - 2023-05-19

Added

  • HMM.validate method to ensure a HMM holds HMMER structural constraints.
  • plan7.Transitions enum with transition names for indexing HMM.transition_probabilities.

v0.8.0 - 2023-05-01

PyHMMER has been accepted for publication in Bioinformatics. Paper can be reached at doi:10.1093/bioinformatics/btad214.

Added

  • pyhmmer.hmmer.jackhmmer function to run several JackHMMER iterative searches in parallel using multithreading (#35, by @zdk123).
  • HMM.to_profile shortcut method to allocate and configure a new Profile object.

Fixed

  • Type annotations of Pipeline.iterate_seq and Pipeline.iterate_hmm.
  • Potential memory leak on exceptions raised by HMMPressedFile.read.
  • Offsets.profile not recording offsets properly, causing pyhmmer.hmmer.hmmpress to produce invalid pressed files (#37).

Changed

  • HMM.__init__ and HMM.sample now take the Alphabet as the first argument, for consistency with the rest of the API.
  • HMM now require a name argument.

Removed

  • Deprecated ignore_gaps argument in SequenceFile.__init__.
  • Deprecated Sequence.taxonomy_id property.

v0.7.4 - 2023-04-14

Added

  • Recipes page to the documentation with code example for loading multiple HMM files (#24, by @zdk123).

Fixed

  • TraceAligner methods causing a segfault when passed an uninitialized HMM (#36).

Changed

  • HMM default constructor now always creates a valid HMM (with respects to probability arrays).
  • TraceAligner now validates the input HMM before calling the HMMER code.
  • Use stack allocation for all error buffers instead of creating empty bytearray objects where applicable.

v0.7.3 - 2023-03-24

Fixed

  • Wrong argument type in IterativeSearch.iterate_hmm method (#34, by @zdk123).

v0.7.2 - 2023-02-17

Added

  • easel.GeneticCode class wrapping an ESL_GENCODE struct for configuring translation.
  • DigitalSequence.translate method to translate a nucleotide sequence to a protein sequence. Metadata is copied from the source sequence to its translation (#31, by @valentynbez).

Deprecated

  • Sequence.taxonomy_id property, as it is not used by Easel and implementation is not consistent (see EddyRivasLab/easel#68).

v0.7.1 - 2022-12-15

Added

  • Missing __reduce__ method to TopHits.

Fixed

  • Build detection of available platform functions in setup.py.

v0.7.0 - 2022-12-04

Added

  • Bitfield.zeros and Bitfield.ones classmethods for constructing an empty bitfield of known size.
  • Bitfield.copy method to copy a bitfield object.
  • SequenceBlock and OptimizedProfileBlock classes to store Python objects next to a contiguous array of pointers for iterating with the GIL released.
  • SequenceFile.read_block method to read a whole sequence block from a file.
  • HMM.sample class method to generate a HMM at random given a Randomness source.
  • hmmscan function to scan a profile database with sequence queries.
  • deepcopy implementations to HMM, Profile and OptimizedProfile classes of plan7.
  • rewind method to HMMFile, HMMPressedFile and SequenceFile to reset a file back to its initial position.
  • name attribute to HMMFile, HMMPressedFile, MSAFile and SequenceFile to expose the path of a file (when it was created from path).
  • local property to Profile and OptimizedProfile, indicating whether a profile is in local or global mode.
  • multihit property to Profile and OptimizedProfile, indicating whether a profile is in unihit or multihit mode, with a setter taking care of the reconfiguration.
  • Domain.included and Domain.reported settable properties to report the inclusion and reporting status of a single domain.
  • TopHits.included and TopHits.reported sized iterator to iterate only on included and reported hits.
  • Domains.included and Domains.reported sized iterator to iterate only on included and reported domains.

Changed

  • Bitfield, Vector and Matrix can now be created from an iterable.
  • Pipeline search methods now expect a DigitalSequenceBlock or a SequenceFile for the target sequence database.
  • Pipeline scan methods now expect an OptimizedProfileBlock or a HMMPressedFile for the target profile database.
  • TraceAligner now expect a DigitalSequenceBlock for the sequences to align to the HMM.
  • Profile.configure now uses a default value of 400 for the L argument.
  • hmmsearch, nhmmer and phmmer support being given a single query instead of requiring an iterable.
  • HMMPressedFile can now be created, closed and used as a context manager directly without having to manage the source HMMFile.
  • Renamed Profile.optimized method to Profile.to_optimized.
  • Replaced Randomness.is_fast method with the Randomness.fast property.
  • Rewrite handling of Hit flags using settable properties (Hit.included, Hit.reported, Hit.new, Hit.dropped, Hit.duplicate) instead of methods.

Fixed

  • Memory leak in the LongTargetsPipeline search loop.
  • PyPy behaviour change of readinto methods now expecting unsigned char* instead of char* memoryview.
  • NULL-pointer dereference in Pipeline.search_hmm when given a query without name.
  • LongTargetsPipeline not recording the query name and accession.
  • Memory leak caused by using a non-default prior scheme when constructing a Builder.

Removed

  • PipelineSearchTargets, replaced in functionality with easel.DigitalSequenceBlock.
  • is_local and is_multihit methods of Profile and OptimizedProfile, replaced with equivalent properties.
  • Hit.manually_drop and Hit.manually_include methods, replaced with the different Hit properties.

v0.6.3 - 2022-09-09

Fixed

  • Error not being raised on alphabet detection failure in SequenceFile or MSAFile.
  • Add check in DigitalSequence constructor to make sure encoded characters are in valid range (#25).

Added

  • SequenceFile.guess_alphabet and MSAFile.guess_alphabet to guess the alphabet from an open file.
  • Alphabet.encode and Alphabet.decode to convert raw sequences between digital and text format.

v0.6.2 - 2022-08-12

Changed

  • hmmsearch, phmmer and nhmmer functions will reduce the requested number of threads to the number of queries, if it can be detected using operator.length_hint.

Added

  • Documentation for loading all HMMs from an HMMFile object at once (#23).
  • List of projects depending on PyHMMER to the Examples page of the documentation.

v0.6.1 - 2022-06-28

Added

  • pickle protocol support for TopHits objects, using the HMMER network serialization.
  • TopHits.write method to write hits to a file in tabular format.
  • query_name and query_accession properties to TopHits objects to access the name and accession of the query that produced the hits.

Fixed

  • Extraction of filename from file-like objects in the HMMFile constructor.
  • Use os.cpu_count instead of multiprocessing.cpu_count where applicable to preserve OS scheduling.
  • Wrong return type in docstring of HMM.insert_emissions.
  • TopHits.searched_nodes returning the searched number of residues instead of the searched number of model nodes.
  • Unsound decoding of pickled MatrixF or VectorF when data comes from a source of different endianness.

Changed

  • Rewrite pyhmmer.hmmer threading code using Deque instead of collections.Queue to store the queries and results.
  • Reduce memory consumption of pyhmmer.hmmer by reducing the number of semaphores and event flags used concurrently.
  • Make pyhmmer.hmmer main threads block on query insertion rather than result retrieval to make sure worker threads are never idling.

v0.6.0 - 2022-05-01

Added

  • pyhmmer.daemon module with an client implementation to communicate to a hmmpgmd server.
  • Pipeline.arguments methods to get a list of CLI arguments from the parameters used to initialize the Pipeline.
  • Setters for name, accession and description properties of plan7.Hit.
  • Constructor for individual plan7.Trace objects outside a plan7.Traces list.
  • plan7.Trace.from_sequence constructor to create a faux trace from a single sequence.
  • manually_include and manually_drop methods to plan7.Hit for manually selecting the inclusion status of a Hit in a TopHits instance.
  • compare_ranking method to plan7.TopHits for comparing the order of the hits compared to a previous run on the same targets stored in an easel.KeyHash object.
  • Pipeline.iterate_seq and Pipeline.iterate_hmm to run iterative queries like JackHMMER.
  • repr implementations for easel.MSAFile, easel.SequenceFile and easel.HMMFile showing the path or file object they were created from.
  • repr implementation for easel.Randomness showing the seed and the RNG algorithm in use.
  • str implementation for plan7.Alignment using HMMER original code to display a domain alignment like in search/scan results.

Changed

  • plan7.Trace.posterior_probabilities property may now be None in case no memory is allocated for the posteriors in the P7_TRACE struct.
  • TopHits.to_msa can now add additional sequences passed as arguments to the alignment.
  • plan7.HMMPressedFile now raises an exception on attempts to create a new instance manually.
  • ignore_gaps argument of easel.SequenceFile is now deprecated.
  • repr implementations for easel types now use the fully qualified class name.

Fixed

  • easel.SequenceFile.readinto docstring not rendering properly in documentation.
  • Type annotations of hits_included and hits_reported of plan7.TopHits marking these properties as bool instead of int.
  • Setters of name, accession, description and author properties of easel.MSA crashing when given None values.
  • Exception value raised from Easel code not being properly extracted.
  • Plain strings being used in example for easel.TextSequence and easel.TextMSA constructors where byte strings are expected (#20).

v0.5.0 - 2022-03-14

Added

  • plan7.PipelineSearchTargets to reduce the overhead when searching the same sequences several times with different. query profiles.
  • TopHits.copy method to duplicate a TopHits instance.
  • TopHits.merge method to merge hits obtained with the same query on different targets.
  • Buffer protocol implementation for pyhmmer.easel.Bitfield.

Changed

  • Renamed TopHits.included and TopHits.reported properties to TopHits.hits_included and TopHits.hits_included.
  • MSAFile and SequenceFile are now directly in digital mode if they are instantiated with digital=True.
  • SequenceFile.parse can now return a sequence in digital mode.
  • Reorganized tests to make then runnable from a site install.

Fixed

  • Usage of memcpy in contexts where it may have had undefined behaviour.
  • VectorF.__eq__ crashing when comparing two empty objects.
  • SequenceFile and MSAFile not closing file handles when raising an error in __init__.

v0.4.11 - 2021-12-15

Added

  • plan7.HMMFile.read method to read a single plan7.HMM from an plan7.HMMFile (instead of using next).
  • closed property on easel.SequenceFile, easel.MSAFile and plan7.HMMFile to mark whether a file object is closed.
  • plan7.HMMFile.is_pressed method to check whether a HMM file has associated pressed data.
  • plan7.HMMFile.optimized_profiles methods to read the plan7.OptimizedProfile entries in an plan7.HMMFile is there are associated pressed data available.
  • Getters for the name, accession, description, consensus, consensus_structure, evalue_parameters and cutoffs properties of a plan7.OptimizedProfile.
  • plan7.OptimizedProfile.__eq__ implementation to compare two optimized profiles.
  • __sizeof__ implementations for plan7.OptimizedProfile and plan7.Profile to get the allocated size of a profile.

Fixed

  • Double-free caused by the Cython cycle breaking feature on several view types (easel.Randomness, easel.Vector, easel.Matrix, plan7.Cutoffs, plan7.EvalueParameters, plan7.Offsets, plan7.Trace)
  • plan7.Hit.description using the pointer to the accession string erroneously, causing occasional NULL dereference.
  • plan7.OptimizedProfile.copy performing a shallow copy instead of a deep copy as expected.

Changed

  • pyhmmer.hmmer type annotations now explicit support for plan7.Profile or plan7.OptimizedProfile inputs where applicable.

v0.4.10 - 2021-12-06

Added

  • entropy and relative_entropy methods to easel.VectorF to compute the Shannon entropy of a vector and the Kullback-Leibler divergence of two vectors.
  • mean_match_entropy, mean_match_information and mean_match_relative_entropy methods to plan7.HMM to get information statistics of an HMM model.
  • match_occupancy method to plan7.HMM to compute the occupancy for each match state as an easel.VectorF.

Fixed

  • plan7.Builder.build_msa using the gap-open and gap-extend probabilities instead of the MSA itself to compute the transition probabilities for the new HMM.

Changed

  • plan7.Builder.build will now only load the score system once and reuse it unless a different score system is requested between calls.

v0.4.9 - 2021-11-11

Added

  • plan7.ScoreData class to store the substitution scores and maximal extensions for a long target search.
  • plan7.LongTargetsPipeline to run searches on targets longer than 100,000 residues.
  • Alphabet methods to check whether an Alphabet object is a DNA, RNA, nucleotide or protein alphabet.
  • window_length and window_beta arguments to plan7.Builder to set the max length of nucleotide HMM created by builder objects.

Changed

  • pyhmmer.hmmer.nhmmer now uses a LongTargetsPipeline instead of a Pipeline to search the target sequences.
  • pyhmmer.hmmer.nhmmer now supports HMM queries in addition to DigitalSequence and DigitalMSA queries.
  • pyhmmer.hmmer.phmmer now always assumes protein queries.
  • Z and domZ attributes of plan7.TopHits objects is now read-only.

Fixed

  • nhmmer now uses DNA as the default alphabet instead of amino acid alphabet like it did before (#12).

v0.4.8 - 2021-10-27

Added

  • Constructor arguments and properties to plan7.Pipeline to support bit score thresholds instead to filter top hits.
  • Support for creating a SequenceFile and an MSAFile using a Python file-like object instead of only supporting filenames.
  • Support for reading individual sequences from an MSA file with SequenceFile.
  • TextMSA.alignment to access the actual alignment as a tuple of strings.
  • Subtraction and division support for easel.Vector subclasses

Changed

  • plan7.Cutoffs now support setting the bit score cutoffs, but requires both to be set or cleared at the same time.
  • easel.Vector will always allocate some memory when created manually to avoid having a special empty case in every vector method.
  • pyhmmer.easel.AllocationError now stores the size it failed to allocate, and the number of elements when allocating an array.

Fixed

  • TextSequence.digitize will not raise a ValueError when the sequence contains invalid characters for the alphabet (previously was an UnexpectedError).

v0.4.7 - 2021-09-28

Added

  • TraceAligner, Trace and Traces classes to pyhmmer.plan7 to get tracebacks after aligning several sequences against an HMM.
  • pyhmmer.hmmalign function with the same features as the hmmalign binary from HMMER3.
  • Support for out-of-band pickling in easel.Vector and easel.Matrix.

Changed

  • Allow creating an empty Vector or Matrix by calling their constructor without arguments.

Fixed

  • Potential unreported exceptions in plan7.OptimizedProfile.write and several plan7.SSIWriter methods.

v0.4.6 - 2021-09-10

Added

  • pickle protocol for easel.Alphabet, easel.Bitfield, easel.KeyHash, easel.Vector, easel.Matrix and plan7.HMM.
  • taxonomy_id and residue_markups properties to easel.Sequence.
  • sum_score property to plan7.Hit.
  • plan7.EvalueParameters class to expose the e-value parameters of a plan7.HMM or a plan7.Profile.
  • Equality checks and slicing for easel.Matrix and easel.Vector.
  • Support for creating and manipulating zero-sized easel matrices and vectors.
  • plan7.Cutoffs class to expose the Pfam score cutoffs of a plan7.HMM or a plan7.Profile.
  • Keyword arguments to configure E-value thresholds when creating a plan7.Pipeline object.
  • Support for using model-specific thresholding options in plan7.Pipeline.

Changed

  • Use the replace error handler when decoding error messages to skip potential decoding issues when already building an exception.
  • Improve pyhmmer.hmmer to ensure background threads exit on a KeyboardInterrupt.
  • easel.VectorU8.__eq__ accepts any object implementing the buffer protocol.
  • plan7.HMM.creation_time now takes and returns a datetime.datetime object, assuming the field is only ever set with asctime.
  • Refactor easel.Vector and easel.Matrix and mark exposed memory as C-contiguous.

Fixed

  • easel.Alphabet not reporting potential allocation errors.
  • Potential buffer overflow in easel.Matrix and easel.Vector when calling __init__ more than once.

v0.4.5 - 2021-07-19

Added

  • OptimizedProfile.convert method to configure an optimized profile from a Profile without reallocating a new P7_OPROFILE struct.

Changed

  • Rewrite the plan7.Pipeline search loop to avoid reacquiring the GIL between reference sequences.
  • Require the reference sequences to be stored in a collection (instead of an iterable) when passing them to the search_hmm, search_msa and search_seq methods of plan7.Pipeline.
  • Avoid reallocating a new OptimizedProfile every time a new HMM is passed to Pipeline.search_hmm.
  • Relax the GIL while sorting and thresholding TopHits in Pipeline search methods.

v0.4.4 - 2021-07-07

Added

  • ignore_gaps parameter to pyhmmer.plan7.SequenceFile, allowing to skip the gap characters when reading a sequence from an ungapped format.
  • __sizeof__ implementation for some
  • Dedicated check for sequence length before running the platform-specific code in pyhmmer.plan7.Pipeline.

Fixed

  • Score system not being set in pyhmmer.plan7.Builder.build_msa.
  • Alphabet not being checked after the first sequence in Pipeline search and scan methods.

v0.4.3 - 2021-07-03

Fixed

  • File object wrappers not reporting exceptions raised when seeking on OSX/BSD platforms.

v0.4.2 - 2021-06-20

Added

  • pyhmmer.easel.Randomness class exposing a deterministic random number generator.
  • pyhmmer.plan7.Builder.randomness and pyhmmer.plan7.Pipeline.randomness attributes exposing the internal random number generator used by each object.
  • pyhmmer.plan7.Hit.best_domain property mapping to the highest scoring domain of a hit.
  • pyhmmer.plan7.OptimizedProfile.rbv property exposing match scores.
  • pyhmmer.plan7.Domain.pvalue and pyhmmer.plan7.Hit.pvalue reporting the p-value for a domain or hit bitscore.

Fixed

  • Dimensions of the pyhmmer.plan7.OptimizedProfile.sbv matrix not being properly set.

v0.4.1 - 2021-06-06

Fixed

  • Main buffer not being freed in MatrixF.__dealloc__ and MatrixU8.__dealloc__ when created without owner.

Added

  • Additional configuration values for pyhmmer.plan7.Pipeline as both constructor arguments and mutable properties.
  • consensus, consensus_structure and offsets properties to pyhmmer.plan7.Profile objects.

Changed

  • Make OptimizedProfile.ssv_filter check the alphabet of the given sequence.

v0.4.0 - 2021-06-05 - YANKED

Added

  • Linear algebra primitives to expose 1D (Vector) and 2D (Matrix) contiguous buffers containing numerical values to pyhmmer.easel.
  • Documentation for the Z and domZ parameters of the pyhmmer.plan7.Pipeline constructor.
  • pyhmmer.errors.AlphabetMismatch exception deriving from ValueError to specifically report mismatching Easel alphabets where applicable.
  • scale and normalize methods to pyhmmer.plan7.HMM objects.
  • Property to access pyhmmer.plan7.Background residue frequencies as a VectorF object.
  • Property to access pyhmmer.plan7.HMM mean residue composition as a VectorF object.
  • Property to access pyhmmer.plan7.HMM probabilities and emissions as MatrixF objects.
  • ssv_filter methods to pyhmmer.plan7.OptimizedProfile to get the SSV filter score of the profile for a given sequence.
  • Several additional properties to access the pyhmmer.plan7.OptimizedProfile internals.

Removed

  • Unused report_e parameter of pyhmmer.plan7.Pipeline constructor.
  • pyhmmer.plan7.TopHits.clear method which could lead to segfault if it was called while a Hit is being held.

Changed

  • Multithreaded loop in pyhmmer.hmmer to reduce memory consumption while still yielding hits in order.
  • pyhmmer.easel.DigitalSequence.sequence property is now a VectorU8.

Fixed

  • Type annotations in pyhmmer.hmmer.
  • Potential double free in pyhmmer.plan7.HMM.command_line property setter.
  • Minor floating-point precision issues in pyhmmer.plan7.Builder constructor.
  • Segfault in TextMSA.digitize caused by esl_msa_Copy not digitizing on-the-fly like esl_sq_Copy.
  • Exceptions not being raised in some methods of pyhmmer.plan7.Profile and pyhmmer.plan7.TopHits.

v0.3.1 - 2021-05-08

Added

  • Pipeline.scan_seq method to query a database of profiles with one or more sequences.
  • transition_probabilities, match_emissions, insert_emissions properties to the HMM class, providing access to the numerical parameters of the HMM.
  • consensus_structure and consensus_accessibility properties to the HMM class to get consensus lines from the source alignment if the HMM was created from a MSA.
  • nseq and nseq_effective properties to the HMM class to get the number of training sequences and effective sequences used to build the HMM.

Changed

  • HMM.checksum is now None if the p7H_CHKSUM flag is not set.
  • Builder methods will now record sys.argv when creating a HMM.

Fixed

  • HMM.write(..., binary=False) crashing on HMMs without a consensus line. (#5). Fixed upstream in (EddyRivasLab/HMMER#236).
  • Pipeline.reset mishandling the Z and domZ values if those were detected from the number of targets.
  • pyhmmer.hmmer functions will not block until all results have been collected anymore when run in multithreaded mode.

v0.3.0 - 2021-03-11

Added

  • easel.MSAFile to read from a file containing
  • accession, author, name and description properties to easel.MSA objects.
  • plan7.Builder.build_msa to build a pHMM from a sequence alignment.
  • Additional methods to easel.KeyHash, allowing to use it as a dict/set hybrid.
  • Sequence.write and MSA.write methods to format a sequence or an alignment to a file handle.
  • plan7.TopHits.to_msa method to convert all the top hits of a query against a database into a multiple sequence alignment.
  • easel.MSA.sequences attribute to access individual sequences of an alignment using the collections.abc.Sequence interface.
  • easel.DigitalMSA.textize method to convert a multiple sequence alignment in digital mode to its text-mode counterpart.
  • Read-only name, accession and description properties to plan7.Profile showing attributes inherited from the HMM it was configured with.
  • plan7.HMM.consensus property, allowing to access the consensus sequence of a pHMM.
  • plan7.HMM equality implementation, using zero tolerance.
  • plan7.Pipeline.search_msa to query a MSA against a sequence database.
  • easel.Sequence.reverse_complement method allowing to reverse-complement inplace or to build a copy.
  • errors.AlphabetMismatch exception for use in cases where an alphabet is expected but not matched by the input.
  • hmmer.nhmmer function with the same behaviour as hmmer.phmmer, except it expects inputs with a DNA alphabet.

Fixed

  • plan7.Builder.copy not copying some parameters correctly, causing pyhmmer.hmmer.phmmer to give inconsistent results in multithreaded mode.
  • easel.Bitfield not properly handling index overflows.
  • Documentation not rendering for the __init__ method of all classes.

Changed

  • plan7.Builder gap-open and gap-extend probabilities are now set on instantiation and depend on the alphabet type.
  • Constructors for easel.TextMSA and easel.DigitalMSA, which can now be given an iterable of easel.Sequence objects to store in the alignment.

Removed

  • Unimplemented easel.SequenceFile.fetch and easel.SequenceFile.fetchinto methods.

v0.2.2 - 2021-03-04

Fixed

  • Linking issues on OSX caused by aggressive stripping of intermediate libraries.
  • plan7.Builder RNG not reseeding between different HMMs.

v0.2.1 - 2021-01-29

Added

  • pyhmmer.plan7.HMM.checksum property to get the 32-bit checksum of an HMM.

v0.2.0 - 2021-01-21

Added

  • pyhmmer.plan7.Builder class to handle building a HMM from a sequence.
  • Pipeline.search_seq to query a sequence against a sequence database.
  • psutil dependency to detect the most efficient thread count for hmmsearch based on the number of physical CPUs.
  • pyhmmer.hmmer.phmmer function to run a search of query sequences against a sequence database.

Changed

  • Pipeline.search was renamed to Pipeline.search_hmm for disambiguation.
  • libeasel.random sequences do not require the GIL anymore.
  • Public API now have proper signature annotations.

Fixed

  • Inaccurate exception messages in Pipeline.search_hmm.
  • Unneeded RNG reallocation, replaced with re-initialisation where possible.
  • SequenceFile.__next__ not working after being set in digital mode.
  • sequences argument of hmmsearch now only requires a typing.Collection[DigitalSequence] instead of a typing.Collection[Sequence] (not more __getitem__ needed).

Removed

  • hits argument to Pipeline.search_hmm to reduce risk of issues with TopHits reuse.
  • Broken alignment coordinates on Domain classes.

v0.1.4 - 2021-01-15

Added

  • DigitalSequence.textize to convert a digital sequence to a text sequence.
  • DigitalSequence.__init__ method allowing to create a digital sequence from any object implementing the buffer protocol.
  • Alignment.hmm_accession property to retrieve the accession of the HMM in an alignment.

v0.1.3 - 2021-01-08

Fixed

  • Compilation issues in OSX-specific Cython code.

v0.1.2 - 2021-01-07

Fixed

  • Required Cython files not being included in source distribution.

v0.1.1 - 2020-12-02

Fixed

  • HMMFile calling file.peek without arguments, causing it to crash when passed some types, e.g. gzip.GzipFile.
  • HMMFile failing to work with PyPy file objects because of a bug with their implementation of readinto.
  • C/Python file object implementation using strcpy instead of memcpy, causing issues when null bytes were read.

v0.1.0 - 2020-12-01

Initial beta release.

Fixed

  • TextSequence uses the sequence argument it's given on instantiation.
  • Segmentation fault in Sequence.__eq__ caused by implicit type conversion.
  • Segmentation fault on SequenceFile.read failure.
  • Missing type annotations for the pyhmmer.easel module.

v0.1.0-a5 - 2020-11-28

Added

  • Sequence.__len__ magic method so that len(seq) returns the number of letters in seq.
  • Python file-handle support when opening an pyhmmer.plan7.HMMFile.
  • Context manager protocol to pyhmmer.easel.SSIWriter.
  • Type annotations for pyhmmer.easel.SSIWriter.
  • add_alias to pyhmmer.easel.SSIWriter.
  • write method to pyhmmer.plan7.OptimizedProfile to write an optimized profile in binary format.
  • offsets property to interact with the disk offsets of a pyhmmer.plan7.OptimizedProfile instance.
  • pyhmmer.hmmer.hmmpress emulating the hmmpress binary from HMMER.
  • M property to pyhmmer.plan7.HMM exposing the number of nodes in the model.

Changed

  • Bumped vendored Easel to v0.48.
  • Bumped vendored HMMER to v3.3.2.
  • pyhmmer.plan7.HMMFile will raise an EOFError when given an empty file.
  • Renamed length property to L in pyhmmer.plan7.Background.

Fixed

  • Segmentation fault when close method of pyhmmer.easel.SSIWriter was called more than once.
  • close method of pyhmmer.easel.SSIWriter not writing the index contents.

v0.1.0-a4 - 2020-11-24

Added

  • MSA, TextMSA and DigitalMSA classes representing a multiple sequence alignment to pyhmmer.easel.
  • Methods and protocol to copy a Sequence and a MSA.
  • pyhmmer.plan7.OptimizedProfile wrapping a platform-specific optimized profile.
  • SSIReader and SSIWriter classes interacting with sequence/subsequence indices to pyhmmer.easel.
  • Exception handler using Python exceptions to report Easel errors.

Changed

  • pyhmmer.hmmsearch returns an iterator of TopHits, with one instance per HMM in the input.
  • pyhmmer.hmmsearch properly raises errors happenning in the background threads without deadlock.
  • pyhmmer.plan7.Pipeline recycles memory between Pipeline.search calls.

Fixed

  • Missing type annotations for the pyhmmer.errors module.

Removed

  • Unneeded or private methods from pyhmmer.plan7.

v0.1.0-a3 - 2020-11-19

Added

  • TextSequence and DigitalSequence representing a Sequence in a given mode.
  • E-value properties to Hit and Domain.
  • TopHits now stores a reference to the pipeline it was obtained from.
  • Pipeline.Z and Pipeline.domZ properties.
  • Experimental pickling support to Alphabet.
  • Experimental freelist to Sequence class to avoid allocation bottlenecks when iterating on a SequenceFile without recycling sequence buffers.

Changed

  • Made Sequence an abstract base class.
  • Additional Pipeline parameters can be passed as keyword arguments to pyhmmer.hmmsearch.
  • SequenceFile.read can now be configured to skip reading the metadata or the content of a sequence.

Removed

  • Redundant SequenceFile methods.

Fixed

  • doctest loader crashing on Python 3.5.
  • TopHits.threshold segfaulting when being called without prior Tophits.sort call
  • Unknown format argument to SequenceFile constructor not raising the right error.

v0.1.0-a2 - 2020-11-12

Added

  • Support for compilation on PowerPC big-endian platforms.
  • Type annotations and stub files for Cython modules.

Changed

  • distutils is now used to compile the package, instead of calling autotools and letting HMMER configure itself.
  • Bitfield.count now allows passing an argument (for compatibility with collections.abc.Sequence).

v0.1.0-a1 - 2020-11-10

Initial alpha release (test deployment to PyPI).