BasicBWT API

This page contains part of the pydoc output generated for the BasicBWT class.

CLASSES
__builtin__.object
    BasicBWT

class BasicBWT(__builtin__.object)
 |  This class is the root class for ANY msbwt created by this code regardless of it being compressed or no.
 |  Shared Functions:
 |  __init__
 |  constructIndexing
 |  countOccurrencesOfSeq
 |  findIndicesOfStr
 |  getSequenceDollarID
 |  recoverString
 |  
 |  Override functions:
 |  loadMsbwt
 |  constructTotalCounts
 |  constructFMIndex
 |  getCharAtIndex
 |  getOccurrenceOfCharAtIndex
 |  getBWTRange
 |  getFullFMAtIndex
 |  iterInit
 |  iterNext
 |  iterNext_cython
 |  
 |  Methods defined here:
 |  
 |  __init__(...)
 |      Constructor
 |      Nothing special, use this for all at the start
 |  
 |  countOccurrencesOfSeq(...)
 |      This function counts the number of occurrences of the given sequence
 |      @param seq - the sequence to search for
 |      @param givenRange - the range to start from (if a partial search has already been run), default=whole range
 |      @return - an integer count of the number of times seq occurred in this BWT
 |  
 |  countPileup(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array. Automatically includes reverse complement.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts
 |  
 |  countSeqMatches(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts
 |  
 |  countStrandedSeqMatches(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts, and the other choice also
 |  
 |  countStrandedSeqMatchesNoOther(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts
 |  
 |  findIndicesOfRegex(...)
 |      This function will search for a string and find the location of that string OR the last index less than it. It also
 |      will start its search within a given range instead of the whole structure.  Note that have a small tail string can 
 |      lead to fast exponential blowup of the solution space.
 |      @param seq - the sequence to search for with valid symbols [$, A, C, G, N, T, *, ?]
 |          $, A, C, G, N, T - exact match of specific symbol
 |          * - matches 0 or more of any non-$ symbols (may be different symbols)
 |          ? - matches exactly one of any non-$ symbol
 |      @param givenRange - the range to search for, whole range by default
 |      @return - a python list of ranges representing the start and end of the sequence in the bwt
 |  
 |  findIndicesOfStr(...)
 |      This function will search for a string and find the location of that string OR the last index less than it. It also
 |      will start its search within a given range instead of the whole structure
 |      @param seq - the sequence to search for
 |      @param givenRange - the range to search for, whole range by default
 |      @return - a python range representing the start and end of the sequence in the bwt
 |  
 |  findKTOtherStranded(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @param isStranded - if True, it ONLY counts the forward strand (aka, exactly matches "seq")
 |                          if False, it counts forward strand and reverse-complement strand and adds them together
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts
 |  
 |  findKmerThreshold(...)
 |      This function takes an input sequence "seq" and counts the number of occurrences of all k-mers of size
 |      "kmerSize" in that sequence and return it in an array.
 |      @param seq - the seq to scan
 |      @param kmerSize - the size of the k-mer to count
 |      @param isStranded - if True, it ONLY counts the forward strand (aka, exactly matches "seq")
 |                          if False, it counts forward strand and reverse-complement strand and adds them together
 |      @return - a numpy array of size (len(seq)-kmerSize+1) containing the counts
 |  
 |  findKmerWithError(...)
 |      This function takes a k-mer input and finds all k-mers with an edit distance of 1 that occur at least
 |      "minThresh" times in the dataset.  Indels at the beginning/end of the 'seq' are not considered.
 |      @param seq - the k-mer sequence we want to match
 |      @param minThresh - the minimum number of times any in-exact matching k-mers must occur to be returned
 |      @return - a list of ranges AND the change made to the k-mer to get that range stored in tuples
 |          tuple: (start range, end range, k-mer)
 |          start range - the start index in the bwt
 |          end range - the end index in the bwt
 |          k-mer - the k-mer associated with this range
 |  
 |  findKmerWithErrors(...)
 |      This function takes a k-mer input and finds all k-mers with an edit distance of 1 that occur at least
 |      "minThresh" times in the dataset.  Indels at the beginning/end of the 'seq' are not considered.
 |      @param seq - the k-mer sequence we want to match
 |      @param editDistance - the maximum edit distance to match
 |      @param minThresh - the minimum number of times any in-exact matching k-mers must occur to be returned
 |      @return - a list of ranges AND the change made to the k-mer to get that range stored in tuples
 |          tuple: (start range, end range, k-mer)
 |          start range - the start index in the bwt
 |          end range - the end index in the bwt
 |          k-mer - the k-mer associated with this range
 |  
 |  findPatternWithError(...)
 |      This function will search the BWT for strings which match the given sequence allowing for one error.
 |      In this function, "seq" must be close to the length of the read or else the ends of the reads will be counted
 |      as long insertions leading to no matches in the data.
 |      @param seq - the sequence to search for with valid symbols [A, C, G, N, T]
 |      @param bonusStr - in the case of a deletion in the search, this is an extra character that must match at the front
 |                        of seq, aka it must match (bonusStr+seq) with one symbol deleted
 |      @return - a python list of ranges representing the start and end of the sequence in the bwt, these ranges will be
 |                in the '$' indices, so they will correspond to a specific read
 |                NOTE: these results may overlap, user expected to check for overlaps if important
 |  
 |  findReadsMatchingSeq(...)
 |      REQUIRES LCP 
 |      This function takes a sequence and finds all strings of length "stringLen" which exactly match the sequence
 |      @param seq - the sequence we want to match, assumed to be buffered on both ends with 'N' symbols
 |      @param strLen - the length of the strings we are trying to extract
 |      @return - a list of dollar IDs corresponding to strings that exactly match the seq somewhere
 |  
 |  findStrWithError(...)
 |      This function will search the BWT for strings which match the given sequence allowing for one error.
 |      In this function, "seq" must be close to the length of the read or else the ends of the reads will be counted
 |      as long insertions leading to no matches in the data.
 |      @param seq - the sequence to search for with valid symbols [A, C, G, N, T], NOTE: we assume the string is implicity
 |                   flanked by '$' so do NOT pass the '$' in the string or no result will return
 |      @param bonusStr - in the case of a deletion in the search, this is an extra character that must match at the front
 |                        of seq, aka it must match (bonusStr+seq) with one symbol deleted
 |      @return - a python list of ranges representing the start and end of the sequence in the bwt, these ranges will be
 |                in the '$' indices, so they will correspond to a specific read
 |                NOTE: these results may overlap, user expected to check for overlaps if important
 |  
 |  getBinBits(...)
 |      @return - the number of bits in a bin
 |  
 |  getCharAtIndex(...)
 |      dummy function, shouldn't be called
 |  
 |  getOccurrenceOfCharAtIndex(...)
 |      dummy function, shouldn't be called
 |  
 |  getSequenceDollarID(...)
 |      This will take a given index and work backwards until it encounters a '$' indicating which dollar ID is
 |      associated with this read
 |      @param strIndex - the index of the character to start with
 |      @return - an integer indicating the dollar ID of the string the given character belongs to
 |  
 |  getSymbolCount(...)
 |      @param symbol - this is an integer from [0, 6)
 |      @return - the total count for the passed in symbol
 |  
 |  getTotalSize(...)
 |      @return - the total number of symbols in the BWT
 |  
 |  iterInit(...)
 |      this function must be called to reset the iterator to the beginning, used for both normal and
 |      compressed data structures since it's so simple
 |  
 |  iterNext(...)
 |  
 |  recoverString(...)
 |      This will return the string that starts at the given index
 |      @param strIndex - the index of the string we want to recover
 |      @return - string that we found starting at the specified '$' index
 |

References

Holt, J., & McMillan, L. (2014). Merging of multi-string BWTs with applications. Bioinformatics, btu584.

Holt, J., & McMillan, L. (2014, September). Constructing burrows-wheeler transforms of large string collections via merging. In Proceedings of the 5th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (pp. 464-471). ACM.

Contact Us

James Holt - [email protected]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BasicBWT API

References

Contact Us

Clone this wiki locally