Skip to content

Commit

Permalink
Updated HMMs
Browse files Browse the repository at this point in the history
  • Loading branch information
thapasz authored Mar 8, 2022
1 parent 5689deb commit 532e567
Showing 1 changed file with 4 additions and 14 deletions.
18 changes: 4 additions & 14 deletions vocabulary/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,23 +219,13 @@ The number of SCGs will decrease with decreasing resolutions of taxonomy. For in

### Hidden Markov Models (HMMs)

Let’s say, **weather_states = [rainy, sunny]** and
A [Markov model](https://web.stanford.edu/~jurafsky/slp3/A.pdf) allows us to predict/describe a future state, given the knowledge of [current state](https://web.stanford.edu/~jurafsky/slp3/A.pdf) **(observation)** in the sequence. The past state is not important in predicting the future outcome. To summarize, the system state at a given time point [“t+1”](https://reader.elsevier.com/reader/sd/pii/S000437029800023X?token=79509CC161F6A21DD71D5B2C02D3E7A3C6D2AC8EBB6D10B37EC4E063A4F21931F2C8F3204F4EFF5E89610ED5280FAF64&originRegion=us-east-1&originCreation=20220306193035) is dependent upon the state at time point “t”. [Hidden Markov Models](https://en.wikipedia.org/wiki/Hidden_Markov_model) (HMMs) are the Markov models where the **[states are hidden or not directly unobservable](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/pdf/CG-10-402.pdf)**.

population_**mood = [sad, happy**]
HMMs have been widely used in bioinformatics for [sequence analysis](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/pdf/CG-10-402.pdf) - tasks such as database searches, gene prediction, pairwise and multiple sequence alignment etc. Many problems in biological sequence analysis often have similar patterns - availability of initial sequence of symbols (nucleotides, amino acids) and necessity to predict which protein or phylogenetic family it belongs to - aka “sequence-based homology detection”.

A [Markov model](https://web.stanford.edu/~jurafsky/slp3/A.pdf) allows us to predict/describe a future state, given the knowledge of [current state](https://web.stanford.edu/~jurafsky/slp3/A.pdf) **(observation)** in the sequence. The past state is not important in predicting the future outcome. To summarize, the system state at a given time point [“t+1”](https://reader.elsevier.com/reader/sd/pii/S000437029800023X?token=79509CC161F6A21DD71D5B2C02D3E7A3C6D2AC8EBB6D10B37EC4E063A4F21931F2C8F3204F4EFF5E89610ED5280FAF64&originRegion=us-east-1&originCreation=20220306193035) is dependent upon the state at time point “t”. The idea is analogous to predicting tomorrow’s weather based on your knowledge of today’s weather, but yesterday’s weather is not that important. If you want to think in terms of networks, the states represent the nodes and the probabilities between the states -aka “transition probabilities” - would be the edges.
HMMs trained and built on closely related species can exhibit advanced sensitivity towards sequence searches for [remote homology](https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1002195&type=printable). In contrast to BLAST techniques used for sequence alignment, i) HMMs can corresponds to position-specific gap penalties, which leads to better depiction of changes occurring at a [conserved vs variable region](https://reader.elsevier.com/reader/sd/pii/S0022283684711041?token=DB01FA515414FC42BC5DCA4555C53A6D84434346ED3F29CC8A2FDE4DD008FFCAE2A9783792E15F30C072F5CB6BE63457&originRegion=us-east-1&originCreation=20220308161623). Moreover, ii) the overall alignment is an outcome of not just one best-scoring alignment, but consensus over all possible alignments. This assists in [effective prediction](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2447419/pdf/CFG-04-250.pdf) of the true homologs.

**So, what is Hidden Markov Model?**

Simply put, Hidden Markov Model (HMM) is the Markov model where the **[states are hidden or not directly unobservable](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/pdf/CG-10-402.pdf)**. Let’s say, in an unusual scenario in an unusual town, you have access to population mood - but no access to their weather data (hidden). Per HMM, you would utilize the sequence of population mood (sad, happy), to predict the weather sequence (rainy, sunny), without even stepping a foot in that town. This idea (apologies for this level of oversimplification) is widely used in fields such as [speech recognition](https://web.ece.ucsb.edu/Faculty/Rabiner/ece259/Reprints/tutorial%20on%20hmm%20and%20applications.pdf), [bioinformatics](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2766791/pdf/CG-10-402.pdf) and artificial intelligence.

**So, what does it have to do with Biology then?**

The HMMs can be applied for protein structure recognition, gene finding and multiple sequence alignment. These models could range in [terms of priori complexity and biological knowledge](http://www2.cs.uh.edu/~ceick/ML/HMM_in_BI.pdf), but play a solid role in deciphering the unknown from large swaths of sequence data.

**So, what does [Anvi’o](https://pubmed.ncbi.nlm.nih.gov/26500826/) has to do with Hidden Markov Models?**

Running ***“anvi-run-hmms”*** on your contig database allows to estimate single-copy core genes (SCGs) in your contig database. Since SCGs are phylogenetically conserved, they are good candidates to measure the [completeness of genomes](https://pubmed.ncbi.nlm.nih.gov/26500826/). Anvi’o comes with HMMs for Bacterial, Archea as well as Ribosomal RNA genes.
Within Anvi’o environment, running ***“anvi-run-hmms”*** on your contig database allows to estimate single-copy core genes (SCGs) present in the contig database. Since SCGs are phylogenetically conserved, they are good candidates to measure the [completeness of genomes](https://pubmed.ncbi.nlm.nih.gov/26500826/). Anvi’o comes with HMMs for Bacterial, Archea as well as Ribosomal RNA genes.

### Completion
{:data-tags="completion,completeness"}
Expand Down

0 comments on commit 532e567

Please sign in to comment.