RNA Secondary Structure Predictor

App

FLASK (faster) RNA Secondary Structure App

Interactive tool for the prediction and visualization of RNA secondary structures, using dynamic programming. It predicts base pairing and generates a visual representation of the structure based on the input RNA sequence. The algorithm I implemented in this code is related to the Nussinov’s algorithm.

Objective

Optimal secondary structure prediction of RNA sequence by maximizing the number of base pairs under specific constraints.

Definitions

RNA Sequence:
- A string of nucleotides: adenine (A), uracil (U), guanine (G), and cytosine (C).
- Valid base-pair interactions:
  - Canonical: A-U, U-A, G-C, C-G
  - Wobble: G-U, U-G
Dot-Bracket Notation:
- The predicted structure is represented as a string of (, ), and .:
  - .: Unpaired base.
  - (, ): Paired bases, indicating a bond.
Dynamic Programming Approach:
- DP table dp[i][j] where:
  - i and j represent indices of the RNA sequence.
  - dp[i][j] stores the maximum number of base pairs in the subsequence from i to j.

Steps

DP Table initialization:
- A 2D matrix dp of size (n \times n) (where (n) is the sequence length) is initialized to 0.
- A separate traceback matrix is used to store the decisions for reconstructing the structure later.
DP Table iterative filling:
- Start with short subsequences and expand to larger ones.
- For each pair of indices (i, j) (where (j > i)):
  - Case 1: Unpaired: The nucleotide at position j is left unpaired: [ dp[i][j] = dp[i][j-1] ]
  - Case 2: Paired: If sequence[i] can pair with sequence[j]:
    - Add 1 (for the new pair) to the solution of the inner subsequence ((i+1, j-1)): [ dp[i][j] = \max(dp[i][j], 1 + dp[i+1][j-1]) ]
  - Case 3: Bifurcation: Split the subsequence into two parts:
    - Combine solutions from two non-overlapping subsequences ((i, k)) and ((k+1, j)): [ dp[i][j] = \max(dp[i][j], dp[i][k] + dp[k+1][j]) \quad \text{for } k \in (i + \text{MIN_LOOP}, j - \text{MIN_LOOP}) ]
Constraints:
- Min-loop-size: Ensures there are at least MIN_LOOP unpaired bases between paired bases (i) and (j).
- Valid base pairing: Ensures following of biological rules.
Traceback:
- For reconstructing the optimal solution.
- Starting from the full sequence (0, n-1), decisions stored in traceback[i][j] determine:
  - Whether i and j are paired.
  - If the subsequence bifurcates into two smaller problems.
Output:
- Dot-bracket notation for the secondary structure.
- A list of base pairs (i.e., indices of paired bases).

Mathematical Recurrence

The algorithm can be described using the recurrence relation:

dp[i][j] = max {

dp[i][j-1] (Unpaired)
dp[i+1][j-1] + 1 (Paired, if can_pair(sequence[i], sequence[j]))
max_{k=i+MIN_LOOP}^j-MIN_LOOP(dp[i][k] + dp[k+1][j]) (Bifurcation) }

Time Complexity

Table Filling:
- The outer loop iterates over subseq. lengths (L) from MIN_LOOP + 2 to (n), and inner loops iterate over (i) and (j).
- For each cell (i, j), there is an additional loop over possible bifurcation points (k).
- Complexity: (O(n^3)).
Traceback:
- Linear in the number of base pairs.
- Complexity: (O(n)).

Overall time complexity is (O(n^3)).

Biological Significance

Simplified model for RNA secondary structure prediction:

It maximizes base pairing without considering thermodynamic stability or pseudoknots.
Extensions like Zuker’s algorithm can incorporate free-energy minimization for more accurate predictions.

Example

For the sequence AUGCGUA:

dp[i][j] will compute the maximum number of pairs.
Traceback reconstructs the structure:
- Dot-bracket: .(())..
- Base pairs: [(1, 6), (2, 5)]

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.devcontainer		.devcontainer
FLASK_APP_faster		FLASK_APP_faster
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RNA Secondary Structure Predictor

App

Objective

Definitions

Steps

Mathematical Recurrence

Time Complexity

Biological Significance

Example

About

Releases

Packages

Languages

kris96tian/RNA_App

Folders and files

Latest commit

History

Repository files navigation

RNA Secondary Structure Predictor

App

Objective

Definitions

Steps

Mathematical Recurrence

Time Complexity

Biological Significance

Example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages