Skip to content

Commit

Permalink
Text edits for clarity
Browse files Browse the repository at this point in the history
  • Loading branch information
clzirbel authored Oct 15, 2024
1 parent 926bb57 commit 5b89d9c
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions help.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ nav_order: 2
</details>

## Introduction
R3DMCS (pronounced like /redmax/) is a web service which maps an input set of RNA nucleotides in one 3D structure from the Protein Data Bank to an output page showing corresponding instances of the motif across 3D structures of the same molecule from the same organism, or across 3D structures of the same molecule from other organisms but in the same Rfam family. Target structures can be filtered by resolution, redundancy, and experimental method. The output page lists the corresponding instances in a table as a multiple alignment, with each row listing nearby chains in the structure, basepairs and other pairwise interactions, and structure resolution. The output page shows an all-against-all comparison of the instances of the motif in the form of a heatmap; instances are ordered to put geometrically similar instances near each other. Finally, the output page has a coordinate viewer to visualize one instance at a time, or to superimpose instances for easier comparison.
R3DMCS (pronounced like /redmax/) is a web service which maps an input set of RNA nucleotides in one 3D structure from the Protein Data Bank to an output page showing corresponding instances of the motif across 3D structures of the same molecule from the same organism, or across 3D structures of the same molecule from other organisms in the same Rfam family. Target structures can be filtered by resolution, redundancy, and experimental method. The output page lists the corresponding instances in a table as a multiple alignment, with each row listing nearby chains in the structure, basepairs and other pairwise interactions, and structure resolution. The output page shows an all-against-all comparison of the instances of the motif in the form of a heatmap; instances are ordered to put geometrically similar instances near each other. Finally, the output page has a coordinate viewer to visualize one instance at a time, or to superimpose instances for easier comparison.

## Input page {#input_page}
This section provides an overview on how to fill up the input page.
Expand All @@ -25,7 +25,7 @@ First specify the nucleotides in one 3D structure that you would like to work wi
For the most part, each residue in a 3D structure file can be identified by the PDB identifier (like 5J7L), the author-assigned chain identifier (like AA), and residue numbers, also called nucleotide numbers (like 1405 and 1496). Note that the chain identifier is case sensitive, but the PDB identifier is not.

##### Individual residues
To retrieve the specific nucleotides mentioned above, one would type 1405,1496 in the Selection box, 5J7L in the PDB ID box, and AA in the Chain ID box. The general format for entering individual nucleotide numbers is “**number1,number2,number3**" and repeat for as many individual positions as are needed. Individual residue numbers are separated by commas. These nucleotides are one of the closing basepairs in the decoding loop in the SSU of E. coli. [Link to example of individual residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405,1496&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True). Note that in that example, some 3D structures model C1496 in syn, creating a separate cluster in the heat map.
To retrieve the specific nucleotides mentioned above, one would type 1405,1496 in the Selection box, 5J7L in the PDB ID box, and AA in the Chain ID box. The general format for entering individual nucleotide numbers is “**number1,number2,number3**" and repeat for as many individual positions as are needed. Individual residue numbers are separated by commas. Nucleotides 1405 and 1496 are one of the closing basepairs in the decoding loop in the SSU of E. coli. [Link to example of individual residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405,1496&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True). Note that in that example, some 3D structures model C1496 in syn, creating a separate cluster in the heat map.

##### Single range of residues
A range of residue numbers can be provided, separating the lower and upper number with a colon character. The format for entering a single range of nucleotide numbers is “**start_position:end_position**”. For example, to specify the lower-numbered strand of the decoding loop in the E. coli SSU from 5J7L chain AA, one would type 1405:1409 in the Selection box. [Link to example of range of residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405:1409&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True).
Expand All @@ -37,9 +37,9 @@ This option consists of entering multiple single ranges of nucleotide numbers se
Each week, the BGSU data processing pipeline extracts hairpin, internal, and junction loops from RNA-containing 3D structure files using the FR3D software. Once the loops are extracted, we label them with unique and stable identifiers. These “loop ids" contain the following three fields, separated by underscores:
- Field 1: Loop type prefix: “HL” for hairpin loops, “IL” for internal loops, “J3” for three-way junctions
- Field 2: PDB ID
- Field 3: A sequentially assigned, three digit character
- Field 3: A sequentially assigned, three digit number

Users can view the loop_ids for a particular RNA structure by exploring the RNA Structure Atlas pages. See, for example, the [page for 5TBW](http://rna.bgsu.edu/rna3dhub/pdb/5TBW), hairpin loop [HL_5TBW_007](http://rna.bgsu.edu/rna3dhub/loops/view/HL_5TBW_007), internal loop [IL_5TBW_019](http://rna.bgsu.edu/rna3dhub/loops/view/IL_5TBW_019), and 3-way junction loop [J3_5TBW_003](http://rna.bgsu.edu/rna3dhub/loops/view/J3_5TBW_003). Loop ids are also used in the [RNA 3D Motif Atlas](http://rna.bgsu.edu/rna3dhub/motifs). For example, motif group [IL_29549.7](http://rna.bgsu.edu/rna3dhub/motif/view/IL_29549.7) contains 35 instances of the kink-turn internal loop motif. Each of the 35 loop ids there could be used as the starting point to explore variation in the kink turn geometry across different experimental structures.
Users can view the loop_ids for a particular RNA structure by exploring the RNA Structure Atlas pages. See, for example, the [loop page for 5TBW](http://rna.bgsu.edu/rna3dhub/pdb/5TBW/motifs), hairpin loop [HL_5TBW_007](http://rna.bgsu.edu/rna3dhub/loops/view/HL_5TBW_007), internal loop [IL_5TBW_019](http://rna.bgsu.edu/rna3dhub/loops/view/IL_5TBW_019), and 3-way junction loop [J3_5TBW_003](http://rna.bgsu.edu/rna3dhub/loops/view/J3_5TBW_003). Loop ids are also used in the [RNA 3D Motif Atlas](http://rna.bgsu.edu/rna3dhub/motifs). For example, motif group [IL_29549.7](http://rna.bgsu.edu/rna3dhub/motif/view/IL_29549.7) contains 35 instances of the kink-turn internal loop motif. Each of the 35 loop ids there could be used as the starting point to explore variation in the kink turn geometry across different experimental structures.

To specify a R3DMCS query using a loop id, type the loop id in the Selection box, and leave the PDB id and Chain id boxes empty. [Link to example of using loop id to specify a query](http://rna.bgsu.edu/correspondence/comparison?selection=J3_5TBW_003&exp_method=all&resolution=3.0&scope=EC&input_form=True).

Expand Down Expand Up @@ -246,14 +246,14 @@ We have worked hard to identify and eliminate situations in which the page does
### Nucleotides must be from the same chain
The nucleotides in a query must be from the same chain. For example, some internal loops in the eukaryotic ribosomal large subunit (LSU) have one strand in the 5.8S rRNA and one in the long LSU chain, see for example the large internal loop [IL_8GLP_021](http://rna.bgsu.edu/rna3dhub/loops/view/IL_8GLP_021) from Homo sapiens. In principle, it would be possible to retrieve aligned nucleotides across multiple chains, but in practice there are many edge cases that are difficult to cover.

### Alignments across Rfam families are not available
### Alignments across different Rfam families are not available
With the ribosomal small subunit (SSU) and large subunit (LSU), Rfam provides separate families for archaea, bacteria, and eukarya. R3DMCS can retrieve and compare motifs within each family, but at the moment does not provide alignments across those different domains. We will remove this limitation when we can provide sufficiently accurate cross-domain alignments.

### Poor alignment quality in some regions of some Rfam alignments
The alignments produced across PDB chains in an Rfam family are sometimes inaccurate, especially in regions where the secondary structure is variable between organisms. This will generally show itself with two or more clearly separated clusters in the heat map, and visual inspection will show that the sets of nucleotides in the two sets bear no resemblance to each other. tRNA alignments are particularly susceptible to this problem, partly because Rfam has a single family for all tRNAs from all domains, and they don't all align perfectly well. Alignment in variable regions is difficult, and perhaps not meaningful because different species simply have different 3D structures. R3DMCS can make it clear that the 2D or 3D structures differ enough in that region to require further study.

### Long computation time on large comparisons
R3DMCS can retrieve hundreds of instances, but the all-against-all geometric comparison scales as the square of the number of instances, and so that can take a few minutes in some cases. The amount of time it took to create the output is shown on the bottom of the output page. It is a good idea to start with a low resolution threshold or with a low equivalence class depth at first. For reference, R3DMCS produced results for a 92-nucleotide query with 149 matching instances in 87 seconds. That said, it is primarily designed for motifs up to about 30 nucleotides.
R3DMCS can retrieve hundreds of instances, but the all-against-all geometric comparison scales as the square of the number of instances, which can take a few minutes in some cases. The amount of time it took to create the output is shown on the bottom of the output page. It is a good idea to start with a strict resolution threshold or with a low equivalence class depth at first. For reference, R3DMCS produced results for a 92-nucleotide query with 149 matching instances in 87 seconds. R3DMCS is primarily designed for motifs up to about 30 nucleotides.

### No discrepancy calculated when an instance is missing atoms
Some 3D structures have nucleotides with missing atoms, for example, missing base atoms. As of April 2024, no discrepancy is calculated with those instances, but the nucleotides are shown in the table and in the heat map.
Expand Down

0 comments on commit 5b89d9c

Please sign in to comment.