From 5b89d9ce54422c0817d1febce0a6546cd9899651 Mon Sep 17 00:00:00 2001 From: "Craig L. Zirbel" Date: Tue, 15 Oct 2024 13:14:15 -0700 Subject: [PATCH] Text edits for clarity --- help.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/help.md b/help.md index 215da2c..dbacf71 100644 --- a/help.md +++ b/help.md @@ -13,7 +13,7 @@ nav_order: 2 ## Introduction -R3DMCS (pronounced like /redmax/) is a web service which maps an input set of RNA nucleotides in one 3D structure from the Protein Data Bank to an output page showing corresponding instances of the motif across 3D structures of the same molecule from the same organism, or across 3D structures of the same molecule from other organisms but in the same Rfam family. Target structures can be filtered by resolution, redundancy, and experimental method. The output page lists the corresponding instances in a table as a multiple alignment, with each row listing nearby chains in the structure, basepairs and other pairwise interactions, and structure resolution. The output page shows an all-against-all comparison of the instances of the motif in the form of a heatmap; instances are ordered to put geometrically similar instances near each other. Finally, the output page has a coordinate viewer to visualize one instance at a time, or to superimpose instances for easier comparison. +R3DMCS (pronounced like /redmax/) is a web service which maps an input set of RNA nucleotides in one 3D structure from the Protein Data Bank to an output page showing corresponding instances of the motif across 3D structures of the same molecule from the same organism, or across 3D structures of the same molecule from other organisms in the same Rfam family. Target structures can be filtered by resolution, redundancy, and experimental method. The output page lists the corresponding instances in a table as a multiple alignment, with each row listing nearby chains in the structure, basepairs and other pairwise interactions, and structure resolution. The output page shows an all-against-all comparison of the instances of the motif in the form of a heatmap; instances are ordered to put geometrically similar instances near each other. Finally, the output page has a coordinate viewer to visualize one instance at a time, or to superimpose instances for easier comparison. ## Input page {#input_page} This section provides an overview on how to fill up the input page. @@ -25,7 +25,7 @@ First specify the nucleotides in one 3D structure that you would like to work wi For the most part, each residue in a 3D structure file can be identified by the PDB identifier (like 5J7L), the author-assigned chain identifier (like AA), and residue numbers, also called nucleotide numbers (like 1405 and 1496). Note that the chain identifier is case sensitive, but the PDB identifier is not. ##### Individual residues -To retrieve the specific nucleotides mentioned above, one would type 1405,1496 in the Selection box, 5J7L in the PDB ID box, and AA in the Chain ID box. The general format for entering individual nucleotide numbers is “**number1,number2,number3**" and repeat for as many individual positions as are needed. Individual residue numbers are separated by commas. These nucleotides are one of the closing basepairs in the decoding loop in the SSU of E. coli. [Link to example of individual residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405,1496&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True). Note that in that example, some 3D structures model C1496 in syn, creating a separate cluster in the heat map. +To retrieve the specific nucleotides mentioned above, one would type 1405,1496 in the Selection box, 5J7L in the PDB ID box, and AA in the Chain ID box. The general format for entering individual nucleotide numbers is “**number1,number2,number3**" and repeat for as many individual positions as are needed. Individual residue numbers are separated by commas. Nucleotides 1405 and 1496 are one of the closing basepairs in the decoding loop in the SSU of E. coli. [Link to example of individual residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405,1496&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True). Note that in that example, some 3D structures model C1496 in syn, creating a separate cluster in the heat map. ##### Single range of residues A range of residue numbers can be provided, separating the lower and upper number with a colon character. The format for entering a single range of nucleotide numbers is “**start_position:end_position**”. For example, to specify the lower-numbered strand of the decoding loop in the E. coli SSU from 5J7L chain AA, one would type 1405:1409 in the Selection box. [Link to example of range of residue numbers](http://rna.bgsu.edu/correspondence/comparison?selection=1405:1409&pdb=5J7L&chain=AA&exp_method=all&resolution=3.0&scope=EC&input_form=True). @@ -37,9 +37,9 @@ This option consists of entering multiple single ranges of nucleotide numbers se Each week, the BGSU data processing pipeline extracts hairpin, internal, and junction loops from RNA-containing 3D structure files using the FR3D software. Once the loops are extracted, we label them with unique and stable identifiers. These “loop ids" contain the following three fields, separated by underscores: - Field 1: Loop type prefix: “HL” for hairpin loops, “IL” for internal loops, “J3” for three-way junctions - Field 2: PDB ID -- Field 3: A sequentially assigned, three digit character +- Field 3: A sequentially assigned, three digit number -Users can view the loop_ids for a particular RNA structure by exploring the RNA Structure Atlas pages. See, for example, the [page for 5TBW](http://rna.bgsu.edu/rna3dhub/pdb/5TBW), hairpin loop [HL_5TBW_007](http://rna.bgsu.edu/rna3dhub/loops/view/HL_5TBW_007), internal loop [IL_5TBW_019](http://rna.bgsu.edu/rna3dhub/loops/view/IL_5TBW_019), and 3-way junction loop [J3_5TBW_003](http://rna.bgsu.edu/rna3dhub/loops/view/J3_5TBW_003). Loop ids are also used in the [RNA 3D Motif Atlas](http://rna.bgsu.edu/rna3dhub/motifs). For example, motif group [IL_29549.7](http://rna.bgsu.edu/rna3dhub/motif/view/IL_29549.7) contains 35 instances of the kink-turn internal loop motif. Each of the 35 loop ids there could be used as the starting point to explore variation in the kink turn geometry across different experimental structures. +Users can view the loop_ids for a particular RNA structure by exploring the RNA Structure Atlas pages. See, for example, the [loop page for 5TBW](http://rna.bgsu.edu/rna3dhub/pdb/5TBW/motifs), hairpin loop [HL_5TBW_007](http://rna.bgsu.edu/rna3dhub/loops/view/HL_5TBW_007), internal loop [IL_5TBW_019](http://rna.bgsu.edu/rna3dhub/loops/view/IL_5TBW_019), and 3-way junction loop [J3_5TBW_003](http://rna.bgsu.edu/rna3dhub/loops/view/J3_5TBW_003). Loop ids are also used in the [RNA 3D Motif Atlas](http://rna.bgsu.edu/rna3dhub/motifs). For example, motif group [IL_29549.7](http://rna.bgsu.edu/rna3dhub/motif/view/IL_29549.7) contains 35 instances of the kink-turn internal loop motif. Each of the 35 loop ids there could be used as the starting point to explore variation in the kink turn geometry across different experimental structures. To specify a R3DMCS query using a loop id, type the loop id in the Selection box, and leave the PDB id and Chain id boxes empty. [Link to example of using loop id to specify a query](http://rna.bgsu.edu/correspondence/comparison?selection=J3_5TBW_003&exp_method=all&resolution=3.0&scope=EC&input_form=True). @@ -246,14 +246,14 @@ We have worked hard to identify and eliminate situations in which the page does ### Nucleotides must be from the same chain The nucleotides in a query must be from the same chain. For example, some internal loops in the eukaryotic ribosomal large subunit (LSU) have one strand in the 5.8S rRNA and one in the long LSU chain, see for example the large internal loop [IL_8GLP_021](http://rna.bgsu.edu/rna3dhub/loops/view/IL_8GLP_021) from Homo sapiens. In principle, it would be possible to retrieve aligned nucleotides across multiple chains, but in practice there are many edge cases that are difficult to cover. -### Alignments across Rfam families are not available +### Alignments across different Rfam families are not available With the ribosomal small subunit (SSU) and large subunit (LSU), Rfam provides separate families for archaea, bacteria, and eukarya. R3DMCS can retrieve and compare motifs within each family, but at the moment does not provide alignments across those different domains. We will remove this limitation when we can provide sufficiently accurate cross-domain alignments. ### Poor alignment quality in some regions of some Rfam alignments The alignments produced across PDB chains in an Rfam family are sometimes inaccurate, especially in regions where the secondary structure is variable between organisms. This will generally show itself with two or more clearly separated clusters in the heat map, and visual inspection will show that the sets of nucleotides in the two sets bear no resemblance to each other. tRNA alignments are particularly susceptible to this problem, partly because Rfam has a single family for all tRNAs from all domains, and they don't all align perfectly well. Alignment in variable regions is difficult, and perhaps not meaningful because different species simply have different 3D structures. R3DMCS can make it clear that the 2D or 3D structures differ enough in that region to require further study. ### Long computation time on large comparisons -R3DMCS can retrieve hundreds of instances, but the all-against-all geometric comparison scales as the square of the number of instances, and so that can take a few minutes in some cases. The amount of time it took to create the output is shown on the bottom of the output page. It is a good idea to start with a low resolution threshold or with a low equivalence class depth at first. For reference, R3DMCS produced results for a 92-nucleotide query with 149 matching instances in 87 seconds. That said, it is primarily designed for motifs up to about 30 nucleotides. +R3DMCS can retrieve hundreds of instances, but the all-against-all geometric comparison scales as the square of the number of instances, which can take a few minutes in some cases. The amount of time it took to create the output is shown on the bottom of the output page. It is a good idea to start with a strict resolution threshold or with a low equivalence class depth at first. For reference, R3DMCS produced results for a 92-nucleotide query with 149 matching instances in 87 seconds. R3DMCS is primarily designed for motifs up to about 30 nucleotides. ### No discrepancy calculated when an instance is missing atoms Some 3D structures have nucleotides with missing atoms, for example, missing base atoms. As of April 2024, no discrepancy is calculated with those instances, but the nucleotides are shown in the table and in the heat map.