A small R program designed to take in Recombination Breakpoint Distribution Plot data output by RDP5 and make attractive figures.
The program takes RDP5 Recombination Breakpoint Distribution Plot data and converts it into more legible graphs, fit for use in journal article figures, using the 99% (or 95%) confidence intervals.
- Clone/Download the code.
- Open up
MainFile.R
in R studio (or similar IDE). - Allow R Studio (recommended) to install required libaries and packages (yellow popup).
- RDP5 Beta version 5.16 and higher is required. Download (older) RDP here: http://web.cbio.uct.ac.za/~darren/rdp.html, or get the hidden, most updated version here: http://web.cbio.uct.ac.za/~darren/mysetup.exe
- Within RDP5 >5.16, create a Breakpoint Distribution Plot and export the data by right-clicking the graph and hitting "Save CSV".
- RDBP_Grapher uses the following data - move it into the root directory of RDBP_Grapher: Breakpoint Distribution Data, ORFCoords and BreakpointPositions.
Some data will require minor modification, here we guide you through the steps:
- Open your Breakpoint Distribution Data (this will be the one without either csvORFCoords or csvBreakpointPositions appended).
- Add a new column in G, titled
Bottom
. - Use the following formula, applied to all cells with contents to the left:
=MIN(B2:F2)
. - Modify both position one, and the final position of the dataset to have 0 values in all but their "Position in alignment" field (Like so, for both the first and last positions: https://i.imgur.com/CN1JOtl.png).
- Save and exit.
- Add a new column in G, titled
- Open your ORFPositions file.
- Look carefully at the Gene Symbol column and purge duplicates. Be careful, as you need to take the smallest Start position and the largest Stop position for each unique Gene Symbol. Keep the formatting (leave gapless).
There are just a few more things we need to modify or tweak, depending on the dataset.
- In the
MainFile.R
, look under the#Importing the data
comment- Modify the name of each of the 3 CSV files, according to your own data. In the example it is using Sarbecovirus or Nobecovirus (Change the green text).
- Double check that you changed breakpointData, geneMap and breakpointDotPos to now read your own data.
- virusName: Output title used both on top of the graph, and underneath on the X axis.
- taxID: TaxID according to NCBI (https://www.ncbi.nlm.nih.gov/taxonomy).
- fontSizeForGeneMap: How big must the text be in the gene map? Remember that when exporting, the text does not scale linearly compared to the preview image.
- fontSizeMultiplier: How much would you like to multiply the font size by
- breakPointLineLength: How long should the breakpoint lines be (minimum of 1)
There are many optional modifiers, but they require a bit more digging through the code. If you want something gone, simply comment it out and re-run. A few notable optional modifiers include:
- The ability to remove titles (and spacing for them)
- Changing the height of the gene
- Individual font size modification
- Changing to 95% upper and lower confidence intervals
- Color changes
- Theme modding
- Press
CTRL + A
(to select all) - Press
CTRL + ENTER
(to run selected code)
- To save the graph, click on Export -> Save as PDF
- If you save as a PNG, the image will likely be jagged Settings:
- PDF Size: 30 x 8 (but will depend on your own use case, so mess around with it)
- Portrait
- Pick a directory
- Pick a File Name
- Save
- Better randomisation to avoid overlapping gene maps (currently, you may need to generate a figure a couple of times to get the output you want where no gene maps overlap).
This project was created as a part of my Masters Thesis in Bioinformatics with the University of Cape Town, specifically to improve the quality of my figures.
Martin DP, Murrell B, Golden M, Khoosal A, & Muhire B (2015) RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution 1: vev003 doi: 10.1093/ve/vev003
Thanks to Darren Martin for his continued support in upkeeping RDP5 and general guidance.
Thanks to Rentia Lourens for her contribution in the layering of geom_polygons.
Thanks to Steyn de Klerk (@staindk) for his contribution in geneMapHeight automation.