LDO project Repo

The python scripts contained in this repo were used to calculate the branch length significant differences between paralogs and to identify asymmetric evolution. This method was applied to all gene families across the Tree of Life in the PANTHER database. There is also code used to analyse the gene structures and gene expression profiles. Each notebook contains a first explanatory markdown cell and comments in the code to help users replicate the analysis.

step1_expected_branches

Use to compute the expected branch lengths, using a simple evolutionary model. This gives us a method to identify the unexpectedly long branches, so that we can test the hypothesis of the Least Diverged Orthologue (LDO).

The imput data was downloaded from the Panther database:

data/panther-18.0/trees/ -> contains all trees from panther

download: wget http://data.pantherdb.org/ftp/panther_library/18.0/PANTHER18.0_hmmscoring.tgz
extract only the tree files: tar -zxvf PANTHER18.0_hmmscoring.tgz target/famlib/rel/PANTHER18.0_altVersion/hmmscoring/PANTHER18.0/books/PTHR*/tree.tree
rename to trees/PTHR*.tree

data/panther-18.0/species_tree.nhx -> species tree

downloaded using panther api with: scripts/panther_species_tree.py

step2_expectedness_of_duplications

Use to filter the branches to only duplication events. This is performed by computing the difference from the expected branch length and then translating this into z-scores. Branches can then be classified into six categories (p<0.05):

i) normal-normal: both branches are not significantly different

ii) short-short: both branches are significantly shorter than expected

iii) long-long: both branches are significantly longer than expected

iv) normal-long: only one branch is significantly longer than expected

v) short-normal: only one branch is significantly shorter than expected

vi) short-long: one branch is significantly shorter than expected and the other is significantly longer than expected.

Also generates Fig 1, Supp Fig 1, Supp Fig 2

Additionally, identifies the genes for the outgroup test.

step3_structure_data

Use to download the structure data and compute the structural alignment.

step4_loading_expression_data-rna_seq

This document contains the code that downloads and reformats the available expression data from the bgee database (https://www.bgee.org/) and reformat all the expression data used.

The plant expression data was downloaded from: https://expression.plant.tools/

step5_genome_mapping

Generate genome mapping tables for each of the species. This is necessary as PANTHER genomes are imported from UniProt RPs, whereas bgee is using ensembl data directly.

step6_pairwise

This notebook contains the code to run the inparalogue pairwise Pearson's correlation and tissue specificity ($\tau$) tests.

step7_tissue_specificity

This notebook contains the code to compute tissue specicity scores ($\tau$). To do this the TPM data is first transformed using the arcsinh function -- $\textrm{arcsinh}(x) := \ln (x + \sqrt{x^2 + 1})$ -- before taking the mean value of any replicates.

step8_outgroup

This notebook identifies relevant species to use for each branch in the species tree as outgroup species.

After, LDO / MDO are compared to the outgroup gene with both a PCC and tau analysis.

step9_plots

This notebook contains all the code used to analyze the data and generate the plots presented in the paper.

lib/

This folder contains the modules used to parse the panther trees in step_1.

general_scripts/

Scripts used to analyse the data and download specific datasets.

structure_scripts/

Scripts used in step_3.

A note on software and system requirements

The scripts used for this project have only been tested on Ubuntu 24.04 environment.

At various steps several external tools are called by these scripts and notebooks.

Installation instuctions and dependencies for the software used in this project can be found at the following locations:

AlphaFold: [https://alphafold.ebi.ac.uk/] (v4)

foldseek: [https://github.com/steineggerlab/foldseek] (v8.ef4e960)

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
general_scripts		general_scripts
lib		lib
structure_scripts		structure_scripts
LICENSE		LICENSE
README.md		README.md
step1_expected_branches.ipynb		step1_expected_branches.ipynb
step2_expectedness_of_duplications.ipynb		step2_expectedness_of_duplications.ipynb
step3_structure_data.ipynb		step3_structure_data.ipynb
step4_loading_expression_data-rna_seq.ipynb		step4_loading_expression_data-rna_seq.ipynb
step5_genome_mapping.ipynb		step5_genome_mapping.ipynb
step6_issue_specificity.ipynb		step6_issue_specificity.ipynb
step6_pairwise.ipynb		step6_pairwise.ipynb
step7_pairwise.ipynb		step7_pairwise.ipynb
step7_tissue_specificity.ipynb		step7_tissue_specificity.ipynb
step8_outgroup.ipynb		step8_outgroup.ipynb
step9_plots.ipynb		step9_plots.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LDO project Repo

step1_expected_branches

data/panther-18.0/trees/ -> contains all trees from panther

data/panther-18.0/species_tree.nhx -> species tree

step2_expectedness_of_duplications

step3_structure_data

step4_loading_expression_data-rna_seq

step5_genome_mapping

step6_pairwise

step7_tissue_specificity

step8_outgroup

step9_plots

lib/

general_scripts/

structure_scripts/

A note on software and system requirements

About

Releases

Packages

Languages

License

DessimozLab/ldo_study

Folders and files

Latest commit

History

Repository files navigation

LDO project Repo

step1_expected_branches

data/panther-18.0/trees/ -> contains all trees from panther

data/panther-18.0/species_tree.nhx -> species tree

step2_expectedness_of_duplications

step3_structure_data

step4_loading_expression_data-rna_seq

step5_genome_mapping

step6_pairwise

step7_tissue_specificity

step8_outgroup

step9_plots

lib/

general_scripts/

structure_scripts/

A note on software and system requirements

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages