Skip to content

Elli-ellgard/SatuTe-example-analyses

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SatuTe Example Analyses

SatuTe (Saturation Test) is a Python-based tool designed to evaluate the presence of phylogenetic information in analyses. Saturation occurs when multiple substitutions obscure true genetic distances, potentially leading to artifacts and errors in phylogenetic inference. SatuTe introduces a new measure that extends the concept of saturation between two sequences to a theory of saturation between subtrees. The test implemented in SatuTe quantifies whether a given alignment provides sufficient phylogenetic information shared between two subtrees connected by a branch in a phylogeny.

Using the output from SatuTe, you can perform various downstream analyses to gain deeper insights into the phylogenetic signal by addressing different questions.

1. Repository Structure

The repository is organized as follows:

  • /example/: Contains small example datasets and the output generated by SatuTe. This folder allows you to follow along with the different types of analyses.

  • /scripts/: Includes all scripts required for performing the various analyses, such as per-category, sliding-window, and per-alignment-region analyses. Each type of analysis has its own subfolder, and the scripts can be run to generate examples within the /example/ folder. Note that installation of SatuTe and IQ-TREE is not necessary to run these examples, as the required outputs are already provided.

  • /tree_of_life/: Contains the data and scripts used to generate the outputs presented in the associated paper. This includes detailed instructions and resources for replicating the findings.

2. Prerequisites

Prerequisites

Before running any scripts, ensure you have:

  • Python 3.10.12 or higher: Check your Python version with:

    python3 --version

Installation

  1. (Optional) Create a Virtual Environment:

    python3 -m venv env
    source env/bin/activate
  2. Install Required Packages:

    pip install -r requirements.txt

You're now ready to run the scripts!

Additional Tools for Tree of Life Analysis

If you are planning to run the Tree of Life analysis, you'll need these additional tools:

  • SatuTe
  • IQ-Tree2

Install SatuTe using pipx

  1. Install pipx: If you don't have pipx installed, you can install it using pip:

    pip install pipx
  2. Ensure pipx is set up correctly:

    pipx ensurepath
  3. Install SatuTe using pipx: Once pipx is installed, you can use it to install SatuTe:

    pipx install satute
  4. Test the installation: After installation, verify that SatuTe is installed correctly by checking its version:

    satute --version

For more detailed instructions and information about pipx, refer to the official pipx documentation.

IQ-Tree2 Installation

  1. Download IQ-Tree from the official website.

  2. Follow the installation instructions provided on the website for your operating system.

  3. Test the IQ-Tree installation: After installing IQ-Tree, verify the installation by checking its version:

    iqtree2 --version

3. Types of Analyses

Using the output from SatuTe, you can perform various downstream analyses:

3.1. Per-Category Analysis

If an evolutionary model with rate heterogeneity is used, each site is assigned to the rate category with the highest posterior probability. For each category $c$, SatuTe applies the test for phylogenetic information on the rescaled phylogenetic tree and the subalignment of the considered category. During this process, SatuTe calculates the variance estimator $\hat{\sigma}^2_{1,c}$ for each category $c$, enabling you to compare the phylogenetic signals present in each rate category.

3.2. Branch-Specific Sliding-Window Analysis

SatuTe also supports branch-specific analyses. To gain a more detailed understanding of changes in phylogenetic information, you can perform a sliding-window analysis with a specific window size. This approach is effective in detecting a minority of sites affected by saturation.

3.3. Per-Alignment Region Analysis

When the alignment is composite —such as a concatenation of different genes, proteins, or other partitions— a key question is whether the selected alignment regions are phylogenetically informative within the reconstructed tree topology. A per-alignment-region analysis can help address this question.

3.4. Z-Score Differences

By comparing the z-scores obtained from different branches, you can identify potential information loss and examine the differences. For instance, you might explore per-region z-score differences between an external branch and an internal branch. Beyond branch comparison, z-score differences can also help determine whether each region supports one of two given topologies.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published