Skip to content

Data processing and analysis code for transposon mutagenesis sequencing of the Lenski Long-term evolution experiment

License

Notifications You must be signed in to change notification settings

baymlab/2022_Limdi-TnSeq-LTEE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

2022_Limdi-TnSeq-LTEE

About

Code corresponding to the following paper:

Parallel changes in gene essentiality over 50,000 generations of evolution

Anurag Limdi, Sian V. Owen, Cristina Herren, Richard E. Lenski and Michael Baym (https://doi.org/10.1101/2022.05.17.492023)

In this project, we performed transposon sequencing of the ancestors and evolved clones after 50,000 generations of evolution to identify how the distribution of fitness effects and gene essentiality changes over evolution.

Linked Datasets

Organization

This repository is organized as follows:

Data

This folder is empty: please download the data from https://doi.org/10.5281/zenodo.6547536. This processed data is generated using the scripts in Part 1: Data to Trajectories, and is required for final figure generation and analysis.

The data is contained in a single file ltee-tnseq-processed-data.zip. Please unzip this file, and move the Mutant_Trajectories and WGS_Data directories here.

Metadata

This folder contains the relevant metadata for analysis, including gene names, locations, reference genomes, etc.

It contains the following datasets from previously published papers:

Analysis

This section contains all the scripts required to go from the .fastq sequencing data to generating the final figures and analysis for the paper. It is further sub-divided into three parts:

  • Part 1: Data to Trajectories. Here, I process the raw sequencing reads, map them to the reference genome, and compile a final table containing the number of reads mapping to each insertion at every timepoint in the fitness assay.
  • Part 2: Whole Genome Sequencing Analysis. I analyse the raw WGS data using breseq and samtools, and identify large duplications and deletions
  • Part 3: TnSeq Analysis. I calculate fitness effects of insertion mutations and infer gene essentiality from mutant trajectories, and do additional exploratory analysis looking the interplay of gene essentiality with gene expression levels, and presence/absence of homologs. In this section, I also create the final figures and analyses that go into the manuscript.

Note: each of these folders contains a README.md outlining in more detail what each script/jupyter notebook does, plus any other relevant information.

Other directories:

  • Plots_for_Paper: Contains all the figure panels that I assemble and annotate to the get the final figures in Illustrator
  • Supplementary Tables: Contains supporting information for the analyses done in the paper

About

Data processing and analysis code for transposon mutagenesis sequencing of the Lenski Long-term evolution experiment

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages