Skip to content

Latest commit

 

History

History
125 lines (96 loc) · 4.87 KB

README.md

File metadata and controls

125 lines (96 loc) · 4.87 KB

License: GPL v3

Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq

The blueprints to development, response to the environment, and cellular function are largely the manifestation of distinct gene expression programs controlled by the spatiotemporal activity of cis-regulatory elements. Although biochemical methods for identifying accessible chromatin – a hallmark of cis-regulatory elements – have been developed, approaches capable of measuring and quantifying cis-regulatory activity are only beginning to be realized. Massively Parallel Reporter Assays coupled to chromatin accessibility profiling presents a high-throughput solution for testing the transcription activating capacity of millions of putatively regulatory DNA sequences. However, clear computational pipelines for analyzing these high-throughput sequencing-based reporter assays are lacking. In this protocol, I layout and rationalize a computational framework for the processing and analysis of Assay for Transposase Accessible Chromatin profiling followed by Self-Transcribed Active Regulatory Region sequencing (ATAC-STARR-seq) data from a recent study in Zea mays. The approach described herein can be adapted to other sequencing-based reporter assays and it largely agnostic to the model organism.

Software dependencies

BWA MEM see install instructions here
SAMtools
BEDtools
SRA-toolkit
fastp
MACS2
UCSC binaries
tabix
IGV
MEME
CrossMap
DeepTools

Input data

The computational pipeline uses paired-end sequencing data from an ATAC-STARR-seq experiment performed on maize protoplasts (Ricci et al., 2019). The ATAC-STARR-seq experiment consisted of a DNA input (ATAC-seq library) and a mRNA readout (self-transcribed regulatory regions) to identify genomic regions exhibiting transcription-activating regulatory activity.

  1. Transfected ATAC-seq DNA-input FASTQ
  2. Transcribed ATAC-seq mRNA FASTQ

Procedure

Additional details can be found in paper.

  1. Download data
# set variables and download FASTQ files
mkdir FASTQ_files
cd FASTQ_files
fasterq-dump -o B73_maize_DNA_input.fastq SRR10964904
fasterq-dump -o B73_maize_mRNA_output.fastq SRR10964905

# compress fastq files
pigz *.fastq

# NOT RUN
# Tip: gzip can be used as an alternative to pigz (parallel gzip)
# gzip *.fastq

# download reference data
cd ../
mkdir Genome_Reference
cd Genome_Reference
wget https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0.fa.gz
wget https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3.gz

# create indices for reference genome FASTA
gunzip Zm-B73-REFERENCE-NAM-5.0.fa.gz
samtools faidx Zm-B73-REFERENCE-NAM-5.0.fa
bwa index Zm-B73-REFERENCE-NAM-5.0.fa
  1. Trim adapters and remove low quality reads
# run step 1
sbatch step01_trim_raw_reads.sh
  1. Align and process sequenced reads
# run step 2
sbatch step02_align_STARR_data.sh
  1. Extract fragments
# run step 3
sbatch step03_extract_fragments.sh
  1. Call peaks
# run step 4
sbatch step04_call_peaks.sh
  1. Estimate enhancer activity
# run step 5
sbatch step05_estimate_enhancer_activity.sh

# estimate enhancer activity
cd BED_files/
Rscript Estimate_Enhancer_Activity.R
  1. Filter noisy STARR peaks using empirical FDR
# run step 6
sbatch step06_create_control_regions.sh

# create directory to contain analysis
cd ../
mkdir 01_Peak_Analysis
cd 01_Peak_Analysis

# map maximum enhancer activity to putative regulatory regions (wdups)
bedtools map -a ../Peak_data/STARR_merged_peaks.bed -b ../BED_files/B73_maize.enhancer_activity.bdg -o max -c 4 > STARR_merged_peaks.enhancer_activity.bed

# map maximum enhancer activity to control 
bedtools map -a ../Peak_data/STARR_CONTROL.bed -b ../BED_files/B73_maize.enhancer_activity.bdg -o max -c 4 > STARR_CONTROL.enhancer_activity.bed

# run eFDR filter
Rscript eFDR_Filter_STARR_Peaks.R

  1. Plot heatmaps
# run step 7
sbatch step07_plot_enhancer_activity.sh

Downstream analysis and expected results

See the paper for a downstream analysis and expected results.