Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq

The blueprints to development, response to the environment, and cellular function are largely the manifestation of distinct gene expression programs controlled by the spatiotemporal activity of cis-regulatory elements. Although biochemical methods for identifying accessible chromatin – a hallmark of cis-regulatory elements – have been developed, approaches capable of measuring and quantifying cis-regulatory activity are only beginning to be realized. Massively Parallel Reporter Assays coupled to chromatin accessibility profiling presents a high-throughput solution for testing the transcription activating capacity of millions of putatively regulatory DNA sequences. However, clear computational pipelines for analyzing these high-throughput sequencing-based reporter assays are lacking. In this protocol, I layout and rationalize a computational framework for the processing and analysis of Assay for Transposase Accessible Chromatin profiling followed by Self-Transcribed Active Regulatory Region sequencing (ATAC-STARR-seq) data from a recent study in Zea mays. The approach described herein can be adapted to other sequencing-based reporter assays and it largely agnostic to the model organism.

Software dependencies

BWA MEM see install instructions here
SAMtools
BEDtools
SRA-toolkit
fastp
MACS2
UCSC binaries
tabix
IGV
MEME
CrossMap
DeepTools

Input data

The computational pipeline uses paired-end sequencing data from an ATAC-STARR-seq experiment performed on maize protoplasts (Ricci et al., 2019). The ATAC-STARR-seq experiment consisted of a DNA input (ATAC-seq library) and a mRNA readout (self-transcribed regulatory regions) to identify genomic regions exhibiting transcription-activating regulatory activity.

Transfected ATAC-seq DNA-input FASTQ
Transcribed ATAC-seq mRNA FASTQ

Procedure

Additional details can be found in paper.

Download data

# set variables and download FASTQ files
mkdir FASTQ_files
cd FASTQ_files
fasterq-dump -o B73_maize_DNA_input.fastq SRR10964904
fasterq-dump -o B73_maize_mRNA_output.fastq SRR10964905

# compress fastq files
pigz *.fastq

# NOT RUN
# Tip: gzip can be used as an alternative to pigz (parallel gzip)
# gzip *.fastq

# download reference data
cd ../
mkdir Genome_Reference
cd Genome_Reference
wget https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0.fa.gz
wget https://download.maizegdb.org/Zm-B73-REFERENCE-NAM-5.0/Zm-B73-REFERENCE-NAM-5.0_Zm00001eb.1.gff3.gz

# create indices for reference genome FASTA
gunzip Zm-B73-REFERENCE-NAM-5.0.fa.gz
samtools faidx Zm-B73-REFERENCE-NAM-5.0.fa
bwa index Zm-B73-REFERENCE-NAM-5.0.fa

Trim adapters and remove low quality reads

# run step 1
sbatch step01_trim_raw_reads.sh

Align and process sequenced reads

# run step 2
sbatch step02_align_STARR_data.sh

Extract fragments

# run step 3
sbatch step03_extract_fragments.sh

Call peaks

# run step 4
sbatch step04_call_peaks.sh

Estimate enhancer activity

# run step 5
sbatch step05_estimate_enhancer_activity.sh

# estimate enhancer activity
cd BED_files/
Rscript Estimate_Enhancer_Activity.R

Filter noisy STARR peaks using empirical FDR

# run step 6
sbatch step06_create_control_regions.sh

# create directory to contain analysis
cd ../
mkdir 01_Peak_Analysis
cd 01_Peak_Analysis

# map maximum enhancer activity to putative regulatory regions (wdups)
bedtools map -a ../Peak_data/STARR_merged_peaks.bed -b ../BED_files/B73_maize.enhancer_activity.bdg -o max -c 4 > STARR_merged_peaks.enhancer_activity.bed

# map maximum enhancer activity to control 
bedtools map -a ../Peak_data/STARR_CONTROL.bed -b ../BED_files/B73_maize.enhancer_activity.bdg -o max -c 4 > STARR_CONTROL.enhancer_activity.bed

# run eFDR filter
Rscript eFDR_Filter_STARR_Peaks.R

Plot heatmaps

# run step 7
sbatch step07_plot_enhancer_activity.sh

Downstream analysis and expected results

See the paper for a downstream analysis and expected results.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq

Software dependencies

Input data

Procedure

Downstream analysis and expected results

Files

README.md

Latest commit

History

README.md

File metadata and controls

Computational Analysis of Maize Enhancer Regulatory Elements Using ATAC-STARR-seq

Software dependencies

Input data

Procedure

Downstream analysis and expected results