Uses the seq-seq-pan software (https://gitlab.com/rki_bioinformatics/seq-seq-pan) to generate an .xmfa multiple genome alignment file. Genomes and gff files are available through lepbase.org.
Run_seq-seq-pan.sh
Python script to create .bed files with sequence blocks present in pan genome for each genome.
python seq-seq-pan_blocks_intervals.py -I SeqSeqPan_erato_melp_noNewline.xmfa -g 1,2
Bash commands for intersecting block .bed with bedtools.
Intersect_bed.sh
Transform .xmfa file to fasta.
python seq-seq-pan_toFasta.py -I SeqSeqPan_erato_melp_noNewline.xmfa -g 1,2
Note that this bash code is written for a slurm array job and would still require to add slurm job submission parameters.
Clean reads using Trimmomatic.
Run_ATAC_trimmomatic.sh
Read mapping.
Run_ATAC_mapping.sh
Peak calling MACS2.
Run_ATAC_MACS2.sh
Combine MACS peaks to creat peak reference set
# Retain peaks present in 2 samples with at least 1 bp overlap
Run_ATAC_combine_peaks_2samples_1bp_per_tissue_stage.sh
# Retain peaks present in all samples with at least 1 bp overlap
Run_ATAC_combine_peaks_ALLsamples_1bp_per_tissue_stage.sh
# Retain peaks present in 2 samples with at least 50% reciprocal overlap
Run_ATAC_combine_peaks_2samples_50perc_per_tissue_stage.sh
# Retain peaks present in all samples with at least 50% reciprocal overlap
Run_ATAC_combine_peaks_ALLsamples_50perc_per_tissue_stage.sh
Count number of reads in .bam file.
samtools view -c SAMPLE.bam
Calculate fraction of reads in peaks (FRIP).
Run_ATAC_Fraction_reads_in_peaks.sh
Calculate Transcription Start Site enrichment score.
TSS_enrichment.R
Counting.
# Count read number in reference peak set in H. erato
Run_ATAC_count_H_erato.sh
# Count read number in reference peak set in H. melpomene
Run_ATAC_count_H_melpomene.sh
Map ATAC-seq peak intervals to PAN genome.
map_peak_intervals_to_pan.sh
Python code to transform scaffold positions to genome positions for mapping with seq-seq-pan map.
seq-seq-pan_bedgraph_chrompos.py
Map MACS2 peaks sets to PAN genome coordinates and intersect.
Run_ATAC_map_peaks_to_PAN.sh
Create bedgraph files from .bam files.
Run_bedgraphs.sh
Scale and average bedgraphs for tissue/time.
Run_bedgraphs_scale.sh
Map bedgraphs to PAN genome assembly.
# Extract positions from bedgraph files for mapping with seq-seq-pan map
Run_bedgraphs_map_to_PAN_preprocess.sh
# Map position to pan genome
Run_bedgraphs_map_to_PAN_seqseqpan_mapping.sh
# Combine start and end positions of intervals mapped to pan genome
Run_bedgraphs_map_to_PAN_postprocessing.sh
Calculate size factors (used when scaling and combining bedgraphs).
ATAC_sizefactor.R
R code for differential accessibility (DA) analyses. Includes:
- Code to match H. erato and H. melpomene peak counts
- code for DA between developmental time points, wings and sections
- Code for PCA
- Merging of DA peaks with DNA sequence conservation
- Foldchange correlation analysis
- Code to output DA peak sets
ATAC_DA_erato_melp_development.R
ATAC_DA_erato_melp_FWHW.R
ATAC_DA_erato_melp_sections.R
Download data
Download_RNAseq_data.sh
Map RNA-seq reads and count gene expression.
Run_mapping.sh
Run_count.sh
Create counts table from individual sample mappings.
Create_counts_tables.R
Differential expression forewing versus hindwing.
diff_expression_analysis_FWHW.R
Volcano plots with highlighted genes near DA peaks.
diff_expression_volcano_FWHW.R
Identify genes with shared expression patterns forewing versus hindwing.
shared_genes_with_peaks.R
Differential expression forewing sections.
diff_expression_analysis_sections.R
Identify genes close to DA ATAC-seq peak.
Run_homer_development.sh
Run_homer_FWHW.sh
Run_homer_sections.sh
Correlate ATAC-seq accessibility with gene expression.
# Over development
shared_unique_development_expression.R
# between forewing and hindwing
shared_unique_FWHW_expression.R
Python script to calculate conservation of intervals between pair of genomes.
seq-seq-pan_bedfile_conservation.py
Calculate conservation for different interval sets.
Run_interval_conservation.sh
Summaries and visualisation in R.
calculate_IDY_mean.R
Run meme-chip to find TF enrichment patterns between tissues and time points.
Run_meme_development.sh
Run_meme_FWHW.sh
Run_meme_sections.sh
Parse meme-chip html outputs and extract TFs and enrichment values.
parse_MEME.py
R script to extract color and compare with wild type phentypes.
patternize_optix4.R
Ubx
Plot_PAN_ATAC_Ubx.R
optix
Plot_PAN_ATAC_optix.R