Skip to content
Oleg Shpynov edited this page Nov 26, 2024 · 5 revisions

Semi-supervised peak calling example

This document describes the semi-supervised peak calling method and the test dataset.

Please install all necessary tools: SPAN Peak Caller and JBR Genome Browser.

The semi-supervised algorithm consists of the following steps:

  1. Fit unsupervised SPAN 2.0 statistical data model and peaks
  2. Upload treatment alignment visualization files and peaks files to JBR.
    Most likely, peak-calling results are OK, and there is no need for a semi-supervised procedure.
  3. Create manual markup
  4. Upload the SPAN model directly into the JBR session
  5. Perform a supervised learning procedure

Required files:

  1. Monocyte_H3K4me1_hg38_ENCFF396RZF.bam Histone H3K4me1 ChIP-Seq of Human Monocytes
  2. Monocyte_Control_hg38_ENCFF328STD.bam ChIP-Seq Control of Human Monocytes
  3. hg38.chrom.sizes UCSC genome file for hg38.
  4. Monocyte_H3K4me1_hg38_ENCFF396RZF.bw treatment alignment visualization file

All BAM files were aligned to the hg38 human genome reference.

Additional files:

  1. Monocyte_H3K4me1_hg38_ENCFF396RZF_q0.05_peaks.narrowPeak peaks produced by MACS2
  2. Monocyte_H3K4me1_hg38_ENCFF396RZF_broad0.1_peaks.broadPeak peaks produces by MACS2 --broad
  3. Monocyte_H3K4me1_hg38_ENCFF396RZF-W200-G600-islands-summary-FDR0.01.bed peaks produced by SICER
    Peaks obtained by MACS2 and SICER are given for comparison purposes.

Fit unsupervised SPAN statistical data model

Launch SPAN with the following command line.

java -Xmx8G -jar span.jar analyze -t Monocyte_H3K4me1_hg38_ENCFF396RZF.bam -c Monocyte_Control_hg38_ENCFF328STD.bam --chrom.sizes hg38.chrom.sizes --threads 4 --model Monocyte_H3K4me1_hg38_ENCFF396RZF.span --peaks Monocyte_H3K4me1_hg38_ENCFF396RZF.peak

The model file Monocyte_H3K4me1_hg38_ENCFF396RZF.span may be later used for semi-supervised peak calling procedure.

Upload signal visualization files to JBR

  1. Launch JBR Genome Browser. If your current session isn't hg38, create a new hg38 session using File | New Session...

  2. Upload treatment and control files: Monocyte_H3K4me1_hg38_ENCFF396RZF.bw and Monocyte_Control_hg38_ENCFF328STD.bw.
    Open files as remote URLs File | Load URL(s)... menu action or as local file using File | Load BED file... or File | Load BigWig / Wig / BigBed file....
    See JBR Genome Browser documentation for more details.

  3. Alternatively, you can open preconfigured session file example.yaml as remote file File | Load URLS(s) or from local file File | Open Session.
    Session contain ChIP-seq signal visualization, semi-supervised markup data and all the peak files mentioned behind.

Create manual markup

  1. Select an appropriate zoom level to make separate peaks distinguishable from background.
    Recommended zoom levels:

    • 5-10 kbp for TFs and narrow histone modifications
    • 50-100 kbp for broad histone modifications
  2. Four supervised annotation types are supported:

    • peaks : there is at least one peak in the labeled area
    • noPeaks : there are no peaks in the labeled area
    • peakStart : exactly one peak starts in the labeled area
    • peakEnd : exactly one peak ends in the labeled area

    peakStart and peakEnd are not paired and may refer to different peaks.

  3. Enable JBR Genome Browser annotations mode using JBR menu action SPAN | Peaks Annotation Mode.
    Repeat the following procedure:

    • Press and hold SHIFT+ALT, use mouse drag-n-drop to select genome region, use ESC to clear selection
    • Set the annotation type by clicking four label buttons or pressing the N/P/S/E key on the keyboard
  4. You can import and export markup as BED files using Import and Export buttons on the annotator panel.

For details on creating annotation, please refer to JBR documentation.

Upload SPAN model into JBR session

Once you have SPAN model fitted, load it into JBR using menu action File | Load SPAN model...

Perform supervised learning procedure

  1. You can perform learning procedure for all SPAN models loaded in JBR at once by using SPAN | Tune SPAN models... action from the main menu.
    Alternatively, you can work only with a selected SPAN model track(s), choose Tune SPAN model from context menu.

  2. Export peaks to BED file To export peaks for all loaded SPAN model tracks, select SPAN | Export SPAN Peaks... action from the main menu and choose the target directory.
    Alternatively, export peaks for selected track(s) only: choose Export SPAN model peaks... from the context menu.

Final step

After you have performed the semi-supervised tuning procedure you can see resulting peaks file statistics including number of peaks, average length, etc. by selecting track(s) and using About Tracks action from the context menu.

Should you have any further questions, please do not hesitate to contact us anytime.