Skip to content

Generate prediction bigwigs

Anusri Pampari edited this page Jan 5, 2024 · 11 revisions

The script chrombpnet pred_bw can be invoked to generate bigwigs containing predictions on a set of input regions.

Usage

chrombpnet pred_bw [-h] -bm BIAS_MODEL -cm CHROMBPNET_MODEL -cmb CHROMBPNET_MODEL_NB -r REGIONS -g GENOME -c CHROM_SIZES -op OUT_PREFIX [-b BATCH_SIZE] [-t TQDM]  [-d DEBUG_CHR [DEBUG_CHR ...]]

Input Format

required arguments:
  -bm BIAS_MODEL, --bias-model BIAS_MODEL
                        Path to bias model h5 (atleast one of -bm, -cm, -cmb is  reqd)
  -cm CHROMBPNET_MODEL, --chrombpnet-model CHROMBPNET_MODEL
                        Path to chrombpnet model h5 (atleast one of -bm, -cm, -cmb is reqd)
  -cmb CHROMBPNET_MODEL_NB, --chrombpnet-model-nb CHROMBPNET_MODEL_NB
                        Path to chrombpnet no bias model h5 (atleast one of -bm, -cm, -cmb is reqd)
  -r REGIONS, --regions REGIONS
                        10 column bed file of regions for prediction
  -g GENOME, --genome GENOME
                        Genome fasta
  -c CHROM_SIZES, --chrom-sizes CHROM_SIZES
                        Chromosome sizes 2 column tab-separated file
  -op OUTPUT_PREFIX, --output-prefix OUTPUT_PREFIX
                        Output prefix for bigwig files

optional arguments:
  -os OUTPUT_PREFIX_STATS, --output-prefix-stats OUTPUT_PREFIX_STATS
                        Output stats on bigwig
  -bs BATCH_SIZE, --batch-size BATCH_SIZE
                        batch size to use for prediction
  -t TQDM, --tqdm TQDM  Use tqdm. If yes then you need to have it installed.
  -d DEBUG_CHR [DEBUG_CHR ...], --debug-chr DEBUG_CHR [DEBUG_CHR ...]
                        Run for specific chromosomes only (e.g. chr1 chr2) for debugging
  -bw BIGWIG, --bigwig BIGWIG
                        If provided .h5 with predictions are output along with calculated metrics considering bigwig as groundtruth.

Output Format

The following files are created using the output_prefix as prefix for the output.

Note: Note that prefix can include a directory path and prefix for the output file. Make sure that the directory in output_prefix exists. Make sure that regions in the input bed file can be expanded to inputlen (default to 2114) regions without overflowing out of the chromosomes (after centering on the summit). If this condition is not satisfied the program will return with a error for versions <= v0.1.3. For latest version this is handled and the output files includes a bed file containing the filtered regions.

  • output_prefix_bias.bw: Predictions from bias.h5 model in the input regions
  • output_prefix_bias_preds.bed: Bed file containing filtered regions for which predictions were generated using bias.h5
  • output_prefix_chrombpnet.bw: Predictions from chrombpnet.h5 model in the input regions. (Bias uncorrected bigwig - predictions comparable to observed)
  • output_prefix_chrombpnet_preds.bed: Bed file containing filtered regions for which predictions were generated using chrombpnet.h5
  • output_prefix_chrombpnet_nobias.bw: Predictions from chrombpnet_nobias.h5 model in the input regions (Bias corrected bigwig - predictions reflects TF footprints)
  • output_prefix_chrombpnet_nobias_preds.bed: Bed file containing filtered regions for which predictions were generated using chrombpnet_nobias.h5

If bigwig is provided, individual predictions at each of the input regions is output to a .h5 along with metrics comparing with ground truth - pearsonr on count and jsd on profile head.