Skip to content

Commit

Permalink
add option --cutoff-analysis-max for hmmratac
Browse files Browse the repository at this point in the history
  • Loading branch information
taoliu committed Apr 26, 2024
1 parent 96ba7e1 commit 758d909
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 19 deletions.
6 changes: 3 additions & 3 deletions MACS3/Commands/hmmratac_cmd.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Time-stamp: <2024-04-26 15:38:54 Tao Liu>
# Time-stamp: <2024-04-26 15:46:03 Tao Liu>

"""Description: Main HMMR command
Expand Down Expand Up @@ -192,7 +192,7 @@ def run( args ):

# Let MACS3 do the cutoff analysis to help decide the lower and upper cutoffs
with open(cutoffanalysis_file, "w") as ofhd_cutoff:
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = 100, steps=options.cutoff_analysis_steps ) )
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = options.cutoff_analysis_max, steps=options.cutoff_analysis_steps ) )
#raise Exception("Cutoff analysis only.")
sys.exit(1)

Expand Down Expand Up @@ -238,7 +238,7 @@ def run( args ):

# Let MACS3 do the cutoff analysis to help decide the lower and upper cutoffs
with open(cutoffanalysis_file, "w") as ofhd_cutoff:
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = 100 ) )
ofhd_cutoff.write( fc_bdg.cutoff_analysis( min_length=minlen, max_gap=options.hmm_training_flanking, max_score = options.cutoff_analysis_max ) )

# we will check if anything left after filtering
if peaks.total > options.hmm_maxTrain:
Expand Down
9 changes: 6 additions & 3 deletions bin/macs3
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env python
# Time-stamp: <2024-04-10 01:08:20 Tao Liu>
# Time-stamp: <2024-04-26 15:55:13 Tao Liu>

"""Description: MACS v3 main executable.
Expand Down Expand Up @@ -906,10 +906,13 @@ plus an extra option for the HMM model file like `macs3 hmmratac
help = "Name for this experiment, which will be used as a prefix to generate output file names. DEFAULT: \"NA\"",
default = "NA" )
group_output.add_argument( "--cutoff-analysis-only", dest = "cutoff_analysis_only", action = "store_true",
help = "Only run the cutoff analysis and output a report. After generating the report, the process will stop. The report will help user decide the three crucial parameters for `-l`, `-u`, and `-c`. So it's highly recommanded to run this first! Please read the report and instructions in `Choices of cutoff values` on how to decide the three crucial parameters",
help = "Only run the cutoff analysis and output a report. After generating the report, the process will stop. By default, the cutoff analysis will be included in the whole process, but won't quit after the report is generated. The report will help user decide the three crucial parameters for `-l`, `-u`, and `-c`. So it's highly recommanded to run this first! Please read the report and instructions in `Choices of cutoff values` on how to decide the three crucial parameters. The resolution of cutoff analysis can be controlled by --cutoff-analysis-max and --cutoff-analysis-steps options.",
default = False )
group_output.add_argument( "--cutoff-analysis-max", dest="cutoff_analysis_max", type = int,
help = "The maximum cutoff score for performing cutoff analysis. Together with --cutoff-analysis-steps, the resolution in the final report can be controlled. Please check the description in --cutoff-analysis-steps for detail. DEFAULT: 100",
default = 100 )
group_output.add_argument( "--cutoff-analysis-steps", dest="cutoff_analysis_steps", type = int,
help = "Steps for performing cutoff analysis. It will be used to decide which cutoff value should be included in the final report. Larger the value, higher resolution the cutoff analysis can be. The cutoff analysis function will first find the smallest (at least 0) and the largest (at most 1,000) foldchange score in the data, then break the range of foldchange score into `CUTOFF_ANALYSIS_STEPS` intervals. It will then use each foldchange score as cutoff to call peaks and calculate the total number of candidate peaks, the total basepairs of peaks, and the average length of peak in basepair. Please note that the final report ideally should include `CUTOFF_ANALYSIS_STEPS` rows, but in practice, if the foldchange cutoff yield zero peak, the row for that foldchange value won't be included. DEFAULT: 100",
help = "Steps for performing cutoff analysis. It will be used to decide which cutoff value should be included in the final report. Larger the value, higher resolution the cutoff analysis can be. The cutoff analysis function will first find the smallest (at least 0) and the largest (at most 100, and controlled by --cutoff-analysis-max) foldchange score in the data, then break the range of foldchange score into `CUTOFF_ANALYSIS_STEPS` intervals. It will then use each foldchange score as cutoff to call peaks and calculate the total number of candidate peaks, the total basepairs of peaks, and the average length of peak in basepair. Please note that the final report ideally should include `CUTOFF_ANALYSIS_STEPS` rows, but in practice, if the foldchange cutoff yield zero peak, the row for that foldchange value won't be included. DEFAULT: 100",
default = 100 )
group_output.add_argument( "--save-digested", dest = "save_digested", action = "store_true",
help = "Save the digested ATAC signals of short-, mono-, di-, and tri- signals in three BedGraph files with the names NAME_short.bdg, NAME_mono.bdg, NAME_di.bdg, and NAME_tri.bdg. DEFAULT: False",
Expand Down
42 changes: 29 additions & 13 deletions docs/source/docs/hmmratac.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,18 +22,25 @@ Here's an example of how to run the `hmmratac` command:
$ macs3 hmmratac -i yeast.bam -n yeast
```

or with the BEDPE format
or with the BEDPE format of a much smaller size:

```
$ macs3 hmmratac -i yeast.bedpe.gz -f BEDPE -n yeast
```

Note: you can convert BAMPE to BEDPE by using
You can convert BAMPE to BEDPE by using

```
$ macs3 filterdup --keep-dup all -f BAMPE -i yeast.bam -o yeast.bedpe
```

Please note that in order to save memory usage and fasten the process,
`hmmratac` will save intermediate temporary file to the disk. The file
size can range from megabytes to gigabytes, depending on how many
candidate regions `hmmratac` needs to decode. The temporary file will
be removed after the job is done. So please make sure there is enough
space in the 'tmp' directory of your system.

Please use `macs3 hmmratac --help` to see all the options. Here we
list the essential ones.

Expand Down Expand Up @@ -98,28 +105,37 @@ output file names. DEFAULT: "NA"
fragments, make this value smaller. Default = 0.001

### `--cutoff-analysis-only`

Only run the cutoff analysis and output a report. After generating
the report, the process will stop. The report will help user decide
the three crucial parameters for `-l`, `-u`, and `-c`. So it's highly
recommanded to run this first! Please read the report and
instructions in [Choices of cutoff values](#choices-of-cutoff-values)
on how to decide the three crucial parameters.
Only run the cutoff analysis and output a report. After generating the
report, the whole process will stop. By default, the cutoff analysis
will be included in the whole process, but won't quit after the report
is generated. The report will help user decide the three crucial
parameters for `-l`, `-u`, and `-c`. So it's highly recommanded to run
this first! Please read the report and instructions in [Choices of
cutoff values](#choices-of-cutoff-values) on how to decide the three
crucial parameters. The resolution of cutoff analysis can be
controlled by `--cutoff-analysis-max` and `--cutoff-analysis-steps`
options.

### `--cutoff-analysis-max`
The maximum cutoff score for performing cutoff analysis. Together with
`--cutoff-analysis-steps`, the resolution in the final report can be
controlled. Please check the description in `--cutoff-analysis-steps`
for detail. The default value is 100.

### `--cutoff-analysis-steps`

Steps for performing cutoff analysis. It will be used to decide which
cutoff value should be included in the final report. Larger the value,
higher resolution the cutoff analysis can be. The cutoff analysis
function will first find the smallest and the largest foldchange score
in the data, then break the range of foldchange score into
function will first find the smallest (at least 0) and the largest (at
most 100, and controlled by --cutoff-analysis-max) foldchange score in
the data, then break the range of foldchange score into
`CUTOFF_ANALYSIS_STEPS` intervals. It will then use each foldchange
score as cutoff to call peaks and calculate the total number of
candidate peaks, the total basepairs of peaks, and the average length
of peak in basepair. Please note that the final report ideally should
include `CUTOFF_ANALYSIS_STEPS` rows, but in practice, if the
foldchange cutoff yield zero peak, the row for that foldchange value
won't be included. DEFAULT: 100
won't be included. The default is 100.

### `--hmm-type`

Expand Down

0 comments on commit 758d909

Please sign in to comment.