Skip to content

Commit a5a7921

Browse files
Merge pull request #311 from AdamaJava/qbasepileup.doc.mar
update document on qbasepileup
2 parents 5133610 + 09b7606 commit a5a7921

File tree

4 files changed

+182
-177
lines changed

4 files changed

+182
-177
lines changed

docs/qbasepileup/index.md

+103-86
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,8 @@
11
# qbasepileup
2+
```diff
3+
-This tool is dated due to some input file format (dccq, dcc1) is no longer available!
4+
```
5+
26

37
## Introduction
48

@@ -8,39 +12,28 @@ metrics on the reads at the positions of interest.
812

913
## Installation
1014

11-
qbasepileup requires java 7 and (ideally) a multi-core machine (5 threads
15+
qbasepileup requires java 8 and (ideally) a multi-core machine (5 threads
1216
are run concurrently) with at least 20GB of RAM.
13-
Download the qbasepileup tar file
14-
Untar the tar file into a directory of your choice
15-
You should see jar files for qbasepileup and its dependencies:
1617

17-
~~~~{.text}
18-
$ tar xjvf qbasepileup.tar.bz2
19-
x antlr-3.2.jar
20-
x ini4j-0.5.2-SNAPSHOT.jar
21-
x jopt-simple-3.2.jar
22-
x picard-1.110.jar
23-
x qbamfilter-1.0pre.jar
24-
x qcommon-0.1pre.jar
25-
x qio-0.1pre.jar
26-
x qpicard-0.1pre.jar
27-
x qpileup-0.1pre.jar
28-
x sam-1.110.jar
29-
x jhdfobj.jar
30-
x jhdf5obj.jar
31-
x jhdf5.jar
32-
x jhdf.jar
33-
~~~~
18+
* **To do a build of qbasepileup, first clone the adamajava repository using "git clone":**
19+
```
20+
git clone https://github.com/AdamaJava/adamajava
21+
```
3422

35-
## Usage
23+
Then move into the adamajava folder:
24+
```
25+
cd adamajava
26+
```
27+
Run gradle to build qbasepileup and its dependent jar files:
28+
```
29+
./gradlew :qbasepileup:build
30+
```
31+
This creates the qbasepileup jar file along with dependent jars in the `qbasepileup/build/flat` folder
3632

37-
A general invocation of qbasepileup looks like:
3833

39-
~~~~{.text}
40-
java -jar qbasepileup.jar [OPTIONS]
41-
~~~~
34+
## Usage
4235

43-
A typical invocation might look like:
36+
A general invocation of qbasepileup looks like:
4437

4538
~~~~{.text}
4639
java -jar qbasepileup.jar -m mode \
@@ -50,57 +43,56 @@ java -jar qbasepileup.jar -m mode \
5043
## Options
5144

5245
~~~~{.text}
53-
--help, -h Show help message.
54-
--version Print version.
55-
-b Path to tab delimited file with list of bams.
56-
--bq Minimum base quality score for accepting a read.
57-
--dup Include duplicates
58-
-f Format of SNPs file [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA], Def=dcc1
59-
--filter Query string for qbamfilter
60-
--gatk Adjust insertion position to conform to GATK format
61-
--hdf HDF file to read list of bam files from
62-
--hp window around indel to check for homopolymers [default:10]
63-
-i Path to bam file.
64-
--ig Path to somatic indel file
65-
--in Path to normal bam file
66-
--ind Include reads with indels. (y or n, default y).
67-
--intron Include reads mapping across introns. (y or n, default y).
68-
--is Path to somatic indel file
69-
--it Path to tumour bam file
70-
--log Req, Log file.
71-
--loglevel Logging level required, e.g. INFO, DEBUG. Default INFO.
72-
-m snp,indel,coverage or compoundsnp
73-
--maxcov Report reads that are less than the mininmum coverage option. Integer
74-
--mincov Report reads that are less than the
75-
--mq Minimum mapping quality score for accepting a read.
76-
-n Bases around indel to check for other indels., Def=3.
77-
--novelstarts Report novelstarts rather than read count [Y,N], Def=Y.
78-
-o Output file.
79-
--of Output file format [rows,columns].
80-
--og Output file for germline indels.
81-
--os Output file for somatic indels.
82-
-p Pileup profile type [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA] Def=dcc1.
83-
--pd <pindel_deletions> Path to normal bam file
84-
--pindel adjust insertion position to conform to pindel format
85-
-r Path to reference genome fasta file.
86-
-s Path to tab delimited file containing snps. Formats: dcc1,dccq,vcf,maf,txt
87-
--sc <soft_clip_window> number of bases around indel to check for softclipped bases [default:13]
88-
--strand Separate coverage by strand. (y or n, default y)
89-
--strelka adjust insertion position to conform to strelka format (same as pindel)
90-
-t Thread number. Total = number supplied + 2.
46+
Option Description
47+
------ -----------
48+
-V, -v, --version Print version info.
49+
-b <txt file> Opt (coverage and snp mode), path to tab delimited file with list of bams.
50+
--bq <Integer> Opt (snp related mode), minimum base quality score for accepting a read. Def = null.
51+
--dup Opt (indel and snp mode), a flag to include duplicates reads.
52+
-f [format] Opt(snp mode) snp file format: [dcc1, dccq, vcf, tab, maf].Def=dcc1. or
53+
Opt(coverage mode), snp file format: [dcc1, dccq, vcf, tab, maf, gff3, gtf]. Def=dcc1.
54+
--filter <query> Opt, a qbamfilter query to filter out BAM records. Def=null.
55+
--gatk Opt (indel mode), a flag to conform gatk format, but do nothing.
56+
-h, --help Shows this help message.
57+
--hdf <hdf file> Opt (snp mode), path to hdf file which header contains a list of bams.
58+
--hp <Integer> Opt (indel mode), base around indel to check for homopolymers. Def=10.
59+
-i <bam file> Opt (coverage and snp mode), specify a single SAM/BAM file here.
60+
--ig <dcc1> Req (indel mode), path to germline indel file in dcc1 format.
61+
--in <bam file> Req (indel mode), path to normal bam file
62+
--ind <y|n> Opt (snp related mode), include reads with indels [y,n]. Def=y.
63+
--intron <y|n> Opt (snp related mode), include reads mapping across introns [y,n]. Def=y.
64+
--is <dcc1> Req (indel mode), path to somatic indel file in dcc1 format.
65+
--it <bam file> Req (indel mode), path to tumour bam file
66+
--log Req, log file.
67+
--loglevel Opt, logging level required, e.g. INFO, DEBUG. Default INFO.
68+
-m <mode> Opt, Mode [snp, compoundsnp, snpcheck, indel, coverage]. Def=snp.
69+
--mincov <Integer> Opt, report reads that are less than the mininmum coverage option.
70+
--mq <Integer> Opt (snp related mode), minimum mapping quality score for accepting a read. Def = null.
71+
-n <Integer> Opt (indel mode), bases around indel to check for other indels. Def=3.
72+
--novelstarts <y|n> Opt (snp related mode), report novelstarts rather than read count [Y,N], Def=y.
73+
-o <txt file> Req (coverage and snp related mode), the output file path.
74+
--of <format> Opt (snp mode only), output file format [columns]. this option only works with input snp
75+
file format is tab, otherwise it will ignored.
76+
--og <dcc1> Req (indel mode), output file for germline indels.
77+
--os <dcc1> Req (indel mode), output file for somatic indels.
78+
-p [profile] Opt (snp mode), pileup profile type [torrent,RNA,DNA, standard]. Def="standard".
79+
--pd <pindel_deletions> Req (indel mode), path to normal bam file
80+
--pindel Opt (indel mode), a flag to conform pindel format, but do nothing.
81+
-r <fasta file> Req (indel and snp mode), path to reference genome fasta file.
82+
-s <txt file> Req (coverage and snp mode), path to tab delimited file containing snps.
83+
--sc <Integer> Opt (indel mode), bases around indel to check for softclip. Def=13.
84+
--strand <y|n> Opt (snp related mode), separate coverage by strand [y,n]. Def=y.
85+
--strelka Opt (indel mode), a flag to conform strelka format, but do nothing.
86+
-t [Integer] Opt, number of worker threads (yields n+2 total threads). Def=1.
9187
~~~~
9288

89+
## Modes
90+
### coverage
9391

94-
### `-n`
95-
96-
This integer is the number of bases around and indel to check for nearby
97-
indel. Default=3.
98-
99-
### `--pd`
92+
Reads one or more BAM files, and a file containing reference ranges and
93+
piles up the reads around the indel to count the number of reads covering
94+
each position in the range.
10095

101-
<pindel_deletions> Path to normal bam file
102-
103-
## Modes
10496

10597
### [snp](qbasepileup_snp_mode)
10698

@@ -111,24 +103,31 @@ region. Coverage per nucleotide is reported and the total coverage at that
111103
position is reported. By default, duplicates and unmapped reads are excluded.
112104

113105
### [compoundsnp](qbasepileup_compound_snp_mode)
106+
```diff
107+
-This mode is deprecated due to the input snp position file format (dcc1) is no longer available!
108+
```
114109

115-
Reads one or more BAM files, a reference genome, and a file containing
116-
positions of compound SNPs (SNPs that sit next to each other). It finds the
117-
reference genome base at the compound SNP positions as well as the bases
118-
found at that position in all reads aligned to that region. Coverage per
119-
nucleotide is reported and the total coverage at that position is reported.
120110

121-
By default, the `--filter` qbamfilter query string is:
111+
In this mode, qbasepileup reads one or more BAM files, a reference genome, and a file containing positions of compound SNPs (SNPs that sit next to each other). It finds the reference genome base at the compound SNP positions as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported. By default, the filter is:
122112

123113
~~~~{.text}
124114
and( Flag_DuplicateRead==false, CIGAR_M>34, MD_mismatch <= 3, option_SM > 10)
125115
~~~~
126116

127117
For a more detailed description of qbamfilter and how it works to filter
128-
reads in and out of a particular analysis, see
129-
[qbamfilter](../qbamfilter/).
118+
reads in and out of a particular analysis, see[qbamfilter](../qbamfilter/).
119+
120+
121+
This mode is for compound SNPs, is very similar to [snp mode](qbasepileup_snp_mode.md) except:
122+
123+
* Only dcc1 format (`-f`) is currently accepted
124+
* Default filter is: `and(Flag_DuplicateRead==false, CIGAR_M>34, MD_mismatch<=3, option_SM>10)`
125+
130126

131127
### [indel](qbasepileup_indel_mode)
128+
```diff
129+
-This mode is deprecated due to the indle file format (dcc1) is no longer available!
130+
```
132131

133132
Reads tumour and normal BAM files, a reference genome, and somatic and/or
134133
germline files containing positions of indels and pileups the reads around
@@ -141,9 +140,27 @@ the indel to count a number of metrics. Metrics include:
141140
* number of reads with nearby soft clipping
142141
* number of reads with nearby indels
143142

144-
### coverage
143+
#### Examples
145144

146-
Reads one or more BAM files, and a file containing reference ranges and
147-
piles up the reads around the indel to count the number of reads covering
148-
each position in the range.
145+
* Somatic and Germline input files in pindel format
146+
~~~~{.text}
147+
qbasepileup -t 2 -m indel -r reference.fa --pindel \
148+
--it tumour.bam --in normal.bam \
149+
--is somatic.input.dcc1 --ig germline.input.dcc1 \
150+
--os somatic.output.dcc1 --og germline.output.dcc1 \
151+
--log basepileup.log
152+
~~~~
153+
154+
* Somatic input file in gatk format
155+
~~~~{.text}
156+
qbasepileup -t 2 -m indel --it tumour.bam --in normal.bam \
157+
--is somatic.input.dcc1 --os somatic.output.dcc1
158+
--log basepileup.log -r reference.fa --gatk
159+
~~~~
149160

161+
* Germline input file in pindel format
162+
~~~~{.text}
163+
qbasepileup -t 2 -m indel --it tumour.bam --in normal.bam \
164+
--ig germline.input.dcc1 --og germline.output.dcc1
165+
--log basepileup.log -r reference.fa --pindel
166+
~~~~

docs/qbasepileup/qbasepileup_compound_snp_mode.md

-22
This file was deleted.

0 commit comments

Comments
 (0)