1
1
# qbasepileup
2
+ ``` diff
3
+ - This tool is dated due to some input file format (dccq, dcc1) is no longer available!
4
+ ```
5
+
2
6
3
7
## Introduction
4
8
@@ -8,39 +12,28 @@ metrics on the reads at the positions of interest.
8
12
9
13
## Installation
10
14
11
- qbasepileup requires java 7 and (ideally) a multi-core machine (5 threads
15
+ qbasepileup requires java 8 and (ideally) a multi-core machine (5 threads
12
16
are run concurrently) with at least 20GB of RAM.
13
- Download the qbasepileup tar file
14
- Untar the tar file into a directory of your choice
15
- You should see jar files for qbasepileup and its dependencies:
16
17
17
- ~~~~ {.text}
18
- $ tar xjvf qbasepileup.tar.bz2
19
- x antlr-3.2.jar
20
- x ini4j-0.5.2-SNAPSHOT.jar
21
- x jopt-simple-3.2.jar
22
- x picard-1.110.jar
23
- x qbamfilter-1.0pre.jar
24
- x qcommon-0.1pre.jar
25
- x qio-0.1pre.jar
26
- x qpicard-0.1pre.jar
27
- x qpileup-0.1pre.jar
28
- x sam-1.110.jar
29
- x jhdfobj.jar
30
- x jhdf5obj.jar
31
- x jhdf5.jar
32
- x jhdf.jar
33
- ~~~~
18
+ * ** To do a build of qbasepileup, first clone the adamajava repository using "git clone":**
19
+ ```
20
+ git clone https://github.com/AdamaJava/adamajava
21
+ ```
34
22
35
- ## Usage
23
+ Then move into the adamajava folder:
24
+ ```
25
+ cd adamajava
26
+ ```
27
+ Run gradle to build qbasepileup and its dependent jar files:
28
+ ```
29
+ ./gradlew :qbasepileup:build
30
+ ```
31
+ This creates the qbasepileup jar file along with dependent jars in the ` qbasepileup/build/flat ` folder
36
32
37
- A general invocation of qbasepileup looks like:
38
33
39
- ~~~~ {.text}
40
- java -jar qbasepileup.jar [OPTIONS]
41
- ~~~~
34
+ ## Usage
42
35
43
- A typical invocation might look like:
36
+ A general invocation of qbasepileup looks like:
44
37
45
38
~~~~ {.text}
46
39
java -jar qbasepileup.jar -m mode \
@@ -50,57 +43,56 @@ java -jar qbasepileup.jar -m mode \
50
43
## Options
51
44
52
45
~~~~ {.text}
53
- --help, -h Show help message.
54
- --version Print version.
55
- -b Path to tab delimited file with list of bams.
56
- --bq Minimum base quality score for accepting a read.
57
- --dup Include duplicates
58
- -f Format of SNPs file [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA], Def=dcc1
59
- --filter Query string for qbamfilter
60
- --gatk Adjust insertion position to conform to GATK format
61
- --hdf HDF file to read list of bam files from
62
- --hp window around indel to check for homopolymers [default:10]
63
- -i Path to bam file.
64
- --ig Path to somatic indel file
65
- --in Path to normal bam file
66
- --ind Include reads with indels. (y or n, default y).
67
- --intron Include reads mapping across introns. (y or n, default y).
68
- --is Path to somatic indel file
69
- --it Path to tumour bam file
70
- --log Req, Log file.
71
- --loglevel Logging level required, e.g. INFO, DEBUG. Default INFO.
72
- -m snp,indel,coverage or compoundsnp
73
- --maxcov Report reads that are less than the mininmum coverage option. Integer
74
- --mincov Report reads that are less than the
75
- --mq Minimum mapping quality score for accepting a read.
76
- -n Bases around indel to check for other indels., Def=3.
77
- --novelstarts Report novelstarts rather than read count [Y,N], Def=Y.
78
- -o Output file.
79
- --of Output file format [rows,columns].
80
- --og Output file for germline indels.
81
- --os Output file for somatic indels.
82
- -p Pileup profile type [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA] Def=dcc1.
83
- --pd <pindel_deletions> Path to normal bam file
84
- --pindel adjust insertion position to conform to pindel format
85
- -r Path to reference genome fasta file.
86
- -s Path to tab delimited file containing snps. Formats: dcc1,dccq,vcf,maf,txt
87
- --sc <soft_clip_window> number of bases around indel to check for softclipped bases [default:13]
88
- --strand Separate coverage by strand. (y or n, default y)
89
- --strelka adjust insertion position to conform to strelka format (same as pindel)
90
- -t Thread number. Total = number supplied + 2.
46
+ Option Description
47
+ ------ -----------
48
+ -V, -v, --version Print version info.
49
+ -b <txt file> Opt (coverage and snp mode), path to tab delimited file with list of bams.
50
+ --bq <Integer> Opt (snp related mode), minimum base quality score for accepting a read. Def = null.
51
+ --dup Opt (indel and snp mode), a flag to include duplicates reads.
52
+ -f [format] Opt(snp mode) snp file format: [dcc1, dccq, vcf, tab, maf].Def=dcc1. or
53
+ Opt(coverage mode), snp file format: [dcc1, dccq, vcf, tab, maf, gff3, gtf]. Def=dcc1.
54
+ --filter <query> Opt, a qbamfilter query to filter out BAM records. Def=null.
55
+ --gatk Opt (indel mode), a flag to conform gatk format, but do nothing.
56
+ -h, --help Shows this help message.
57
+ --hdf <hdf file> Opt (snp mode), path to hdf file which header contains a list of bams.
58
+ --hp <Integer> Opt (indel mode), base around indel to check for homopolymers. Def=10.
59
+ -i <bam file> Opt (coverage and snp mode), specify a single SAM/BAM file here.
60
+ --ig <dcc1> Req (indel mode), path to germline indel file in dcc1 format.
61
+ --in <bam file> Req (indel mode), path to normal bam file
62
+ --ind <y|n> Opt (snp related mode), include reads with indels [y,n]. Def=y.
63
+ --intron <y|n> Opt (snp related mode), include reads mapping across introns [y,n]. Def=y.
64
+ --is <dcc1> Req (indel mode), path to somatic indel file in dcc1 format.
65
+ --it <bam file> Req (indel mode), path to tumour bam file
66
+ --log Req, log file.
67
+ --loglevel Opt, logging level required, e.g. INFO, DEBUG. Default INFO.
68
+ -m <mode> Opt, Mode [snp, compoundsnp, snpcheck, indel, coverage]. Def=snp.
69
+ --mincov <Integer> Opt, report reads that are less than the mininmum coverage option.
70
+ --mq <Integer> Opt (snp related mode), minimum mapping quality score for accepting a read. Def = null.
71
+ -n <Integer> Opt (indel mode), bases around indel to check for other indels. Def=3.
72
+ --novelstarts <y|n> Opt (snp related mode), report novelstarts rather than read count [Y,N], Def=y.
73
+ -o <txt file> Req (coverage and snp related mode), the output file path.
74
+ --of <format> Opt (snp mode only), output file format [columns]. this option only works with input snp
75
+ file format is tab, otherwise it will ignored.
76
+ --og <dcc1> Req (indel mode), output file for germline indels.
77
+ --os <dcc1> Req (indel mode), output file for somatic indels.
78
+ -p [profile] Opt (snp mode), pileup profile type [torrent,RNA,DNA, standard]. Def="standard".
79
+ --pd <pindel_deletions> Req (indel mode), path to normal bam file
80
+ --pindel Opt (indel mode), a flag to conform pindel format, but do nothing.
81
+ -r <fasta file> Req (indel and snp mode), path to reference genome fasta file.
82
+ -s <txt file> Req (coverage and snp mode), path to tab delimited file containing snps.
83
+ --sc <Integer> Opt (indel mode), bases around indel to check for softclip. Def=13.
84
+ --strand <y|n> Opt (snp related mode), separate coverage by strand [y,n]. Def=y.
85
+ --strelka Opt (indel mode), a flag to conform strelka format, but do nothing.
86
+ -t [Integer] Opt, number of worker threads (yields n+2 total threads). Def=1.
91
87
~~~~
92
88
89
+ ## Modes
90
+ ### coverage
93
91
94
- ### ` -n `
95
-
96
- This integer is the number of bases around and indel to check for nearby
97
- indel. Default=3.
98
-
99
- ### ` --pd `
92
+ Reads one or more BAM files, and a file containing reference ranges and
93
+ piles up the reads around the indel to count the number of reads covering
94
+ each position in the range.
100
95
101
- <pindel_deletions> Path to normal bam file
102
-
103
- ## Modes
104
96
105
97
### [ snp] ( qbasepileup_snp_mode )
106
98
@@ -111,24 +103,31 @@ region. Coverage per nucleotide is reported and the total coverage at that
111
103
position is reported. By default, duplicates and unmapped reads are excluded.
112
104
113
105
### [ compoundsnp] ( qbasepileup_compound_snp_mode )
106
+ ``` diff
107
+ - This mode is deprecated due to the input snp position file format (dcc1) is no longer available!
108
+ ```
114
109
115
- Reads one or more BAM files, a reference genome, and a file containing
116
- positions of compound SNPs (SNPs that sit next to each other). It finds the
117
- reference genome base at the compound SNP positions as well as the bases
118
- found at that position in all reads aligned to that region. Coverage per
119
- nucleotide is reported and the total coverage at that position is reported.
120
110
121
- By default, the ` --filter ` qbamfilter query string is:
111
+ In this mode, qbasepileup reads one or more BAM files, a reference genome, and a file containing positions of compound SNPs (SNPs that sit next to each other). It finds the reference genome base at the compound SNP positions as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported. By default, the filter is:
122
112
123
113
~~~~ {.text}
124
114
and( Flag_DuplicateRead==false, CIGAR_M>34, MD_mismatch <= 3, option_SM > 10)
125
115
~~~~
126
116
127
117
For a more detailed description of qbamfilter and how it works to filter
128
- reads in and out of a particular analysis, see
129
- [ qbamfilter] ( ../qbamfilter/ ) .
118
+ reads in and out of a particular analysis, see[ qbamfilter] ( ../qbamfilter/ ) .
119
+
120
+
121
+ This mode is for compound SNPs, is very similar to [ snp mode] ( qbasepileup_snp_mode.md ) except:
122
+
123
+ * Only dcc1 format (` -f ` ) is currently accepted
124
+ * Default filter is: ` and(Flag_DuplicateRead==false, CIGAR_M>34, MD_mismatch<=3, option_SM>10) `
125
+
130
126
131
127
### [ indel] ( qbasepileup_indel_mode )
128
+ ``` diff
129
+ - This mode is deprecated due to the indle file format (dcc1) is no longer available!
130
+ ```
132
131
133
132
Reads tumour and normal BAM files, a reference genome, and somatic and/or
134
133
germline files containing positions of indels and pileups the reads around
@@ -141,9 +140,27 @@ the indel to count a number of metrics. Metrics include:
141
140
* number of reads with nearby soft clipping
142
141
* number of reads with nearby indels
143
142
144
- ### coverage
143
+ #### Examples
145
144
146
- Reads one or more BAM files, and a file containing reference ranges and
147
- piles up the reads around the indel to count the number of reads covering
148
- each position in the range.
145
+ * Somatic and Germline input files in pindel format
146
+ ~~~~ {.text}
147
+ qbasepileup -t 2 -m indel -r reference.fa --pindel \
148
+ --it tumour.bam --in normal.bam \
149
+ --is somatic.input.dcc1 --ig germline.input.dcc1 \
150
+ --os somatic.output.dcc1 --og germline.output.dcc1 \
151
+ --log basepileup.log
152
+ ~~~~
153
+
154
+ * Somatic input file in gatk format
155
+ ~~~~ {.text}
156
+ qbasepileup -t 2 -m indel --it tumour.bam --in normal.bam \
157
+ --is somatic.input.dcc1 --os somatic.output.dcc1
158
+ --log basepileup.log -r reference.fa --gatk
159
+ ~~~~
149
160
161
+ * Germline input file in pindel format
162
+ ~~~~ {.text}
163
+ qbasepileup -t 2 -m indel --it tumour.bam --in normal.bam \
164
+ --ig germline.input.dcc1 --og germline.output.dcc1
165
+ --log basepileup.log -r reference.fa --pindel
166
+ ~~~~
0 commit comments