-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdescription_of_tools.html
406 lines (348 loc) · 16.8 KB
/
description_of_tools.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta charset="utf-8" />
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="pandoc" />
<meta name="author" content="Stephen Kelly" />
<meta name="date" content="2019-11-03" />
<title>Detailed Description of the Bioinformatic Tools</title>
<style type="text/css">code{white-space: pre;}</style>
<style type="text/css">
pre:not([class]) {
background-color: white;
}
</style>
<script type="text/javascript">
if (window.hljs) {
hljs.configure({languages: []});
hljs.initHighlightingOnLoad();
if (document.readyState && document.readyState === "complete") {
window.setTimeout(function() { hljs.initHighlighting(); }, 0);
}
}
</script>
<style type="text/css">
h1 {
font-size: 34px;
}
h1.title {
font-size: 38px;
}
h2 {
font-size: 30px;
}
h3 {
font-size: 24px;
}
h4 {
font-size: 18px;
}
h5 {
font-size: 16px;
}
h6 {
font-size: 12px;
}
.table th:not([align]) {
text-align: left;
}
</style>
<style type="text/css">body {
font-family: sans-serif;
}
pre code {
background-color: #eee;
border: 1px solid #999;
display: block;
padding: 20px;
overflow: scroll;
}
.section-rule {
color: lightgray;
}
</style>
<style type="text/css">
.main-container {
max-width: 940px;
margin-left: auto;
margin-right: auto;
}
code {
color: inherit;
background-color: rgba(0, 0, 0, 0.04);
}
img {
max-width:100%;
height: auto;
}
.tabbed-pane {
padding-top: 12px;
}
.html-widget {
margin-bottom: 20px;
}
button.code-folding-btn:focus {
outline: none;
}
summary {
display: list-item;
}
</style>
<!-- tabsets -->
<style type="text/css">
.tabset-dropdown > .nav-tabs {
display: inline-table;
max-height: 500px;
min-height: 44px;
overflow-y: auto;
background: white;
border: 1px solid #ddd;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs > li.active:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li.active:before {
content: "";
border: none;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open:before {
content: "";
font-family: 'Glyphicons Halflings';
display: inline-block;
padding: 10px;
border-right: 1px solid #ddd;
}
.tabset-dropdown > .nav-tabs > li.active {
display: block;
}
.tabset-dropdown > .nav-tabs > li > a,
.tabset-dropdown > .nav-tabs > li > a:focus,
.tabset-dropdown > .nav-tabs > li > a:hover {
border: none;
display: inline-block;
border-radius: 4px;
}
.tabset-dropdown > .nav-tabs.nav-tabs-open > li {
display: block;
float: none;
}
.tabset-dropdown > .nav-tabs > li {
display: none;
}
</style>
<script>
$(document).ready(function () {
window.buildTabsets("TOC");
});
$(document).ready(function () {
$('.tabset-dropdown > .nav-tabs > li').click(function () {
$(this).parent().toggleClass('nav-tabs-open')
});
});
</script>
<!-- code folding -->
</head>
<body>
<div class="container-fluid main-container">
<div class="fluid-row" id="header">
<h1 class="title toc-ignore">Detailed Description of the Bioinformatic Tools</h1>
<h4 class="author">Stephen Kelly</h4>
<h4 class="date">November 3, 2019</h4>
</div>
<div id="TOC">
<ul>
<li><a href="#trimmomatic-version-0.36"><span class="toc-section-number">1</span> Trimmomatic Version 0.36</a><ul>
<li><a href="#settings-used"><span class="toc-section-number">1.1</span> Settings Used</a></li>
</ul></li>
<li><a href="#bwa-version-0.7.17"><span class="toc-section-number">2</span> BWA Version: 0.7.17</a><ul>
<li><a href="#settings-used-1"><span class="toc-section-number">2.1</span> Settings Used</a></li>
</ul></li>
<li><a href="#sambamba-v0.6.8"><span class="toc-section-number">3</span> Sambamba v0.6.8</a><ul>
<li><a href="#settings-used-2"><span class="toc-section-number">3.1</span> Settings Used</a></li>
</ul></li>
<li><a href="#gatk-version-3.8"><span class="toc-section-number">4</span> GATK version 3.8</a><ul>
<li><a href="#settings-used-3"><span class="toc-section-number">4.1</span> Settings Used</a><ul>
<li><a href="#realignertargetcreator"><span class="toc-section-number">4.1.1</span> RealignerTargetCreator</a></li>
<li><a href="#indelrealigner"><span class="toc-section-number">4.1.2</span> IndelRealigner</a></li>
<li><a href="#baserecalibrator"><span class="toc-section-number">4.1.3</span> BaseRecalibrator</a></li>
<li><a href="#depthofcoverage"><span class="toc-section-number">4.1.4</span> DepthOfCoverage</a></li>
<li><a href="#mutect2"><span class="toc-section-number">4.1.5</span> MuTect2</a></li>
</ul></li>
</ul></li>
<li><a href="#lofreq-version-2.1.2"><span class="toc-section-number">5</span> LoFreq version 2.1.2</a><ul>
<li><a href="#settings-used-4"><span class="toc-section-number">5.1</span> Settings Used</a></li>
</ul></li>
<li><a href="#bcftools-version-1.3"><span class="toc-section-number">6</span> bcftools version 1.3</a><ul>
<li><a href="#settings-used-5"><span class="toc-section-number">6.1</span> Settings Used</a></li>
</ul></li>
<li><a href="#annovar-revision-150617"><span class="toc-section-number">7</span> ANNOVAR revision 150617</a><ul>
<li><a href="#settings-used-6"><span class="toc-section-number">7.1</span> Settings Used</a></li>
</ul></li>
</ul>
</div>
<p>Various software packages are used during the targeted exome sequencing and variant calling analysis used by the 580 gene panel. The bulk of these tools are scripted in the <a href="https://github.com/NYU-Molecular-Pathology/NGS580-nf"><code>NGS580-nf</code></a> pipeline (<a href="https://github.com/NYU-Molecular-Pathology/NGS580-nf" class="uri">https://github.com/NYU-Molecular-Pathology/NGS580-nf</a>).</p>
<section id="trimmomatic-version-0.36" class="level1">
<h1><span class="header-section-number">1</span> Trimmomatic Version 0.36</h1>
<p><a href="https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/" class="uri">https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4103590/</a></p>
<p><a href="http://www.usadellab.org/cms/?page=trimmomatic" class="uri">http://www.usadellab.org/cms/?page=trimmomatic</a></p>
<p>A flexible read trimming tool for Illumina NGS data. Trimmomatic performs a variety of useful trimming tasks for illumina paired-end and single ended data.</p>
<section id="settings-used" class="level2">
<h2><span class="header-section-number">1.1</span> Settings Used</h2>
<ul>
<li><p><code>PE</code>: Paired End</p></li>
<li><p><code>ILLUMINACLIP:/ref/contaminants/trimmomatic.fa:2:30:10:1:true</code>: Cut adapter and other illumina-specific sequences from the read.</p></li>
<li><p><code>TRAILING:5</code>: TRAILING: Cut bases off the end of a read, if below a threshold quality</p></li>
<li><p><code>SLIDINGWINDOW:4:15</code>: Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15</p></li>
<li><p><code>MINLEN:35</code>: Drop the read if it is below a specified length</p></li>
</ul>
</section>
</section>
<section id="bwa-version-0.7.17" class="level1">
<h1><span class="header-section-number">2</span> BWA Version: 0.7.17</h1>
<p><a href="https://github.com/lh3/bwa" class="uri">https://github.com/lh3/bwa</a></p>
<p><a href="http://bio-bwa.sourceforge.net/bwa.shtml" class="uri">http://bio-bwa.sourceforge.net/bwa.shtml</a></p>
<p>BWA is a software package for mapping DNA sequences against a large reference genome, such as the human genome.</p>
<section id="settings-used-1" class="level2">
<h2><span class="header-section-number">2.1</span> Settings Used</h2>
<ul>
<li><p><code>bwa mem</code>: Align 70bp-1Mbp query sequences with the BWA-MEM algorithm.</p></li>
<li><p><code>-M</code>: Mark shorter split hits as secondary</p></li>
<li><p><code>-v 1</code>: Control the verbose level of the output. 1 for outputting errors only</p></li>
</ul>
</section>
</section>
<section id="sambamba-v0.6.8" class="level1">
<h1><span class="header-section-number">3</span> Sambamba v0.6.8</h1>
<p><a href="https://github.com/lomereiter/sambamba" class="uri">https://github.com/lomereiter/sambamba</a></p>
<p><a href="http://lomereiter.github.io/sambamba/" class="uri">http://lomereiter.github.io/sambamba/</a></p>
<p><a href="http://lomereiter.github.io/sambamba/docs/sambamba-view.html" class="uri">http://lomereiter.github.io/sambamba/docs/sambamba-view.html</a></p>
<p>Tools for working with SAM/BAM data.</p>
<p>Sambamba is a high performance modern robust and fast tool (and library), written in the D programming language, for working with SAM and BAM files. Current functionality is an important subset of samtools functionality, including view, index, sort, markdup, and depth.</p>
<section id="settings-used-2" class="level2">
<h2><span class="header-section-number">3.1</span> Settings Used</h2>
<ul>
<li><code>--filter='mapping_quality>=10'</code>: Set custom filter for alignments.</li>
</ul>
</section>
</section>
<section id="gatk-version-3.8" class="level1">
<h1><span class="header-section-number">4</span> GATK version 3.8</h1>
<p><a href="https://software.broadinstitute.org/gatk/" class="uri">https://software.broadinstitute.org/gatk/</a></p>
<p><a href="https://software.broadinstitute.org/gatk/documentation/" class="uri">https://software.broadinstitute.org/gatk/documentation/</a></p>
<p><a href="https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/" class="uri">https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/</a></p>
<p>Genome Analysis Toolkit for variant discovery in high-throughput sequencing data. Developed in the Data Sciences Platform at the Broad Institute, the toolkit offers a wide variety of tools with a primary focus on variant discovery and genotyping.</p>
<section id="settings-used-3" class="level2">
<h2><span class="header-section-number">4.1</span> Settings Used</h2>
<section id="realignertargetcreator" class="level3">
<h3><span class="header-section-number">4.1.1</span> RealignerTargetCreator</h3>
<ul>
<li><p><code>-dt NONE</code>: Type of read downsampling to employ at a given locus</p></li>
<li><p><code>--interval_padding 10</code>: Amount of padding (in bp) to add to each interval</p></li>
</ul>
</section>
<section id="indelrealigner" class="level3">
<h3><span class="header-section-number">4.1.2</span> IndelRealigner</h3>
<ul>
<li><p><code>-dt NONE</code>: Type of read downsampling to employ at a given locus</p></li>
<li><p><code>--maxReadsForRealignment 50000</code>: Max reads allowed at an interval for realignment</p></li>
</ul>
</section>
<section id="baserecalibrator" class="level3">
<h3><span class="header-section-number">4.1.3</span> BaseRecalibrator</h3>
<ul>
<li><p><code>-rf BadCigar</code>: Filters to apply to reads before analysis</p></li>
<li><p><code>--interval_padding 10</code>: Amount of padding (in bp) to add to each interval</p></li>
<li><p><code>-BQSR</code>: Base quality score recalibration</p></li>
</ul>
</section>
<section id="depthofcoverage" class="level3">
<h3><span class="header-section-number">4.1.4</span> DepthOfCoverage</h3>
<ul>
<li><p><code>-dt NONE</code>: Type of read downsampling to employ at a given locus</p></li>
<li><p><code>-rf BadCigar</code>: Filters to apply to reads before analysis</p></li>
<li><p><code>--omitIntervalStatistics</code>: Do not calculate per-interval statistics</p></li>
<li><p><code>--omitLocusTable</code>: Do not calculate per-sample per-depth counts of loci</p></li>
<li><p><code>--omitDepthOutputAtEachBase</code>: Do not output depth of coverage at each base</p></li>
<li><p><code>-ct [10, 50, 100, 200, 300, 400, 500]</code>: Coverage threshold (in percent) for summarizing statistics</p></li>
<li><p><code>-mbq 20</code>: Minimum quality of bases to count towards depth</p></li>
<li><p><code>-mmq 20</code>: Minimum mapping quality of reads to count towards depth</p></li>
</ul>
</section>
<section id="mutect2" class="level3">
<h3><span class="header-section-number">4.1.5</span> MuTect2</h3>
<ul>
<li><p><code>-dt NONE</code>: Type of read downsampling to employ at a given locus</p></li>
<li><p><code>--standard_min_confidence_threshold_for_calling 30</code>: The minimum phred-scaled confidence threshold at which variants should be called</p></li>
<li><p><code>--max_alt_alleles_in_normal_count 10</code>: Threshold for maximum alternate allele counts in normal</p></li>
<li><p><code>--max_alt_allele_in_normal_fraction 0.05</code>: Threshold for maximum alternate allele fraction in normal</p></li>
<li><p><code>--max_alt_alleles_in_normal_qscore_sum 40</code>: Threshold for maximum alternate allele quality score sum in normal</p></li>
<li><p><code>--interval_padding 10</code>: Amount of padding (in bp) to add to each interval</p></li>
</ul>
</section>
</section>
</section>
<section id="lofreq-version-2.1.2" class="level1">
<h1><span class="header-section-number">5</span> LoFreq version 2.1.2</h1>
<p><a href="http://csb5.github.io/lofreq/" class="uri">http://csb5.github.io/lofreq/</a></p>
<p><a href="https://github.com/CSB5/lofreq" class="uri">https://github.com/CSB5/lofreq</a></p>
<p><a href="https://www.ncbi.nlm.nih.gov/pubmed/23066108" class="uri">https://www.ncbi.nlm.nih.gov/pubmed/23066108</a></p>
<p>LoFreq is a fast and sensitive variant-caller for inferring SNVs and indels from next-generation sequencing data. It makes full use of base-call qualities and other sources of errors inherent in sequencing (e.g. mapping or base/indel alignment uncertainty), which are usually ignored by other methods or only used for filtering.</p>
<section id="settings-used-4" class="level2">
<h2><span class="header-section-number">5.1</span> Settings Used</h2>
<ul>
<li><code>--call-indels</code>: include indels in variant call output</li>
</ul>
</section>
</section>
<section id="bcftools-version-1.3" class="level1">
<h1><span class="header-section-number">6</span> bcftools version 1.3</h1>
<p><a href="https://samtools.github.io/bcftools/" class="uri">https://samtools.github.io/bcftools/</a></p>
<p><a href="https://samtools.github.io/bcftools/bcftools.html" class="uri">https://samtools.github.io/bcftools/bcftools.html</a></p>
<p><a href="http://www.htslib.org/download/" class="uri">http://www.htslib.org/download/</a></p>
<p>BCFtools is a set of utilities that manipulate variant calls in the Variant Call Format (VCF) and its binary counterpart BCF. All commands work transparently with both VCFs and BCFs, both uncompressed and BGZF-compressed.</p>
<section id="settings-used-5" class="level2">
<h2><span class="header-section-number">6.1</span> Settings Used</h2>
<ul>
<li><p><code>index</code>: index VCF/BCF</p></li>
<li><p><code>norm</code>: normalize indels</p></li>
<li><p><code>view</code>: subset, filter and convert VCF and BCF files</p></li>
<li><p><code>--multiallelics</code>: split multiallelic sites into biallelic records (-) or join biallelic sites into multiallelic records (+)</p></li>
<li><p><code>-both</code>: abbreviation of “-c indels -c snps”</p></li>
<li><p><code>-c snps</code>: any SNP records are compatible, regardless of whether the ALT alleles match or not. For duplicate positions, only the first SNP record will be considered and appear on output.</p></li>
<li><p><code>-c indels</code>: all indel records are compatible, regardless of whether the REF and ALT alleles match or not. For duplicate positions, only the first indel record will be considered and appear on output.</p></li>
<li><p><code>--output-type v</code>: Output compressed BCF (b), uncompressed BCF (u), compressed VCF (z), uncompressed VCF (v). Use the -Ou option when piping between bcftools subcommands to speed up performance by removing unnecessary compression/decompression and VCF←→BCF conversion.</p></li>
<li><p><code>--fasta-ref</code>: reference sequence in fasta format</p></li>
</ul>
</section>
</section>
<section id="annovar-revision-150617" class="level1">
<h1><span class="header-section-number">7</span> ANNOVAR revision 150617</h1>
<p><a href="http://annovar.openbioinformatics.org/en/latest/" class="uri">http://annovar.openbioinformatics.org/en/latest/</a></p>
<p>ANNOVAR is an efficient software tool to utilize update-to-date information to functionally annotate genetic variants detected from diverse genomes (including human genome hg18, hg19, hg38, as well as mouse, worm, fly, yeast and many others)</p>
<section id="settings-used-6" class="level2">
<h2><span class="header-section-number">7.1</span> Settings Used</h2>
<ul>
<li><p><code>--buildver hg19</code>: genome build version</p></li>
<li><p><code>--protocol refGene,clinvar_20170905,cosmic70,1000g2015aug_all,avsnp150,exac03,snp138</code>: comma-delimited string specifying database protocol</p></li>
<li><p><code>--operation g,f,f,f,f,f,f</code>: comma-delimited string specifying type of operation</p></li>
</ul>
</section>
</section>
</div>
<script>
// add bootstrap table styles to pandoc tables
function bootstrapStylePandocTables() {
$('tr.header').parent('thead').parent('table').addClass('table table-condensed');
}
$(document).ready(function () {
bootstrapStylePandocTables();
});
</script>
</body>
</html>