-
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
734 lines (689 loc) · 26.1 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
<html lang="en">
<head>
<meta charset="utf-8" />
<meta
name="viewport"
content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=no"
/>
<title>UBC iGEM: Software Slides</title>
<link rel="stylesheet" href="dist/reset.css" />
<link rel="stylesheet" href="dist/reveal.css" />
<link rel="stylesheet" href="dist/theme/igem.css" />
<link rel="stylesheet" href="plugin/highlight/monokai.css" />
</head>
<body>
<div class="reveal">
<div class="slides">
<section
data-auto-animate
class="left-align"
data-background="public/backgrounds/bits_background.svg"
>
<div class="row">
<div class="col">
<h2>Enhancing DNA Storage with Synthetic Biology: Software</h2>
<h3 class="highlight"><b>UBC iGEM</b></h3>
</div>
</div>
</section>
<section
data-auto-animate
class="left-align"
data-background="public/backgrounds/info_background.svg"
>
<h2>Project Description</h2>
<p class="r-fit-text">
Our project aims to tackle the growing need for a better, more
energy-efficient data storage medium compared to current magnetic
and optical data storage options by means of synthetic biology.
Currently, we aim to achieve this through 2 separate tracks:
</p>
<p class="r-fit-text fragment">
Developing an <b>enzymatic </b>DNA synthesis platform that can
elongate a <b>single-stranded </b> DNA (ssDNA) in a
<b>template-independent </b> manner. The synthesized ssDNA strand
will then be converted to a more stable, double-stranded DNA (dsDNA)
and inserted into a plasmid for long-term data storage.
</p>
<p class="r-fit-text fragment">
Developing a <b>data encoding/decoding pipeline</b> that allows
binary files (used by computers) to be stored in a ternary format
compatible with our DNA synthesis platform, retrieved, and converted
back into binary.
</p>
</section>
<section
data-auto-animate
class="left-align"
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Goals</h2>
<div class="r-fit-text fragment">
<h3>In Silico:</h3>
Demonstrate ability to encode and decode information someone may
store in long-term storage, in the 1000s of nucleotides long.
</div>
<div class="r-fit-text fragment">
<h3>With wet lab:</h3>
Demonstrate ability to encode and decode a 100 nucleotide sequence
with 30% error.
</div>
</section>
<section
data-auto-animate
class="left-align"
data-background="public/backgrounds/info_background.svg"
>
<h2>Plan</h2>
<div class="fragment r-fit-text">
<h3>DBTL 1: March to April</h3>
<p>
Implement a barebones pipeline, and see how much error can be
tolerated in 100 nucleotide long DNA sequences with in silico
testing.
</p>
</div>
<div class="fragment r-fit-text">
<h3>DBTL 2: April to May</h3>
<p>
Redefine algorithms to tolerate up to 30% error in 100 nucleotide
long DNA sequences, with in silico testing.
</p>
</div>
</section>
<section
data-auto-animate
class="left-align"
data-background="public/backgrounds/molecules_background.svg"
>
<h2>DBTL 1: Proof of Concept</h2>
<ul class="fragment">
<li>Encoding</li>
<li>Error Correction</li>
<li>Decoding</li>
<li>ChaosDNA</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/crab_background.svg"
>
<h2>What programming language are we using?</h2>
<h3 class="fragment">Rust</h3>
<ul>
<li class="fragment">
systems programing language that is fast, memory efficient and
memory-safe
</li>
<li class="fragment">
expressive type system, great documentation, robust tooling
</li>
<li class="fragment">most admired language among developers</li>
</ul>
<cite
><a
href="https://github.blog/2023-08-30-why-rust-is-the-most-admired-language-among-developers/"
>https://github.blog/2023-08-30-why-rust-is-the-most-admired-language-among-developers/</a
></cite
>
</section>
<section
data-auto-animate
data-background="public/backgrounds/encoding.svg"
>
<h2>Encoding</h2>
<ul>
<li>Primer Generation</li>
<li>Sequence Generation</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/encoding.svg"
>
<h2>Primer Generation</h2>
</section>
<section
data-auto-animate
data-background="public/backgrounds/default_background.svg"
>
<h2>Why generate our own primers?</h2>
<ul>
<li>act as unique identifiers for information</li>
<li>required for PCR amplification</li>
<li>specify requirements for TdT enzyme</li>
</ul>
<p class="fragment">How? Using a genetic algorithm.</p>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Why use a genetic algorithm?</h2>
<ul>
<li>type of optimization algorithm</li>
<li>
uses a set of constraints to produce heuristics to determine best
parent candidates, which go on to produce children population
</li>
</ul>
<p class="fragment">Primer design fits this description!</p>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Schematic</h2>
<img class="stretch" src="public/images/schematic-primer-gen.png" />
</section>
<section
data-auto-animate
data-background="public/backgrounds/default_background.svg"
>
<h2>Example: two chosen parents</h2>
<img src="public/images/primer_gen.png" />
</section>
<section
data-auto-animate
data-background="public/backgrounds/default_background.svg"
>
<h2>Sequence Generation</h2>
</section>
<section
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Binary to Nucleotides</h2>
<p>Many encoding formats exist:</p>
<ul>
<li class="fragment">
Base 4 Encoding: 0 → A, 1 → T, 2 → G, 3 → C
</li>
<li class="fragment">Church Encoding: 0 → A or C, 1 → T or G</li>
<li class="fragment">
Base 2 Encoding: 00 → A, 11 → T, 01 → G, 10 → C
</li>
<li class="fragment">
HEDGES (key-autokey cipher) ECC: (hash(input) + bit)mod4 → Base 4
Encoding
</li>
</ul>
<p class="fragment">
These encoding methods will ultimately be tested in silico...
</p>
</section>
<section
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Binary to Nucleotides</h2>
<p>Some limitations...</p>
<ul>
<li class="fragment">short strands (100 nt)</li>
<li class="fragment">high rate of deletion errors (30%)</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Short strands?</h2>
<p>Blocking data</p>
<img class="stretch" src="public/images/block.png" />
<ul class="r-fit-text fragment">
<li class="fragment">
requires we generate unique primers per strand
</li>
<li class="fragment">
allows for parallel synthesis of DNA strands
</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Short strands?</h2>
<p class="r-fit-text fragment">
To encode the UBC iGEM sponsorship package...
</p>
<div class="row fragment">
<div style="text-wrap: wrap" class="left-align">
<ul>
<li>
1.8 MB → 1.8 10<sup>6</sup> bytes → 1.44 * 10<sup>7</sup> bits
→ 180000 strands of DNA
</li>
<li>
assuming each DNA strand's information portion is 80 unique
bases and we are using HEDGES Encoding
</li>
<li class="fragment">we will require compression...</li>
</ul>
</div>
<img class="side-img stretch" src="public/images/example.png" />
</div>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Short strands?</h2>
<img class="stretch" src="public/images/ts_zip-compression.png" />
</section>
<!-- <section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Semi-specific synthesis?</h2>
<p>Rotation based cipher</p>
<img class="stretch" src="public/images/rbc.png" />
</section>
-->
<!-- <section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2 class="r-fit-text">Rotation based cipher: One char</h2>
<img class="stretch" src="public/images/one_char.png" />
</section>
-->
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>High rate of deletion errors?</h2>
<p class="r-fit-text left-align">
For the first iteration, we want to see what percentage of deletion
errors we can correct for with minimal error correction. Some ways
of reducing rate of deletion errors or preventing deletion errors
include that we will explore in further DBTLs are:
</p>
<ul class="r-fit-text">
<li class="fragment">
<b
>synthesizing shorter sequences (more blocks of shorter
length)</b
>
</li>
<li class="fragment">more complex encoding strategies</li>
<li class="fragment">
<b>
more complex error correction methods (inner and outer codes)
</b>
</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>High rate of deletion errors?</h2>
<p>HEDGES error-correcting codes: Why?</p>
<ul>
<li>
corrects for mutations and deletions, not prevalent type of error
in traditional media storage
</li>
<li>form of inner code: bits that encode for information</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>HEDGES error-correcting codes</h2>
<img class="stretch" src="public/images/hedges-encode.jpg" />
<p>Redundacy is enforced in the encoding strategy.</p>
<cite class="citation">
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
<em>Proceedings of the National Academy of Sciences</em>,
<em>117</em>(31), 18489–18496.
<a href="https://doi.org/10.1073/pnas.2004821117"
>https://doi.org/10.1073/pnas.2004821117</a
>
</cite>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>What does this mean?</h2>
<p>
HEDGES uses a hash function to encode redundancy and generate the
base to be synthesized.
</p>
<p class="fragment"><b>What's a hash function?</b></p>
<ul>
<li class="fragment">
maps data of arbitrary size to a fixed-size value
</li>
<li class="fragment">fast to compute hash values</li>
<li class="fragment">
very low chance of returning the same hash value for two different
hash inputs
</li>
</ul>
<p>
We will also use a established checksum algorithm to generate a
checksum to signal if error correction is needed.
</p>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>HEDGES error-correcting codes: Rationale</h2>
<p>
"Hashing each bit value withits strand ID, bit index, and a few
previous bits “poisons” bad decoding hypotheses, allowing for
correction of indels."
</p>
<p>
"In summary, the algorithm encodes information as a stream of
nucleotides such that any single decoding error in either nucleotide
identity or nucleotide position will “poison” the downstream
predictions. Thus, on decoding, there will be onlyone good-scoring
chain of guesses—the correct one."
</p>
<cite class="citation">
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
<em>Proceedings of the National Academy of Sciences</em>,
<em>117</em>(31), 18489–18496.
<a href="https://doi.org/10.1073/pnas.2004821117"
>https://doi.org/10.1073/pnas.2004821117</a
>
</cite>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>HEDGES error-correcting codes: Rationale</h2>
<ul>
<li>
designed with orthogonality in mind: can add other error
correction codes without interference, such as RSC
</li>
<li>
variable parameters, so we can tune to our specific DNA synthesis
</li>
<li>MIT license, implementions online in C++ and Python</li>
</ul>
<br />
<br />
<cite class="citation">
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
<em>Proceedings of the National Academy of Sciences</em>,
<em>117</em>(31), 18489–18496.
<a href="https://doi.org/10.1073/pnas.2004821117"
>https://doi.org/10.1073/pnas.2004821117</a
>
</cite>
</section>
<section
data-auto-animate
data-background="public/backgrounds/bits_background.svg"
>
<h2>Decoding</h2>
<ul>
<li>Sequence Alignment</li>
<li>Error Correction</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/bits_background.svg"
>
<h2>Sequence Alignment</h2>
<p>we will most likely use either (or combination of):</p>
<ul>
<li class="fragment">Sanger Sequencing</li>
<li class="fragment"><b>NGS</b></li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>De Novo Assembly</h2>
<p>Why implement de novo assembly?</p>
<ul>
<li>no reference template, so de novo assembly is required</li>
<li>
important proof of concept to demonstrate our software can put
back together a DNA sequence of 1000s of bases long
</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>De Novo Assembly</h2>
<img class="stretch" src="public/images/seq_align.png" />
<p>greedy graph search</p>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Why greedy graph search?</h2>
<ul>
<li class="fragment">the exact solution is NP-hard problem</li>
<li class="fragment">
our sequences will be at most 1000-3000 bases long, so being
greedy will usually yield the exact solution anyways
</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Error Correction</h2>
<p>How does this work?</p>
<ul>
<li>
we are given a DNA sequence, and assuming we have correctly
decoded bits[0...i-1], and we want bits[i]
</li>
<li>
we use the hash function from encoding stage to "guess" what the
correct base should be
</li>
<li>
if we correctly guess the base, we continue searching that branch,
otherwise assign a cumulative penality score or abandon that
branch
</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>HEDGES error-correction code: Reminder</h2>
<img class="stretch" src="public/images/full-hedges.png" />
<br />
<cite class="citation">
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
<em>Proceedings of the National Academy of Sciences</em>,
<em>117</em>(31), 18489–18496.
<a href="https://doi.org/10.1073/pnas.2004821117"
>https://doi.org/10.1073/pnas.2004821117</a
>
</cite>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Error Correction</h2>
<img class="stretch" src="public/images/hedges-decode.jpg" />
<cite class="citation">
Press, W. H., Hawkins, J. A., Jones, S. K., Schaub, J. M., &
Finkelstein, I. J. (2020). HEDGES error-correcting code for DNA
storage corrects indels and allows sequence constraints.
<em>Proceedings of the National Academy of Sciences</em>,
<em>117</em>(31), 18489–18496.
<a href="https://doi.org/10.1073/pnas.2004821117"
>https://doi.org/10.1073/pnas.2004821117</a
>
</cite>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>ChaosDNA</h2>
<ul>
<li>an in-silico tool for generating faulty DNA sequences</li>
</ul>
<p>Why?</p>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>ChaosDNA</h2>
<ul>
<li>an in-silico tool for generating faulty DNA sequences</li>
<li>
will allow us to prove our software tool can deal with files of
more realistic sizes (MB)
</li>
<li>
and enable us to work with DNA sequences of varying levels of
deletion, insertion and mutation errors
</li>
</ul>
<img src="public/images/chaosdna.png" />
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>After DBTL 2?</h2>
<div class="fragment r-fit-text">
<h3>DBTL 3: May to June</h3>
<p>
Implement DNA Storage Alliance specifications, and do in silico
testing on DNA sequences with 1000s of nucleotides.
</p>
</div>
<div class="fragment r-fit-text">
<h3>DBTL 4 and 5: June to July</h3>
<p>
Test our software on sequences synthesized by wet lab, and
redefine algorithms with in silico testing and wet lab data.
</p>
</div>
</section>
<section
data-auto-animate
data-background="public/backgrounds/dna_right_corner_backgroud.svg"
>
<h2>Future Directions</h2>
<ul>
<li>DNA Storage Alliance specifications</li>
<li>Outer Codes: GC</li>
<li>Graphical User Interface</li>
<li>SVGs</li>
</ul>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>DNA Storage Alliance specifications</h2>
<p>
"Unlike traditional storage media such as tape, HDD, and SSD, DNA
does not have a fixed physical structure, a built-in controller, or
a way to address different regions of the media linearly, and thus
needs a mechanism to start reading or “booting up” a DNA archive
that does not rely on such a structure. The SNIA DNA Archive Rosetta
Stone (DARS) working group, one of four working groups in the DNA
Data Storage Alliance aimed at defining standards for DNA data
storage systems, has developed two specifications to enable archive
readers to find the sequence to begin booting up the data."
</p>
<cite
><a
href="https://www.snia.org/news_events/newsroom/dna-data-storage-alliance-releases-its-first-specifications"
>https://www.snia.org/news_events/newsroom/dna-data-storage-alliance-releases-its-first-specifications</a
></cite
>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Outer Codes: Guess & Check+</h2>
<p>Correcting Insertions and Deletions in Short DNA sequences</p>
<ul>
<li>Uses Reed-Solomon outer code to encode redundancies</li>
<li>Within redundant bits, portion of the bits store different possible indel patterns as 'guesses'</li>
<li>Rest of the parity bits encoded are 'checks' i.e. repetitive bits, used to check against guessed indel pattern</li>
</ul>
<img src="public/images/gc_layman.png" />
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>Graphical User Interface</h2>
</section>
<section
data-auto-animate
data-background="public/backgrounds/info_background.svg"
>
<h2>SVGs and QR Codes</h2>
<div class="fragment r-fit-text">
<h3>Scalable Vector Graphics</h3>
<p>
Store shapes as mathematical equations rather than individual pixels.
Further compressible through traditional mechanisms or
aforementioned text compression mechanisms.
</p>
</div>
<div class="fragment r-fit-text">
<h3>QR Codes</h3>
<p>
Designed to store redundant information, enabling extreme
error correction. Can be efficiently stored in many formats,
including SVG or PNG.
</p>
</div>
</section>
<section
data-auto-animate
data-background="public/backgrounds/bits_background.svg"
>
<h1>Thank you!</h1>
<h3>Questions?</h3>
</section>
</div>
</div>
<script src="dist/reveal.js"></script>
<script src="plugin/notes/notes.js"></script>
<script src="plugin/markdown/markdown.js"></script>
<script src="plugin/highlight/highlight.js"></script>
<script>
// More info about initialization & config:
// - https://revealjs.com/initialization/
// - https://revealjs.com/config/
Reveal.initialize({
hash: true,
// Learn about plugins: https://revealjs.com/plugins/
plugins: [RevealMarkdown, RevealHighlight, RevealNotes],
});
</script>
</body>
</html>