-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathinput.csv
We can't make this file beautiful and searchable because it's too large.
1922 lines (1922 loc) · 623 KB
/
input.csv
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
name,mtime,section,body
003 Dynamic Programming.md,1669012068803,---,"--- title: ""003 Dynamic Programming"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
003 Dynamic Programming.md,1669012068803,003 Dynamic Programming,003 Dynamic Programming moc _Dynamic Programming is a tool to solve problems which satisfy the [Principle of Optimality](Notes/Principle%20of%20Optimality.md)._
003 Dynamic Programming.md,1669012068803,Well Known Problems,Well Known Problems - [Fibonacci Sequence](Notes/Fibonacci%20Sequence.md) - [Longest Common Subsequence](Notes/Longest%20Common%20Subsequence.md) - [Longest Increasing Subsequence](Notes/Longest%20Increasing%20Subsequence.md) - [Alignment Problem](Notes/Alignment%20Problem.md) - [Chain Matrix Multiplication](Notes/Chain%20Matrix%20Multiplication.md) - [Knapsack Problem](Notes/Knapsack%20Problem.md) - [Making Change](Notes/Making%20Change.md) - [Travelling Salesman Problem](Notes/Travelling%20Salesman%20Problem.md)
003 Dynamic Programming.md,1669012068803,Strategies,Strategies Both strategies will achieve the same time complexity but bottom up is usually more CPU time efficient due to the simplicity of the code
003 Dynamic Programming.md,1669012068803,Top Down Approach,"Top Down Approach 1. Formulate the problem in terms of recursive smaller subproblems. 2. Use a dictionary to store the solutions to subproblems 3. Turn the formulation into a recursive function 1. Before any recursive call, check the store to see if a solution has been previously computed 2. Store the solution before returning Example with Fib DP:"
003 Dynamic Programming.md,1669012068803,Bottom Up Approach,Bottom Up Approach 1. Formulate the problem in terms of recursive smaller subproblems. 2. Draw the subproblem graph to find dependencies 3. Use a dictionary to store the solutions to subproblems 4. Turn the formulation into a recursive function 1. Compute the solutions to subproblems first 2. Use the solutions to compute the solution for P and store it Example with Fib DP:
003 Dynamic Programming.md,1669012068803,Exercises,Exercises
003 Dynamic Programming.md,1669012068803,Binomial Coefficients,Binomial Coefficients b. c. A top down approach: d. A bottom up approach
003 Dynamic Programming.md,1669012068803,References,References - https://www2.seas.gwu.edu/~ayoussef/cs6212/dynamicprog.html
002 Search Strategies.md,1669012068807,---,"--- title: ""002 Search Strategies"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
002 Search Strategies.md,1669012068807,Search Strategies,Search Strategies moc
002 Search Strategies.md,1669012068807,Factors for search,Factors for search - Completeness: Does it always find a solution if one exists? - Optimality: Does it always find the best solution? [Shortest Path Problem](Notes/Shortest%20Path%20Problem.md) - Average Branching Factor: average number of successors of any node $$ABF=No.of\ Nodes/No.of\ non\ leaf\ nodes$$ - Uninformed Search - [Depth First Search](Notes/Depth%20First%20Search.md) - [Breadth First Search](Notes/Breadth%20First%20Search.md) - [Iterative Deepening Search](Notes/Iterative%20Deepening%20Search.md) - Path Cost - [Uniform Cost Search](Notes/Uniform%20Cost%20Search.md) - [Dijkstra's Algorithm](Notes/Dijkstra's%20Algorithm.md) - Informed Search (Heuristics) - [Greedy Best First Search](Notes/Greedy%20Best%20First%20Search.md) - [A-Star Search](Notes/A-Star%20Search.md) d: depth of the optimal solution m: maximum depth l: cut off maximum depth
001 Software Engineering.md,1682536590982,---,"--- title: ""001 Software Engineering"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
001 Software Engineering.md,1682536590982,Software Engineering Map of Contents,Software Engineering Map of Contents moc - Unified Modelling Language - Conceptual Models - [Use Case Diagrams](Notes/Use%20Case%20Diagrams.md) - [Class Diagrams](Notes/Class%20Diagrams.md) - Dynamic Models - [Activity Diagrams](Notes/Activity%20Diagrams.md) - [State Machine Diagrams](Notes/State%20Machine%20Diagrams.md) - [Communication Diagrams](Notes/Communication%20Diagrams.md) - [Sequence Diagrams](Notes/Sequence%20Diagrams.md) - Design Patterns - [Strategy Pattern](Notes/Strategy%20Pattern.md) - [Observer Pattern](Notes/Observer%20Pattern.md) - [Factory Pattern](Notes/Factory%20Pattern.md) - [Façade Pattern](Notes/Fa%C3%A7ade%20Pattern.md) - [[Visitor Pattern]] - [Dynamic Loading](Notes/Dynamic%20Loading.md) - [Dependency Injection](Notes/Dependency%20Injection.md) - Software Architecture - [Layered Architecture](Notes/Layered%20Architecture.md) - [Model-View-Controller Architecture](Notes/Model-View-Controller%20Architecture.md) - Software Testing - [Black Box Testing](Notes/Black%20Box%20Testing.md) - [White Box Testing](Notes/White%20Box%20Testing.md) - Tooling - [006 Tools](006%20Tools.md) - Software Methodologies - [Test Driven Development](Test%20Driven%20Development)
005 Sorting Algorithms.md,1669012068805,---,"--- title: ""005 Sorting Algorithms"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
005 Sorting Algorithms.md,1669012068805,005 Sorting Algorithms,005 Sorting Algorithms moc
005 Sorting Algorithms.md,1669012068805,Important algorithms,Important algorithms - [Insertion Sort](Notes/Insertion%20Sort.md) - [Merge Sort](Notes/Merge%20Sort.md) - [Quick Sort](Notes/Quick%20Sort.md) - [Heap Sort](Notes/Heap%20Sort.md)
005 Sorting Algorithms.md,1669012068805,Properties,Properties [Stability](https://en.wikipedia.org/wiki/Sorting_algorithmStability): an algorithm is stable if it preserves the original order of any 2 equal elements in its input. ^85ee66 Time complexity:
004 String Matching.md,1669012068792,---,"--- title: ""004 String Matching"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
004 String Matching.md,1669012068792,004 String Matching,004 String Matching moc
004 String Matching.md,1669012068792,Important Algorithms,Important Algorithms - [Rabin-Karp Algorithm](Notes/Rabin-Karp%20Algorithm.md) - [Boyer-Moore Algorithm](Notes/Boyer-Moore%20Algorithm.md)
004 String Matching.md,1669012068792,Straightforward Solution,Straightforward Solution
006 Tools.md,1685261803793,---,"--- title: ""006 Tools"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
006 Tools.md,1685261803793,Tools,Tools moc
006 Tools.md,1685261803793,Languages,Languages - [TypeScript](Notes/TypeScript.md) - [Go](Notes/Go.md) - [C](Notes/C.md) - [[Rust]]
006 Tools.md,1685261803793,Frameworks,Frameworks - [Angular](Notes/Angular.md) - [React](React)
006 Tools.md,1685261803793,Libraries,Libraries - [NodeJS](NodeJS) - [ASP.NET Web API](Notes/ASP.NET%20Web%20API.md)
006 Tools.md,1685261803793,Databases,Databases - [SQL](SQL) - [MongoDB](MongoDB) -
007 Data Structures.md,1669012068799,---,"--- title: ""007 Data Structures"" date: 2022-11-07 lastmod: 2022-11-21 ---"
007 Data Structures.md,1669012068799,Data Structures,Data Structures - [Linked Lists](Linked%20Lists) - [Hash Tables](Notes/Hash%20Tables.md) - [Heaps](Notes/Heaps.md) - [Binary Tree](Notes/Binary%20Tree.md) - [Binary Search Tree](Notes/Binary%20Search%20Tree.md) - [B-tree](Notes/B-tree.md) - [Union Find](Notes/Union%20Find.md) - [Bitmap](Notes/Bitmap.md)
008 Networking.md,1677604052754,---,"--- title: ""008 Networking"" tags: [moc] date: 2022-11-07 lastmod: 2023-01-18 ---"
008 Networking.md,1677604052754,Networking,Networking moc
008 Networking.md,1677604052754,[Building Blocks of the Internet](Notes/Building%20Blocks%20of%20the%20Internet.md),[Building Blocks of the Internet](Notes/Building%20Blocks%20of%20the%20Internet.md)
008 Networking.md,1677604052754,[[Application Layer]],[[Application Layer]] - [[HTTP]] - [[Transport Layer Security]] - [[DNS]] - [SMTP](Notes/SMTP.md) - [POP3](Notes/POP3.md) - [BitTorrent](Notes/BitTorrent.md) - [Distributed Hash Table](Notes/Distributed%20Hash%20Table.md)
008 Networking.md,1677604052754,[[Transport Layer]],[[Transport Layer]] - [Transmission Control Protocol](Notes/Transmission%20Control%20Protocol.md) - [User Datagram Protocol](Notes/User%20Datagram%20Protocol.md)
008 Networking.md,1677604052754,[[Network Layer]],[[Network Layer]] - [Internet Protocol](Notes/Internet%20Protocol.md) - [Network Address Translation](Notes/Network%20Address%20Translation.md) - [[Browser Networking]] [[Link Layer]] - [[Wireless Networks]]
008 Networking.md,1677604052754,References,References - [High Performance Browser Networking](https://hpbn.co/) - [@kuroseComputerNetworkingTopdown2017](References/@kuroseComputerNetworkingTopdown2017.md)
009 Computer Organisation.md,1669012068797,---,"--- title: ""009 Computer Organisation"" tags: [moc] date: 2022-11-08 lastmod: 2022-11-21 ---"
009 Computer Organisation.md,1669012068797,Computer Organisation,Computer Organisation moc - [Signal Chain Subsystem](Notes/Signal%20Chain%20Subsystem.md)
100 Reading List.md,1669012068790,---,"--- title: ""100 Reading List"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
100 Reading List.md,1669012068790,Reading List,Reading List moc
100 Reading List.md,1669012068790,Algos,Algos 1. [A Common-Sense Guide to Data Structures and Algorithms](https://www.amazon.sg/Common-Sense-Guide-Data-Structures-Algorithms/dp/1680507222/ref=sr_1_9?crid=UQ12IKPHMY7G&keywords=Elements+of+Programming+Interviews&qid=1656292315&sprefix=elements+of+programming+interviews%2Caps%2C300&sr=8-9) 2. [101 Introduction To Algorithms](Notes/101%20Introduction%20To%20Algorithms.md) Link: [Introduction to Algorithms](https://www.amazon.com/Introduction-Algorithms-fourth-Thomas-Cormen/dp/026204630X/ref=pd_cart_crc_cko_cp_2_6/134-8052667-1718550?_encoding=UTF8&content-id=amzn1.sym.7c768d31-fcb6-4e60-bb16-7d8e97d21350&pd_rd_i=026204630X&pd_rd_r=c619e326-f826-46d4-9060-27d427d0abd9&pd_rd_w=pJEXS&pd_rd_wg=zmv6K&pf_rd_p=7c768d31-fcb6-4e60-bb16-7d8e97d21350&pf_rd_r=3FJ07YDA0YDJHBD65F0W&psc=1&refRID=3FJ07YDA0YDJHBD65F0W) 3. [Elements of Programming Interviews](https://www.amazon.sg/Elements-Programming-Interviews-Python-Insiders/dp/1537713949/ref=sr_1_1?crid=UQ12IKPHMY7G&keywords=Elements+of+Programming+Interviews&qid=1656292315&sprefix=elements+of+programming+interviews%2Caps%2C300&sr=8-1)
100 Reading List.md,1669012068790,Distributed Systems,"Distributed Systems 1. [Understanding Distributed Systems](https://www.amazon.com/Understanding-Distributed-Systems-Second-applications/dp/1838430210?keywords=understanding+distributed+systems&qid=1656280535&sprefix=understanding+dis,aps,118&sr=8-1&linkCode=sl1&tag=utsavized0d-20&linkId=a920b5dfb493c084cd500eb954527f5c&language=en_US&ref_=nav_signin&)"
100 Reading List.md,1669012068790,Under the Hood,"Under the Hood 1. [Writing An Interpreter In Go](https://interpreterbook.com/) 2. [Writing A Compiler In Go](https://compilerbook.com/) 3. Ian McLoughlin, Computer Architecture: An Embedded Approach, McGraw-Hill Education (Asia), 2011, ISBN: 978-0071311182."
100 Reading List.md,1669012068790,Databases,Databases 1. High Performance MySQL
100 Reading List.md,1669012068790,Software Engineering,Software Engineering 1. [Clean Code](Notes/Clean%20Code.md)
2005 Operating Systems.md,1689953249629,---,"--- title: ""2005 Operating Systems"" tags: [moc] date: 2022-11-07 lastmod: 2023-07-01 ---"
2005 Operating Systems.md,1689953249629,Operating Systems,Operating Systems moc Essentially a piece of code which controls and coordinates the use of hardware among various programs for various users.
2005 Operating Systems.md,1689953249629,The Boot Process,"The Boot Process When you turn on a computer, it begins executing *firmware code* that is stored in motherboard [ROM](https://en.wikipedia.org/wiki/Read-only_memory). This code performs a [power-on self-test](https://en.wikipedia.org/wiki/Power-on_self-test), detects available RAM, and pre-initializes the CPU and hardware. Afterwards, it looks for a bootable disk and starts booting the operating system kernel."
2005 Operating Systems.md,1689953249629,Boot Process Firmware,"Boot Process Firmware On x86, there are two firmware standards: the “Basic Input/Output System“ (**[BIOS](https://en.wikipedia.org/wiki/BIOS)**) and the newer “Unified Extensible Firmware Interface” (**[UEFI](https://en.wikipedia.org/wiki/Unified_Extensible_Firmware_Interface)**). The BIOS standard is old and outdated, but simple and well-supported on any x86 machine since the 1980s. UEFI, in contrast, is more modern and has much more features, but is more complex to set up"
2005 Operating Systems.md,1689953249629,Bootloaders,"Bootloaders When you turn on a computer, it loads the BIOS from some special flash memory located on the motherboard. The BIOS runs self-test and initialization routines of the hardware, then it looks for bootable disks. If it finds one, control is transferred to its *bootloader*, which is a 512-byte portion of executable code stored at the disk’s beginning. Most bootloaders are larger than 512 bytes, so bootloaders are commonly split into a small first stage, which fits into 512 bytes, and a second stage, which is subsequently loaded by the first stage. The bootloader has to determine the location of the kernel image on the disk and load it into memory. It also needs to switch the CPU from the 16-bit real mode first to the 32-bit protected mode, and then to the 64-bit long mode, where 64-bit registers and the complete main memory are available. Its third job is to query certain information (such as a memory map) from the BIOS and pass it to the OS kernel."
2005 Operating Systems.md,1689953249629,Types of OS,"Types of OS 1. Batch Systems: batch similar jobs which automatically transfers control from one job to another - Only 1 job in memory at any time - When job waits for IO, the CPU is idle 2. Multiprogram / Time-sharing Systems: several jobs are kept in main memory at the same time - Goal: Improve CPU utilization by running more than one program concurrently even in a single-core CPU - Different from multiprocessing: increase computing power with parallel architectures - __Requires OS to be able to handle memory management, CPU and I/O scheduling for efficiency 3. Embedded Systems: physical systems where operations are controlled by computing - Examples: - Real time systems: have jobs that must complete without well-defined fixed time constraints (e.g. car airbag deployment) - Handheld systems"
2005 Operating Systems.md,1689953249629,Functions of the OS,Functions of the OS 1. [IO Subsystem](Notes/IO%20Subsystem.md) 2. [Direct Memory Access](Notes/Direct%20Memory%20Access.md) 3. [Interrupts](Notes/Interrupts.md) 4. [[Multitasking]] 5. [Hardware Protection](Notes/Hardware%20Protection.md) 6. Handle [Processes](Notes/Processes.md) 7. [Process scheduling](Notes/Process%20scheduling.md) 8. [Process Synchronization](Notes/Process%20Synchronization.md) 9. [Deadlocks](Notes/Deadlocks.md) 10. [Real Time Operating Systems](Notes/Real%20Time%20Operating%20Systems.md) 11. [Virtualization](Notes/Virtualization.md) 12. [Memory Organisation](Notes/Memory%20Organisation.md) 13. [Virtual Memory](Notes/Virtual%20Memory.md) 14. [[Allocators]] 15. [File Systems](Notes/File%20Systems.md)
2005 Operating Systems.md,1689953249629,References,References
2005 Operating Systems.md,1689953249629,Operating Systems Concepts,Operating Systems Concepts Exercise solutions: - https://codex.cs.yale.edu/avi/os-book/OS10/practice-exercises/index-solu.html Instructor's Manual: - http://web.uettaxila.edu.pk/CMS/AUT2011/seOSbs/tutorial/Sol.%20-%20Silberschatz.Galvin%20-%20Operating.System.Concepts.7th.pdf
2203 Distributed Systems.md,1678366795953,---,"--- title: ""2203 Distributed Systems"" date: 2023-01-21 ---"
2203 Distributed Systems.md,1678366795953,2203 Distributed Systems,2203 Distributed Systems moc - [[Distributed Abstractions]] - [Failure Detectors](Notes/Failure%20Detectors.md) - [[Broadcast Abstractions]] - [[Distributed Shared Memory]] - [[Consensus]] - [[Time Abstractions]] - [[Distributed Data Management]]
2203 Distributed Systems.md,1678366795953,What are distributed systems,"What are distributed systems A set of nodes, connected by a network, which appear to its users as a single coherent system."
2203 Distributed Systems.md,1678366795953,Core problems,Core problems
2203 Distributed Systems.md,1678366795953,Agreement,Agreement
2203 Distributed Systems.md,1678366795953,Two generals problem,Two generals problem “Two generals need to coordinate an attack” - Must agree on time to attack - They’ll win only if they attack simultaneously - Communicate through messengers - Messengers may be killed on their way Generals are unable to come to an agreement within a specified time bound using unreliable communication channels.
2203 Distributed Systems.md,1678366795953,Consensus Problem,"Consensus Problem All nodes/processes propose a value Some nodes (non correct nodes) might crash & stop responding The algorithm must ensure a set of properties (specification): - All correct nodes eventually decide - Every node decides the same - Only decide on proposed values This problem models the core issue in distributed databases known as **atomic commits**, where we choose to commit if every node agrees to commit and abort if at least one node aborts. It is a consensus with 2 values {commit, abort}."
2203 Distributed Systems.md,1678366795953,Broadcast Problem,"Broadcast Problem Atomic Broadcast - A node broadcasts a message - If sender correct, all correct nodes deliver message - All correct nodes deliver the same messages (consensus) - Messages delivered in the same order > [!Note] > Atomic broadcast can be used to solve consensus in the following way: > 1. Decide on the first received proposal > 2. Since all messages are in the same order, all nodes will decide the same > > Consensus can be solved by Atomic broadcast > > *Atomic broadcast is equivalent to Consensus*"
2203 Distributed Systems.md,1678366795953,Modelling Distributed Systems,Modelling Distributed Systems
2203 Distributed Systems.md,1678366795953,Timing assumptions,Timing assumptions - Processes: bounds on time to make a computation step - Network: bounds on time to transmit a message - Clocks: lower and upper bounds on clock drift rate
2203 Distributed Systems.md,1678366795953,Failure assumptions,"Failure assumptions - Processes: what kind of failure? - Network: can network drop messages, temporarily disconnect?"
2203 Distributed Systems.md,1678366795953,Asynchronous System Model,Asynchronous System Model - No bound on time to deliver a message - No bound on time to compute - Clocks are not synchronized
2203 Distributed Systems.md,1678366795953,Synchronous system,"Synchronous system *""My server always serves requests within 1 week""* - Known bound on time to deliver a message (latency) - Known bound on time to compute - Known lower and upper bounds in physical clock drift rate Examples: - Embedded systems (shared clock) - Multicore computers"
2203 Distributed Systems.md,1678366795953,Partial Synchrony,"Partial Synchrony *""My server processes requests within one week when it is running, and it will eventually be running for at least a week, I just don't know when that will be.""* - A system that is asynchronous but eventually exhibits some period of synchrony."
2203 Distributed Systems.md,1678366795953,Measuring Performance,Measuring Performance
2203 Distributed Systems.md,1678366795953,Message complexity,Message complexity The number of messages required to terminate an operation of an abstraction
2203 Distributed Systems.md,1678366795953,Time complexity (Rounds),"Time complexity (Rounds) One time unit in an Execution E is the longest message delay in E. We assume all communication steps takes one time unit. We also call this a round or step. Time Complexity is Maximum time taken by any execution of the algorithm under the assumptions - A process can execute any finite number of actions (events) in zero time - The time between send(m)i,j and deliver(m)i,j is at most one time unit"
2101 Algorithm Design and Analysis.md,1669012068784,---,"--- title: ""2101 Algorithm Design and Analysis"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
2101 Algorithm Design and Analysis.md,1669012068784,2101 Algorithm Design and Analysis,2101 Algorithm Design and Analysis moc - [Insertion Sort](Notes/Insertion%20Sort.md) - Divide & Conquer Algorithms - [Merge Sort](Notes/Merge%20Sort.md) - [Quick Sort](Notes/Quick%20Sort.md) - Data Structure Based Algorithms - [Heap Sort](Notes/Heap%20Sort.md) - [Union Find](Notes/Union%20Find.md) - [Kruskal's Algorithm](Notes/Kruskal's%20Algorithm.md) - Greedy Algorithms - [Dijkstra's Algorithm](Notes/Dijkstra's%20Algorithm.md) - [Prim's Algorithm](Notes/Prim's%20Algorithm.md) - [Complexity Analysis](Notes/Complexity%20Analysis.md) - [Recurrence Equations](Notes/Recurrence%20Equations.md) - [003 Dynamic Programming](003%20Dynamic%20Programming.md) - [004 String Matching](004%20String%20Matching.md) - [Rabin-Karp Algorithm](Notes/Rabin-Karp%20Algorithm.md) - [Boyer-Moore Algorithm](Notes/Boyer-Moore%20Algorithm.md) - Introduction to P and NP - [P and NP Problems](Notes/P%20and%20NP%20Problems.md)
2704 Finance.md,1669012068785,---,"--- title: ""2704 Finance"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
2704 Finance.md,1669012068785,2704 Finance,2704 Finance moc - [Bonds](Notes/Bonds.md) - [Stock Valuation](Notes/Stock%20Valuation.md) - [Capital Budgeting](Notes/Capital%20Budgeting.md)
2421 Machine Learning.md,1678911806680,---,"--- title: ""2421 Machine Learning"" date: 2023-01-19 tags: [moc] lastmod: 2023-03-13 ---"
2421 Machine Learning.md,1678911806680,2421 Machine Learning,2421 Machine Learning A subfield of [AI](3005%20AI.md) focused on using algorithms trained on data to perform complex tasks. moc - [[Decision Trees]] - [[Regression]] - [Statistical Inference](Notes/Statistical%20Inference.md) - [[Support Vector Machines]] - [[Neural Networks]] - [[Clustering]] - [[Ensemble Learning]] - [[Dimensionality Reduction]]
2421 Machine Learning.md,1678911806680,Training and validation,"Training and validation Training a machine learning model involves the use of data. However, we need to test the effectiveness of the model, this is called validation. Hence we need to split the data into training and testing sets."
2421 Machine Learning.md,1678911806680,Bias vs Variance,"Bias vs Variance - Bias: level of inability for the model to fit the true nature of the data. High bias model cannot fit the true nature - Variance: is the amount which our predictions will change due to a different training data set. This can lead to worse fit on the test set. Formally, its the expected divergence of the estimated prediction from its average value. <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/EuBBz3bI-aA"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"" allowfullscreen></iframe> Our intuition may tell: - The presence of bias indicates something basically wrong with the model and algorithm... - Variance is also bad, but a model with high variance could at least predict well on average... So the model should minimize bias even at the expense of variance?? Not really! Bias and variance are equally important as we are always dealing with a single realization of the data set."
2421 Machine Learning.md,1678911806680,Bias and variance decomposition,Bias and variance decomposition - True function: $f(x)$ - Prediction function estimated with data D: $\hat{f_D}(x)$ - Average of prediction models: $E_D[\hat{f_D}(x)]$ $$ \begin{align} Variance=E_D[(E_D[\hat{f_D}(x)]-E_D[f(x)])^2]\\ Bias=E_D[\hat{f_D}(x)]-f(x) \end{align} $$
2421 Machine Learning.md,1678911806680,Overfitting,"Overfitting When the learned models are overly specialized for the training samples, leading to low bias and high variance."
2421 Machine Learning.md,1678911806680,Cross Validation,"Cross Validation How do we know how much % to split between test and train-set data? Cross validation will attempt many different combinations to find the best split. <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/fSytzGwwBVw"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"" allowfullscreen></iframe>"
2421 Machine Learning.md,1678911806680,Interpretability,Interpretability
2421 Machine Learning.md,1678911806680,Shrinking the number of variables,"Shrinking the number of variables Among a large number of variables the model there are generally many that have little (or no) effect on Y - Leaving these variables in the model makes it harder to see the big picture, i.e. the effect of the “important variables” - Would be easier to interpret the model by removing unimportant variables (setting the coefficients to zero)"
2421 Machine Learning.md,1678911806680,Occam's Razor,"Occam's Razor A principle about choosing the simplest explanation for the observed data, which can involve the number of model parameters, data points and fit to data."
3001 Advanced Computer Architecture.md,1669221262558,---,"--- title: ""3001 Advanced Computer Architecture"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
3001 Advanced Computer Architecture.md,1669221262558,Advanced Computer Architecture,Advanced Computer Architecture moc - [Computer Performance](Notes/Computer%20Performance.md) - [Computer Power](Notes/Computer%20Power.md) - [Instruction Set Architecture](Notes/Instruction%20Set%20Architecture.md) - [Datapath and Control Design](Notes/Datapath%20and%20Control%20Design.md) - [Pipelining](Notes/Pipelining.md) - [Instruction Level Parallelism](Notes/Instruction%20Level%20Parallelism.md) - [Custom Computing](Notes/Custom%20Computing.md) - [Cache](Notes/Cache.md) - [GPU Architecture](Notes/GPU%20Architecture.md) - [Data Level Parallelism](Notes/Data%20Level%20Parallelism.md) - [Thread Level Parallelism](Notes/Thread%20Level%20Parallelism.md)
3005 AI.md,1674150632574,---,"--- title: ""3005 AI"" tags: [moc] date: 2022-11-07 lastmod: 2023-01-19 ---"
3005 AI.md,1674150632574,CZ3005 Artificial Intelligence,CZ3005 Artificial Intelligence moc - [Intelligent Agents](Notes/Intelligent%20Agents.md) - [002 Search Strategies](002%20Search%20Strategies.md) - [Constraint Satisfaction Problem](Notes/Constraint%20Satisfaction%20Problem.md) - [Games as Search Problems](Notes/Games%20as%20Search%20Problems.md) - [Markov Decision Process](Notes/Markov%20Decision%20Process.md) - Reinforcement Learning - [Monte Carlo Policy](Notes/Monte%20Carlo%20Policy.md) - [Q-Learning](Notes/Q-Learning.md) - [Game Theory](Notes/Game%20Theory.md) - [Knowledge Representation](Notes/Knowledge%20Representation.md) - [Propositional Logic](Notes/Propositional%20Logic.md) - [First Order Logic](Notes/First%20Order%20Logic.md) - [Default Logic](Notes/Default%20Logic.md)
Annotations.md,1669012068774,---,"--- title: ""Annotations"" date: 2022-11-07 lastmod: 2022-11-21 ---"
Annotations.md,1669012068774,Annotations,"Annotations (03/07/2022, 13:24:26) “Since this subarray must contain AŒmid,thefor loop of lines 3–7 starts the index i at mid and works down to low, so that every subarray it considers is of the form AŒi : : mid. Lines 1–2 initialize the variables left-sum, which holds the greatest sum found so far, and sum, holding the sum of the entries in AŒi : : mid. Whenever we find, in line 5, a subarray AŒi : : mid with a sum of values greater than left-sum, we update left-sum to this subarray’s sum in line 6, and in line 7 we update the variable max-left to record this index i. Lines 8–14 work analogously for the right half,” ([“Introduction to algorithms”, 2009, p. 72](zotero://select/library/items/7E6KGQXY)) ([pdf](zotero://open-pdf/library/items/X4G75MM7?page=93&annotation=SYVBAT6S)) “For example, we can interpret a character string as an integer expressed in suitable radix notation. Thus, we might interpret the identifier pt as the pair of decimal integers .112; 116/,sincep D 112 and t D 116 in the ASCII character set; then, expressed as a radix-128 integer, pt becomes .112 128/ C 116 D 14452.” ([“Introduction to algorithms”, 2009, p. 263](zotero://select/library/items/7E6KGQXY)) ([pdf](zotero://open-pdf/library/items/X4G75MM7?page=284&annotation=8285ZSWS)) “11.3.3 Universal hashing” ([“Introduction to algorithms”, 2009, p. 265](zotero://select/library/items/7E6KGQXY)) ([pdf](zotero://open-pdf/library/items/X4G75MM7?page=286&annotation=MYXVRTMC)) \[image\] ([pdf](zotero://open-pdf/library/items/X4G75MM7?page=950&annotation=4IXV76H4)) ([“Introduction to algorithms”, 2009, p. 929](zotero://select/library/items/7E6KGQXY)) gcd “31.1-2 Prove that there are infinitely many primes. (Hint: Show that none of the primes p1;p2;:::;pk divide .p1p2 pk/ C 1.)” ([“Introduction to algorithms”, 2009, p. 932](zotero://select/library/items/7E6KGQXY)) ([pdf](zotero://open-pdf/library/items/X4G75MM7?page=953&annotation=6S5WI7DM))"
4031 Database Systems.md,1669012068777,---,"--- title: ""4031 Database Systems"" tags: [moc] date: 2022-11-07 lastmod: 2022-11-21 ---"
4031 Database Systems.md,1669012068777,Database Systems,Database Systems moc - [Storage](Notes/Storage.md) - [Buffer Pools](Notes/Buffer%20Pools.md)
4031 Database Systems.md,1669012068777,Indexes,Indexes - [Conventional Indexes](Notes/Conventional%20Indexes.md) - [B+ Tree Index](Notes/B+%20Tree%20Index.md) - [Hash Index](Notes/Hash%20Index.md) - [Multi Key Index](Notes/Multi%20Key%20Index.md)
4031 Database Systems.md,1669012068777,Query Processing,Query Processing - [Query Processing](Notes/Query%20Processing.md) - [One Pass Algorithms](Notes/One%20Pass%20Algorithms.md) - [Two Pass Algorithms](Notes/Two%20Pass%20Algorithms.md) - [Index Based Algorithms](Notes/Index%20Based%20Algorithms.md) - [Query Execution](Notes/Query%20Execution.md) - [Query Compiler](Notes/Query%20Compiler.md)
4031 Database Systems.md,1669012068777,Transactions,Transactions - [Transaction Management](Notes/Transaction%20Management.md) - [Concurrency Control](Notes/Concurrency%20Control.md)
A thought on leetcode.md,1669012068769,---,"--- title: ""A thought on leetcode"" date: 2022-11-08 lastmod: 2022-11-21 ---"
A thought on leetcode.md,1669012068769,A thought on leetcode,"A thought on leetcode A horrendous bug was discovered at work today. How horrendous? It had to do with concurrency. The bug came as a side effect from a cycle in our service API bug detection program. Our program ran by recursively following the verdict result from failing services. After some debugging, essentially: ```mermaid graph LR; A(Service A) --> B(Service B); B --> C(Service C); C --> A; ``` Immediately a task was issued to implement a fix to detect the cyclic verdict mechanism which was trapping our program. This is a simple problem, once one is able to frame the problem as a cyclic detection problem, it devolves to [a leetcode easy](https://leetcode.com/problems/linked-list-cycle/). The widely accepted optimal solution which uses O(1) space is [Floyd's tortoise and hare](https://en.wikipedia.org/wiki/Cycle_detectionFloyd's_tortoise_and_hare)two pointer solution. In our case, to implement such a solution would mean establishing an additional runner to complete the loop, finding a cycle only after a minimum of 1 full additional cycle has been made. This would not have made the most sense, as this meant making additional requests to our verdict endpoint, increasing the overall latency to determine this rare cyclic case. Our go-to solution became the naïve method of storing visited nodes in a HashMap, and checking against it before continuing down our call sequence. Algorithms knowledge provides the programmer with options. The solution to a real-world issue may not be the same algorithm used in a leetcode problem, but the ability to frame the problem, and have the knowledge of different potential solutions and their trade-offs is an essential everyday skill."
ByteDance - 2nd Round.md,1669390137079,---,"--- title: ""ByteDance - 2nd Round"" date: 2022-11-25 lastmod: 2022-11-25 ---"
ByteDance - 2nd Round.md,1669390137079,ByteDance - 2nd Round,"ByteDance - 2nd Round I applied to ByteDance for a backend engineering internship role in video infrastructure. Due to a lack of research, I found myself in a system design interview rather than a leetcode style one. My interviewer was an SRE - Software Reliability Engineer, who seemed quite surprised that I knew what an SRE was (thanks to my role at Shopee monitoring the stability of the UAT environment, which is not much unlike an SRE)."
ByteDance - 2nd Round.md,1669390137079,The Question,"The Question *""A poor business man wants to set up a new business, selling cloud storage to customers. However, he only has 4 old servers, each of 1 TB capacity.""* Questions in order: 1. Design the IO flow of such a system, to support basic features of uploading and downloading files given a specific file path. 2. How could we support multiple clients? 3. What if a client wants to upload a file larger than the capacit of a single server ( > 1 TB)? 4. One of the servers failed, causing data loss and large costs to compensate users for the lost/corrupted data. How could we improve the reliability of the 4 servers. 5. The businessman wants to be able to *oversell* his service. With only 4TB of storage, we need to be able to sell 8TB worth to customers. This is on the basis that not all customers will use all the storage they purchased."
ByteDance - 2nd Round.md,1669390137079,Wow I am bad,"Wow I am bad Pointers 4 and 5 are most interesting, and stumped me during the interview. Question 4: With only 4 servers, and the constraint of being poor, a RAID configuration was not feasible. I think the interviewer expected me to list some redudancy algorithms, but with no knowledge or experience, I couldn't provide any. Question 5: This question and the answers from the interviewer, brought into light the technical difficulties which the cloud storage services we use face everyday. One solution is to compress the files uploaded. If we can compress and decompress files on the fly while serving requests to clients, we can make the 4TB of storage go a long way. Pair this up with *virtual uploads*. Virtual uploads is based on the assumption that every individual does not *really* have many personal files or data. This means a majority of storage that most people ever really make use of, is for public files - files such as music, or the first season of The Office. Ever wondered what Google Drive is doing during the *Scanning File* portion when you upload a file? Apparently, the local file is being hashed in its entirety and sent to the server to check if Google already has the same file in its store. This means that multiple users will have pointers to the same file stored on the server, and the local file is only ""virtually"" uploaded. The final step, and last resort, is to provision more servers. It is hard to provision servers out of thin air, and once bought, they start to add to the cost of the business. Here the interviewer mentioned how one could make use of the constraint on network and bandwith speeds as the time the business will have to provision more servers."
ByteDance - 2nd Round.md,1669390137079,Life's tough,"Life's tough If it is not yet evident, I definitely did not come up with any of the responses above. A lot of very smart people have had to come up with such solutions and implement them in the real world. I wonder what other problems lie in plain sight, but have solutions which most will remain oblivious to."
"Why Vim, in the land of Go.md",1670741479592,---,"--- title: ""Why Vim, in the land of Go"" date: 2022-12-10 lastmod: 2022-12-11 --- Editor wars have been fought from at least [1985](https://en.wikipedia.org/wiki/Editor_war). In 2022 however, the [majority](https://survey.stackoverflow.co/2022/most-popular-technologies-new-collab-tools) continue to trend towards Visual Studio Code as their preferred choice of an integrated developer environment (IDE). At my workplace where the primary language is Go, the editor on screens, amongst the different colour schemes and themes can be quite easily recognised as JetBrain's proprietary GoLand. Being a paid piece of software, it of course includes most of the feature set which one might expect from a modern IDE: syntax highlighting, symbol navigation methods, built-in debugger etc. Even more importantly, it seemed like GoLand users made bigger bucks than Vim users[^1]! At the behest of one colleague, who felt that GoLand was slow and memory intensive, I decided to come up with my case for why I use vim in the land of Go."
"Why Vim, in the land of Go.md",1670741479592,Vim Motions...Sickness,"Vim Motions...Sickness Vim is not *just* vim. Vim comprises of its interface (the program running in a terminal window) and vim **motions**. These are the keyboard shortcuts that can be tied to cursor movements, file operations and even custom functions. I won't go into the weeds about the different commands and modes available as there are a ton of interactive (and even gamified, if that's your type of thing) tutorials out there which do a much better job of what I can hope to do here. Here are some: - vimtutor - [Learn-Vim](https://github.com/iggredible/Learn-Vim) - [vim-be-good](https://github.com/ThePrimeagen/vim-be-good) But why should you learn a bunch of new commands and keystrokes? It seems like a massive time sink. Everyday, some new technology comes out demanding your attention and this just ain't helping."
"Why Vim, in the land of Go.md",1670741479592,Need for speed,"Need for speed This brings me to my first reason, it makes you really *really* fast. The basic commands keep your hands on the home row of the keyboard, allowing you to just churn out lines and lines of text without ever touching the mouse. And of course, as programmers, we are not always writing code. Time is spent thinking and designing what is the *best* way to solve a problem. But during this deliberation process, we like to try things, add a few lines here and refactor a function there. In general, we like to break stuff to understand how things work and what we should do next. Being able to break stuff quickly, reducing mouse distractions, speeds up the iterative process that is software engineering."
"Why Vim, in the land of Go.md",1670741479592,SSS combo,"SSS combo [Speed is fun](https://www.scienceabc.com/pure-sciences/why-do-we-feel-so-thrilled-by-speed.html). But if you have ever played brawler or hack and slash type games, you know the thrill of hitting insane combos. This is the same way I feel about vim motions. Want to refactor a nested function? `:10<CR>V10jd<C-d>P`. Want to change all its arguments? `ci(`. Want to give up? `:qa!<CR>`. Keeping that flow state, moving fast, pushing out combos just adds up to quite a lot of fun."
"Why Vim, in the land of Go.md",1670741479592,Vim the program,"Vim the program Now, let's talk about vim the program itself. In terms of applicability, vim is still *somewhat* immemorial. Vi is installed by default in various Linux distributions, allowing you to interface with server files proficiently. But who cares? You just want to be able to write and debug Go code, see pretty rainbow bracket colours and run a separate terminal program all in the same view. Vim *can't* do that, after all, its website looks like it was made in the 90s:"
"Why Vim, in the land of Go.md",1670741479592,Running naked makes you faster?,"Running naked makes you faster? Vim starts of relatively barebones. From its website you can see the key features listed being a multi level undo tree and powerful search and replace, which are all things you would already expect to have. This means that at the start, writing code in vim is extremely painful. It will feel like you are writing code on Notepad but with the added difficulty of hundreds of commands. However, this also means that it is fast. It launches instantaneously on the terminal window and even on low performance virtual machines, the experience is still decent. But what is the point of it all if you are just going to be worse off as a programmer? Vim has an extensive plugin system. Adding functionality which you want is simple, and usually involves finding existing plugins on GitHub, adding them to your initialisation file and configuring the options that you want. See the key here is what *you* want. You get to decide what features you wish to add to your editor, and what you consider bloat. **To quickly get all the necessary features you are used to in GoLand into Vim, there is [this plugin](https://github.com/fatih/vim-go).**"
"Why Vim, in the land of Go.md",1670741479592,Nah it just allows you to tinkle I mean tinker while you run,"Nah it just allows you to tinkle I mean tinker while you run All this control does not just limit you to the functionality of things. Looking at the screen for hours a day means aesthetics is just as important to the programmer. Vim allows you extensive options to make it look the way you want to. This type of persistent tinkering may put off some of you who simply wish for some sensible defaults out of the box and at the beginning, it will be constant tinkering to get something you like. However, I assure you that the satisfaction of having something completely personalised to your taste and style will make coding in it 10x more enjoyable. My setup for Go in this year's Advent Of Code:"
"Why Vim, in the land of Go.md",1670741479592,How about running in the open?,"How about running in the open? Vim is open source. This means you can (if you want) scrutinise the code for any malicious intent. This does not automatically make Vim *better* per-se. Instead, this might mean that stability is not as guaranteed as compared to an editor where people are paying $69.99 per month for. In fact, Paul Lutus would request for the user to ""stop complaining for a while and make the world a better place.""[^2] Personally, I use [NeoVim](https://github.com/neovim/neovim), which is a fork off Vim that encourages community contributions among other things. For essential Go development features, checkout the wonderful [go.nvim](https://github.com/ray-x/go.nvim) plugin."
"Why Vim, in the land of Go.md",1670741479592,Concrete steps out of the tarpit,"Concrete steps out of the tarpit Past all my blabbering, I wish to offer some actionable steps which you can take. 1. I would argue that learning vim motions bring you 80% of the way towards using vim as your daily driver. Hence, don't start with Vim, start with vim motions. Look for options or plugins that enable the use of vim key bindings. For GoLand users, look to [IdeaVim](https://plugins.jetbrains.com/plugin/164-ideavim). This means that you can get good with the essential vim commands without leaving the comfort of GoLand. 2. When you feel ready to leave the nest, I recommend installing NeoVim and fiddling with the configurations. Here I also recommend looking at videos, which can offer step by step walkthroughs on the the general ideas behind configuration. - [Your first vimrc](https://www.youtube.com/watch?v=x2QJYq4IX6M) - [Neovim from scratch](https://www.youtube.com/watch?v=ctH-a-1eUME) As with anything that is configurable, one must accept the inevitable situation of things breaking. It helps to treat these situations like mini side projects, ones that will help solidify your understanding of the tools you use and make you a better developer out of it. 3. Get inspired. There is a great community out there that use Vim to make incredible things. Many make plugins which I cannot live without, and many others create insanely ""riced"" personal development environments out of their editor. It's hard to get into something tough, if you can't see or want the end goal. [Reddit](https://www.reddit.com/r/neovim)is a good place to explore what you may be interested in. I can also highly recommend following some extremely knowledgeable and entertaining vim content creators like [ThePrimeagen](https://www.youtube.com/@ThePrimeagen) and [TJ](https://www.youtube.com/channel/UCd3dNckv1Za2coSaHGHl5aA)."
"Why Vim, in the land of Go.md",1670741479592,References,References [^1]: https://survey.stackoverflow.co/2022/top-paying-technologies-integrated-development-environment [^2]: https://arachnoid.com/careware/index.html
Brag Doc.md,1669012068771,---,"--- title: ""Brag Doc"" date: 2022-11-07 lastmod: 2022-11-21 ---"
Brag Doc.md,1669012068771,Brag Doc,Brag Doc
Brag Doc.md,1669012068771,Shopee,"Shopee Project: API auto failure detection, triaging and reporting internal tool - Roll out for LATAM regions - Build new service-specific email reporting flow to increase workflow efficiency and standardization for *all* service teams - Helped to increase success rates from 60% to 90% in the UAT environment"
2460 Software Safety and Security.md,1683790573149,---,"--- title: ""2460 Software Safety and Security"" date: 2023-03-27 tags: [moc] lastmod: 2023-04-11 ---"
2460 Software Safety and Security.md,1683790573149,2460 Software Safety and Security,2460 Software Safety and Security moc - [[Risk Analysis]] - [[X.509 Email Address Vulnerability]] - [[Formal Specification]] - [[Notes/Model Checking]] - [[Software Model Checking]] - [[Memory Safety]] [Safety](Notes/Safety%20and%20Liveliness.md): condition of being protected from harm Security: degree of protection from harm
2460 Software Safety and Security.md,1683790573149,Verification vs Validation,Verification vs Validation Verification: does the software do things right? - can be automated by tools to verify specific properties Validation: does the software do the right thing? - requires human judgement to think about which are the correct requirements/operations
2460 Software Safety and Security.md,1683790573149,Verification,Verification Dynamic analysis: performs at run time analysing the real state of the system Static analysis: performs at compile time to analyse the simplified state of the system
101 Introduction To Algorithms.md,1669012068767,---,"--- title: ""101 Introduction To Algorithms"" tags: [book, moc] date: 2022-11-08 lastmod: 2022-11-21 ---"
101 Introduction To Algorithms.md,1669012068767,Introduction To Algorithms,Introduction To Algorithms book moc This contains the map of contents to my set of notes and solutions to the problems laid out in the Introduction to Algorithms book. - [Chapter 5: Probabilistic Analysis and Randomised Algorithms](Notes/Probabilistic%20Analysis%20and%20Randomised%20Algorithms.md) -
Activity Diagrams.md,1669012068764,---,"--- title: ""Activity Diagrams"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Activity Diagrams.md,1669012068764,Activity Diagram,Activity Diagram Flow chart of activities performed by the system.
Activity Diagrams.md,1669012068764,Swimlanes,Swimlanes _Partition_ an activity diagram to show who is doing which action.
Activity Diagrams.md,1669012068764,Parallel Paths,"Parallel Paths **_Fork_ nodes indicate the start of concurrent flows of control.** **_Join_ nodes indicate the end of parallel paths.** In a set of parallel paths, execution along **all parallel paths should be complete before the execution can start on the outgoing control flow of the _join_.** > [!EXAMPLE] > In this activity diagram (from an online shop website) the actions _User browses products_ and _System records browsing data_ happen in parallel. Both of them need to finish before the _log out_ action can take place. > >"
Activity Diagrams.md,1669012068764,Examples,Examples
A-Star Search.md,1669012068762,---,"--- title: ""A-Star Search"" date: 2022-11-08 lastmod: 2022-11-21 ---"
A-Star Search.md,1669012068762,A* Search,"A* Search Combines [Greedy Best First Search](Notes/Greedy%20Best%20First%20Search.md) h(n) with [Uniform Cost Search](Notes/Uniform%20Cost%20Search.md) g(n) Evaluation function $$f(n)=g(n)+h(n)$$ **Remember to take the full path cost in calculating g(n) for a node** Optimality: Optimal with a *admissible heuristic* Time Complexity: Exponential in length of solution Space Complexity: Exponential in length of solution [Admissible Heuristic](https://en.wikipedia.org/wiki/Admissible_heuristic) for every node n, it underestimates the cost of getting from n to the closest goal node __there is no path from n to a goal that has path cost less than h(n)__. It prevents A* from skipping the optimal solution. > [!Example] Suppose you're trying to [drive from Chicago to New York](https://maps.google.co.uk/maps?q=Chicago+to+New+York&saddr=Chicago&daddr=New+York&hl=en&ll=41.294317,-80.81543&spn=11.071941,20.302734&sll=41.656497,-82.155762&sspn=11.010616,20.302734&geocode=FWICfwIdGuDG-inty_TQPCwOiDEAwMAJrabgrw%3BFVA6bQIdS8KW-yk7CD_TpU_CiTFi_nfhBo8LyA&t=h&z=6) and your heuristic is what your friends think about geography. If your first friend says, ""Hey, Boston is close to New York"" (underestimating), then you'll waste time looking at routes via Boston. Before long, you'll realise that any [sensible route from Chicago to Boston](https://maps.google.co.uk/maps?saddr=Chicago&daddr=Boston&hl=en&ll=42.228517,-79.343262&spn=10.912859,20.302734&sll=40.63063,-73.87207&sspn=11.183158,20.302734&geocode=FWICfwIdGuDG-inty_TQPCwOiDEAwMAJrabgrw%3BFZ9WhgIdw7bD-ykbMT0NLWXjiTGg6GIBJL98eA&t=h&mra=ls&z=6) already gets fairly close to New York before reaching Boston and that actually going via Boston just adds more miles. So you'll stop considering routes via Boston and you'll move on to find the optimal route. Your underestimating friend cost you a bit of planning time but, in the end, you found the right route. Guaranteed to expand no more nodes than UCS: Heuristics guide the search towards the goal node which prevents expansion of redundant nodes. Where heuristic h(n) = 0, it will expand the same number as UCS."
A-Star Search.md,1669012068762,Example Graphs,Example Graphs
Alignment Problem.md,1669012068759,---,"--- title: ""Alignment Problem"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Alignment Problem.md,1669012068759,Alignment Problem,Alignment Problem
Alignment Problem.md,1669012068759,Problem Formulation,"Problem Formulation Let $n1$ and $n2$ represent the position of the character in the respective subsequence S1 and S2. The cost to align characters up till $n1$ and $n2$ can be found by finding the solutions to the cost of aligning characters $n1-1$ or $n2-1$. If the last 2 characters are equal, the cost to align is simply the cost to align the rest of the $n1-1$ and $n2-1$ characters. If they are not equal, we can ignore 1 character, either from n1 or n2 (_resulting in $n1-1$ or $n2-1$_) by replacing it with an underscore: _resulting in a +1 cost_. Take the minimum of this 2."
Alignment Problem.md,1669012068759,Strategy,Strategy
Alignment Problem.md,1669012068759,Pseudocode,Pseudocode
Angular.md,1669012068752,---,"--- title: ""Angular"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Angular.md,1669012068752,Angular,Angular A frontend development platform built on [TypeScript](Notes/TypeScript.md).
Angular.md,1669012068752,Creating components,Creating components ```console ng generate component <name> ng g c <name> ```
Angular.md,1669012068752,Defining metadata,"Defining metadata A file in the form of `<name>.component.ts` will be generated. ```typescript @Component({ selector: 'app-hero-list', templateUrl: './hero-list.component.html', providers: [ HeroService ] }) export class HeroListComponent implements OnInit { /* . . . */ } ``` `selector` A CSS selector that tells Angular to create and insert an instance of this component wherever it finds the corresponding tag in template HTML. For example, if an application's HTML contains `<app-hero-list></app-hero-list>`, then Angular inserts an instance of the `HeroListComponent` view between those tags. `templateUrl` The module-relative address of this component's HTML template. Alternatively, you can provide the HTML template inline, as the value of the `template` property. This template defines the component's _host view_. `providers` An array of [providers](https://angular.io/guide/glossaryprovider) for services that the component requires. In the example, this tells Angular how to provide the `HeroService` instance that the component's constructor uses to get the list of heroes to display."
Angular.md,1669012068752,Templating,Templating In a file in the form of `<name>.component.html`.
Angular.md,1669012068752,Data Binding,Data Binding
Angular.md,1669012068752,2 way binding,2 way binding
Angular.md,1669012068752,Pipes,"Pipes We can use pipes to transform values into a specific display format in our view. Angular defines various pipes, such as the [date](https://angular.io/api/common/DatePipe) pipe and [currency](https://angular.io/api/common/CurrencyPipe) pipe; for a complete list, see the [Pipes API list](https://angular.io/api?type=pipe). You can also define new pipes. To specify a value transformation in an HTML template, use the [pipe operator (`|`)](https://angular.io/guide/pipes): ``` {{interpolated_value | pipe_name}} ```"
Angular.md,1669012068752,Directives,"Directives Angular templates are dynamic. When Angular renders them, it transforms the DOM according to the instructions given by directives. A directive is a class with a ``@Directive()`` decorator."
Angular.md,1669012068752,Structural directives,"Structural directives They alter layout by adding, removing, and replacing elements in the DOM. [Guide]() | Directive | Details | | -------------------------------------------------------------- | ------------------------------------------------------------------------------------- | | [`*ngFor`](https://angular.io/guide/built-in-directivesngFor) | An iterative; it tells Angular to stamp out one `<li>` per hero in the `heroes` list. | |[`*ngIf`](https://angular.io/guide/built-in-directivesngIf) |A conditional; it includes the `HeroDetail` component only if a selected hero exists.| | ```typescript <li *ngFor=""let hero of heroes""></li> <app-hero-detail *ngIf=""selectedHero""></app-hero-detail> ```"
Angular.md,1669012068752,Attribute directives,"Attribute directives They alter the appearance or behavior of an existing element. In templates they look like regular HTML attributes, hence the name. [Guide](https://angular.io/guide/attribute-directives) | Directive | Details | | --------- | ------- | | [ngModel](https://angular.io/api/forms/NgModel) | `ngModel` modifies the behavior of an existing element (typically `<input>`) by setting its display value property and responding to change events. |"
Angular.md,1669012068752,Services,Services ``` ng generate service <name> ng g s <name> ```
Angular.md,1669012068752,Dependency Injection,"Dependency Injection Angular uses [Dependency Injection](Notes/Dependency%20Injection.md) to increase modularity. Use [](Notes/Dependency%20Injection.mdConstructor%20injection%20%7CConstructor%20Injection) to utilize a service: ```typescript export class ProductDetailsComponent implements OnInit { constructor( private route: ActivatedRoute, private cartService: CartService ) { } } ```"
Angular.md,1669012068752,Hitting APIs,Hitting APIs Configure [HTTPModule](https://angular.io/start/start-dataconfigure-appmodule-to-use-httpclient) Data is passed from services to components via [Observables](https://angular.io/guide/observables)
Angular.md,1669012068752,[Get](https://angular.io/guide/httprequesting-data-from-a-server),[Get](https://angular.io/guide/httprequesting-data-from-a-server) ```typescript class SomeService{ constructor(private http: HttpClient){} get(): Observable<Task[]>{ return this.http.get<Task[]>(this.apiUrl) } } ```
Angular.md,1669012068752,[Post](https://angular.io/guide/httpmaking-a-post-request),[Post](https://angular.io/guide/httpmaking-a-post-request)
Angular.md,1669012068752,[Delete](https://angular.io/guide/httpmaking-a-delete-request),[Delete](https://angular.io/guide/httpmaking-a-delete-request)
Angular.md,1669012068752,[Put](https://angular.io/guide/httpmaking-a-put-request),[Put](https://angular.io/guide/httpmaking-a-put-request)
Angular.md,1669012068752,[Error handling](https://angular.io/guide/httphandling-request-errors),[Error handling](https://angular.io/guide/httphandling-request-errors)
Angular.md,1669012068752,RxJS Observables,"RxJS Observables Makes use of the [Observer Pattern](Notes/Observer%20Pattern.md). <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/T9wOu11uU6U"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"" allowfullscreen></iframe> We use `Observables` when there is some stream of data that is changing and we have multiple subscribers that want to change when there is some new data. RxJS provides multiple functions to modify how often we want to call next(), in what ways to format the data etc."
Allocators.md,1689950037140,---,"--- title: ""Allocators"" date: 2023-07-15 ---"
Allocators.md,1689950037140,Allocators,"Allocators A kernel often also requires support for heap allocation. With the support of [Paging](Notes/Memory%20Organisation.mdPaging), we can define a virtual memory range and map it to physical frames. Now, all we need is an allocator. The job of an allocator is to manage the available heap memory. It needs to return unused memory on `alloc` calls and keep track of memory freed by `dealloc` so that it can be reused again. Most importantly, it must never hand out memory that is already in use somewhere else because this would cause undefined behavior."
Allocators.md,1689950037140,Bump Allocator,"Bump Allocator The idea behind a bump allocator is to linearly allocate memory by increasing (_“bumping”_) a `next` variable, which points to the start of the unused memory. At the beginning, `next` is equal to the start address of the heap. On each allocation, `next` is increased by the allocation size so that it always points to the boundary between used and unused memory:"
Allocators.md,1689950037140,Pros and Cons,"Pros and Cons - The big advantage of bump allocation is that it’s very fast. Compared to other allocator designs that need to actively look for a fitting memory block and perform various bookkeeping tasks on `alloc` and `dealloc`, a bump allocator [can be optimized](https://fitzgeraldnick.com/2019/11/01/always-bump-downwards.html) to just a few assembly instructions. This makes bump allocators useful for optimizing the allocation performance, for example when creating a [virtual DOM library](https://hacks.mozilla.org/2019/03/fast-bump-allocated-virtual-doms-with-rust-and-wasm/). - The main limitation of a bump allocator is that it can only reuse deallocated memory after all allocations have been freed. This means that a single long-lived allocation suffices to prevent memory reuse. ```rust fn many_boxes_long_lived() { let long_lived = Box::new(1); // new for i in 0..HEAP_SIZE { let x = Box::new(i); assert_eq!(*x, i); } assert_eq!(*long_lived, 1); // new } ```"
Allocators.md,1689950037140,Tricks,"Tricks - We could update `dealloc` to check whether the freed allocation was the last allocation returned by `alloc` by comparing its end address with the `next` pointer. In case they’re equal, we can safely reset `next` back to the start address of the freed allocation. This way, each loop iteration reuses the same memory block. - We could add an `alloc_back` method that allocates memory from the _end_ of the heap using an additional `next_back` field. Then we could manually use this allocation method for all long-lived allocations, thereby separating short-lived and long-lived allocations on the heap. Note that this separation only works if it’s clear beforehand how long each allocation will live. Another drawback of this approach is that manually performing allocations is cumbersome and potentially unsafe."
Allocators.md,1689950037140,The fundamental issue,"The fundamental issue A bump allocator cant effectively reuse freed memory regions. There are a total of 5 unused memory regions, but the next pointer only gives us access to the last one. We could store all the unused regions in a constant sized array but we would not be able to know what size or how large it could get. We can't use dynamic data structures, because then our heap allocator would depend on itself."
Allocators.md,1689950037140,Linked List Allocator,"Linked List Allocator A linked list can be used to keep track of the freed areas of memory, hence the name, *free list*. Freed blocks should be merged together, else it will result in increasing and increasing fragmentation:"
Allocators.md,1689950037140,Pros and Cons,"Pros and Cons - Able to reuse freed memory - Poorer performance: list length depends on the number of unused memory blocks, the performance can vary extremely for different programs. A program that only creates a couple of allocations will experience relatively fast allocation performance. For a program that fragments the heap with many allocations, however, the allocation performance will be very bad because the linked list will be very long and mostly contain very small blocks."
Allocators.md,1689950037140,Fixed Size Block Allocator,"Fixed Size Block Allocator Instead of allocating exactly as much memory as requested, we define a small number of block sizes and round up each allocation to the next block size. For example, with block sizes of 16, 64, and 512 bytes, an allocation of 4 bytes would return a 16-byte block, an allocation of 48 bytes a 64-byte block. Like the linked list allocator, we keep track of the unused memory by creating a linked list in the unused memory. However, instead of using a single list with different block sizes, we create a separate list for each size class. The property that each region in the list is the same size makes for some efficient allocations: 1. Round up the requested allocation size to the next block size. For example, when an allocation of 12 bytes is requested, we would choose the block size of 16 in the above example. 2. Retrieve the head pointer for the list, e.g., for block size 16, we need to use `head_16`. 3. Remove the first block from the list and return it. Most notably, we can always return the first element of the list and no longer need to traverse the full list. Thus, allocations are much faster than with the linked list allocator. Deallocations also work the same way, by rounding up the freed size and adding the region to the head of the list, we avoid traversing the entire list."
Allocators.md,1689950037140,Fallback Allocator,"Fallback Allocator A fallback allocator like a linked list allocator for allocation sizes which are rare, can reduce memory waste. Since only very few allocations of that size are expected, the linked list would stay small and the (de)allocations would still be reasonably fast."
Allocators.md,1689950037140,Variations,Variations
Allocators.md,1689950037140,Slab allocator,"Slab allocator Use block sizes that directly correspond to selected types in the kernel. This way, allocations of those types fit a block size exactly and no memory is wasted. Sometimes, it might be even possible to preinitialize type instances in unused blocks to further improve performance. Slab allocation is often combined with other allocators. For example, it can be used together with a fixed-size block allocator to further split an allocated block in order to reduce memory waste. It is also often used to implement an [object pool pattern](https://en.wikipedia.org/wiki/Object_pool_pattern) on top of a single large allocation."
Allocators.md,1689950037140,Buddy allocator,"Buddy allocator Instead of using a linked list to manage freed blocks, the [buddy allocator](https://en.wikipedia.org/wiki/Buddy_memory_allocation) design uses a [binary tree](https://en.wikipedia.org/wiki/Binary_tree) data structure together with power-of-2 block sizes. When a new block of a certain size is required, it splits a larger sized block into two halves, thereby creating two child nodes in the tree. Whenever a block is freed again, its neighbor block in the tree is analyzed. If the neighbor is also free, the two blocks are joined back together to form a block of twice the size. The advantage of this merge process is that [external fragmentation](https://en.wikipedia.org/wiki/Fragmentation_(computing)External_fragmentation) is reduced so that small freed blocks can be reused for a large allocation. It also does not use a fallback allocator, so the performance is more predictable. The biggest drawback is that only power-of-2 block sizes are possible, which might result in a large amount of wasted memory due to [internal fragmentation](https://en.wikipedia.org/wiki/Fragmentation_(computing)Internal_fragmentation). For this reason, buddy allocators are often combined with a slab allocator to further split an allocated block into multiple smaller blocks."
Arrays and Slices.md,1669012068756,---,"--- title: ""Arrays and Slices"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Arrays and Slices.md,1669012068756,Arrays and Slices,Arrays and Slices
Arrays and Slices.md,1669012068756,Arrays,Arrays
Arrays and Slices.md,1669012068756,Slices,Slices
Arrays and Slices.md,1669012068756,Slice reallocation,Slice reallocation
Arrays and Slices.md,1669012068756,Deletion,Deletion
Arrays and Slices.md,1669012068756,Pitfalls,"Pitfalls Since a slice is a pointer to an array, passing a slice into a function will allow the function to modify the original array. However, if slice reallocation occurs inside this new function, the function will no longer be modifying the original array: To avoid this, we can return the modified array from the func"
Arrays and Slices.md,1669012068756,Goroutine Unsafety,Goroutine Unsafety
ASP.NET Web API.md,1669012068749,---,"--- title: ""ASP.NET Web API"" date: 2022-11-08 lastmod: 2022-11-21 ---"
ASP.NET Web API.md,1669012068749,ASP.NET Web API,ASP.NET Web API
ASP.NET Web API.md,1669012068749,Routing,Routing Routing uses `APIController` Add an attribute to a controller class to tell the compiler that this is an `APIController` ```csharp [APIController] public class TicketsController: ControllerBase{} ```
ASP.NET Web API.md,1669012068749,Route pattern using _attribute binding_,"Route pattern using _attribute binding_ - `IActionResult` is a generic interface to encapsulate all the return types such as XML, JSON. - Use the route method attribute to set the endpoint ```csharp [HTTPGet] [Route(""api/tickets"")] public IActionResult GetTickets(){ return Ok(""Reading tickets""); } //with interpolation [HTTPGet] [Route(""api/tickets/{id}"")] public IActionResult GetTicket(int id){ return Ok($""Reading tickets {id}""); } ``` Can also define the route based on the controller name at the class level ```csharp [APIController] [Route(""api/[controller]"")] public class TicketsController: ControllerBase{ [HTTPGet] public IActionResult GetTickets(){ return Ok(""Reading tickets""); } [HTTPGet(""{id}"")] public IActionResult GetTicket(int id){ return Ok($""Reading tickets {id}""); } } ```"
ASP.NET Web API.md,1669012068749,Route pattern using _model binding_,"Route pattern using _model binding_ [Primitive type binding](https://docs.microsoft.com/en-us/aspnet/core/mvc/models/model-binding?view=aspnetcore-6.0sources) - `FromRoute` - `FromQuery`: specifies that this attribute must come from the query string ```csharp [HTTPGet] [Route(""/api/projects/{pid}/tickets"")] //slash at the start indicates from root rather than the controller route defined in the class-level public IActionResult GetTicketFromProject(int pid, [FromQuery] int tid){ if (tid == 0){ return Ok($""Reading all tickets belonging to project {pid}""); } else { return Ok($""Reading project {pid}, tickets {id}""); } } ``` Using a complex type: ```csharp public class Ticket{ [FromQuery(Name=""tid"")] public int TicketId {get; set;} [FromRoute(Name=""pid"")] public int ProjectId {get; set;} } [HTTPGet(""{id}"")] [Route(""/api/projects/{pid}/tickets"")] public IActionResult GetTicketFromProject(Ticket ticket){ if (ticket.TicketId == 0){ return Ok($""Reading all tickets belonging to project {ticket.ProjectId}""); } else { return Ok($""Reading project {ticket.ProjectId}, tickets {ticket.TickedId}""); } } ```"
ASP.NET Web API.md,1669012068749,Post Routes,Post Routes ```csharp [HttpPost] public IActionResult Post([FromBody] Ticket ticket){ return Ok(ticket); //automatically serializes the body into JSON } ```
ASP.NET Web API.md,1669012068749,Validation,"Validation [Data Annotations](https://docs.microsoft.com/en-us/aspnet/core/mvc/models/model-binding?view=aspnetcore-6.0sources) Place validation on the mode attributes ```csharp public class Ticket{ [Required] public int TicketId {get; set;} [Required] public int ProjectId {get; set;} } ``` Custom validation attributes ```csharp public class Ticket_EnsureDueDateForTicketOwner: ValidationAttribute{ protected override ValidationResult IsValid(Object value, ValidationContext validationContext){ var ticket = validationContext.ObjectInstance as Ticket; if (ticket != null && !string.IsNullOrWhiteSpace(ticket.Owner)){ if (!ticket.DueDate.HasValue){ return new ValidationResult(""Due date is required when ticket has owner""); } } return ValidationResult.Success; } } /**some code **/ public string Owner {get; set;} [Ticket_EnsureDueDateForTicketOwner] public DateTime? DueDate {get; set;} ```"
ASP.NET Web API.md,1669012068749,Filters,Filters How the filter pipeline works:
ASP.NET Web API.md,1669012068749,Action Filters,"Action Filters [Place validation on endpoint routes.](https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-6.0implementation) ```csharp public class ValidateModelAttribute : ActionFilterAttribute { public override void OnActionExecuting(ActionExecutingContext context) { //custom validation on the endpoint if (!context.ModelState.IsValid) { context.ModelState.AddModelError(""SomeKey"", ""Key is missing""); //short circuit the request context.Result = new BadRequestObjectResult(context.ModelState); } } } [HttpPost] [ValidateModelAttribute] public IActionResult Post([FromBody] Ticket ticket){ return Ok(ticket); //automatically serializes the body into JSON } ```"
ASP.NET Web API.md,1669012068749,[Resource Filters](https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-6.0resource-filters),"[Resource Filters](https://docs.microsoft.com/en-us/aspnet/core/mvc/controllers/filters?view=aspnetcore-6.0resource-filters) Useful to short-circuit the rest of the pipeline such as during versioning and caching ```csharp public class Version1DiscontinueResourceFilter : Attribute, IResourceFilter{ public void OnResourceExecuting(ResourceExecutingContext context){ if (path contains v1){ contex.Result = new BadRequestObjectResut( new { Versioning = new[] {""This API Version is discontinued""} } ); } } } ```"
Application Layer.md,1676020278712,---,"--- title: ""Application Layer"" date: 2023-01-18 lastmod: 2023-01-19 ---"
Application Layer.md,1676020278712,Application Layer,Application Layer
Application Layer.md,1676020278712,Network Application Architectures,Network Application Architectures Examples: - Webmail Examples: - File sharing - Bittorrent
Application Layer.md,1676020278712,CS vs P2P File Distribution,CS vs P2P File Distribution
Application Layer.md,1676020278712,Client Server,"Client Server - The server must transmit one copy of the file to each of the N peers. Thus the server must transmit NF bits. Since the server’s upload rate is us, the time to distribute the file must be at least NF/us. - Let $d_{min}$ denote the download rate of the peer with the lowest download rate, that is, $d_{min} = min \{d1, dp, . . . , dN\}$. The peer with the lowest download rate cannot obtain all F bits of the file in less than $F/d_{min}$ seconds. Thus the minimum distribution time is at least $F/d_{min}$. $$D_{CS} \ge max\{\frac{NF}{u_s},\frac{F}{d_{min}}\}$$ From this we can observe that distribution time increases linearly with the number of peers N."
Application Layer.md,1676020278712,P2P,"P2P - To get this file into the community of peers, the server must send each bit of the file at least once into its access link. Thus, the minimum distribution time is at least F/us. - The peer with the lowest download rate cannot obtain all F bits of the file in less than $F/d_{min}$ seconds. - The total upload capacity of the system as a whole is equal to the upload rate of the server plus the upload rates of each of the individual peers, that is, $u_{total} = u_s + u_1 + ... + u_N$. The system must upload F bits to each of the N peers, thus delivering a total of NF bits. The minimum distribution time is also at least $NF/(u_s + u_1 + ... + uN)$."
Application Layer.md,1676020278712,Process Communication,Process Communication Network applications on different hosts need a way to communicate with each other (sometimes across different operating systems). Client: process that initiates the communication Server: the other part of the pair
Application Layer.md,1676020278712,Addressing Processes,"Addressing Processes To receive messages, a process must have an identifier. Each host has a unique IP address but this is not enough as there are many processes which can be running on the same host. A **port number** is needed to identify the receiving process/socket: HTTP server: 80 Mail server: 25"
Application Layer.md,1676020278712,Transport Service Requirements,"Transport Service Requirements 1. Data integrity: the amount of fault tolerance an application needs 2. Throughput: rate which sending process can deliver bits to receiver. Because communication lines are shared, some bandwidth-sensitive applications (such as multimedia) may need a set throughout value. 3. Timing: the amount of latency. An example guarantee might be that every bit that the sender pumps into the socket arrives at the receiver’s socket no more than 100 msec later. Such a service would be appealing to interactive real-time applications, such as multiplayer games. 4. Security: encryption"
Application Layer.md,1676020278712,Application Layer Protocols,"Application Layer Protocols An application layer protocol defines: - The types of messages exchanged, for example, request messages and response messages - The syntax of the various message types, such as the fields in the message and how the fields are delineated - The semantics of the fields, that is, the meaning of the information in the fields - Rules for determining when and how a process sends messages and responds to messages"
Application Layer.md,1676020278712,HTTP,HTTP The Web's application layer protocol is [HTTP](Notes/HTTP.md).
Application Layer.md,1676020278712,Electronic Mail,"Electronic Mail A typical message starts its journey in the sender’s user agent, travels to the sender’s mail server, and travels to the recipient’s mail server, where it is deposited in the recipient’s mailbox. When Bob wants to access the messages in his mailbox, the mail server containing his mailbox authenticates Bob (with usernames and passwords) Each user agent uses a separate mail server rather than directly connecting with each other such that there is some recourse (able to keep retrying to send a message) when the destination is currently unreachable."
Application Layer.md,1676020278712,SMTP,"SMTP The heart of Internet electronic mail is [[SMTP]], which allows for the transfer of messages."
Application Layer.md,1676020278712,Mail Access Protocol,Mail Access Protocol SMTP is a push protocol. Mail access protocols are needed to retrieve mail from the mail server via a pull operation: - [[POP3]]
Application Layer.md,1676020278712,DNS,DNS Many application protocols are built on top of [[DNS]].
Application Layer.md,1676020278712,BitTorrent,BitTorrent The most popular P2P protocol for file distribution is [[BitTorrent]]
Application Layer.md,1676020278712,Distributed Hash Table,Distributed Hash Table Another application of P2P is a [[Distributed Hash Table]]
Application Layer.md,1676020278712,Socket Programming,Socket Programming How are network applications actually created? Processes running on different machines communicate with each other through sockets.
Application Layer.md,1676020278712,TCP,"TCP Network applications may communicate through TCP, and hence the connection socket must support TCP. TCP provides a reliable **byte-stream** service: Basic byte I/O classes in Java"
Application Layer.md,1676020278712,Client,Client The client must perform the following operations: 1. Open TCP connection to the server 2. Send data 3. Receive data on the connection 4. Close the connection
Application Layer.md,1676020278712,Server,Server
Application Layer.md,1676020278712,Encoding/Decoding,"Encoding/Decoding To transfer a string between two processes over the network, we must decide how to represent the string as a sequence of bytes."
Application Layer.md,1676020278712,ASCII,ASCII
Application Layer.md,1676020278712,UTF-8,UTF-8 - Unicode Transformation Format – 8-bit - Variable length encoding - Up to four bytes per symbol - The first 128 are the same as for ASCII - Backwards compatibility – ASCII text is also valid UTF-8 - Dominating format on the Web
Application Layer.md,1676020278712,Helpful Java classes,Helpful Java classes
B+ Tree Index.md,1669012068747,---,"--- title: ""B+ Tree Index"" date: 2022-11-08 lastmod: 2022-11-21 ---"
B+ Tree Index.md,1669012068747,B+ Tree Index,B+ Tree Index Idea: build a multi-layer index in the structure of a [B-tree](Notes/B-tree.md) > [!Properties] > 1. Each tree node is stored within a *block* > 2. Each node stores at most n+1 pointers and n keys > 3. Each level is an index > - sorted within each node > - sorted across nodes at the same level > [!Leaf Node Properties] > Consider a leaf node storing k+1 pointers (k <= n) and k keys > 1. First k pointers are to records and last one is to the next leaf node > 2. Each key is equal to the key that its corresponding pointer is pointing to in the record > [!Internal Node Properties] > Consider an internal node storing k+1 pointers (k <= n) and k keys > 1. The ith key is the lower bound of the range of the i+1 pointer. The 2nd pointer points to a subtree that has the first key as the first element: >
B+ Tree Index.md,1669012068747,Validity,Validity
B+ Tree Index.md,1669012068747,Searching,"Searching 1. If search key is greater than ith key, follow i+1 pointer, else follow ith pointer"
B+ Tree Index.md,1669012068747,Insertion,"Insertion 1. Find which node to insert record 2. If node is not full, insert the record (maintain sorted order) 3. Else 1. Split the node and distribute the keys 2. If no parent, create root node 3. Else, insert into parent recursively until no splits needed"
B+ Tree Index.md,1669012068747,Deletion,"Deletion Case 1: key can be deleted while maintaining constraints Case 2: key can be borrowed from sibling nodes, e.g. take 16 from left: Case 3: cannot borrow from siblings 1. Merge 2 nodes and delete 1 of them 2. Delete key from parent (one child is removed, may need to remove key from parent) 3. Recursively apply delete on this parent if it is not full enough"
B+ Tree Index.md,1669012068747,Construction,Construction One way to do so is through a series of insert operations No sequential storage of leaf nodes but is more suitable for dynamic data.
B+ Tree Index.md,1669012068747,Bulk Loading,"Bulk Loading Leaves will be stored sequentially, will work for static data (all data is known before hand) 1. Sort all data entries based on search key 2. Start by creating all leaf nodes by packing the keys 3. Insert internal nodes bottom up <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/HJgXVxsO5YU?start=160"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"" allowfullscreen></iframe>"
B+ Tree Index.md,1669012068747,Practice Problems,"Practice Problems a. 1. 5 2. 5 3. 4 4. 4 b. 1. Min key + 1 = 3 2. Min key + 1 = 3 3. Original keys: n. After split at least $\lfloor(n/2)\rfloor=2$ 4. $\lfloor(n+1)/2\rfloor=2$ Height of the B+tree will be $log_{150}1000000=2.75$ Need at least 3 levels in the btree At most will take 3 I/O + 1 I/O to access data block - Each internal node can index 151 children - Last level indexes 150 records a. i. Data level: 1,000,000 records -> 100,000 blocks Leaf level will have 1,000,000 pointers = $1000000/70\approx14286$ blocks 2nd level will have $14286/70\approx205$ blocks 3rd level will have $205/70\approx3$ blocks Root will have 1 block Total blocks: $100000+14286+205+3=114494$ Height of the B-tree will at least be: $log_{70}1000000=3.25\approx4$ Number of I/O = 4+1 = 5 b. Same as a. Since dense index, order of the data record does not matter c. What if a) but sparse index? Leaf level will have 100,000 pointers to each data record block: $100000/70=1429$ blocks 2nd level: $1429/70\approx21$ Root: 1 Total blocks: $100,000+1429+21+1=101451$"
B-tree.md,1669012068744,---,"--- title: ""B-tree"" date: 2022-11-08 lastmod: 2022-11-21 ---"
B-tree.md,1669012068744,B-tree,B-tree A self balancing tree data structure. Consider a B-Tree of order n (here we use example of [](Notes/Conventional%20Indexes.mdB%20Tree%20Index%7CB+Tree)): - Each orange box is a key - Each blue line is a pointer to subtree This also means that each internal node has at least $\lfloor{n/2}\rfloor$ +1 children
B-tree.md,1669012068744,Practice Problems,Practice Problems a. 1. Interior node min keys: $\lfloor(n/2)\rfloor=5$ Min pointers: $min\ key + 1 = 5+1=6$ 2. Leaf node min key: $\lfloor(n+1/2)\rfloor=5$ Min pointers: $min\ key + 1 = 5+1=6$ b. 1. Interior node min key: 5 Min pointers: 5+1 =6 2. Leaf node min key: 6 Min pointers: $min\ key + 1 = 6+1=7$
Asynchronous Programming.md,1689954808532,---,"--- title: ""Asynchronous Programming"" date: 2023-07-21 ---"
Asynchronous Programming.md,1689954808532,Asynchronous Programming,Asynchronous Programming
Asynchronous Programming.md,1689954808532,Futures,Futures
Binary Search Tree.md,1669012068740,---,"--- title: ""Binary Search Tree"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Binary Search Tree.md,1669012068740,Binary Search Trees,"Binary Search Trees _BST property: Let x be a node in a binary search tree. If y is a node in the left subtree of x, then y.key $\le$ x.key. If y is a node in the right subtree of x, then y.key $\ge$ x.key._ Inorder tree traversal of the BST will produce a sorted list."
Binary Search Tree.md,1669012068740,Operations,Operations
Binary Search Tree.md,1669012068740,Searching,Searching We are able to search for a specific key in O(h) time where h is the height of the tree. The BST property allows us to perform binary search.
Binary Search Tree.md,1669012068740,Min and Max,Min and Max The smallest node is rooted at the left most part of the the tree and is symmetric for the largest node. ``` while x.left != null x = x.left return x ```
Binary Search Tree.md,1669012068740,Successors,"Successors We are able to find the successor of a node without any key comparisons: 1. If there is a right subtree to the node x, the successor is the leftmost element in the right subtree 2. Else, the successor is the parent of the first element which is a left child when traversing upwards through the tree. ``` if x.right != null return Tree-Min(x.right) y = x.p while y != null and x == y.right x = y y = y.p return y ```"
Binary Search Tree.md,1669012068740,Insert,"Insert The procedure maintains the trailing pointer y as the parent of x. After initialization, the while loop in lines 2-6 causes these two pointers to move down the tree, going left or right depending on the key comparison, until x becomes NIL. We need the trailing pointer y, because by the time we find the NIL where z belongs, the search has proceeded one step beyond the node that needs to be changed. Lines 7–10 set the pointers that cause z to be inserted. ``` x = root while x y = x if z.key < x.key x = x.left else x = x.right z.p = y if y == null root = y else if z.key < y.key y.left = z else y.right = z ```"
Binary Tree.md,1669012068742,---,"--- title: ""Binary Tree"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Binary Tree.md,1669012068742,Binary Trees,"Binary Trees A tree structure in which each node has at most 2 children. ```javascript class TreeNode { constructor(val, left, right, parent){ this.val = val this.left = left this.right = right // this.parent = parent } } ```"
Binary Tree.md,1669012068742,Properties of binary trees,"Properties of binary trees Full binary trees (all nodes have 0 or 2 children) - The number of nodes n in a full binary tree is at least $2h+1$ and at most $2^{h+1}-1$, where h is the height of the tree. A tree consisting of only a root node has a height of 0. - The number of leaf nodes l in a perfect binary tree, is $\frac{(n+1)}{2}$ because the number of non-leaf (a.k.a. internal) nodes $\sum _{k=0}^{\log _{2}(l)-1}2^{k}=2^{\log _{2}(l)}-1=l-1$. - This means that a full binary tree with l leaves has $2l-1$ nodes. Complete binary trees (like those use in [Heaps](Notes/Heaps.md)) - A complete binary tree has $\lfloor n/2 \rfloor$ internal nodes"
Binary Tree.md,1669012068742,Array representation,Array representation For a node at index i: - Left child: 2i + 1 - Right child: 2i + 2 - Parent: $\lfloor{\frac{(i-1)}{2}}\rfloor$
Bitmap.md,1669012068737,---,"--- title: ""Bitmap"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Bitmap.md,1669012068737,Bitmap,"Bitmap A bitmap is a mapping from some domain (for example, a range of integers) to bits. It is stored in a bit array."
Bitmap.md,1669012068737,Bit Array,Bit Array An array that compactly stores bits.
Binary Exponential Backoff.md,1677622813843,---,"--- title: ""Binary Exponential Backoff"" date: 2023-02-28 ---"
Binary Exponential Backoff.md,1677622813843,Binary Exponential Backoff,"Binary Exponential Backoff 1. A random value $K$ is chosen at random from an interval from $\{0,1,2,...2^{n-1}\}$ 2. As n increases, the size of the set grows exponentially and a larger number is more likely to be chosen"
Bellman-Ford Algorithm.md,1676035353052,---,"--- title: ""Bellman-Ford Algorithm"" date: 2023-02-10 ---"
Bellman-Ford Algorithm.md,1676035353052,Bellman-Ford Algorithm,"Bellman-Ford Algorithm A single source shortest path algorithm which allows for negative edge weights in the graph unlike [Dijkstra's Algorithm](Notes/Dijkstra's%20Algorithm.md). This algorithm only works if there is no negative cycle in the graph. <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/obWXjtg0L64"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share"" allowfullscreen></iframe> ```c function BellmanFord(list vertices, list edges, vertex source) is // This implementation takes in a graph, represented as // lists of vertices (represented as integers [0..n-1]) and edges, // and fills two arrays (distance and predecessor) holding // the shortest path from the source to each vertex distance := list of size n predecessor := list of size n // Step 1: initialize graph for each vertex v in vertices do distance[v] := inf // Initialize the distance to all vertices to infinity predecessor[v] := null // And having a null predecessor distance[source] := 0 // The distance from the source to itself is, of course, zero // Step 2: relax edges repeatedly repeat |V|−1 times: for each edge (u, v) with weight w in edges do if distance[u] + w < distance[v] then distance[v] := distance[u] + w predecessor[v] := u // Step 3: check for negative-weight cycles for each edge (u, v) with weight w in edges do if distance[u] + w < distance[v] then // Step 4: find a negative-weight cycle negativeloop := [v, u] repeat |V|−1 times: u := negativeloop[0] for each edge (u, v) with weight w in edges do if distance[u] + w < distance[v] then negativeloop := concatenate([v], negativeloop) find a cycle in negativeloop, let it be ncycle // use any cycle detection algorithm here error ""Graph contains a negative-weight cycle"", ncycle return distance, predecessor ```"
BitTorrent.md,1674145601178,---,"--- title: ""BitTorrent"" date: 2023-01-19 ---"
BitTorrent.md,1674145601178,BitTorrent,"BitTorrent The collection of all peers participating in the distribution of a particular file is called a torrent. - Peers in a torrent download equal-size chunks of the file from one another, with a typical chunk size of 256 kbytes. - When a peer first joins a torrent, it has no chunks. Over time it accumulates more and more chunks. - While it downloads chunks it also uploads chunks to other peers. Once a peer has acquired the entire file, it may (selfishly) leave the torrent, or (altruistically) remain in the torrent and continue to upload chunks to other peers. The tracker keeps track of the peers that are participating in the torrent. A user will receive a subset of peers from the tracker of which they will establish concurrent TCP connections. These are the *neighbouring peers*."
BitTorrent.md,1674145601178,Requesting Chunks,Requesting Chunks 1. Alice issue requests for the list of chunks neighbouring peers have. 2. Alice will use **rarest first** to determine the chunks that are the rarest among the neighbours and then request those chunks first
BitTorrent.md,1674145601178,Sending Chunks (tit-for-tat),"Sending Chunks (tit-for-tat) An incentive based trading algorithm: 1. For each neighbour, continually measure the rate which she receive bits and determine the peers which are sending at the highest rate. 2. Send chunks to these **unchoked peers**. 3. Every 10 seconds, recalculate the rates 4. Every 30 seconds, pick one additional neighbour at random and send it chunks, optimistically unchoking this peer. > [! Consider Alice and an optimistically unchoked Bob] > If the rate Alice is sending data to Bob is high enough, she may become one of Bob's top uploaders. In which case, Bob will begin to send data to Alice. If the rate Bob sends data is high enough, he might become Alice's top uploaders. > > *The effect is, peers capable of uploading at compatible rates tend to find each other.* > > The random neighbour selection allows new peers to get chunks so that they can have something to trade. > > All other peers are choked and do not receive chunks"
Black Box Testing.md,1676224426161,---,"--- title: ""Black Box Testing"" date: 2022-11-08 lastmod: 2023-02-12 ---"
Black Box Testing.md,1676224426161,Black Box Testing,Black Box Testing Testing of requirements and specifications Assumptions: 1. Verifiable requirements (i.e. hire *juniors* on part time basis compared to hire those below 18 years old part time) 2. Testable code > [!NOTE] Test case design: > 1. Formulate the equivalence classes > 2. Break down ECs into boundary values. Remove boundary values which fall into other ECs > - > 3. Create valid test cases using permutation of valid boundary values > - > 4. Create invalid test cases using permutation of invalid boundary values; *only one parameter can be invalid at one time* > - >
Black Box Testing.md,1676224426161,Equivalence Class Testing,Equivalence Class Testing *Equivalence Class*: set of values that produce the same output *Example: We are testing if an alert is sent* Valid ECs will produce a positive output according to specification (i.e. will send an alert) Invalid ECs will produce a negative output (i.e. no alert sent). Error or exception ECs are invalid data ranges or types not within specification (i.e. exception is thrown).
Black Box Testing.md,1676224426161,Boundary Value Testing,"Boundary Value Testing For each EC, there are 3 BVs for the 2 ends of the range: 1. On the value 2. Below the value 3. Above the value Discrete values have no BV."
Bonds.md,1669012068730,---,"--- title: ""Bonds"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Bonds.md,1669012068730,Bonds,Bonds
Bonds.md,1669012068730,Terminologies,"Terminologies Callability: Issuer can redeem the bond before maturity _leads to higher risk to the investor, will mean that the bond will have **relatively higher YTM_. Putability: Buyer can redeem the bond before maturity _leads to higher risk to the __issuer__, will mean that the bond will have **relatively lower YTM_."
Bonds.md,1669012068730,Prices,"Prices > [!NOTE] The relationship between bond prices and interest rates > When interest rate increases, people are able to obtain bonds with higher YTM, this makes the current bond which offers a lower YTM __worth less__: becomes a discount bond. > Vice versa for when interest rates decreases > [!NOTE] Price Relationships > - Higher coupon payments will have less sensitivity to changes in interest rates > - Longer maturities will have higher sensitivity to interest rate changes > - Riskier bonds ($\beta\ is\ higher\ hence\ r_e$ is higher) will have lower price"
Bonds.md,1669012068730,Yield To Maturity,"Yield To Maturity The expected return if one was to hold the bond till maturity > [!IMPORTANT] YTM is different from EAR > 8% semi-annual coupon bond selling at par will have $EAR=(1+0.04)^2-1=8.16\%$ but will have a YTM of 8% > > A bond that has same risk and term as the above, but pays _annual_ coupons instead will have a YTM of 8.16% Depends only on the maturity and risk. If these are the same across 2 bonds, they will have the same effective yield with differing coupon payments."
Bonds.md,1669012068730,Current Yield,Current Yield $$ Current\ Yield=\frac{Annual\ Interest\ Payments}{Current\ Bond\ Price} $$ Zero coupon bonds: Bonds which give out no coupons - Current yield = 0 > [!NOTE] > For all par bonds: > $$YTM=Current\ Yield=Coupon\ Rate$$ > For all discount bonds: > $$YTM>Current\ Yield> Coupon\ Rate$$ > For all premium bonds: > $$YTM<Current\ Yield< Coupon\ Rate$$
Bonds.md,1669012068730,Term Structure (Yield Curve),"Term Structure (Yield Curve) Inflation premium: reflects the health of the economy. _Investors expect inflation to rise in an economy that is doing well_. Real rate: does not affect the slope of the curve, only translates the curve."
Boyer-Moore Algorithm.md,1669012068732,---,"--- title: ""Boyer-Moore Algorithm"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Boyer-Moore Algorithm.md,1669012068732,Boyer-Moore Algorithm,"Boyer-Moore Algorithm Definitions : T denotes the input text to be searched. Its length is n. : P denotes the string to be searched for, called the pattern. Its length is m. Steps 1. Process the text T[1...n] from __left to right 2. Scan the pattern P[1...m] from __right to left. 3. Generate tables to find the maximum steps to slide the pattern after a mismatch 4. For every mismatch, we slide by the maximum amount returned by the 2 heuristics 5. Return once we find the pattern or the pattern does not exist (n-m+1) comparisons"
Boyer-Moore Algorithm.md,1669012068732,Bad character rule (charJump),Bad character rule (charJump) Case 1 (mismatched character does not exist in the rest of the pattern): Case 2 (mismatched character is found in the rest of the pattern):
Boyer-Moore Algorithm.md,1669012068732,Example Array,Example Array
Boyer-Moore Algorithm.md,1669012068732,Pseudocode,Pseudocode Right most occurrence such that we do not over slide.
Boyer-Moore Algorithm.md,1669012068732,Simple BM scan,"Simple BM scan When using charJump only, taking the max of charJump and $m-k+1$ will ensure that we do not _left shift_ the pattern and that we always at least move by 1 character. Example of where we could have made a _left shift_ of the pattern, which is not what we want:"
Boyer-Moore Algorithm.md,1669012068732,Good suffix rule (matchJump),"Good suffix rule (matchJump) Derive maximum shift from the structure of the pattern. Run through each case below in order (1 -> 2 -> 3) so as to shift by the least amount such that a suffix is matched. > [!NOTE] General formula for slide > $(m-k)+(m-q$) > - k is the position of the mismatch > - q is the position of end of the next matching suffix Case 1: matching suffix occurs earlier in the pattern and __preceded by a different character. Case 2: matching suffix occurs at the __start of the pattern. Case 3: __no occurrence of the matching suffix in the rest of the pattern__. Case 4: mismatch on the first character The last character array entry is always 1. We have no information about matching suffixes from the text as the first comparison is already mismatched. Hence, we can only safely shift by 1 character."
Boyer-Moore Algorithm.md,1669012068732,Example Array,Example Array
Boyer-Moore Algorithm.md,1669012068732,Pseudocode,Pseudocode
Boyer-Moore Algorithm.md,1669012068732,Examples,Examples
Breadth First Search.md,1669012068728,---,"--- title: ""Breadth First Search"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Breadth First Search.md,1669012068728,Breadth First Search,Breadth First Search
Breadth First Search.md,1669012068728,Graph Traversal,Graph Traversal _Assuming ties are handled in alphabetical order_ Expansion Order: A > B > C > D > E > G Final Path: A > C > G
Building Blocks of the Internet.md,1676020289801,---,"--- title: ""Building Blocks of the Internet"" date: 2023-01-17 lastmod: 2023-01-17 ---"
Building Blocks of the Internet.md,1676020289801,Building Blocks of the Internet,"Building Blocks of the Internet The internet consists of billions of end systems (hosts), connected together by a network of communication links and packet switches. - Communication links include wired and wireless: copper wires, optical fibre, radio. - Packet switches such as routers transmit packets of data through a route in the network: > Consider, for example, a factory that needs to move a large amount of cargo to some destination warehouse located thousands of kilometers away. At the factory, the cargo is segmented and loaded into a fleet of trucks. Each of the trucks then independently travels through the network of highways, roads, and intersections to the destination warehouse. At the destination warehouse, the cargo is unloaded and grouped with the rest of the cargo arriving from the same shipment. Thus, in many ways, packets are analogous to trucks, communication links are analogous to highways and roads, packet switches are analogous to intersections, and end systems are analogous to buildings. Just as a truck takes a path through the transportation network, a packet takes a path through a computer network."
Building Blocks of the Internet.md,1676020289801,Internet socket interface,"Internet socket interface How does one program running on one end system instruct the Internet to deliver data to another program running on another end system? End systems attached to the Internet provide a socket interface that specifies how a program running on one end system asks the Internet infrastructure to deliver data to a specific destination program running on another end system. This Internet socket interface is a set of rules that the sending program must follow so that the Internet can deliver the data to the destination program: > Suppose Alice wants to send a letter to Bob using the postal service. Alice, of course, can’t just write the letter (the data) and drop the letter out her window. Instead, the postal service requires that Alice put the letter in an envelope; write Bob’s full name, address, and zip code in the center of the envelope; seal the envelope; put a stamp in the upper-right-hand corner of the envelope; and finally, drop the envelope into an official postal service mailbox. Thus, the postal service has its own “postal service interface,” or set of rules, that Alice must follow to have the postal service deliver her letter to Bob. In a similar manner, the Internet has a socket interface that the program sending data must follow to have the Internet deliver the data to the program that will receive the data."
Building Blocks of the Internet.md,1676020289801,Protocols,"Protocols Protocols define format and order of messages sent and received among network entities, and actions taken on message transmission and reception"
Building Blocks of the Internet.md,1676020289801,Protocol Layering,"Protocol Layering Explicit structure allows identification and relationship of the different pieces Modularization eases maintenance and updating of system - change of implementation of layer’s service transparent to rest of system - For example, a change in gate procedure doesn’t affect rest of system Taken together, protocols of the various layers form the protocol stack:"
Building Blocks of the Internet.md,1676020289801,Encapsulation,Encapsulation Each layer encapsulates the payload with additional header information for the next layer to continue the exchange of data with the next layer.
Building Blocks of the Internet.md,1676020289801,Network edge,Network edge Computers and other devices connected to the Internet are called *end systems* because they sit at the edge of the Internet. End systems include both **clients and servers**. The access network is the network that physically connects the end systems to the first router.
Building Blocks of the Internet.md,1676020289801,ISP Access,ISP Access ISP access is how we connect to the ISPs: - Digital Subscriber Line: uses existing telephone line (twisted copper wire) to exchange data with the telcos central office - Fiber To The Home (FTTH): 10 Mbps - 100 Gbps
Building Blocks of the Internet.md,1676020289801,Local Access,Local Access - Ethernet: uses twister copper wire to connect to an ethernet switch which in turn connect to the larger internet. - WiFi 802.11
Building Blocks of the Internet.md,1676020289801,Network Core,"Network Core The network core represents the mesh of interconnected routers that make up the ISPs. - Tier 1 ISPs span across the globe. But for customers using different ISPs to exchange data, the ISPs themselves must be connected through Internet Exchange Points (IXP) - Regional ISPs compete with each other and pay T1 ISPs for their traffic - Access ISPs connect to any lower tier ISPs for their traffic - Content provider networks create their own private networks which connects its data centres to the internet, bypassing lower tiered ISPs to bring content close to their customers."
Building Blocks of the Internet.md,1676020289801,Packet switching,"Packet switching Data is broken into smaller chunks called packets, which are transmitted through communication links and packet switches at the **full transmission rate** of the link. Store and forward transmission: the packet switch must receive the entire packet before it can transfer the first bit of the packet."
Building Blocks of the Internet.md,1676020289801,Forwarding Tables and Routing Protocols,"Forwarding Tables and Routing Protocols *How does the router determine which link it should forward the packet onto?* For the Internet, the [Internet Protocol](Notes/Internet%20Protocol.md) dictates a destination IP address in each packet. Each router has a forwarding table that maps destination addresses (or portions of the destination addresses) to that router’s outbound links. When a packet arrives at a router, the router examines the address and searches its forwarding table, using this destination address, to find the appropriate outbound link. *How does the forwarding table get set?* Internet has a number of special routing protocols that are used to automatically set the forwarding tables. A routing protocol may, for example, determine the shortest path from each router to each destination and use the shortest path results to configure the forwarding tables in the routers."
Building Blocks of the Internet.md,1676020289801,Queueing Delay,Queueing Delay
Building Blocks of the Internet.md,1676020289801,Exercises,"Exercises a) What is a communication protocol ? A protocol defines the format and order of messages and the set of procedures performed on a message when it is sent or received. b) Name the different layers in the Internet protocol stack, and place the following protocols/functions/concepts at the correct layer : IP, TCP, Ethernet, HTTP, bit coding , FTP, IEEE 802.11 WLAN, TP Category 6, Routing , UDP. | Application | Transport | Network | Link | Physical | | ----------- | --------- | ------- | ----------- | ---------- | | HTTP | TCP | IP | IEEE 802.11 | TP Cat 6 | | FTP | UDP | Routing | Ethernet | bit coding | c) What layer in the Internet protoco l stack is responsible for the transfer of a data packet over a single link, between two directly connected devices? Link layer. *d) A router has two main functions, which can be described by the two terms “routing” and “forwarding”. What is the difference between routing and forwarding?* Routing refers to the address translation performed in order to determine the correct destination address for the packet. Forwarding is the actual relay of the data packet to the destination address. *e) What service does the transport layer describe? Give a short answer* Service of the transfer of data from one host to another. $$ \begin{align} &\text{First packet delay} = N*(L/R)\\ &\text{The 2nd packet must wait for the first packet to reach the 2nd router before it is sent}:\\ &\text{2nd packet delay} = N*(L/R)+(L/R)\\ &...\\ &\text{Pth packet delay} = N*(L/R)+(P-1)*(L/R)\\ &\text{All packets sent after Pth delay}:\\ &d_{end-to-end} = (N+P-1)*(L/R) \end{align} $$ a. $d_{prop}=m/s$ b. $d_{trans}=L/R$ c. $d_{end-to-end}=d_{prop}+d_{trans}$ d. At the start of the link e. In the link f. At the destination g. $d_{trans}=\frac{120}{56\times10^3}=0.00214s$ $0.00214=m/2.5\times10^8$ $m=535.7km$ $$ \begin{align} &\text{All bits must be generated before it can be grouped into packets}:\\ d_{generate}&=56\times8/(64\times10^3)=7ms\\ d_{trans}&=56\times8/(2\times10^6)=0.000224s\\ d_{prop}&=0.01s\\ Total&=17.224ms \end{align} $$ The first bit of host 1 reaches Router A after 0.002s The last bit of host 1 reaches Router A after $0.002+(1500\times8)/(4\times10^6)=0.005s$ The first bit of host 2 reaches Router A after 0.006s. Hence, no queueing delay is incurred as the last bit of host 1 is propagated before host 2 first bit arrives. No buffering when: $$ d_1+\frac{L}{R_1}< d_2 $$ a. The first packet to be propagated has no queueing delay The 2nd packet will have $d=L/R$ 3rd packet: $d=2\times L/R$ Nth packet: $d = (N-1)\times L/R$ Avg delay: $$ \begin{align} \frac{1}N\sum_{i=0}^{N-1}i\times\frac{L}{R}\\ =\frac{L}{NR}\times\frac{N(N-1)}2\\ =L(N-1)/2R \end{align} $$ b. The Nth packet is propagated after $(N-1)L/R$, which is $<NL/R$. The first packet of the next stream will not have to wait. Average queueing delay of such a packet will be the same as in (a)."
Buffer Pools.md,1669012068719,---,"--- title: ""Buffer Pools"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Buffer Pools.md,1669012068719,Buffer Pools,Buffer Pools An area of memory used as a buffer between the disk and the database system
Buffer Pools.md,1669012068719,Page table,Page table A page table is used to keep track of the pages loaded in the buffer pool. This helps the system to determine if the page is already in buffer without having to go to the disk.
Buffer Pools.md,1669012068719,Buffer Manager,Buffer Manager Used to control the memory in the buffer pool and provide the following features:
Buffer Pools.md,1669012068719,Prefetching,"Prefetching While the current page is being processed, we can prefetch the next required pages to be accessed based on a query plan. This reduces total I/O time as operations are done in parallel."
Buffer Pools.md,1669012068719,Scan sharing,"Scan sharing If a query starts a scan and if there is one already doing this, it would attach to that query’s cursor. Once that current query is complete, the new query can continue to scan those pages that were initially skipped."
Buffer Pools.md,1669012068719,Buffer Replacement Policies,Buffer Replacement Policies Similar to [Cache Replacement Policies](Notes/Cache%20Replacement%20Policies.md). [Page Replacement Policies](Notes/Page%20Replacement%20Policies.md)
Buffer Pools.md,1669012068719,Practice Problems,Practice Problems Process flow: 1. Fetch block 1 2. Process block 1 and Fetch block 2 3. Process block 2 and Fetch block 3 4. Process block 3 a. Total time is 4P; only 4 cycles needed b. R + P + 2R = 3R + P c. R + P + 2P = 3P + R If no pre-fetching: 3(R+P) = 3R + 3P | Reference | LRU | Optimal | | --------- | ------- | ------- | | 5 | ABCDE | ABCDE | | 6 | ABCED | ABCDE | | 7 | BCEDF/A | ABCDF/E | | 8 | BCEFD | ABCDF | | 9 | CEFDG/B | ABCDG/F | | 10 | CEFGD | ABCDG | | 11 | EFGDH/C | ABCDH/G | | 12 | EFGHD | ABCDH | | 13 | FGHDC/E | ABCDH | The LRU is suboptimal in this case because it chooses to replace useful pages like B/C which are needed later. A more optimal strategy is to choose pages for replacement based on the corresponding level of the page in the B-Tree.
Cache Placement Policies.md,1669221683864,---,"--- title: ""Cache Placement Policies"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Cache Placement Policies.md,1669221683864,Cache Placement Policies,Cache Placement Policies We need a way to decide where the data is placed when it is first copied into the cache where in the cache a copy of selected memory block will reside.
Cache Placement Policies.md,1669221683864,Direct Mapped Cache,"Direct Mapped Cache Each memory block can only be mapped to a single fixed cache line. This means that there are no sets of cache memory, rather each line is a set itself. If another block maps to this set, the previous data block is replaced. E.g. Block 0, Block 64, Block 0, Block 64 etc."
Cache Placement Policies.md,1669221683864,Advantages and Disadvantages,"Advantages and Disadvantages - This placement policy is power efficient as it avoids the search through all the cache lines. - It has lower cache hit rate, as there is only one cache line available in a set. Every time a new memory is referenced to the same set, the cache line is replaced, which causes conflict miss"
Cache Placement Policies.md,1669221683864,Fully Associative Cache,"Fully Associative Cache To increase flexibility, one way is to allow memory block to be placed anywhere in the cache. This can be framed as a single cache set holding all the cache lines. Index bits are no longer required since there is no distinguishing between sets:"
Cache Placement Policies.md,1669221683864,Advantages and Disadvantages,Advantages and Disadvantages - Fully associative cache structure provides us the flexibility of placing memory block in any of the cache lines and hence full utilization of the cache. - The placement policy provides better cache hit rate. - Offers the use of a wider variety of replacement algorithms on miss - The placement policy is slow as it takes time to iterate through all the lines.
Cache Placement Policies.md,1669221683864,Set-Associative Cache,Set-Associative Cache Trade-off between direct and fully associative cache.
C.md,1669012068716,---,"--- title: ""C"" date: 2022-11-08 lastmod: 2022-11-21 ---"
C.md,1669012068716,C Programming,C Programming
C.md,1669012068716,Special keywords,Special keywords
C.md,1669012068716,Volatile,Volatile ```c int volatile foo; ``` Volatile is a qualifier that is applied to a variable when it is declared. It tells the compiler that the value of the variable may change at any time-without any action being taken by the code the compiler finds nearby.
C.md,1669012068716,Use in peripheral registers,"Use in peripheral registers These registers may have their values changed asynchronously during program flow. Code without this keyword can be *optimised* by the compiler into an infinite loop. ```c UINT1 * ptr = (UINT1 *) 0x1234; // Wait for register to become non-zero. while (*ptr == 0); // Do something else. ``` Compiler interprets the ptr value is being always 0, as it has already loaded the value in the second line, resulting in an infinite loop: ```assembly mov ptr, 0x1234 mov a, @ptr loop bz loop ``` Same situations can occur for variables that may be modified in [ISRs](Notes/Interrupts.md) or by [multi-threaded applications](Notes/Thread%20Level%20Parallelism.md)."
Cache Replacement Policies.md,1669012068725,---,"--- title: ""Cache Replacement Policies"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Cache Replacement Policies.md,1669012068725,Cache Replacement Policies,Cache Replacement Policies Which block should we replace if there is a cache miss? We need to choose a *victim* - First in first out - [Least Recently Used Policy](Notes/Least%20Recently%20Used%20Policy.md) - Pseudo-random
Broadcast Abstractions.md,1678354533520,---,"--- title: ""Broadcast Abstractions"" date: 2023-02-02 lastmod: 2023-03-07 ---"
Broadcast Abstractions.md,1678354533520,Broadcast Abstractions,Broadcast Abstractions
Broadcast Abstractions.md,1678354533520,Unreliable Broadcast,Unreliable Broadcast Does not guarantee anything. Such events are allowed:
Broadcast Abstractions.md,1678354533520,Best Effort Broadcast,"Best Effort Broadcast Guarantees reliability only if sender is correct - BEB1. Best-effort-Validity: If pi and pj are correct, then any broadcast by pi is eventually delivered by pj - BEB2. No duplication: No message delivered more than once - BEB3. No creation: No message delivered unless broadcast"
Broadcast Abstractions.md,1678354533520,Implementation,"Implementation We can use perfect links: Upon <beb Broadcast | m>send message m to all processes (for-loop) Correctness - If sender doesn’t crash, every other correct process receives message by perfect channels (Validity) - No creation & No duplication already guaranteed by perfect channels"
Broadcast Abstractions.md,1678354533520,Reliable Broadcast,"Reliable Broadcast BEB gives no guarantees if sender crashes. Reliable strengthens this by giving guarantees even if sender crashes. Guarantee: either all correct processes deliver m or none of them. - RB1 = BEB1. Validity - RB2 = BEB2. No duplication - RB3 = BEB3. No creation - RB4. Agreement. If a correct process delivers m, then every correct process delivers m"
Broadcast Abstractions.md,1678354533520,Fail Stop (Lazy) Implementation,"Fail Stop (Lazy) Implementation - Perfect failure detector (P): use this to detect when process crash - If P is replaced with [Eventually perfect failure detector](Notes/Failure%20Detectors.mdEventually%20perfect%20failure%20detector): eventual strong accuracy means that some correct processes may be suspected as crashed. However, since we redistribute messages on crash, no property is violated. - BEB: use this to redistribute messages when detect a crash from a process Case 1: detect crash and redistribute Case 2: delivered message, detect crash and redistribute"
Broadcast Abstractions.md,1678354533520,Performance,"Performance Message complexity: best case O(N), worst case O(N^2) Time complexity: best case 1 round. worst case 2 rounds"
Broadcast Abstractions.md,1678354533520,Fail Silent (Eager),Fail Silent (Eager) No failure detector necessary. A pessimistic approach that just redistributes any message by assuming that the process has failed.
Broadcast Abstractions.md,1678354533520,Uniform Reliable Broadcast,"Uniform Reliable Broadcast Reliable broadcast creates a problem. If a failed process delivers a message that has a side effect (such as withdrawing some money from an account), the correct processes need not deliver (know of) this side effect. - URB1 = RB1. - URB2 = RB2. - URB3 = RB3. - URB4. Uniform Agreement: For any message m, if a process delivers m, then every correct process delivers m"
Broadcast Abstractions.md,1678354533520,Eager (Fail-stop),"Eager (Fail-stop) Intuition: deliver the message only when we know that every other correct process can deliver the message. If we do not wait for all correct processes (or we do not have the complete set of failed processes using a weaker FD), we might deliver m even though some correct processes did not receive the message, violating agreement."
Broadcast Abstractions.md,1678354533520,Majority-ACK (Fail Silent),Majority-ACK (Fail Silent) Correctness assumption: a majority of processes are always correct. Resilience is N/2 machines can fail
Broadcast Abstractions.md,1678354533520,Causal Broadcast,"Causal Broadcast [Causality](Notes/Distributed%20Abstractions.mdCausality) between broadcast events is preserved by the corresponding delivery events - If broadcast(m1) happens-before broadcast(m2), any delivery(m2) cannot happen-before a delivery(m1) - However, delivering m2 by itself still preserves causal order."
Broadcast Abstractions.md,1678354533520,Examples,"Examples - 3 caused the broadcast of 2, causal order is preserved for {3,2,1} or {3,1,2} %%[🖋 Edit in Excalidraw](Pics/Broadcast%20Abstractions%202023-03-07%2021.54.54.excalidraw.md), and the [dark exported image](Pics/Broadcast%20Abstractions%202023-03-07%2021.54.54.excalidraw.dark.svg)%%"
Broadcast Abstractions.md,1678354533520,Reliable (Fail Silent),Reliable (Fail Silent) Each broadcasted messages carries a history which can be used to ensure causality before delivery. The history is an ordered list of casually preceding messages in the past.
Broadcast Abstractions.md,1678354533520,Fail Silent waiting,Fail Silent waiting
Broadcast Abstractions.md,1678354533520,Orderings,Orderings - Single source FIFO order: delivery is ordered in FIFO order for deliveries of its own broadcasts - Total order: order of delivery is the same across all processes - Causal order: [Causality](Notes/Distributed%20Abstractions.mdCausality)
Cache.md,1689387060732,---,"--- title: ""Cache"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Cache.md,1689387060732,Cache,Cache
Cache.md,1689387060732,Memory Organisation,Memory Organisation Memory in cache is stored as cache lines. A CPU will try to access cache through a memory address:
Cache.md,1689387060732,Instruction cache and Data cache,Instruction cache and Data cache Storing these separately will allow in better parallelism. CPU is able to fetch instructions from instruction cache while writing to data cache for STUR instructions.
Cache.md,1689387060732,[Cache Placement Policies](Notes/Cache%20Placement%20Policies.md),[Cache Placement Policies](Notes/Cache%20Placement%20Policies.md)
Cache.md,1689387060732,[Cache Replacement Policies](Notes/Cache%20Replacement%20Policies.md),[Cache Replacement Policies](Notes/Cache%20Replacement%20Policies.md)
Cache.md,1689387060732,[Cache Write Policies](Notes/Cache%20Write%20Policies.md),[Cache Write Policies](Notes/Cache%20Write%20Policies.md)
Cache.md,1689387060732,Performance,"Performance The key factor affecting cache performance is the effects of cache misses. When a cache miss occurs, compute cycles are needed to find a victim, request the appropriate data from memory, fill the cache line with this new block and resume execution."
Cache.md,1689387060732,Types of Misses,Types of Misses
Cache.md,1689387060732,Compulsory miss,Compulsory miss First reference to a given block of memory. This is an inevitable miss as data has not been accessed before.
Cache.md,1689387060732,Capacity miss,Capacity miss This occurs when the current working set exceeds the cache capacity. Current useful blocks have to be replaced.
Cache.md,1689387060732,Conflict miss,Conflict miss When useful blocks are displaced due to placement policies. E.g. fully associative cache mapping
Cache.md,1689387060732,Design considerations,Design considerations - Number of blocks - More blocks means that we will have larger capacity and result in lesser capacity misses - Associativity - Reduce conflict misses - Increases access time - Block size - Larger blocks exploit spatial locality. More data in the same area is loaded together when requested. - Reduces compulsory misses since more data is loaded at once. - Reduces number of cache blocks for a fixed block size. This leads to increase in conflict miss as higher chance for different data map to the same blocks. - Increases miss penalty as more data needs to be replaced on a miss. - Levels of cache - Using multi-level cache will reduce the miss penalty
Cache.md,1689387060732,False sharing,False sharing An issue arises when different cores have cached values which share the same cache line. The picture below depicts how 2 cores try to access different independent values (X and Y) that resides on the same cache line in L3 cache. If X and Y are highly used variables - Writing to X invalidates the cache line in Core 2 - Writing to Y invalidates the cache line in Core 1
Cache.md,1689387060732,Measuring Impact with CPI,Measuring Impact with CPI $$ \begin{align} &CPU_{time}=(CPU_{\text{execution cycles}}+\text{Memory stall cycles})\times\text{Cycle Time}\\ &CPU_{time}=((IC\times CPI)+(IC\times\%\text{Memory Access}\times\text{Miss Rate}\times\text{Miss Penalty}))\times \text{Cycle Time} \end{align} $$ L1 cache hit can be considered to be part of CPI ideal as it is often possible to complete the data access within the ideal clock cycles.
Cache.md,1689387060732,Example,Example
Cache.md,1689387060732,Multi-level cache example,Multi-level cache example
Cache.md,1689387060732,Measuring Impact with Average Memory Access Time (AMAT),"Measuring Impact with Average Memory Access Time (AMAT) We need a way to measure the performance of cache, standalone from the performance of the CPU. AMAT is the average time to access memory considering both hits and misses $$ \begin{aligned} AMAT&=\text{Hit Time}\times(1-\text{Miss Rate})+\text{Miss Rate}\times(\text{Hit Time}+\text{Miss Penalty})\\ &=\text{Time for hit}+\text{Miss Rate}\times\text{Miss Penalty}\\ \end{aligned} $$ Note here that *Miss Penalty* is the loss in cycles for a miss, and not just the cost of main memory access. e.g If time to hit cache = 1, time to hit main memory = 100, miss penalty is $100-1=99$. One example to show that AMAT is superior would be to consider two different caches with similar miss rates, but drastically different hit times. Using the miss rate metric, we would rate both caches the same. Using the AMAT metric, a cache with a lower hit time or lower miss penalty will outperform a cache with a higher respective time, assuming all other variables are the same."
Cache.md,1689387060732,Practice Problems,"Practice Problems 3 bit index, 2bit tag with 0 bit offset. | Addr | Index | Tag | H/M | State | | ----- | ----- | --- | --- | ------ | | 10011 | 011 | 10 | M | 001,10 | | 00001 | 001 | 00 | H | | | 00110 | 110 | 00 | H | | | 01010 | 010 | 01 | M | 010,01 | | 01110 | 110 | 01 | M | 110,01 | | 11001 | 001 | 11 | M | 001,11 | | 00001 | 001 | 00 | M | 001,00 | | 11100 | 100 | 11 | M | 100,11 | | 10100 | 100 | 10 | M | 100,10 | Bits for offset = $log_24=2$ 2 way set associative cache: Each set contains 2 cache lines -> 2 blocks per set -> 8 bytes per set -> 2 sets Bits for index -> 1 | Access | 10001101 | 10110010 | 10111111 | 100001100 | 10011100 | 11101001 | 11111110 | 11101001 | | ------ | -------- | -------- | -------- | --------- | -------- | -------- | -------- | -------- | | Tag | 10001 | 10110 | 10111 | 10001 | 10011 | 11101 | 11111 | 11101 | | Offset | 01 | 10 | 11 | 00 | 00 | 01 | 10 | 01 | | Index | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 0 | | H/M | M | M | M | H | M | M | M | H | | LRU1 | 10001 | 10110 | 10111 | 10001 | 10011 | 11101 | 11111 | 11101 | | LRU2 | NA | NA | 10001 | 10111 | 10001 | 10110 | 10001 | 10110 | Hit rate: 2/8 = 25% a. $$ \begin{align} &T=IC\times CPI\times Period\\ &CPI_{ideal}=1.25\\ &\text{L1 stall cycles}=0.2\times0.2\times8=0.32\\ &\text{L2 stall cycles}=0.2\times0.2\times0.1\times30=0.12\\ &\text{CPI stall}=1.25+0.32+0.12=1.69\\ \end{align} $$ b. $$ \begin{align} &\text{L1 stall cycles}=0.2\times0.2\times30=1.2\\ &\text{CPI stall}=1.25+1.2=2.45 \end{align} $$"
Cache Write Policies.md,1669012068710,---,"--- title: ""Cache Write Policies"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Cache Write Policies.md,1669012068710,Cache Write Policies,Cache Write Policies How do we keep memory updated while writing on the cache?
Cache Write Policies.md,1669012068710,Write-through,"Write-through Every write to the cache will lead to subsequent writes to the rest of the memory hierarchy, L1 -> L2 -> Main Memory -> Disk."
Cache Write Policies.md,1669012068710,Advantages,Advantages - Memory coherency
Cache Write Policies.md,1669012068710,Disadvantages,"Disadvantages - High bandwidth requirement, every cache write results in high latency - Memory becomes slow as size increases"
Cache Write Policies.md,1669012068710,Write-back,Write-back Only write memory to rest of the memory hierarchy on cache replacement. Maintain the state of each cache line as the following - Invalid: not present - Clean: present and unmodified - Dirty: present and modified Update the state bit on modification.
Cache Write Policies.md,1669012068710,Disadvantages,"Disadvantages - [Cache Coherence](Notes/Thread%20Level%20Parallelism.mdCache%20Coherence): in multi-processors where each core maintains its own level of cache, if one core needs to access the data that has been modified by another core, they will get the stale data from memory as updated data is still in that core's own cache and has not been propagated. - Coherent I/O: I/O devices are able to use [DMA](Notes/Direct%20Memory%20Access.md) and access stale copies of data in main memory."
Capital Budgeting.md,1669012068705,---,"--- title: ""Capital Budgeting"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Capital Budgeting.md,1669012068705,Capital Budgeting,Capital Budgeting
Capital Budgeting.md,1669012068705,NPV,NPV Net Present Value: calculate the present value of future cash flows in order to whether a project is worth up taking $NPV_1\ge NPV_2$.
Capital Budgeting.md,1669012068705,Calculating NPV of projects,"Calculating NPV of projects NOWC: This is the net change in accounts receivable, accounts payable, and inventory during the measurement period. __An increase in working capital uses cash, while a decrease produces cash."
Capital Budgeting.md,1669012068705,Unequal Life Projects (Fixed term),Unequal Life Projects (Fixed term)
Capital Budgeting.md,1669012068705,IRR,"IRR Rate of return which makes the NPV of a project = 0. A higher IRR is better. _Intuitively, if the IRR is higher, this means that the cashflows are equivalent to returns at that level of interest rate_. Dependent only on the cashflows and not the required return. Cons: - Non conventional cashflows makes IRR method unreliable as there may be more than 1 IRR - The IRR can also be 0 - Assumes cash flows are reinvested at the IRR and not the WACC (unrealistic) Relationship to NPV: - IRR and NPV gives the same decision for independent projects. - NPV should be used for mutually exclusive projects"
Capital Budgeting.md,1669012068705,Choosing between projects (Crossover rate),Choosing between projects (Crossover rate) Crossover rate is the rate which we are indifferent between 2 projects. Both projects have NPV = 0.
Capital Budgeting.md,1669012068705,MIRR,MIRR The internal rate of return which makes the present value of cash outflows = to the present value of the terminal value (FV of cash inflows) of the project. Handles the IRR problem by combining cash flows into (-) sign cash flow at time 0 and 1 (+) sign cash flow at terminal year.
Chain Matrix Multiplication.md,1669012068708,---,"--- title: ""Chain Matrix Multiplication"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Chain Matrix Multiplication.md,1669012068708,Chain Matrix Multiplication,Chain Matrix Multiplication
Chain Matrix Multiplication.md,1669012068708,Problem Formulation,"Problem Formulation Satisfaction of the Principle of Optimality: Let $A_i$ represent the $i^{th}$ matrix with dimensions $(d_{i-1}\times d_i)$. The optimal way to multiply matrices $A_i \ to\ A_j$ can be broken down into the optimal way to multiply the matrices $A_i \ to\ A_k + A_{k+1} \ to\ A_j$ (for some k) + the cost to multiply the final 2 matrices = $d_i \times d_k \times d_j$ Define $OptCost(i,j)$ to be the optimal cost of multiplying matrices with dimensions $d_i, d_{i+1},...d_j$ Base case: $$OptCost(i,j) = 0 \qquad j-i=1$$ Recursive equation can be formed as follows: loop through possible k to find min $$OptCost(i,j) = \min_{i+1\le k\le j-1}(OptCost(i,k)+OptCost(k,j)+d_i\times d_k\times d_j) $$"
Chain Matrix Multiplication.md,1669012068708,Strategy,Strategy Store the solutions to subproblems in _cost 2d array_. $Cost[i][j]$ represents the optimal cost of multiplying matrices $A_{i+1} \to A_j$ Store the optimal values of k (index to split the matrix multiplication) in _last 2d array_.
Chain Matrix Multiplication.md,1669012068708,Pseudocode,"Pseudocode ``` java // Matrix A[i] has dimension dims[i-1] x dims[i] for i = 1..n MatrixChainOrder(int dims[]) { // length[dims] = n + 1 n = dims.length - 1; // m[i,j] = Minimum number of scalar multiplications (i.e., cost) // needed to compute the matrix A[i]A[i+1]...A[j] = A[i..j] // The cost is zero when multiplying one matrix for (i = 1; i <= n; i++) m[i, i] = 0; for (len = 2; len <= n; len++) { // Subsequence lengths for (i = 1; i <= n - len + 1; i++) { j = i + len - 1; m[i, j] = MAXINT; for (k = i; k <= j - 1; k++) { cost = m[i, k] + m[k+1, j] + dims[i-1]*dims[k]*dims[j]; if (cost < m[i, j]) { m[i, j] = cost; s[i, j] = k; // Index of the subsequence split that achieved minimal cost } } } } } ```"
Chain Matrix Multiplication.md,1669012068708,Exercises,"Exercises Suppose the dimensions of the matrices A, B, C, and D are 20x2, 2x15, 15x40, and 40x4, respectively, and we want to know how best to compute AxBxCxD. Show the arrays cost, last, and multOrder computed by Algorithms matrixOrder() in the lecture notes. | | 0 | 1 | 2 | 3 | 4 | | --- | --- | --- | --- | -------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | 0 | - | 0 | 600 | min{$(0,1)+(1,3)+d_0*d_1*d_3$ <br> $(0,2)+(2,3)+d_0*d_2*d_3$} = 2800 | min{$(0,1)+(1,4)+d_0*d_1*d_4$ <br> $(0,2)+(2,4)+d_0*d_2*d_4$ <br> $(0,3)+(3,4)+d_0*d_3*d_4$} = 1680 | | 1 | | | 0 | 1200 | min{$(1,2)+(2,4)+d_1*d_2*d_4$ <br> $(1,3)+(3,4)+d_1*d_3*d_4$} = 1520 | | 2 | | | | 0 | 2400 | | 3 | | | | | 0 | | 4 | | | | | | Last Table | | 0 | 1 | 2 | 3 | 4 | | --- | --- | --- | --- | --- | --- | | 0 | - | - | 1 | 1 | 1 | | 1 | | | | 2 | 3 | | 2 | | | | | 3 | | 3 | | | | | | | 4 | | | | | | Final order: $A_1*((A_2*A_3)*A_4)$"
Chain Matrix Multiplication.md,1669012068708,Greedy heuristic,Greedy heuristic
Classes.md,1685019841469,---,"--- title: ""Classes"" date: 2023-05-24 ---"
Classes.md,1685019841469,Classes,"Classes Updating our grammar: ``` declaration → classDecl | funDecl | varDecl | statement ; classDecl → ""class"" IDENTIFIER ""{"" function* ""}"" ; ``` In plain English, a class declaration is the class keyword, followed by the class’s name, then a curly-braced body. Inside that body is a list of method declarations. Unlike function declarations, methods don’t have a leading fun keyword. Each method is a name, parameter list, and body. Parser: ```java private Stmt classDeclaration() { Token name = consume(IDENTIFIER, ""Expect class name.""); consume(LEFT_BRACE, ""Expect '{' before class body.""); List<Stmt.Function> methods = new ArrayList<>(); while (!check(RIGHT_BRACE) && !isAtEnd()) { methods.add(function(""method"")); } consume(RIGHT_BRACE, ""Expect '}' after class body.""); return new Stmt.Class(name, methods); } ``` Interpreter: ```java @Override public Void visitClassStmt(Stmt.Class stmt) { environment.define(stmt.name.lexeme, null); LoxClass klass = new LoxClass(stmt.name.lexeme); environment.assign(stmt.name, klass); return null; } ```"
Classes.md,1685019841469,Instances,"Instances We create instances by *calling* a class name: ```java class LoxClass implements LoxCallable { @Override public Object call(Interpreter interpreter, List<Object> arguments) { LoxInstance instance = new LoxInstance(this); return instance; } ```"
Classes.md,1685019841469,Properties,"Properties Properties are accessed using a `.` syntax. `someObject.someProperty` Updating grammar: ``` call → primary ( ""("" arguments? "")"" | ""."" IDENTIFIER )* ; ``` After a primary expression, we allow a series of any mixture of parenthesized calls and dotted property accesses (i.e. get expressions)."
Classes.md,1685019841469,Get Expressions,"Get Expressions A get expression stores the name and the expression: `""Get: Expr object, Token name"",` The get expression will call the `get` method on a LoxInstance, returning named fields in the class: ```java @Override public Object visitGetExpr(Expr.Get expr) { Object object = evaluate(expr.object); if (object instanceof LoxInstance) { return ((LoxInstance) object).get(expr.name); } throw new RuntimeError(expr.name, ""Only instances have properties.""); } Object get(Token name) { if (fields.containsKey(name.lexeme)) { return fields.get(name.lexeme); } throw new RuntimeError(name, ""Undefined property '"" + name.lexeme + ""'.""); } ```"
Classes.md,1685019841469,Set Expressions,"Set Expressions Assignment now supports dotted identifiers on the left hand: ``` assignment → ( call ""."" )? IDENTIFIER ""="" assignment | logic_or ; ``` However, the reference to `call` allows any high-precedence expression before the last dot, including any number of _getters_:"
Classes.md,1685019841469,Methods,"Methods For each method, we create a new LoxFunction and add that to the class via a hashmap. ```java Map<String, LoxFunction> methods = new HashMap<>(); for (Stmt.Function method : stmt.methods) { LoxFunction function = new LoxFunction(method, environment); methods.put(method.name.lexeme, function); } LoxClass klass = new LoxClass(stmt.name.lexeme, methods); ```"
Classes.md,1685019841469,This,"This ```java class Person { sayName() { print this.name; } } var jane = Person(); jane.name = ""Jane""; var bill = Person(); bill.name = ""Bill""; bill.sayName = jane.sayName; bill.sayName(); // ? ``` Does that last line print “Bill” because that’s the instance that we _called_ the method through, or “Jane” because it’s the instance where we first grabbed the method? *Bound methods*: if you take a reference to a method on some object so you can use it as a callback later, you want to remember the instance it belonged to, even if that callback happens to be stored in a field on some other object. We need to take `this` at the point that the method is accessed and attach it to the function through a closure. Put this into the current scope in resolver: ```java beginScope(); scopes.peek().put(""this"", true); ``` For each method, bind this into its closure environment: ```java LoxFunction bind(LoxInstance instance) { Environment environment = new Environment(closure); environment.define(""this"", instance); return new LoxFunction(declaration, environment); } ```"
Classes.md,1685019841469,Constructors,"Constructors Lox uses `init` as a constructor. Store whether a LoxFunction is an initializer or not ```java private final boolean isInitializer; LoxFunction(Stmt.Function declaration, Environment closure, boolean isInitializer) { this.isInitializer = isInitializer; ``` If it is, we get the bound instance in the function closure: ```java if (isInitializer) return closure.getAt(0, ""this""); ```"
Classes.md,1685019841469,Inheritance,"Inheritance Lox uses the `<` to define an *extends* relationship: ``` classDecl → ""class"" IDENTIFIER ( ""<"" IDENTIFIER )? ""{"" function* ""}"" ; ``` The class expression must now capture the superclass relationship: ``` ""Class : Token name, Expr.Variable superclass,"" + "" List<Stmt.Function> methods"", ``` Look for the method in the current class before walking up the superclass chain: ```java if (superclass != null) { return superclass.findMethod(name); } ```"
Classes.md,1685019841469,Super,"Super With `this`, the keyword works sort of like a magic variable, and the expression is that one lone token. But with `super`, the subsequent `.` and property name are inseparable parts of the `super` expression. You can’t have a bare `super` token all by itself. ``` primary → ""true"" | ""false"" | ""nil"" | ""this"" | NUMBER | STRING | IDENTIFIER | ""("" expression "")"" | ""super"" ""."" IDENTIFIER ; ``` The super expression contains the keyword and its method access: ``` ""Super : Token keyword, Token method"", ``` a super expression starts the method lookup from “the superclass”, but which superclass? The naïve answer is the superclass of this, the object the surrounding method was called on. That coincidentally produces the right behavior in a lot of cases, but that’s not actually correct. ```java class A { method() { print ""A method""; } } class B < A { method() { print ""B method""; } test() { super.method(); } } class C < B {} C().test(); // A method ``` Instead, lookup should start on the superclass of _the class containing the `super` expression_. In this case, since `test()` is defined inside B, the `super` expression inside it should start the lookup on _B_’s superclass—A. One important difference is that we bound `this` when the method was _accessed_. The same method can be called on different instances and each needs its own `this`. With `super` expressions, the superclass is a fixed property of the _class declaration itself_. Every time you evaluate some `super` expression, the superclass is always the same. That means we can create the environment for the superclass once, when the class definition is executed. Immediately before we define the methods, we make a new environment to bind the class’s superclass to the name `super`"
Clock (or Second Chance) Policy.md,1669012068698,---,"--- title: ""Clock (or Second Chance) Policy"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Clock (or Second Chance) Policy.md,1669012068698,Clock Replacement Algorithm,"Clock Replacement Algorithm <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/b-dRK8B8dQk?start=319"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"" allowfullscreen></iframe>"
Clock (or Second Chance) Policy.md,1669012068698,Idea,"Idea 1. Keep a circular list of items in memory 2. A ""clock hand"" is used to suggest the next item for eviction 3. It initially points to the oldest page 4. Maintain a _use-bit_ for each item: this will tell us if the item has been accessed recently. Initially, all items have use-bit = 0. 5. Each time an item is accessed, change the use-bit to 1 6. When choosing item to be evicted: 1. If the item has use-bit = 1, we reset it back to 0 (this item has used its second chance) 2. Else we evict it (no second chance, replace it) 7. When bringing in the item (for our case): 1. We do not set the use bit for the incoming page 2. **We advance the clock hand to the next item after bringing in the page** At the worst case, the algorithm behaves the same as FIFO. E.g. when all the pages have their used bit set to 1, eventually the clock hand will return back to the oldest page and replace."
Class Diagrams.md,1669012068703,---,"--- title: ""Class Diagrams"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Class Diagrams.md,1669012068703,Class Diagrams,Class Diagrams
Class Diagrams.md,1669012068703,Basic Notation,Basic Notation
Class Diagrams.md,1669012068703,Visibility Modifiers,Visibility Modifiers \+ : public \- : private \ : protected ~ : package private
Class Diagrams.md,1669012068703,Associations,Associations
Class Diagrams.md,1669012068703,Stereotypes,"Stereotypes > [!NOTE] Heuristics for identifying entity objects > - Terms that developers or users need to clarify in order to understand the use case • Recurring nouns in the use cases (e.g., Incident) > - Real-world entities that the system needs to track (e.g., FieldOfficer, Dispatcher, Resource) > - Real-world activities that the system needs to track (e.g., EmergencyOperationsPlan) > - Data sources or sinks (e.g., Printer). > [!NOTE] Heuristics for identifying boundary objects > - Identify user interface controls that the user needs to initiate the use case (e.g., ReportEmergencyButton). > - Identify forms the users needs to enter data into the system (e.g., EmergencyReportForm). > - Identify notices and messages the system uses to respond to the user (e.g., AcknowledgmentNotice). > - When multiple actors are involved in a use case, identify actor terminals (e.g., DispatcherStation) to refer to the user interface under consideration. > [!NOTE] Heuristics for identifying control objects >- Identify one control object per use case. >- Identify one control object per actor in the use case. >- The life span of a control object should cover the extent of the use case or the extent of a user session. If it is difficult to identify the beginning and the end of a control object activation, the corresponding use case probably does not have well-defined entry and exit conditions."
Combinational Circuits.md,1669012068695,---,"--- title: ""Combinational Circuits"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Combinational Circuits.md,1669012068695,Combinational Circuits,Combinational Circuits
Combinational Circuits.md,1669012068695,Multiplexer,Multiplexer A multiplexer is used to select 1 out of n inputs.
Combinational Circuits.md,1669012068695,Decoder,Decoder A decoder is used to select a 1-hot output based on an n bit input.
Clean Code.md,1669012068701,---,"--- title: ""Clean Code"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Clean Code.md,1669012068701,Clean Code - Notes,Clean Code - Notes
Clean Code.md,1669012068701,Comments,Comments
Clean Code.md,1669012068701,Explain yourself in code,Explain yourself in code ```go //check to see if employee is elgiible for full benefits if (employee.flags & HOURLY_FLAG) && employee.age > 65 ``` Create a function to describe the comment: ```go if (employee.isEligibleForFullBenefits()) ```
Compiling Expressions.md,1685203101924,---,"--- title: ""Compiling Expressions"" date: 2023-05-26 ---"
Compiling Expressions.md,1685203101924,Compiling Expressions,Compiling Expressions
Compiling Expressions.md,1685203101924,Prefix expressions,"Prefix expressions ```c static void unary() { TokenType operatorType = parser.previous.type; // Compile the operand. expression(); // Emit the operator instruction. switch (operatorType) { case TOKEN_MINUS: emitByte(OP_NEGATE); break; default: return; // Unreachable. } } ``` Ensuring precedence: Here, the operand to `-` should be just the `a.b` expression, not the entire `a.b + c`. ``` -a.b + c; ``` When parsing the operand to unary `-`, we need to compile only expressions at a certain precedence level or higher. In jlox’s recursive descent parser we accomplished that by calling into the parsing method for the lowest-precedence expression we wanted to allow (in this case, `call()`). Each method for parsing a specific expression also parsed any expressions of higher precedence too, so that included the rest of the precedence table."
Compiling Expressions.md,1685203101924,Infix expressions,Infix expressions ```c static void binary() { TokenType operatorType = parser.previous.type; ParseRule* rule = getRule(operatorType); parsePrecedence((Precedence)(rule->precedence + 1)); switch (operatorType) { case TOKEN_PLUS: emitByte(OP_ADD); break; case TOKEN_MINUS: emitByte(OP_SUBTRACT); break; case TOKEN_STAR: emitByte(OP_MULTIPLY); break; case TOKEN_SLASH: emitByte(OP_DIVIDE); break; default: return; // Unreachable. } } ```
Compiling Expressions.md,1685203101924,Vaughan Pratt Parser,"Vaughan Pratt Parser We also know we need a table that, given a token type, lets us find - the function to compile a prefix expression starting with a token of that type, - the function to compile an infix expression whose left operand is followed by a token of that type, and - the precedence of an infix expression that uses that token as an operator. Here’s how the entire function works: At the beginning of `parsePrecedence()`, we look up a prefix parser for the current token. The first token is _always_ going to belong to some kind of prefix expression, by definition. It may turn out to be nested as an operand inside one or more infix expressions, but as you read the code from left to right, the first token you hit always belongs to a prefix expression. After parsing that, which may consume more tokens, the prefix expression is done. Now we look for an infix parser for the next token. If we find one, it means the prefix expression we already compiled might be an operand for it. But only if the call to `parsePrecedence()` has a `precedence` that is low enough to permit that infix operator. If the next token is too low precedence, or isn’t an infix operator at all, we’re done. We’ve parsed as much expression as we can. Otherwise, we consume the operator and hand off control to the infix parser we found. It consumes whatever other tokens it needs (usually the right operand) and returns back to `parsePrecedence()`. Then we loop back around and see if the _next_ token is also a valid infix operator that can take the entire preceding expression as its operand. We keep looping like that, crunching through infix operators and their operands until we hit a token that isn’t an infix operator or is too low precedence and stop. Function calls for (-1 + 2) * 3 - -4: ``` expression | parsePrecedence(PREC_ASSIGNMENT) | | grouping | | | expression | | | | parsePrecedence(PREC_ASSIGNMENT) | | | | | unary // for ""-"" | | | | | | parsePrecedence(PREC_UNARY) | | | | | | | number | | | | | binary // for ""+"" | | | | | | parsePrecedence(PREC_FACTOR) // PREC_TERM + 1 | | | | | | | number | | binary // for ""*"" | | | parsePrecedence(PREC_UNARY) // PREC_FACTOR + 1 | | | | number | | binary // for ""-"" | | | parsePrecedence(PREC_FACTOR) // PREC_TERM + 1 | | | | unary // for ""-"" | | | | | parsePrecedence(PREC_UNARY) | | | | | | number ```"
Complexity Analysis.md,1669012068690,---,"--- title: ""Complexity Analysis"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Complexity Analysis.md,1669012068690,Complexity Analysis,Complexity Analysis
Complexity Analysis.md,1669012068690,Asymptotic Notations,Asymptotic Notations Notations used to describe the order of growth of a given function
Complexity Analysis.md,1669012068690,Big-O $O(f(x))$,Big-O $O(f(x))$ The limits when taking the 2 functions to infinity produces a constant C that $$\begin{align}\lim_{n\to \infty}\frac{f(n)}{g(n)}=C \\ C=0 \ or\ 0<C<\infty \end{align}$$
Complexity Analysis.md,1669012068690,Big-Omega $\Omega(f(x))$,Big-Omega $\Omega(f(x))$ The limits when taking the 2 functions to infinity produces a constant C that $$\begin{align}\lim_{n\to \infty}\frac{f(n)}{g(n)}=C \\ C=\infty \ or\ 0<C<\infty \end{align}$$
Complexity Analysis.md,1669012068690,Big-Theta $\theta(f(x))$,Big-Theta $\theta(f(x))$ The limits when taking the 2 functions to infinity produces a constant C that $$\begin{align}\lim_{n\to \infty}\frac{f(n)}{g(n)}=C \\ 0<C<\infty \end{align}$$
Complexity Analysis.md,1669012068690,Properties,"Properties $$f(n)\in O(h(n)),g(n)\in O(h(n)) \implies f(n)+g(n)\in O(h(n))$$"
Complexity Analysis.md,1669012068690,Example function comparisons,Example function comparisons
Complexity Analysis.md,1669012068690,Order of Common Functions,Order of Common Functions
Complexity Analysis.md,1669012068690,Specific Complexities,Specific Complexities - [Polynomial Time Complexity](Notes/Polynomial%20Time%20Complexity.md) - [Pseudo-Polynomial Time Complexity](Notes/Pseudo-Polynomial%20Time%20Complexity.md)
Clustering.md,1678738858379,---,"--- title: ""Clustering"" date: 2023-03-02 ---"
Clustering.md,1678738858379,Clustering,"Clustering A form of unsupervised learning. Rather than having prior knowledge of the classes which the data can be in, we want the machine to infer the possible groupings/classes from unlabelled data:"
Clustering.md,1678738858379,K-means clustering,K-means clustering - Using the Euclidean distance as the minimising function causes it to favour spherical clusters which may not be the case in linear separated data points:
Clustering.md,1678738858379,Expectation Maximisation,"Expectation Maximisation Rather than minimizing the Euclidean distances of the data points to the cluster, try to maximize the probability of the data point being generated from the cluster."
Clustering.md,1678738858379,Step 1,Step 1 Introduce a hidden variable $h$ for each desired cluster and initialize the starting parameters for each cluster distribution
Clustering.md,1678738858379,Step 2,Step 2
Clustering.md,1678738858379,Step 3,Step 3 Update the values of the model parameters iteratively:
Clustering.md,1678738858379,Comparison to K-means,Comparison to K-means
Communication Diagrams.md,1669012068692,---,"--- title: ""Communication Diagrams"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Communication Diagrams.md,1669012068692,Communication Diagrams,Communication Diagrams
Computer Power.md,1669216694882,---,"--- title: ""Computer Power"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Computer Power.md,1669216694882,Computer Power,Computer Power
Computer Power.md,1669216694882,Power dissipation,Power dissipation
Computer Power.md,1669216694882,Dynamic Power,Dynamic Power Dissipated only when computation is performed
Computer Power.md,1669216694882,Static Power,"Static Power Due to leakage current and dissipated whenever the system is powered on Thus, it is possible that heat reducing solutions like a heat sink can help to reduce powe consumption."
Computer Power.md,1669216694882,Total Power,"Total Power When voltage is reduced, the threshold that is used to differentiate between a logic 1 and logic 0 output will be reduced. If this threshold is small, a high frequency will be more prone to noise that could alter the output."
Computer Power.md,1669216694882,Reducing power consumption,"Reducing power consumption 1. Component design 2. Power gating: shutting down unused components 3. Clock gating: reduce unnecessary switching 4. Reduce data movement, number of memory access and register transfer"
Computer Power.md,1669216694882,The problem between power and energy,The problem between power and energy
Computer Power.md,1669216694882,Practice Problems,Practice Problems __Case 1__: Change in voltage = 3.3 - 3 / 3 = 10% i. New frequency = $1.1 * 300 = 330 MHz$ Change in dynamic power = $\frac{3.3^2 \times 330 }{3^2*300} - 1 = 33.1\%$ ii. Change in static power = 10% iii. Perf is directly proportional to freq. Change in perf = 10% iv. $$ \begin{aligned} &\text{Perf is = 1/Time} \\ &\text{Dynamic energy} \\ &\text{Increase in performance by 10\% means that the change in time is 1/1.1}\\ &\text{Consumption change} = 1.331 P * 1/1.1T - P*T = 21\%increase\\ &\text{Static energy:}\\ &1.1P * 1/1.1T - 1 = 0\% \end{aligned} $$ __Case 2__: i. $\frac{3.3^2 - 3^2}{3^2} = 21\%$ ii. 10% increase iii. No change in frequency so no change in performance iv. Dynamic energy consumption: 21% increase Static energy consumption: 10% increase
Computer Performance.md,1669216597785,---,"--- title: ""Computer Performance"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Computer Performance.md,1669216597785,Computer Performance,Computer Performance
Computer Performance.md,1669216597785,Execution time,Execution time One indicator of performance is __execution time__ of a program $$\text{Performance} = \frac{1}{\text{Execution Time}} = \frac{1}{time_{end} - time_{start}}$$ $$\text{Execution Time} = \text{Instruction Count} \times \text{Clocks Per Instruction} \times \text{Clock Period}$$
Computer Performance.md,1669216597785,Instruction Count,Instruction Count
Computer Performance.md,1669216597785,Clocks Per Instruction (CPI),Clocks Per Instruction (CPI) $$\text{Average CPI}=\text{Cycle Count}/\text{Instruction Count}$$
Computer Performance.md,1669216597785,Clock Period,Clock Period Clock period is the inverse of clock frequency __Memory wall problem__: a higher clock frequency may not result in better performance if the the time needed for memory access operations is slower than the CPU
Computer Performance.md,1669216597785,Speed-up,Speed-up Speed-up is the factor over the original machine of improved performance $$\text{Speedup} = \frac{Perf_a}{Perf_b}$$ We can define the execution time of an enhanced machine by the proportion of the program _E_ that is improved and _T_ the original time taken and _S_ the enhancement factor $$T' = (T\times (1-E))+\frac{T\times E}{S}$$
Computer Performance.md,1669216597785,Example,Example
Computer Performance.md,1669216597785,Amdahl's Law,"Amdahl's Law __If the program is of a fixed workload:__ Let _E_ be the fraction of program that is enhanced via _parallelism_, with maximum enhancement factor $S = \infty$, the maximum speedup is $$\text{Max Speedup} =lim_{s\rightarrow \infty}\frac{1}{1-E+\frac{E}{S}}=\frac{1}{1-E}$$ Let *E* be the fraction of program enhanced by Speedup $S_1$ and $1-E$ enhanced by Speedup $S_2$. $$Speedup=\frac{1}{\frac{1-E}{S_2}+\frac{E}{S_1}}$$"
Computer Performance.md,1669216597785,Gustafson's Law,Gustafson's Law __If the program is set to a fixed time period instead:__ do more parallel work in the same amount of time
Computer Performance.md,1669216597785,Other performance metrics,Other performance metrics
Computer Performance.md,1669216597785,Practice Problems,Practice Problems $$ \begin{align} &IC=200+500+300=1000 \\&T_c=100ns\\ &CPI_{average}=(200\times1+500\times2+300\times3)/1000=2.1\\ &T_{execution}=1000\times100\times2.1=210ms \end{align} $$ a. $$ \begin{align} &10=200\times10^6\times\frac{1}{200\times10^6}\times CPI_{avg}\\ &CPI_{avg}=10\\ &5=160\times10^6\times\frac{1}{300\times10^6}\times CPI_{avg}\\ &CPI_{avg}=9.375 \end{align} $$ b. $$ \begin{align} &IC_a=4\div(\frac{10}{200\times10^6})=80\times10^6\\ &IC_b=3\div(\frac{9.375}{300\times10^6})=96\times10^6 \end{align} $$ a. $$ \begin{align} &T_{M2}=(1+2+3+4)\times\frac{1}{500\times10^6}\times4\\ &T_{M3}=(2+2+4+4)\times\frac{1}{750\times10^6}\times4\\ &\text{Speedup}=1.25 \end{align} $$ b. $$ \begin{align} &T_{M1}=2\times\frac{1}{500\times10^6}\times1\\ &T_{M2}=1\times\frac{1}{500\times10^6}\times1\\ &T_{M3}=2\times\frac{1}{750\times10^6}\times1\\ &\text{Speedup M2 over M1}=2 &\text{Speedup M3 over M1}=1.5 \end{align} $$
Concurrency Control.md,1669012068683,---,"--- title: ""Concurrency Control"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Concurrency Control.md,1669012068683,Concurrency Control,"Concurrency Control The DBMS needs to ensure consistency during concurrent execution of transactions, as concurrency can result in the database being in an inconsistent state despite preserving the correctness of transactions and without encountering a failure."
Concurrency Control.md,1669012068683,Scheduler,Scheduler A schedule is a sequence of interleaved actions from all transactions.
Concurrency Control.md,1669012068683,Serial schedule,"Serial schedule A schedule is serial if its actions consists of all the actions of one transaction, then all the actions of another transaction, and so on."
Concurrency Control.md,1669012068683,Serializable schedule,"Serializable schedule Result is equivalent to a serial schedule, but actions of one transaction do not have to occur before the actions of another."
Concurrency Control.md,1669012068683,Conflict Serializable Schedule,"Conflict Serializable Schedule If a pair of actions conflict, if their order is interchanged, the behavior of at least one of the transactions can change. From this we can draw 2 conclusions about when a pair can be swapped. 1. Actions involve the same element 2. At least one is a write"
Concurrency Control.md,1669012068683,Precedence Graph,"Precedence Graph We can use a precedence graph to determine if a set of transactions are conflict serializable. An edge from one node to another represents a constraint on the order of the transactions. i.e. Actions in a transaction t1 cannot be swapped with another transaction t2. If a cycle occurs, the order of transactions become contradictory and no serial schedule can exist."
Concurrency Control.md,1669012068683,Recoverable Schedule,"Recoverable Schedule A schedule is recoverable if transactions commit only after all transactions whose changes they read have committed. Else, the DBMS is unable to guarantee that transactions read data that will be the same as before the crash and after the crash."
Concurrency Control.md,1669012068683,Locks,Locks Ensure that data items that are shared by conflicting operations are accessed one operation at a time. Same as with [Process Synchronization](Notes/Process%20Synchronization.md).
Concurrency Control.md,1669012068683,Two Phase Locking (2PL),Two Phase Locking (2PL) Arbitrary assignment of locks do not lead to a serializable schedule. Two transactions can operate on elements in a different order resulting in different results. We can solve this by ensuring that transactions take up all lock actions before all unlock actions.
Concurrency Control.md,1669012068683,Why 2PL works?,"Why 2PL works? Intuitively, each two-phase-locked transaction may be thought to execute in its entirety at the instant it issues its first unlock request. i.e. a legal schedule can only be such that the transaction completes fully, because otherwise, this means that another transaction is attempting to take a held lock, causing a deadlock. Hence, there is at least one conflict-serialisable schedule: the one in which the transactions appear in the same order as first unlocks. Suppose the schedule starts with T1 locking and reading _A_. If T2 locks _B_ before T1 reaches its unlocking phase, then there is a deadlock, and the schedule cannot complete. Thus, if T1 performs an action first, it must perform _all_ its actions before T2 performs any. Likewise, if T2 starts first, it must complete before T1 starts, or there is a deadlock. Thus, only the two serial schedules of these transactions are legal."
Concurrency Control.md,1669012068683,Lock mechanisms,Lock mechanisms
Concurrency Control.md,1669012068683,Shared and Exclusive locks,"Shared and Exclusive locks - Shared lock: to allow for multiple transactions to perform `READ` - Exclusive lock: for `WRITE` A transaction should only ask for an exclusive lock when it is ready to write, so that any read operations can still continue. Upgrade the lock when needed:"
Concurrency Control.md,1669012068683,Update locks,"Update locks Deadlocks can occur when transactions are unable to upgrade their shared locks to exclusive ones, since there are already shared locks taken. A separate lock type that may be later upgraded to an exclusive lock is needed."
Concurrency Control.md,1669012068683,Compatibility matrix,Compatibility matrix
Concurrency Control.md,1669012068683,Workings of Scheduler,Workings of Scheduler 1. Part 1: Inserts appropriate lock actions ahead of all DB access operations and release the locks held by the Transaction when it aborts/commits 2. Part 2: maintains a waiting list of transactions that need to acquire locks
Concurrency Control.md,1669012068683,Deadlock Detection & Prevention,Deadlock Detection & Prevention
Concurrency Control.md,1669012068683,Timeout,"Timeout Place a limit on how long a transaction may be active, if it exceeds this time, it is forced to release its locks and other resources and roll back."
Concurrency Control.md,1669012068683,Waits-For Graph,Waits-For Graph Utilises the [cyclic properties of deadlocks](Notes/Deadlocks.mdCyclic%20Properties%20of%20Deadlocks) to detect them. - Each transaction holding a lock or waiting for one is a node - An edge exists from T1 to T2 if there is some element A where: - T2 holds a lock on A - T1 is waiting for lock on A - T1 cannot get the lock on A unless T2 releases it This graph can become very large ad analysing this graph for every action can take a long time.
Concurrency Control.md,1669012068683,Timestamps,Timestamps Assign each transaction with a timestamp. This timestamp never changes for the transaction even if it is rolled back.
Concurrency Control.md,1669012068683,Wait-Die,Wait-Die
Concurrency Control.md,1669012068683,Wound-Wait,Wound-Wait
Concurrency Control.md,1669012068683,Comparison,Comparison
Concurrency Control.md,1669012068683,Timestamp Ordering,"Timestamp Ordering An optimistic approach. Use the timestamps of transactions to determine the serialisability of transactions. - Each transaction receives a unique timestamp TS(T). - If TS(Ti ) < TS(Tj ), then the DBMS must ensure that the execution schedule is equivalent to a serial schedule where Ti appears before Tj ."
Concurrency Control.md,1669012068683,Rules,"Rules - A transaction wants to read, but the element has already been written to by another transaction - A transaction wants to write, but the element has already been read by another transaction or written by another transaction"
Concurrency Control.md,1669012068683,Thomas Write Rule,"Thomas Write Rule When a transaction wants to write, and TS(T) < WT(X), ignore the write and allow the transaction to continue without aborting. - Timestamp ordering creates conflict serialisable schedules when this rule is not used - Schedules are not recoverable as TO does not have any checks"
Concurrency Control.md,1669012068683,Comparisons,Comparisons
Concurrency Control.md,1669012068683,Multi Version Concurrency Control,"Multi Version Concurrency Control A misnomer as it is not actually a concurrency control protocol. DBMS maintains multiple physical versions of a single object in the database: - When transaction writes to an object, the DBMS creates a new version - When transaction reads an object, it reads the newest version that exists when the transaction started Each user connected to the database sees a _snapshot_ of the database at a particular instant in time. Any changes made by a writer will not be seen by other users of the database until the changes have been completed, providing isolation"
Concurrency Control.md,1669012068683,Concurrency Protocol,Concurrency Protocol Able to use any concurrency protocol to underly its implementation. MVCC is simply another layer on top of these protocols to provide transactional memory.
Concurrency Control.md,1669012068683,Version Storage,"Version Storage DBMS creates a version chain per tuple, which allows it to find version that is visible to a particular transaction at runtime. There are indexes which point to the head of the chain. Version ordering: - Oldest to Newest: append new version at the end of chain -> the entire chain has to be traversed for look ups - Newest to Oldest: append new version at the head of the chain, update index pointers to the head for every new version"
Concurrency Control.md,1669012068683,Append only,Append only
Concurrency Control.md,1669012068683,Time travel storage,Time travel storage
Concurrency Control.md,1669012068683,Delta storage,Delta storage
Concurrency Control.md,1669012068683,Garbage Collection,Garbage Collection
Concurrency Control.md,1669012068683,Practice Problems,"Practice Problems ```mermaid graph LR; T3 --> T2; T1 --> T2; T3 --> T1; ``` | Time | T1 | T2 | T3 | | ---- | -------- | -------- | ------- | | t1 | | | Read(A) | | t2 | | | Read(B) | | t3 | Read(A) | | | | t4 | Read(C) | | | | t5 | Write(A) | | | | t6 | | Read(C) | | | t7 | | Read(B) | | | t8 | | Write(C) | | | t9 | | Write(B) | | In 2PL, all locks must be acquired by the transaction, operations are done and then all locks are released at once. This is schedule is not consistent with 2PL. T1 takes ul(B), xl(B), ul(D), xl(D). T2 reads and writes item B at step 6, this is not possible if T1 still has the exclusive lock on B. Hence, T1 must have released all locks by step 6. However, T1 still takes read and write actions on D at step 13,14. The minimal set of actions to remove: - 5 - 6 | Time | T1 | T2 | | ---- | --------------- | -------------- | | 1 | Read(Savings) | | | 2 | | Read(Checking) | | 3 | Write(Checking) | | | 4 | | Write(Savings) | | Time | T1 | T2 | R(X) | W(X) | R(Y) | W(Y) | | ---- | ----- | ---- | ---- | ---- | ---- | ---- | | 1 | r1(x) | | 1 | 0 | 0 | 0 | | 2 | | r(x) | 2 | 0 | 0 | 0 | | 3 | | w(x) | 2 | 2 | 0 | 0 | | 4 | r(y) | | 2 | 2 | 1 | 0 | | 5 | | r(y) | 2 | 2 | 2 | 0 |"
Concurrency.md,1685868061346,---,"--- title: ""Concurrency"" date: 2023-06-04 ---"
Concurrency.md,1685868061346,Concurrency,Concurrency
Consensus.md,1678398528125,---,"--- title: ""Consensus"" date: 2023-02-16 lastmod: 2023-03-08 ---"
Consensus.md,1678398528125,Consensus,Consensus Processes propose values and they all have to agree on one of these values. - Single Value Consensus - Validity: decided values are those which are proposed - Agreement: no 2 *correct* processes decide differently - Termination: every correct process eventually decides - Integrity: a process decides at most once - Single Value Uniform Consensus - **Uniform** Agreement: no *2 processes* decide different values **Consensus is not solvable in the asynchronous system model if any node is allowed to fail** - Unable to detect failure - Cannot wait for the correct majority of processes - Termination not satisfied: cannot decide
Consensus.md,1678398528125,Paxos Algorithm,Paxos Algorithm An [Eventual Leader Election](Notes/Failure%20Detectors.mdEventual%20Leader%20Election) (weakest leader elector we can use) can be used to eventually elect 1 single proposer *(providing termination)*. - Proposers: attempt to impose their proposal to acceptors - Acceptors: may accept values issued by proposers (cannot talk to each other) - Learners: decide depending on what is accepted Contention problem: several processes might initially be proposers
Consensus.md,1678398528125,Abortable Consensus,Abortable Consensus Algorithm aborts if there is contention of multiple proposers.
Consensus.md,1678398528125,Version 1 (Centralised),"Version 1 (Centralised) Proposer sends value to a central acceptor. Acceptor decides on the first value which it gets. Problem 1: if this acceptor fails, we will never know of the decision"
Consensus.md,1678398528125,Version 2 (Decentralised),"Version 2 (Decentralised) Proposers talk to a set of acceptors, use a majority [quorum](Notes/Distributed%20Abstractions.mdQuorums) to choose a value and enable fault tolerance. Problem 2: acceptor accepts the first proposal, if messages arrive out of order, possible to have no majority"
Consensus.md,1678398528125,Version 3 (Enable restarts),"Version 3 (Enable restarts) If no majority value, we need to restart until there is one. Since proposers can propose again, we need a way to differentiate between them. - Use a ballot number: sequence number in the form $i, n+i, 2n+i$ for a process $i$ and $n$ processes Problem 3: restarts lead to different majority accepted values across time, learners cannot make a single decision"
Consensus.md,1678398528125,Version 4 (Prepare and Accept),"Version 4 (Prepare and Accept) We need a way to ensure that every higher number proposal results in the same chosen value - Satisfied by ensuring acceptors only accept this value - Satisfied by ensuring proposers only propose this value - Proposers need to learn this value from the highest sequence number of those accepted. Proposers need to ensure that this ""highest value"" does not change. Proposers query acceptors so that if a value is accepted, every higher proposal issued has the same value previously accepted 1. Proposer $prepare(n)$: - Gets a promise from acceptors not to accept a proposal with lower ballot number n - Acceptor also responds with the value corresponding to the highest ballot number proposal 2. Proposer $accept(n,v)$: - Pick the value from the maximum proposal number returned. If none of the processes return a value, proposer can pick freely. - Acceptor $accept(n,v)$ if not accepted any $prepare(m)$ such that $m>n$; else $reject$ 1. Proposer $decide(v)$ if majority acks; else $abort$"
Consensus.md,1678398528125,Optimisations,"Optimisations - Reject `prepare(n)` if accepted `prepare(m); m > n`: Reject a lower prepare - Reject `accept(n,v)` if answered `accept(m,u); m > n`: Reject a lower accept - Reject `prepare(n)` if answered `accept(m,u); m > n` : Reject a lower accept - Ignore messages if majority obtained"
Consensus.md,1678398528125,Multi Paxos,"Multi Paxos The motivation: replicated state machines need to agree on a sequence of commands to execute. Approach: organise the algorithm into rounds. In each round, each server starts a new instance of Paxos. They propose (2 RTT), accept (2 RTT) and decide on 1 command, add that to the log and restart. Initial states - $ProCmds = \emptyset$: stores the list of commands proposed - Log = <>: a log of decided commands A process which wants to execute a command C triggers $rb-broadcast<C, Pid>$. On delivery, the command pair is added to `ProCmds` unless it is already in Log. Problem: the same command across multiple processes might be decided in different slots in time."
Consensus.md,1678398528125,Sequence Consensus,"Sequence Consensus Rather than agreeing on a single command and storing that in a Log, we can directly try to agree on the sequence of commands. - Validity: if process p decides on a value, the value is a sequence of commands - Uniform Agreement: if process p decides u and another decides v, then *one is a prefix of the other* - Integrity: process can later decide another value, but the *previous value is a strict prefix of the newly decided value* - Termination After adopting a value with highest proposal number, the proposer is allowed to extend the sequence with the new command. Problem: in the prepare phase, processes send a lot of redundant state as the full log is transferred between the proposer and acceptor leading to high IO. No pipelining as well since each round must begin with the prepare phase."
Consensus.md,1678398528125,Log Synchronisation,"Log Synchronisation Modify the prepare phase and shared states such that we can work on a single synchronised log $v_a$. *To do this, let 1 process act as the sole leader (proposer) until it is aborted by an election of higher ballot number*"
Consensus.md,1678398528125,Prepare Phase,"Prepare Phase The leader sends `Prepare`: - current round: $n$ - accepted round: $n_a$ - log length: $|v_a|$ - decided index $l_d$, where the decided sequence is $prefix(v_a,l_d)$ The followers reply with `Promise`: - their accepted round - the log entries which the leader is missing and the leader appends those to the log. `AcceptSync` is used to synchronise the new log. *Promised followers and leader now have the same common log prefix*"
Consensus.md,1678398528125,Accept Phase,"Accept Phase The leader sends `Accept` command with highest $n$ to all promised followers The followers reply with `Accepted` When majority accepted, `Decide` is sent. Any late `Promise` is replied with an `AcceptSync`"
Consensus.md,1678398528125,Partial Connectivity (enabling quorum connectedness),"Partial Connectivity (enabling quorum connectedness) Chained scenario: When one server loses connectivity to the leader, it will try to elect itself as a leader. Livelock situation as servers compete to become the leader. Can be solved if A becomes the leader but can't because it is already connected to a leader. Quorum Loss: When the leader loses quorum connectivity, deadlock situation as a majority cannot be obtained to make progress. B, D, E cannot elect a leader without a quorum. A is quorum connected but cannot elect a new leader since it is still connected to the alive leader C. Constrained Election: Leader is fully disconnected. A can become the new leader but will not be elected as it does not have the most updated log (log length)."
Consensus.md,1678398528125,Failure recovery,"Failure recovery 1. Recover state from persistent storage 2. Send a `PrepareReq` to all peers - If elected as leader, synchronise through a `Prepare` phase - `Prepare` phase from another leader will synchronise"
Consensus.md,1678398528125,Reconfiguration,"Reconfiguration Supporting a way to add/replace any process part of the replicated state machine. A configuration $c_i$ is defined by a set of process ids $\{p1, p2, p3\}$ and the new configuration can be any new set of processes e.g. $\{p1,p2,p4\}$"
Consensus.md,1678398528125,Stop Sign,"Stop Sign To safely stop the current configuration, we must prevent new decisions in the old configuration (""split-brain"" problem) using a stop sign: The stop sign contains information about the new configuration to help processes reconfigure: - the new set of processes in $c_{i+1}$ - the new configuration id number - the identifiers for each replica in the new configuration Each process on viewing the stop sign, can safely shut down and restart in the new configuration A new process not previously part of $c_i$ must perform log migration to catch up with the new instance. Log migration can be done with snapshots of the latest state."
Consensus.md,1678398528125,Raft,Raft A state of the art consensus algorithm.
Consensus.md,1678398528125,State of servers,"State of servers Rather than using process ids to break ties for leader election in omnipaxos, Raft uses a form of random retrying when there are split votes."
Consensus.md,1678398528125,Log Reconciliation,"Log Reconciliation A server must have the most up to date log in order to become a leader, compared to omni-paxos where any server can be the leader and be synced up during the Prepare phase."
Context Switch.md,1689954471879,---,"--- title: ""Context Switch"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Context Switch.md,1689954471879,Context Switch,"Context Switch OS preserves the state of the CPU by backing up the whole state of the task, including the call stack, storing registers and the program counter. __Context switch time is overhead__: note that there is time spent where both processes are idle As the call stack can be vary large, the OS typically sets up a separate call stack for each task instead of having to back up the entire call stack content on each task switch. Such a task with its own call stack is a [thread](Notes/Threads.md)."
Constraint Satisfaction Problem.md,1669012068676,---,"--- title: ""Constraint Satisfaction Problem"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Constraint Satisfaction Problem.md,1669012068676,Constraint Satisfaction Problem,Constraint Satisfaction Problem There are a set of constraints which specify the allowable combinations of values **Forward checking**: Pre-emptively removes inconsistent values from the domains of neighbouring variables. Prevents unnecessary expansion if constraints have already been violated. This is uninformed and can be improve with heuristics:
Constraint Satisfaction Problem.md,1669012068676,Constraint Propagation,"Constraint Propagation Propagate the implications of constraints from assigning 1 variable to the other variables. Useful to optimize the __order of variable assignments__. This has the effect of making inconsistent assignments to **fail earlier in the search**, which enables more efficient pruning. This means that it may lead to more dead ends than: Useful to optimize the __order of value assignments__. This prevents deadlocks and reduces backtracking by choosing the values which are most likely to work."
Constraint Satisfaction Problem.md,1669012068676,Example Problems,Example Problems Cryptarithmetic Puzzle
Conventional Indexes.md,1669012068678,---,"--- title: ""Conventional Indexes"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Conventional Indexes.md,1669012068678,Conventional Indexes,Conventional Indexes Indexes are needed to reduce the I/O required to find a record.
Conventional Indexes.md,1669012068678,Updating Indexes,Updating Indexes 1. Locate the targeted record or the place to hold new record 2. Update data file 3. Update index
Conventional Indexes.md,1669012068678,Clustered and Non-Clustered Indexes,Clustered and Non-Clustered Indexes Clustering index: indexes on an attribute is such that all the tuples with a fixed value for the search key of this index appear on as few blocks as can hold them. If a relation is clustered (it must be sorted and packed together according to some attribute a) another index on another attribute _b_ would likely be non-clustered unless _a_ and _b_ are highly correlated.
Conventional Indexes.md,1669012068678,Comparisons,Comparisons
Conventional Indexes.md,1669012068678,Read,Read A range read of keys that are close together will result in high number of I/O:
Conventional Indexes.md,1669012068678,Update,Update Clustered indexes will not be as good if the database goes through many update operations.
Conventional Indexes.md,1669012068678,Multi-layer Index,Multi-layer Index
Conventional Indexes.md,1669012068678,Practice Problems,"Practice Problems a. Dense: We need 300 key pointer pairs. Each block can hold 10 pairs. Total blocks = 300 / 10 = 30 Sparse: 1 index pointer can point to a block of 3 records. Each block can hold 10 pointers. 1 index block represents 30 records. $300/30=10$ blocks needed b. Worst case: retrieve the last record -> 10 I/O c. Another sparse index to point to a block of sparse index Since the initial sparse index needs 10 blocks to represent, the second level index can use 1 block (10 pointers) to fully represent it. I/O for 2nd level: 1 I/O for 1st level: 1 I/O to read record: 1 Total 3 I/O a. Best case when inserting a record in the not full block with record 9. Insert 10 1 I/O to read the index block, 1 I/O to load the block with record 9. Total 2 I/O b. Worst case when inserting into first data block. Insert 0. 1 I/O to read index block. Need to load every data block to shift records down. Total 1+4=5 I/O."
Custom Computing.md,1669221254604,---,"--- title: ""Custom Computing"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Custom Computing.md,1669221254604,Custom Computing,"Custom Computing Custom computers are special-purpose systems customised for specific applications such as signal processing and database operations, when general-purpose computers are too slow, too bulky or consume too much power."
Custom Computing.md,1669221254604,General Purpose Processors (GPP),General Purpose Processors (GPP)
Custom Computing.md,1669221254604,Application Specific Instruction Set Processor (ASIP),Application Specific Instruction Set Processor (ASIP) Flexibility should just be sufficient instead of unlimited in the case of a GPP. Want to achieve highest performance with minimum power consumption.
Custom Computing.md,1669221254604,Digital Signal Processor,Digital Signal Processor Architecture designed for repetitive multiply-accumulate operations and bit-reversal addressing
Custom Computing.md,1669221254604,GPU,GPU
Custom Computing.md,1669221254604,Field Programmable Gate Array (FGPA),"Field Programmable Gate Array (FGPA) The FGPA has an edge over the DSP by supporting parallel designs and greater performance. However, it is more expensive and takes longer time to manufacture."
Custom Computing.md,1669221254604,Application Specific Integrated Circuits (ASIC),"Application Specific Integrated Circuits (ASIC) With more customization, the ASIC can achieve lower power consumption but results in inflexibility."
Custom Computing.md,1669221254604,Heterogenous Computing Systems,Heterogenous Computing Systems
Custom Computing.md,1669221254604,Examples,Examples
Control Flow.md,1684871052369,---,"--- title: ""Control Flow"" date: 2023-05-12 lastmod: 2023-05-14 ---"
Control Flow.md,1684871052369,Control Flow,Control Flow
Control Flow.md,1684871052369,Conditional Execution,"Conditional Execution Adding the if statement to grammar: ``` statement → exprStmt | ifStmt | printStmt | block ; ifStmt → ""if"" ""("" expression "")"" statement ( ""else"" statement )? ; ``` In java: ```java private Stmt ifStatement() { consume(LEFT_PAREN, ""Expect '(' after 'if'.""); Expr condition = expression(); consume(RIGHT_PAREN, ""Expect ')' after if condition.""); Stmt thenBranch = statement(); Stmt elseBranch = null; if (match(ELSE)) { elseBranch = statement(); } return new Stmt.If(condition, thenBranch, elseBranch); } ```"
Control Flow.md,1684871052369,Dangling Else Problem,Dangling Else Problem The else clause is optional. Most parsers bind the else to the nearest if that precedes it.
Control Flow.md,1684871052369,Logical Operators,"Logical Operators Updated grammar: ``` expression → assignment ; assignment → IDENTIFIER ""="" assignment | logic_or ; logic_or → logic_and ( ""or"" logic_and )* ; logic_and → equality ( ""and"" equality )* ; ```"
Control Flow.md,1684871052369,While loops,"While loops Updated grammar: ``` statement → exprStmt | ifStmt | printStmt | whileStmt | block ; whileStmt → ""while"" ""("" expression "")"" statement ; ```"
Control Flow.md,1684871052369,For loops,"For loops Updated grammar: ``` statement → exprStmt | forStmt | ifStmt | printStmt | whileStmt | block ; forStmt → ""for"" ""("" ( varDecl | exprStmt | "";"" ) expression? "";"" expression? "")"" statement ; ``` The first clause is the initializer. It is executed exactly once, before anything else. It’s usually an expression, but for convenience, we also allow a variable declaration. In that case, the variable is scoped to the rest of the for loop—the other two clauses and the body. Next is the condition. It’s evaluated once at the beginning of each iteration, including the first. If the result is truthy, it executes the loop body. Otherwise, it bails. The last clause is the increment. It’s an arbitrary expression that does some work at the end of each loop iteration. The result of the expression is discarded, so it must have a side effect to be useful. In practice, it usually increments a variable. Any of these clauses can be omitted. Following the closing parenthesis is a statement for the body, which is typically a block"
Control Flow.md,1684871052369,Desugaring,Desugaring We don't actually *need* the for loop. It is syntactic sugar for the primitive operations we already have. The for loop can be rewritten to: ```java { var i = 0; while (i < 10) { print i; i = i + 1; } } ```
Datapath and Control Design.md,1669012068669,---,"--- title: ""Datapath and Control Design"" date: 2022-11-08 lastmod: 2022-11-21 --- - [Instructions](Notes/Instructions.md)"
Crafting Interpreters.md,1685133493449,---,"--- title: ""Crafting Interpreters"" date: 2023-04-24 lastmod: 2023-05-26 tags: [moc] ---"
Crafting Interpreters.md,1685133493449,Crafting Interpreters,"Crafting Interpreters moc Notes from the book [Crafting Interpreters](http://craftinginterpreters.com/welcome.html). These consists of code examples in Java, but also the core concepts required to build an interpreter from scratch in any language. - [[Notes/Programming Language Design]] - [Scanning](Notes/Scanning.md) - [Representing Code](Notes/Representing%20Code.md) - [[Control Flow]] - [Functions](Notes/Functions.md) - [[Classes]] - [[Virtual Machine]] - [[Compiling Expressions]]"
Crafting Interpreters.md,1685133493449,Parts of a Language,Parts of a Language The paths from source code to machine code:
Crafting Interpreters.md,1685133493449,Scanning,Scanning Also known as lexing or lexical analysis. Take a linear stream of characters and chunk them into *tokens*. More of this in [[Scanning]].
Crafting Interpreters.md,1685133493449,Parsing,Parsing Takes the tokens and forms *grammar* through construction of an [Abstract Syntax Tree](Notes/Representing%20Code.mdAbstract%20Syntax%20Tree).
Crafting Interpreters.md,1685133493449,Static Analysis,"Static Analysis ""In an expression like a + b, we know we are adding a and b, but we don’t know what those names refer to. Are they local variables? Global? Where are they defined?"" - Binding: for each *identifier*, figure out where it is defined and wire them together. This is affected by scoping. - Type checking: if the language is statically typed, figure out the types of those identifiers and report type errors where operations are not supported. Results from the analysis needs to be stored for later use: - AST: stored back as attributes on the AST which were not previously initialised during parsing - Symbol table: a lookup table associating each key (identifier) to what it refers to"
Crafting Interpreters.md,1685133493449,Intermediate Representation,"Intermediate Representation Compiling code can be viewed as involving two ends. The ""front-end"" is specific to the source code language which the program is written in. The ""back-end"" is the target architecture which the program will run. IRs allow different front-ends to produce a shared IR, and allow different back-ends to convert the IR to the target architecture."
Crafting Interpreters.md,1685133493449,Optimisation,Optimisation Swapping parts of the program for parts which have the same semantics but implemented more efficiently.
Crafting Interpreters.md,1685133493449,Code generation,Code generation Converting the IR into machine code. There are two options: 1. Real machine code generation: native code which the OS can directly execute . This is fast but involves complex work due to high number of instructions. It also makes the code less portable as it will only work on the specific target architecture. 2. Virtual machine code i.e. bytecode generation: produce code for a generalised idealised virtual machine which has synthetic instructions mapping more closely to language semantics than to a specific computer architecture
Crafting Interpreters.md,1685133493449,Virtual Machine,"Virtual Machine Not to be confused with the [system virtual machine](Notes/Virtualization.md). This abstraction refers to process virtual machines, a program that emulates the hypothetical chip to support the virtual architecture (targeted by the virtual machine code) at runtime."
Crafting Interpreters.md,1685133493449,Runtime,"Runtime We usually need some services that our language provides while the program is running. E.g. Automatic memory management: a garbage collector needs to be implemented to reclaim unused bits. In compiled languages like Go, the code implementing the runtime is directly inserted into the resulting executable. If a language is run inside an interpreter like Python, the runtime lives there."
Crafting Interpreters.md,1685133493449,Alternate Paths,Alternate Paths
Crafting Interpreters.md,1685133493449,Tree-walk interpreters,"Tree-walk interpreters The interpreter traverses the abstract syntax tree and evaluates each node as it goes. IR, code generation not required. Those are real advantages. But, on the other hand, it’s *not memory-efficient*. Each piece of syntax becomes an AST node. A tiny Lox expression like `1 + 2` turns into a slew of objects with lots of pointers between them, something like:"
Crafting Interpreters.md,1685133493449,Bytecode interpreter,"Bytecode interpreter Structurally, bytecode resembles machine code. It’s a dense, linear sequence of binary instructions. That keeps overhead low and plays nice with the cache. However, it’s a much simpler, higher-level instruction set than any real chip out there. (In many bytecode formats, each instruction is only a single byte long, hence “bytecode”.) To execute the bytecode, we need to write an *emulator*—a simulated chip written in software that interprets the bytecode one instruction at a time. A *virtual machine (VM)*, if you will. If we write our VM in a language like C that is already supported on all the machines we care about, and we can run our emulator on top of any hardware we like."
Crafting Interpreters.md,1685133493449,Transpilers,"Transpilers Writing a complete backend for a language is a lot of work. Another method could be to write the front end of the language and in the backend, produce a string of valid source code for some other language that is about as high level and use the backend tools for that language to do the rest of the work e.g. Typescript to JavaScript."
Crafting Interpreters.md,1685133493449,Interpreter vs Compiler,"Interpreter vs Compiler Compiling is an implementation technique that involves translating a source language to some other usually lower level form. Generating bytecode, transpiling are all examples of compiling. An implementation ""is an interpreter"" if it takes source code and executes it immediately. Go is an interpreter: `go run` compiles Go source code to machine code and runs it. Go is a compiler : `go build` compiles without running."
Data Level Parallelism.md,1669012068673,---,"--- title: ""Data Level Parallelism"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Data Level Parallelism.md,1669012068673,Data Level Parallelism,Data Level Parallelism The same operation is performed on multiple data values concurrently in multiple processing units. Can reduce the Instruction Count to enhance performance
Data Level Parallelism.md,1669012068673,Processors,Processors Different types of hardware can support different levels of data parallelism.
Data Level Parallelism.md,1669012068673,Flynn's Processor Taxonomy,Flynn's Processor Taxonomy Advantages of SIMD > MIMD - Allow sequential thinking yet achieves parallel speedup - Reduced energy usage - More efficient parallel efficiency
Data Level Parallelism.md,1669012068673,Single Instruction Multiple Data (SIMD),Single Instruction Multiple Data (SIMD)
Data Level Parallelism.md,1669012068673,Vector processor,"Vector processor One vector instruction can perform N computations, where N is the vector length. - Reduces the number of instructions: less branching - Less execution time with lower instruction count - Simpler design as there is no requirement for data dependency check since each execution is independent"
Data Level Parallelism.md,1669012068673,Array processor,"Array processor - Array processor works more like a true parallel system, with each processor able to run same instruction on different data - Vector processor works in more of a pipelined fashion."
Data Level Parallelism.md,1669012068673,Multimedia Extensions (MMX),Multimedia Extensions (MMX)
Deadlocks.md,1669012068667,---,"--- title: ""Deadlocks"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Deadlocks.md,1669012068667,Deadlocks,Deadlocks A set of blocked processes each holding a resource and waiting to acquire a resource held by another process in the set *An example with improper semaphore usage:*
Deadlocks.md,1669012068667,Modelling Deadlocks,Modelling Deadlocks
Deadlocks.md,1669012068667,Cyclic Properties of Deadlocks,"Cyclic Properties of Deadlocks > [!Having a cycle in the graph is only a necessary condition but *not a sufficient condition* for a deadlock.] If each resource only has 1 instance, a cycle implies a deadlock."
Deadlocks.md,1669012068667,Deadlock Conditions,"Deadlock Conditions A deadlock **may** occur if these conditions hold at the same time: 1. Mutual exclusion: Only one process at a time can use a resource instance 2. Hold and wait: A process holding at least one resource is waiting to acquire additional resources held by other processes 3. No preemption: A resource can be released only voluntarily by the process holding it, after that process has completed its task 4. Circular wait: There exists a set {P0, P1, …, Pn} of waiting processes such that P0 is waiting for a resource that is held by P1, P1 is waiting for P2, …, Pn–1 is waiting for Pn, and Pn is waiting for P0"
Deadlocks.md,1669012068667,Deadlock Prevention,"Deadlock Prevention As long as we are able to ensure at least one of the following conditions do not hold, we can prevent a deadlock from occurring. Example using [](Notes/Process%20Synchronization.mdDining%20Philosophers%7CDining%20Philosophers%20Problem): Each process must request for the lower numbered resource first before able to request for the higher numbered resource. This breaks the circular wait as process requests are increasing in their order (no cycle):"
Deadlocks.md,1669012068667,Deadlock Avoidance,"Deadlock Avoidance Rather than prevent deadlocks as they are about to occur, we can avoid entering into a state where deadlocks are possible. This state is called the *unsafe state*. > [! Safe state] > if there exists a safe completion sequence of all processes without deadlock A process completion sequence is safe, if for each $P_i$ , the resources that it requests can be satisfied by currently available resources + resources held by all the $Pj , j< i$ Algorithm: 1. When a process request for resource, determine if allocation leaves the system in a safe state 2. If safe: grant the request 3. Else: wait"
Deadlocks.md,1669012068667,Banker's Algorithm,Banker's Algorithm Checking whether the satisfaction of a request will lead to an unsafe state Necessary assumptions: 1. Each process must declare the maximum instances of each resource type that it needs 2. When a process gets all its resources it must return them in a finite amount of time We need to keep track of some information: Let m be the number of resource types and n be number of processes - Available: `Available = [m]int` the number of instances of each resource currently available to be allocated - Max: `Max[n][m]` is number of resource a process can request. *Note*: the process completes once it reaches its max - Allocation: current allocated resources - Need: number of instances required to complete Each process can also make a request for resources:
Deadlocks.md,1669012068667,Deadlock Detection,"Deadlock Detection Allow the system to enter deadlock state, invoke detection and recovery algorithms."
Deadlocks.md,1669012068667,Practice Problems,"Practice Problems a. False. If there are only 4 people, the circular wait condition is broken b. True. A single process will not be in a deadlock as there are no other processes which it is sharing resources with. *Hold and wait condition can never be satisfied.* c. False. Not all cycles indicate a deadlock. Depends on the number of resource instances a. Available -> 1. P4 allocation = 5. No process can be satisfied with available 1. Unsafe state b. Safe state. Completion order: P3, P4, P2, P1 x = 0 | Process | Allocation | Need | Available | Completed | | ------- | ---------- | ----- | --------- | ------------ | | P0 | 2 1 1 | 0 1 0 | 0 1 0 | P0 Completed | | P1 | 1 1 0 | 2 1 2 | 2 2 1 | P2 Completed | | P2 | 1 1 1 | 2 0 1 | 3 3 2 | P1 Completed | | P3 | 1 1 1 | 4 1 0 | 4 4 2 | P3 Completed |"
Decision Trees.md,1678961336305,---,"--- title: ""Decision Trees"" date: 2023-02-09 lastmod: 2023-03-16 ---"
Decision Trees.md,1678961336305,Decision Trees,Decision Trees A decision tree is an analysis strategy by asking questions about the target sequentially This type of logical expression is easiest for a decision tree to learn.
Decision Trees.md,1678961336305,Growing the Tree,Growing the Tree 1. Choose the best question (measured base on information gain) and split the input data into subsets 2. Terminate when a unique class label is formed (no need for further questions) 3. Grow by recursively extending other branches
Decision Trees.md,1678961336305,Entropy (measuring information gain),Entropy (measuring information gain) - [Entropy](Notes/Information%20Theory.mdEntropy) - [Gini Impurity](Notes/Information%20Theory.mdGini%20Impurity)
Decision Trees.md,1678961336305,Choosing attributes,Choosing attributes
Decision Trees.md,1678961336305,Avoid overfitting,"Avoid overfitting - Stop growing when data split not statistically significant - Grow full tree, then post-prune (e.g. Reduced error pruning)"
Dependency Injection.md,1676224441711,---,"--- title: ""Dependency Injection"" date: 2022-11-08 lastmod: 2023-02-12 ---"
Dependency Injection.md,1676224441711,Dependency Injection,Dependency Injection
Dependency Injection.md,1676224441711,Problems we want to solve,"Problems we want to solve 1. How can a class be independent from the creation of the objects it depends on? 2. How can an application, and the objects it uses support different configurations? 3. How can the behaviour of a piece of code be changed without editing it directly?"
Dependency Injection.md,1676224441711,General Idea,"General Idea An object receives other objects that it depends on. A form of inversion of control, dependency injection aims to separate the concerns of constructing objects and using them, leading to loosely coupled programs."
Dependency Injection.md,1676224441711,Constructor injection,"Constructor injection The most common form of dependency injection is for a class to request its dependencies through its constructor. This ensures the client is always in a valid state, since it cannot be instantiated without its necessary dependencies. ```java // This class accepts a service in its constructor. Client(Service service) { // The client can verify its dependencies are valid before allowing construction. if (service == null) { throw new InvalidParameterException(""service must not be null""); } // Clients typically save a reference so other methods in the class can access it. this.service = service; } ```"
Dependency Injection.md,1676224441711,Pros,Pros 1.
Dependency Injection.md,1676224441711,Cons,Cons 1.
Dijkstra's Algorithm.md,1675630041868,---,"--- title: ""Dijkstra's Algorithm"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Dijkstra's Algorithm.md,1675630041868,Dijkstra's Algorithm,"Dijkstra's Algorithm [Shortest Path Problem](Notes/Shortest%20Path%20Problem.md) Find the shortest path from source to all other vertices. > [!NOTE] Adding a positive constant to all edges > Dijkstra finds the shortest path in terms of the edge weights and not the number of edges. Hence, the shortest path may contain many edges. __This means that a change in edge weights will result in a different shortest path unless the number of edges in each path is the same.__ > [!NOTE] Multiplying all edges by a positive constant > > > This leads to a contradiction as Q is now a shorter path than P but P is the shortest path in G. Assumptions: **Weights must be nonnegative**"
Dijkstra's Algorithm.md,1675630041868,Data Structures Needed,Data Structures Needed
Dijkstra's Algorithm.md,1675630041868,Pseudocode,Pseudocode We can also obtain the number of distinct shortest paths by using an additional n-size array to store the counts of paths which have the same shortest distance:
Dijkstra's Algorithm.md,1675630041868,Proof of Correctness,Proof of Correctness Why the greedy choice is optimal: > [!important] > This step is the reason for why graphs with negative weights do not ensure correctness of Dijkstra's .
Dijkstra's Algorithm.md,1675630041868,Examples,Examples Manually computing shortest path:
Depth First Search.md,1669012068654,---,"--- title: ""Depth First Search"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Depth First Search.md,1669012068654,Depth First Search,Depth First Search
Depth First Search.md,1669012068654,Graph Traversal,Graph Traversal _Assuming ties are handled in alphabetical order_ Expansion Order: A > B > C > E > F > G Final Path: A > B > C > E > F > G
Depth First Search.md,1669012068654,Pseudocode,"Pseudocode A recursive implementation of DFS: **procedure** DFS(_G_, _v_) **is** label _v_ as discovered **for all** directed edges from _v_ to _w that are_ **in** _G_.adjacentEdges(_v_) **do** **if** vertex _w_ is not labeled as discovered **then** recursively call DFS(_G_, _w_) A non-recursive implementation of DFS with worst-case space complexity O(E) **procedure** DFS_iterative(_G_, _v_) **is** let _S_ be a stack _S_.push(_v_) **while** _S_ is not empty **do** _v_ = _S_.pop() **if** _v_ is not labeled as discovered **then** label _v_ as discovered **for all** edges from _v_ to _w_ **in** _G_.adjacentEdges(_v_) **do** _S_.push(_w_)"
Default Logic.md,1669012068663,---,"--- title: ""Default Logic"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Default Logic.md,1669012068663,Default Logic,Default Logic
Default Logic.md,1669012068663,Definitions,Definitions
Default Logic.md,1669012068663,Reiter Extension,Reiter Extension
Default Logic.md,1669012068663,Makinson Approach,Makinson Approach
Default Logic.md,1669012068663,Process Tree Algorithm,"Process Tree Algorithm A __closed__ default is one that has been instantiated The In-set contains all the consequences from applying a default The Out-set contains all the negations of the justifications from applying a default: these are the predicates which cannot be proven true by the In-Set for the extension to be consistent 1. Start with the root node: Out is initialized to $\emptyset$ while In is set to the current knowledge base 2. For every node, check for direct applicability of defaults (If no defaults are directly applicable: __we arrived at a closed process)__ direct applicability must satisfy 2 conditions: 1. Default must be triggered: In-set contains the prerequisite 2. Default must be justified: negation of justifications cannot be proven True from the current In-set 4. If the new In-set becomes inconsistent ($In \cap Out \neq \emptyset$ or $In\cup \delta .consq \ \vdash Out$) : __process is unsuccessful__ __We arrive at an extension for every closed and successful process__."
Direct Memory Access.md,1669012068652,---,"--- title: ""Direct Memory Access"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Direct Memory Access.md,1669012068652,Direct Memory Access (DMA),"Direct Memory Access (DMA) Feature that allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU). - Critically, it allows the CPU to be free during read/write operations to perform other work which does not involve the system bus. The OS is responsible for setting up the memory blocks and counters etc. required."
Direct Memory Access.md,1669012068652,Modes,Modes
Direct Memory Access.md,1669012068652,Burst,Burst DMA controller transfers multiple units of data before returning control. - Fast data transfer rate - CPU inactivity for longer periods of time as it needs to wait for a long time for control of the data bus
Direct Memory Access.md,1669012068652,Cycle stealing,"Cycle stealing Release data bus after transferring 1 unit of data. Executes between CPU instructions and pipeline stages. - Slow transfer rate - CPU inactive time is very short, making it favourable for applications which need to be responsive"
Direct Memory Access.md,1669012068652,Transparent,Transparent Transfer data only when CPU is not using the data bus - Potentially slowest transfer rate as CPU could always be using the data bus - CPU basically has no inactive time as transfer only done when it is not using data bus - Complex to detect when CPU is not using the data bus
Dimensionality Reduction.md,1678720919980,---,"--- title: ""Dimensionality Reduction"" date: 2023-03-12 ---"
Dimensionality Reduction.md,1678720919980,Dimensionality Reduction,Dimensionality Reduction
Dimensionality Reduction.md,1678720919980,Principle Component Analysis,"Principle Component Analysis A method for dimensionality reduction, by finding the variables which most account for the variance in the data."
Dimensionality Reduction.md,1678720919980,A 2D example,"A 2D example Plot the data, with each axis being one of the variables. Center the plot around the origin. Find the best fitting line which passes through the data. The best fit is that which minimizes the sum of the distances from data to line. By the Pythagoras theorem, it is also the one which maximises the distance from origin to projected data. The best fit line is Principle Component 1 (PC1). Since this is a 2d example, PC2 is now simply the line perpendicular to PC1. Why? Idk... We can then remove variables which account for less of the variation in the data."
Dimensionality Reduction.md,1678720919980,Linear Discriminant Analysis,"Linear Discriminant Analysis LDA is a method of dimensionality reduction, by finding a new axis (or set of axes) which maximizes the separation amongst the categories in data. When we have n-dimensions of data, LDA allows us to find the axes which can separate the categories the best."
Dimensionality Reduction.md,1678720919980,Subspace Methods,"Subspace Methods Samples in the same class are similar to each other. We can think of them as localized in a subspace spanned by a set of basis vectors. If we project the new test data onto this subspace, we can find the similarity of it to the class. One method to find similarity is to choose the subspace which maximizes the projection length:"
Disk.md,1669012068646,---,"--- title: ""Disk"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Disk.md,1669012068646,Disk,Disk
Disk.md,1669012068646,Disk Mechanics,Disk Mechanics A disk is made up of multiple cylinders (platters) each with a set of tracks > [!NOTE] > Each platter consists of 2 surfaces which data can be read/written Disk capacity calculation:
Disk.md,1669012068646,Disk Access,Disk Access > [!IMPORTANT] > Data can only be accessed in units of blocks. Each block must be loaded from the disk into main memory. Only in main memory can we individually address each word.
Disk.md,1669012068646,Seek time,"Seek time Seek time depends on the total number of cylinders. However, it is not linear as the time taken is also dependent on the acceleration of the head."
Disk.md,1669012068646,Rotational Delay,Rotational Delay $$t = \frac{Angle}{Rotation \ Speed}$$ On average the rotational delay is 0.5 * t
Disk.md,1669012068646,Transfer Time,Transfer Time $$t = \frac{block\ size}{transfer\ rate}$$
Disk.md,1669012068646,Random Disk Access,"Random Disk Access Average seek time: let i be the cylinder of the block just accessed and j be the cylinder of the block to be accessed, N be the total number of cylinders $$t = \frac{\sum_{i=1}^{N}\sum_{j=1}^{N}seektime(i-j)}{N^2}$$"
Disk.md,1669012068646,Sequential Disk Access,Sequential Disk Access Average seek time is approximately 0 as the block to be accessed is likely to be in same cylinder Average rotational delay is approximately 0 as the head points to the next block after current access
Disk.md,1669012068646,Disk Scheduling,Disk Scheduling
Disk.md,1669012068646,First Come First Serve,First Come First Serve
Disk.md,1669012068646,Shortest Seek Time First,Shortest Seek Time First Similar to [](Notes/Process%20scheduling.mdShortest%20Job%20First%20(SJF)%7Cshortest%20job%20first). Selects the request with the minimum seek time from the current head position. It is susceptible to starvation.
Disk.md,1669012068646,Elevator / Scan,"Elevator / Scan Disk arm starts at one end of the disk, and moves toward the other end, servicing requests until it gets to the other end of the disk, where the head movement is reversed:"
Disk.md,1669012068646,C-Scan,"C-Scan Variant of elevator: after reversing direction, may not need to service requests immediately as more requests would be on the other end (uniform distribution)"
Disk.md,1669012068646,C-Look,"C-Look Rather than reversing only when reaching one end of the disk, reverse after servicing the last request in the current direction."
Disk.md,1669012068646,Comparison,Comparison - SSTF is common and has a natural appeal - SCAN and C-SCAN (or LOOK and C-LOOK) perform better for systems that place a heavy load on the disk (since starvation is unlikely) - Performance depends on the number and types of requests - [](Notes/File%20Systems.mdStorage%20allocation%7CFile%20allocation%20methods) also affect the effectiveness of the algorithm. A linked or indexed file may generate requests wide apart. - All the discussed algorithms (except for FCFS) do not solve the underlying issue of starvation. e.g. SCAN can be prevented from servicing the requests on the other end if new requests keep arriving at the same place.
Disk.md,1669012068646,Disk Management,Disk Management Formatting - Divide the disk into sectors which the controller can read and write Partitioning - separating the disk into 1 or more groups of cylinders Logical formatting - Making a new file system by creating data structures to support file access across different partitions
Disk.md,1669012068646,Disk Reliability,Disk Reliability
Disk.md,1669012068646,Striping,"Striping Uses a group of disks as one storage unit - Each block is broken into several sub-blocks, with one sub-block stored on each disk - Time to transfer a block into memory is faster because all sub-blocks are transferred in parallel"
Disk.md,1669012068646,Mirroring,Mirroring Keeps a duplicate of each disk by using 2 physical disks in 1 logical disk. If one fails data can still be read by the other.
Disk.md,1669012068646,Redundant Array of Independent Disks (RAID),"Redundant Array of Independent Disks (RAID) Raid 0: Striping Raid 1: Mirroring Raid 0 + 1: Mirror of Stripes Raid 1 + 0: Strip of Mirror - A difference occurs if there are at least 6 disks involved. - Raid 10 has better fault tolerance: allows for disk 1,3,5 to be down and still functional - Raid 01 degrades to Raid 0 when any disk fails. i.e. Will only be able to read a file from one group."
Disk.md,1669012068646,Storing relational data,Storing relational data
Disk.md,1669012068646,Fields to Record,Fields to Record
Disk.md,1669012068646,Record to Block,Record to Block There a a few considerations when storing a record into a block
Disk.md,1669012068646,Supporting record separation,Supporting record separation
Disk.md,1669012068646,Order of records,Order of records We can store records in the order of the primary key. Order can be maintained either physically (in memory) or logically (through a pointer)
Disk.md,1669012068646,Practice Problems,"Practice Problems a. 10 + 35 + 20 + 18 + 25 + 3 = 111 b. Order: 11->12->9->16->1->34->36 1+3+7+15+33+2 = 61 c. Order: 11->12->16->34->36->9->1 1+4+18+2+27+8 = 60 Total bytes per tuple = 8+17+1+4+4+4+1 = 39 Block contains meta data of 40bytes a. Total byte without block meta data: $8\times 1024-40=8152$ Records: $8152\div 39=209.03$ 209 records can be stored b. Total bytes per tuple: 17 byte character string needs to pad additional 3 bytes: 20 byte 1 byte needs to pad additional 3 bytes: 4 byte $8+20+4+4+4+4+4=48$ bytes Records: $8152\div 48 = 169$ 169 Records c. Total bytes for block header: $10\times 8 = 80$ Total bytes per tuple: 17 byte character string needs to pad additional 7 bytes: 24 byte 1 byte needs to pad additional 7 bytes: 8 byte Record header: $2\times 8 + 8 = 24$ $24+8+24+8+8=72$ bytes Records: $8192-80\div 72 = 112$ 112 Records a. A “sector” is a physical unit of the disk and a “block” is a logical unit, a creation of whatever software system – operation systems or database systems, for example – is using the disk. As we mentioned, it is typical today for blocks to be at least as large as sectors and to consist of one or more sectors. However, there is no reason why a block cannot be a fraction of a sector, with several blocks packed into one sector. In fact, some older systems did use this strategy. With the block size increases, the of blocks to be accessed for a relational table decreases but the transfer time then increases. b. One block consists of multiple sectors. If these sectors are not sequential, the transfer time will be directly proportional to the RPM which the seek head is able to reach each sector. a. Capacity: $8\times2^{13}\times2^8\times2^9=2^{33}$bytes = 8GB b. 1 round around the track is 256 sectors and 256 gaps, can be completed in $\frac{1}{3840}min$ or 1/64 seconds To navigate 1 sector and 1 gap: $\frac{1}{64\times256}=0.061ms$ Min time to 1 sector and 0 gap: $0.061\times0.9=0.0549ms$ Min time for 8 sector and 7 gaps: 0.482ms Max rotational delay occurs when we need to traverse 256-8 sectors and gaps to find the block Max cylinder access occurs when we need to traverse all the tracks Max time: $0.061\times248+0.482+17.4=33.01ms$ c. There are 8192 cylinders. If access is on cylinder 1000: block access time = average rotational delay If access is on cylinder 1001: block access time = 1 track seek time + avg rotational delay Average cylinder access: (1000+999...+0+1+...+7192)/8192 = 3218 Average cylinder access time: $3218/500+1=7.44ms$ Average rotational delay: $\frac{1}{64\times2}=7.8ms$ Average total block access time: 15.24ms"
Distributed Abstractions.md,1681234663587,---,"--- title: ""Distributed Abstractions"" date: 2023-01-31 lastmod: 2023-03-09 ---"
Distributed Abstractions.md,1681234663587,Distributed Abstractions,Distributed Abstractions The basic building blocks of any distributed system is a set of distributed algorithms. which are implemented as a middleware between network (OS) and the application.
Distributed Abstractions.md,1681234663587,Event based Component Model,"Event based Component Model The distributed computing model consists of a set of processes and a network. Events can be used as messages between components of the same process, which trigger event handlers. Types of events - Requests: incoming to component - Indications: outgoing from component"
Distributed Abstractions.md,1681234663587,Specifications,"Specifications A distributed system specification includes the interface, correctness properties and model/assumptions."
Distributed Abstractions.md,1681234663587,Interface,"Interface This specifies the API, importantly, the requests and events of the service"
Distributed Abstractions.md,1681234663587,Correctness Properties,Correctness Properties Any trace property can be expressed as a *conjunction* of [Safety and Liveliness](Notes/Safety%20and%20Liveliness.md) properties.
Distributed Abstractions.md,1681234663587,Model/Assumptions,Model/Assumptions
Distributed Abstractions.md,1681234663587,Failure assumptions,Failure assumptions Processes that do not fail in an execution are **correct**.
Distributed Abstractions.md,1681234663587,Crash stop failure,Crash stop failure Process is not correct if it stops taking steps like sending and receiving messages.
Distributed Abstractions.md,1681234663587,Omission failure,Omission failure Process is not correct if it omits sending or receiving messages - Send omission: not sending messages according to algorithm - Receive omission: not receiving messages that have been sent to the process
Distributed Abstractions.md,1681234663587,Crash recovery,"Crash recovery Process is not correct if it crashes and - never recover, or - recovers an infinite number of times May recover after crashing with a special recovery event automatically generated"
Distributed Abstractions.md,1681234663587,Byzantine,"Byzantine Process behaves arbitrarily such as sending messages not in its algorithm, or behave maliciously attacking the system. Model B is a special case of model A if a process that works correctly under A, also works correctly under B. - Crash is a special case of omission where all messages are omitted. - Omission is a special case of crash-recovery, as it recovers but does not restore state - Omission == Crash-recovery: where access to volatile memory means some messages after a crash are omitted as it cannot be restored"
Distributed Abstractions.md,1681234663587,Quorums,Quorums A quorum is any set of majority processes (i.e. $\lfloor N/2\rfloor+1$) - Two quorums always intersect in at least 1 process - There is at least 1 quorum with only correct processes - There is at least 1 correct process in each quorum
Distributed Abstractions.md,1681234663587,Channel Failure Modes,Channel Failure Modes
Distributed Abstractions.md,1681234663587,Fair loss links,Fair loss links Channels delivers any message sent with non-zero probability (no network partitions)
Distributed Abstractions.md,1681234663587,Stubborn links,Stubborn links Channels delivers any message sent infinitely many times We can implement stubborn links using fair loss links - sender stores every message it sends in *sent* - periodically resends all messages in *sent*
Distributed Abstractions.md,1681234663587,Perfect Links,Perfect Links Channels that deliver any message sent exactly once.
Distributed Abstractions.md,1681234663587,Timing assumptions,Timing assumptions Processes: bounds on time to make a computation step Network: Bounds on time to transmit a message between a sender and a receiver Clocks: Lower and upper bounds on clock rate-drift and clock skew w.r.t. real time Asynchronous systems: no timing assumption on process and channels Partially synchronous systems: eventually every execution will exhibit synchrony Synchronous systems: build on solid timed operations and clocks
Distributed Abstractions.md,1681234663587,Causality,"Causality In the asynchronous model, we can only reason about the order of events by observing which events may cause other events."
Distributed Abstractions.md,1681234663587,Computation Theorem and equivalence,Computation Theorem and equivalence A permutation of the same collection events whilst preserving causal order are said to be equivalent.
Distributed Abstractions.md,1681234663587,Logical Clocks,"Logical Clocks A logical clock is an algorithm that assigns a timestamp to each event occurring in a distributed system. $$if \ a\rightarrow b, t(a)<t(b)$$"
Distributed Abstractions.md,1681234663587,Lamport clocks:,Lamport clocks: - Note that this does not mean that $t(a)<t(b) \implies a\rightarrow b$. Lesser timestamps does not necessarily mean they are causally related We need to distinguish the total order of events for same timestamps across different processes.
Distributed Abstractions.md,1681234663587,Vector clocks,"Vector clocks We want to tell the causal relation using the timestamps. $$ \begin{align} v(a) < v(b), then\ a\rightarrow_\beta b \\ if\ a\rightarrow_\beta b,v(a)<v(b) \end{align} $$ Limitations - Vectors need to be defined of size n - cannot provide total event ordering"
Distributed Abstractions.md,1681234663587,Similarity,"Similarity If two executions F and E have the same collection of events, and their causal order is preserved, F and E are said to be similar executions, written `F~E`"
Distributed Abstractions.md,1681234663587,[[Failure Detectors]],[[Failure Detectors]]
Distributed Data Management.md,1678370761712,---,"--- title: ""Distributed Data Management"" date: 2023-03-09 ---"
Distributed Data Management.md,1678370761712,Distributed Data Management,Distributed Data Management
Distributed Data Management.md,1678370761712,Distributed Transactions,Distributed Transactions [Transaction Management](Notes/Transaction%20Management.md) for distributed systems. All shards should either commit/abort the same transaction.
Distributed Data Management.md,1678370761712,Atomic Commit,"Atomic Commit The de-facto protocol for atomic commit is *two-phase commit (2PC)* Approach: use 1 process as the *coordinator* (leader). Given a proposed transaction T, commit if all followers agree to commit. Abort if at least one follower aborts/fails. Problem: if the process was to fail after the decision was made by the coordinator it will be unable to apply the changes locally in the shard. - Build a more reliable system by building a replicated state machine within the shard. Replicated log will allow fault tolerance Problem: replicated cluster failure will cause loss of entire log - Perform replication across different clusters. With a replica of each shard in each data centre Problem: coordinator was a single point of failure - Replicate coordinator in the same way for fault tolerance. Second phase of 2PC is no longer needed as each shard can access the local log for decision (abort/commit) without additional broadcast."
Distributed Data Management.md,1678370761712,Distributed Snapshotting,Distributed Snapshotting Capturing the global state of a distributed system.
Distributed Data Management.md,1678370761712,Consistent Cuts,Consistent Cuts Properties: 1. Termination: eventually every process records its state 2. Validity: all recorded states correspond to a consistent cut
Distributed Data Management.md,1678370761712,Chandy Lamport Algorithm,Chandy Lamport Algorithm Approach: disseminate a special marker to mark events during the cut. - Channels and processes record state (add to snapshot) until the marker has been received. E.g. channel incoming to p2 continuously records messages until a marker is passed through it - Snapshot is complete once everyone has seen the marker.
Distributed Data Management.md,1678370761712,Epoch-based Snapshotting,"Epoch-based Snapshotting For continuous data stream processing, it is difficult to log individual task executions. Approach: divide computations into epochs, such as stages, and treat them as 1 transaction. The Chandy Lamport algorithm is not enough, as it will capture a lot of in-flight messages. We want to capture just the states which would in itself reflect the effect of these messages. This is done by *epoch alignment*: 1. Allow all messages to go through until an epoch change marker is introduced 2. On receiving the marker, log the state 3. When a process receiving the marker has multiple channels, prioritise inputs from channels which have not seen the marker until they all see the marker. 4. Terminate once all processes seen the marker."
Dynamic Loading.md,1669012068649,---,"--- title: ""Dynamic Loading"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Dynamic Loading.md,1669012068649,Dynamic Loading,"Dynamic Loading Mechanism for loading a binary and execute functions from external software. Allows program to start up in the absence of these libraries, to discover available libraries, and to potentially gain additional functionality. ```java //java reflection API Class type = Class.forName(name); Object obj = type.newInstance(); ```"
Ensemble Learning.md,1678875778849,---,"--- title: ""Ensemble Learning"" date: 2023-03-02 ---"
Ensemble Learning.md,1678875778849,Ensemble Learning,Ensemble Learning The idea behind ensemble learning is to combine independent and diverse classifiers (high variance low bias) with the hopes of obtaining better predictions i.e. *the variance of the overall model is reduced.*
Ensemble Learning.md,1678875778849,Bootstrap Aggregating (Bagging),"Bootstrap Aggregating (Bagging) Use replicates of the training set by sampling with replacement to train each model. Combine $B$ such models together, by running the test data on each replicate. The classification which received the most ""votes"" from the replicates is the decided value."
Ensemble Learning.md,1678875778849,Random Forest,Random Forest For use in [Decision Trees](Notes/Decision%20Trees.md). Bagging + random feature selection (randomly select a feature to split a node) for every node.
Ensemble Learning.md,1678875778849,Estimating accuracy,Estimating accuracy Out-of-bag error: About 1/3 of the training set is not used in bagging. These data points can be used to test the accuracy of the random forest.
Ensemble Learning.md,1678875778849,Boosting,Boosting Use the data to train a set of weak learners/classifiers. Combine them to create a single strong classifier.
Ensemble Learning.md,1678875778849,Adaboost,"Adaboost Let the sample data $S = \{(x_1,y_1),...(x_m,y_m)\}$ 1. Each sample in the train data is given the same weight: $w_i = \frac{1}{m}$ 2. Train a weak classifier (for example a *stump* which is a decision tree with only 1 fork) using S and their weights. Choose the weak classifier $h$ which minimizes the training error. 3. Compute the **reliability coefficient** $a=log_e(\frac{1-error}{error})$ for the chosen weak classifier. This can be seen as the *amount of say for this classifier*. 4. Use the reliability to scale the weights of each training sample. $w_{t+1}=w_{t}\times e^{-a_tyh_t(x)}$. A misclassification causes the weight to be increased and vice versa. 5. Place more emphasis on correctly classifying the higher weighted sample, e.g. by randomly sampling based on a probability distribution using the weights (higher weight means more likely to be chosen)."
DNS.md,1675798749829,---,"--- title: ""DNS"" date: 2023-01-18 ---"
DNS.md,1675798749829,DNS,"DNS Domain Name System is a distributed database implemented in a hierarchy of DNS servers, and an application layer protocol that allows hosts to query the database."
DNS.md,1675798749829,Features,Features
DNS.md,1675798749829,Address Translation,"Address Translation To identify a host, people prefer the mnemonic hostname identifier such as google.com. However, routers prefer fixed length hierarchically structured IP addresses. DNS's job is to provide the translation between these two references. DNS being employed by HTTP: 1. The same user machine runs the client side of the DNS application. 2. The browser extracts the hostname, www.someschool.edu, from the URL and passes the hostname to the client side of the DNS application. 3. The DNS client sends a query containing the hostname to a DNS server. 4. The DNS client eventually receives a reply, which includes the IP address for the hostname. 5. Once the browser receives the IP address from DNS, it can initiate a TCP connection to the HTTP server process located at port 80 at that IP address"
DNS.md,1675798749829,Host aliasing,"Host aliasing DNS can be invoked by an application to obtain the **canonical hostname** for a supplied alias hostname. A host with a complicated hostname can have one or more alias names. For example, a hostname such as relay1.west-coast .enterprise.com could have two aliases such as enterprise.com and www.enterprise.com. The hostname relay1 .west-coast.enterprise.com is said to be a canonical hostname."
DNS.md,1675798749829,Load distribution,"Load distribution Busy sites can have different servers all replicating the same content, each having their own IP address. DNS stores the entire set of addresses, but is able to rotate their order with each reply. Because the client sends its HTTP request message to the first IP address, this performs load distribution."
DNS.md,1675798749829,How it works,"How it works Rather than having 1 central DNS server which does not scale, DNS servers are distributed and organized in a hierarchical structure: Root -> Top Level Domain -> Authoritative. - Root: provides the IP address of TLD servers. There are 400 root name servers scattered across the world - TLD: com, org, net etc. and all country TLD uk, sg etc. maintained by companies and countries. - Authoritative: houses the DNS records of organization host IP addresses e.g. amazon.com. Can be done in house or outsourced to some service provider - Local: close to the host which acts as a proxy, forwarding queries to the DNS server hierarchy > [!Note] > TLD server may not know directly the authoritative server address, but rather some other intermediate server. In this way, there could be 2 more DNS messages required Query 1 is a recursive query, as it asks to obtain the mapping on its behalf. Subsequent queries are iterative as all replies are directly returned to local DNS server. This is the more typical scenario. There are also queries which are all recursive:"
DNS.md,1675798749829,DNS Caching,"DNS Caching In a query chain, when a DNS server receives a DNS reply (containing, for example, a mapping from a hostname to an IP address), it can cache the mapping in its local memory. This DNS server can provide the desired IP address, even if it is not authoritative for the hostname. Because hosts and mappings between hostnames and IP addresses are by no means permanent, DNS servers discard cached information after a period of time (often set to two days)."
DNS.md,1675798749829,DNS Records,DNS Records
DNS.md,1675798749829,Inserting Records,"Inserting Records A registrar is a commercial entity that verifies the uniqueness of the domain name, enters the domain name into the DNS database and collects a small fee. Example registering domain name `networkutopia.com`, two records are inserted: `(networkutopia.com, dns1.networkutopia.com, NS) `(dns1.networkutopia.com, 212.212.212.1, A)`"
DNS.md,1675798749829,DNS attacks,DNS attacks
DNS.md,1675798749829,Exercises,"Exercises a. 1. Host sends request to local DNS server 2. Local DNS makes a query to the root DNS server 3. Root DNS returns the Top level domain DNS server for ""com"" 4. Local DNS makes query to TLD 5. TLD returns the authoritative name server for ""fws.com"" 6. Local DNS makes query to DNS server for ""fws.com"" 7. Authoritative DNS returns the IP address for ""punchy.fws.com"" 8. Local DNS returns this IP address to the host b. Query 1 is recursive. The rest are iterative c. Yes %%[🖋 Edit in Excalidraw](Pics/Transmission%20Control%20Protocol%202023-02-07%2017.09.11.excalidraw.md), and the [dark exported image](Pics/Transmission%20Control%20Protocol%202023-02-07%2017.09.11.excalidraw.dark.svg)%%"
Distributed Hash Table.md,1674147693577,---,"--- title: ""Distributed Hash Table"" date: 2023-01-19 ---"
Distributed Hash Table.md,1674147693577,Distributed Hash Table,"Distributed Hash Table A DHT is a distributed P2P database. - Each entry is a key-value pair (host, IP address). - A peer queries the DHT with key, and it returns the value matching that key - Peers can also insert pairs For example, it is used in [BitTorrent's distributed tracker](Notes/BitTorrent.md), where the key is a torrent identifier and the value is the set of IP addresses in the torrent."
Distributed Hash Table.md,1674147693577,Hash Table,Hash Table
Distributed Hash Table.md,1674147693577,Distributed (assigning keys to peers),"Distributed (assigning keys to peers) Pairs are evenly distributed among peers, with each peer only knowing a small number of other peers. To resolve a query, a small number of messages are exchanged to obtain the value."
Distributed Hash Table.md,1674147693577,Circular DHT,Circular DHT Each peer is only aware of its immediate successor and predecessor. Average of $N/2$ messages needed.
Distributed Hash Table.md,1674147693577,Shortcuts,Shortcuts
Distributed Hash Table.md,1674147693577,Peer churn,Peer churn
Distributed Shared Memory.md,1678363833985,---,"--- title: ""Distributed Shared Memory"" date: 2023-02-08 ---"
Distributed Shared Memory.md,1678363833985,Distributed Shared Memory,Distributed Shared Memory
Distributed Shared Memory.md,1678363833985,Registers,"Registers A register represents each memory location. The operations are: - write(r,v): update the value of register $x_r$ to v - read(r): return the current value of register $x_r$"
Distributed Shared Memory.md,1678363833985,Trace,"Trace A trace is a sequence of events - $r-inv_i(r)$, $r-res_i(v)$: read invocation by process pi on $x_r$ register and the corresponding response with value $v$ - $w-inv_i(r)$, $w-res_i(v)$: write invocation by process pi on $x_r$ register and the corresponding response with value $v$"
Distributed Shared Memory.md,1678363833985,Properties,"Properties Well-formed - First event of every process is an invocation - Each process alternates between invocations and responses Sequential - 𝑥-inv by i immediately followed by a corresponding 𝑥-res at i - 𝑥-res by i immediately follows a corresponding 𝑥-inv by i, i.e. no concurrency, read x by p1, write y by p5, ... Legal - T is sequential - Each read to Xr returns last value written to register Xr Complete - Every operation is complete - Otherwise T is partial - An operation O of a trace T is - complete if both invocation & response occurred in T - pending if O invoked, but no response Precedence - op1 precedes op2 in a trace T if (denoted <T) - Response of op1 precedes invocation of op2 in T - op1 and op2 are concurrent if neither precedes the other"
Distributed Shared Memory.md,1678363833985,Regular Register Algorithms,"Regular Register Algorithms A regular register is one that meets the following criteria: Termination - Each read/write operation issued by a correct process eventually completes. Validity - Read returns last value written if - Not concurrent with another write, and - Not concurrent with a failed write - Otherwise may return last or concurrent “value”"
Distributed Shared Memory.md,1678363833985,Fail-Stop Read-one Write-All,"Fail-Stop Read-one Write-All Uses [perfect failure detector P](Notes/Failure%20Detectors.mdPerfect%20failure%20detector): write(v) 1. Update local value to v 2. [Fail Stop Broadcast](Notes/Broadcast%20Abstractions.mdFail%20Stop) v to all, and each node locally updates to v: 1 RTT needed 3. Wait for ACK from all correct processes 4. Return: this return means that all processes have updated locally to v, validity is ensured! read 1. Return local value: 0 RTT needed [Eventually perfect failure detector](Notes/Failure%20Detectors.mdEventually%20perfect%20failure%20detector) will not work here as it might falsely suspect some processes as having crashed. During this time, since a write on another process only waits for ACKs from all correct processes, it could return early. A read on the falsely suspected process will incorrectly return the old value."
Distributed Shared Memory.md,1678363833985,Fail silent Majority voting,"Fail silent Majority voting Make use of timestamp-value pairs, tvp = (ts, v), where the timestamp can be used to determine which value is more recent. Each process stores the value of register r with max timestamp of each register r. Majority idea is based of [quorums](Notes/Distributed%20Abstractions.mdQuorums). - Read and write operation reads from quorums, this means at least 1 process knows the most recent value - 1 RTT is needed - 1 RTT is needed"
Distributed Shared Memory.md,1678363833985,Sequential Consistency,"Sequential Consistency Allows executions whose results appear as if the operations of each processes were executed in some sequential order according to ""local time"" (we can reorder operations across processes but not locally):"
Distributed Shared Memory.md,1678363833985,Liveness requirements,"Liveness requirements - Wait-free: no deadlocks, no livelocks, no starvation - Lock-free: no deadlock, no livelocks, maybe starvation - Obstruction-free: no deadlock, maybe livelocks and starvation"
Distributed Shared Memory.md,1678363833985,Register Linearizability/Atomicity,"Register Linearizability/Atomicity Allows executions whose results appear as if the operations of each processes were executed in some sequential order according to ""global time"" (cannot reorder):"
Distributed Shared Memory.md,1678363833985,Read/Write Majority Problem,Read/Write Majority Problem
Distributed Shared Memory.md,1678363833985,Read-Impose Write Majority,Read-Impose Write Majority
Distributed Shared Memory.md,1678363833985,Extending to N readers N writers (Read-impose Write-consult-majority),"Extending to N readers N writers (Read-impose Write-consult-majority) Problem: Before writing, read from majority to get the latest timestamp (query phase before update phase): 2 RTT needed"
Distributed Shared Memory.md,1678363833985,Eventual Consistency,"Eventual Consistency State updates can be issued at any replica/correct process. All updates are disseminated via BEB, RB,... - Each correct process that receives all updates should deterministically converge to the same state. - Eventually every correct process should receive all updates... - Problem: When can a process know it has received all updates??"
Distributed Shared Memory.md,1678363833985,Strong Eventual Consistency,"Strong Eventual Consistency If state operations are **commutative** and processes exchange information, eventually they converge to an identical view."
Distributed Shared Memory.md,1678363833985,Conflict Free Replicated Data Types (CRDTs),"Conflict Free Replicated Data Types (CRDTs) Data structures which implement strong eventual consistency. The join operation allows there to be a commutative operation relationship between sets. However, operations need to have a strict monotonically increasing effect on the set."
Distributed Shared Memory.md,1678363833985,State Based CRDT (CvRDT),State Based CRDT (CvRDT)
Distributed Shared Memory.md,1678363833985,Grow-Only Counter,Grow-Only Counter
Distributed Shared Memory.md,1678363833985,Up-Down Counter,Up-Down Counter
Distributed Shared Memory.md,1678363833985,Or-Set,Or-Set
Distributed Shared Memory.md,1678363833985,Operation Based CRDTs (CmRDTs),Operation Based CRDTs (CmRDTs) CmRDTs impose stricter assumptions. Causally dependent updates are replaced with [Causal Broadcast](Notes/Broadcast%20Abstractions.mdCausal%20Broadcast) and the join function is replaced with any commutative update function. - Less states and IO required (only the operations are broadcasted) - More restrictions to programming model leading to less flexibility
Distributed Shared Memory.md,1678363833985,Or-Set,Or-Set
Event-B.md,1681336639744,---,"--- title: ""Event-B"" date: 2023-03-28 ---"
Event-B.md,1681336639744,Event-B,Event-B A formal specification framework based on [Set Theory](Notes/Set%20Theory.md).
Event-B.md,1681336639744,Abstract Machine Notation,Abstract Machine Notation
Event-B.md,1681336639744,Syntax,Syntax
Event-B.md,1681336639744,Context,Context
Event-B.md,1681336639744,Machine,Machine
Event-B.md,1681336639744,Events,Events
Event-B.md,1681336639744,Actions,Actions
Event-B.md,1681336639744,Examples,Examples
Event-B.md,1681336639744,University Access,University Access A system for controlling access to a university building - An university has some fixed number of students. - Students can be inside or outside the university building. - The system should allow a new student to be registered in order to get the access to the university building. - To deny the access to the building for a student the system should support deregistration. - The system should allow only registered students to enter the university building.
Event-B.md,1681336639744,Coffee Club,Coffee Club
Event-B.md,1681336639744,Printer Access,Printer Access - A system should support adding a permission for a student in order to get an access to a particular printer and removing a permission. - A system should support removing a student’s access to all printers at once. - A system should support giving the combined permissions of any two students to both of them.
Event-B.md,1681336639744,Requirements Document,Requirements Document
Event-B.md,1681336639744,Modelling,"Modelling - To keep track of changing permissions, it will make use of a variable access whose type is a relation between STUDENTS and PRINTERS."
Event-B.md,1681336639744,New requirement: a student can use no more than 3 printers,New requirement: a student can use no more than 3 printers
Event-B.md,1681336639744,Seat Booking System,Seat Booking System
Factory Pattern.md,1676224459139,---,"--- title: ""Factory Pattern"" date: 2022-11-08 lastmod: 2023-02-12 ---"
Factory Pattern.md,1676224459139,Factory Pattern,Factory Pattern
Factory Pattern.md,1676224459139,Problems we want to solve,Problems we want to solve 1. Decouple class selection and object creation from the place where the object is used. 2. Need to instantiate a set of classes but without knowing exactly which one until runtime. 3. Do not want to expose object creation logic to the client. The interface and concrete product classes implement an additional [Strategy Pattern](Notes/Strategy%20Pattern.md) design which allows the algorithms to be instantiated and changed during runtime.
Factory Pattern.md,1676224459139,Pros,Pros 1. Encapsulation of object creation 2. Extensibility of classes 3. Can easily change object creation logic without affecting context due to decoupling
Factory Pattern.md,1676224459139,Cons,Cons 1. Complexity
Exceptions.md,1688266255058,---,"--- title: ""Exceptions"" date: 2023-07-01 ---"
Exceptions.md,1688266255058,Exceptions,"Exceptions > [!Traps] A trap is a CPU generated interrupt caused by a software error or a request: > - __unhandled exceptions in a program used to transfer control back to the [OS](2005%20Operating%20Systems.md) __ > - user programs requesting execution of system calls which needs the OS CPU exceptions occur in various erroneous situations, for example, when accessing an invalid memory address or when dividing by zero. To react to them, we have to set up an interrupt descriptor table that provides handler functions. On x86, there are about 20 different CPU exception types. The most important are: - Page Fault: A page fault occurs on illegal memory accesses. For example, if the current instruction tries to read from an unmapped page or tries to write to a read-only page. - Invalid Opcode: This exception occurs when the current instruction is invalid, for example, when we try to use new SSE instructions on an old CPU that does not support them. - General Protection Fault: This is the exception with the broadest range of causes. It occurs on various kinds of access violations, such as trying to execute a privileged instruction in user-level code or writing reserved fields in configuration registers. - Double Fault: When an exception occurs, the CPU tries to call the corresponding handler function. If another exception occurs while calling the exception handler, the CPU raises a double fault exception. This exception also occurs when there is no handler function registered for an exception. - Triple Fault: If an exception occurs while the CPU tries to call the double fault handler function, it issues a fatal triple fault. We can’t catch or handle a triple fault. Most processors react by resetting themselves and rebooting the operating system."
Exceptions.md,1688266255058,Interrupt Descriptor Table,"Interrupt Descriptor Table The protected mode counterpart to the [interrupt vector table](Notes/Interrupts.mdInterrupt%20Service%20Routine). Each index contains bytes needed to run handlers for different exceptions. For example, the divide by 0 handler should go in the 0 index."
Exceptions.md,1688266255058,Interrupt Calling Convention,"Interrupt Calling Convention Calling conventions specify the details of a function call. For example, they specify where function parameters are placed (e.g. in registers or on the stack) and how results are returned. On x86_64 Linux, the following rules apply for C functions (specified in the System V ABI): - the first six integer arguments are passed in registers rdi, rsi, rdx, rcx, r8, r9 - additional arguments are passed on the stack - results are returned in rax and rdx"
Exceptions.md,1688266255058,Preserved and Scratch Registers,"Preserved and Scratch Registers The values of preserved registers must remain unchanged across function calls. So a called function (the “callee”) is only allowed to overwrite these registers if it restores their original values before returning. Therefore, these registers are called “callee-saved”. A common pattern is to save these registers to the stack at the function’s beginning and restore them just before returning. In contrast, a called function is allowed to overwrite scratch registers without restrictions. If the caller wants to preserve the value of a scratch register across a function call, it needs to backup and restore it before the function call (e.g., by pushing it to the stack). So the scratch registers are caller-saved."
Exceptions.md,1688266255058,x86-interrupt convention Preserving All Registers,"x86-interrupt convention Preserving All Registers In contrast to function calls, exceptions can occur on any instruction. Since we don’t know when an exception occurs, we can’t backup any registers before. This means we can’t use a calling convention that relies on caller-saved registers for exception handlers. The `x86-interrupt` calling convention does this by backing up registers overwritten by the function on function entry."
Exceptions.md,1688266255058,Interrupt Stack Frame,"Interrupt Stack Frame A normal function call stack frame (the return address is pushed to the stack to allow the CPU to return back to the caller): An interrupt stack frame: - **Saving the old stack pointer**: The CPU reads the stack pointer (`rsp`) and stack segment (`ss`) register values and remembers them in an internal buffer. - **Aligning the stack pointer**: An interrupt can occur at any instruction, so the stack pointer can have any value, too. However, some CPU instructions (e.g., some SSE instructions) require that the stack pointer be aligned on a 16-byte boundary, so the CPU performs such an alignment right after the interrupt. - **Switching stacks** (in some cases): A stack switch occurs when the CPU privilege level changes, for example, when a CPU exception occurs in a user-mode program. It is also possible to configure stack switches for specific interrupts using the so-called _Interrupt Stack Table_ (described in the next post). - **Pushing the old stack pointer**: The CPU pushes the `rsp` and `ss` values from step 0 to the stack. This makes it possible to restore the original stack pointer when returning from an interrupt handler. - **Pushing and updating the `RFLAGS` register**: The [`RFLAGS`](https://en.wikipedia.org/wiki/FLAGS_register) register contains various control and status bits. On interrupt entry, the CPU changes some bits and pushes the old value. - **Pushing the instruction pointer**: Before jumping to the interrupt handler function, the CPU pushes the instruction pointer (`rip`) and the code segment (`cs`). This is comparable to the return address push of a normal function call. - **Pushing an error code** (for some exceptions): For some specific exceptions, such as page faults, the CPU pushes an error code, which describes the cause of the exception. - **Invoking the interrupt handler**: The CPU reads the address and the segment descriptor of the interrupt handler function from the corresponding field in the IDT. It then invokes this handler by loading the values into the `rip` and `cs` registers"
Exceptions.md,1688266255058,Double Faults,"Double Faults A double fault is a special exception that occurs when the CPU fails to invoke an exception handler. For example, it occurs when a page fault is triggered but there is no page fault handler registered in the IDT. So it’s kind of similar to catch-all blocks in programming languages with exceptions. Only certain combinations of exceptions can trigger double faults: |First Exception|Second Exception| |---|---| |[Divide-by-zero](https://wiki.osdev.org/ExceptionsDivision_Error), <br>[Invalid TSS](https://wiki.osdev.org/ExceptionsInvalid_TSS), <br>[Segment Not Present](https://wiki.osdev.org/ExceptionsSegment_Not_Present), <br>[Stack-Segment Fault](https://wiki.osdev.org/ExceptionsStack-Segment_Fault), <br>[General Protection Fault](https://wiki.osdev.org/ExceptionsGeneral_Protection_Fault)|[Invalid TSS](https://wiki.osdev.org/ExceptionsInvalid_TSS), <br>[Segment Not Present](https://wiki.osdev.org/ExceptionsSegment_Not_Present), <br>[Stack-Segment Fault](https://wiki.osdev.org/ExceptionsStack-Segment_Fault), <br>[General Protection Fault](https://wiki.osdev.org/ExceptionsGeneral_Protection_Fault)| |[Page Fault](https://wiki.osdev.org/ExceptionsPage_Fault)|[Page Fault](https://wiki.osdev.org/ExceptionsPage_Fault), <br>[Invalid TSS](https://wiki.osdev.org/ExceptionsInvalid_TSS), <br>[Segment Not Present](https://wiki.osdev.org/ExceptionsSegment_Not_Present), <br>[Stack-Segment Fault](https://wiki.osdev.org/ExceptionsStack-Segment_Fault), <br>[General Protection Fault](https://wiki.osdev.org/ExceptionsGeneral_Protection_Fault)| A double fault must be handled properly, else some cases can easily transition into a triple fault causing a system reset. Kernel stack overflow is one of them."
Exceptions.md,1688266255058,Kernel Stack Overflow,"Kernel Stack Overflow What happens if our kernel overflows its stack and the guard page is hit? - A guard page is a special memory page at the bottom of a stack that makes it possible to detect stack overflows. The page is not mapped to any physical frame, so accessing it causes a page fault instead of silently corrupting other memory. The bootloader sets up a guard page for our kernel stack, so a stack overflow causes a page fault. - When a page fault occurs, the CPU looks up the page fault handler in the IDT and tries to push the interrupt stack frame onto the stack. However, the current stack pointer still points to the non-present guard page. Thus, a second page fault occurs, which causes a double fault (according to the above table). - So the CPU tries to call the double fault handler now. However, on a double fault, the CPU tries to push the exception stack frame, too. The stack pointer still points to the guard page, so a third page fault occurs, which causes a triple fault and a system reboot. So our current double fault handler can’t avoid a triple fault in this case."
Exponential Distribution.md,1678919951832,---,"--- title: ""Exponential Distribution"" date: 2023-03-15 ---"
Exponential Distribution.md,1678919951832,Exponential Distribution,Exponential Distribution $$f(x)=\lambda e^{-\lambda x}$$
Façade Pattern.md,1676224453768,---,"--- title: ""Façade Pattern"" date: 2022-11-08 lastmod: 2023-02-12 ---"
Façade Pattern.md,1676224453768,Façade Pattern,Façade Pattern
Façade Pattern.md,1676224453768,Problems we want to solve,Problems we want to solve 1. Decoupling of object interactions 2. Open-closed principle 3. Least Knowledge principle
Façade Pattern.md,1676224453768,Pros,Pros 1. Decouples client from complex system logic 2. Reduces dependencies on classes: favour composition over inheritance
Façade Pattern.md,1676224453768,Cons,Cons 1. Complexity and possible rework
Failure Detectors.md,1678318308808,---,"--- title: ""Failure Detectors"" date: 2023-01-31 lastmod: 2023-03-08 ---"
Failure Detectors.md,1678318308808,Failure Detectors,Failure Detectors
Failure Detectors.md,1678318308808,Failure Models,Failure Models - Fail-stop: can reliably detect failure (achievable in synchronous model) - Fail-noisy: can eventually detect failure (achievable in partially synchronous model) - Fail-silent: cannot detect between a crash or omission failure (asynchronous model)
Failure Detectors.md,1678318308808,Properties,Properties
Failure Detectors.md,1678318308808,Completeness,Completeness No false negatives: all failed processes are suspected *Asynchrony: suspect every node to achieve completeness*
Failure Detectors.md,1678318308808,Strong completeness,Strong completeness
Failure Detectors.md,1678318308808,Weak completeness,Weak completeness
Failure Detectors.md,1678318308808,Accuracy,Accuracy No false positives: correct processes are not suspected *Asynchrony: suspect 0 nodes to achieve accuracy*
Failure Detectors.md,1678318308808,Strong accuracy,Strong accuracy No correct process is ever suspected
Failure Detectors.md,1678318308808,Weak accuracy,Weak accuracy There exists a correct process *P* which is never suspected by any process
Failure Detectors.md,1678318308808,Eventual Strong accuracy,"Eventual Strong accuracy After some time, strong accuracy achieved, prior to this, any behaviour possible. - This does not satisfy weak accuracy, as before achieving strong accuracy, any behaviour allowed"
Failure Detectors.md,1678318308808,Eventual Weak accuracy,"Eventual Weak accuracy After some time, weak accuracy achieved, prior to this, any behaviour possible."
Failure Detectors.md,1678318308808,Perfect failure detector,"Perfect failure detector Only implementable in the [Synchronous system](2203%20Distributed%20Systems.mdSynchronous%20system), else there will be some incorrectly suspected processes while figuring out the actual delay."
Failure Detectors.md,1678318308808,Eventually perfect failure detector,"Eventually perfect failure detector How to achieve strong accuracy? Each time p is inaccurately suspected by a correct q - Timeout T is increased at q - Eventually system becomes synchronous, and T becomes larger than the unknown bound δ (T>γ+δ) - q will receive HB on time, and never suspect p again"
Failure Detectors.md,1678318308808,Leader Election,"Leader Election We want all processes to detect a single and same correct process. To do so, we need to define the set of failed processes (using a FD)"
Failure Detectors.md,1678318308808,Why local accuracy?,Why local accuracy?
Failure Detectors.md,1678318308808,Implementation,Implementation
Failure Detectors.md,1678318308808,Eventual Leader Election,Eventual Leader Election
Failure Detectors.md,1678318308808,Reductions,Reductions
Failure Detectors.md,1678318308808,Strong completeness equivalent to weak completeness,"Strong completeness equivalent to weak completeness - If strong accuracy, no one is ever inaccurate, reduction never spreads inaccurate Susp - If weak accuracy, everyone is accurate about at least one process p, no one spreads inaccurate information about p"
Failure Detectors.md,1678318308808,Eventual leader election $\Omega\equiv \diamond S$,"Eventual leader election $\Omega\equiv \diamond S$ Implement S using $\Omega$: - Strong completeness $\equiv$ weak completeness: if we suspect everyone (except the leader which we know is correct), every process suspects all incorrect processes - Eventual weak accuracy: if we only trust the leader, there exists 1 correct process not suspected"
Failure Detectors.md,1678318308808,Summary,Summary
Fibonacci Sequence.md,1669012068637,---,"--- title: ""Fibonacci Sequence"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Fibonacci Sequence.md,1669012068637,Fibonacci Sequence,Fibonacci Sequence
Fibonacci Sequence.md,1669012068637,Problem Formulation,Problem Formulation The Fibonacci Series: $$F_i=F_{i-1}+F_{i-2} $$
Fibonacci Sequence.md,1669012068637,Strategy,Strategy
Fibonacci Sequence.md,1669012068637,Pseudocode,Pseudocode
First Order Logic.md,1669012068633,---,"--- title: ""First Order Logic"" date: 2022-11-08 lastmod: 2022-11-21 ---"
First Order Logic.md,1669012068633,First Order Logic,First Order Logic [Propositional Logic](Notes/Propositional%20Logic.md) can only deal with a finite number of propositions: T: Tommy is faithful J: Jimmy is faithful L: Laika is faithful $All\ dogs\ are\ faithful\iff T\land J\land L$ What if there is an infinite/unknown number of dogs?
First Order Logic.md,1669012068633,Forming FOL sentences,"Forming FOL sentences “All dogs are mammals"" General form: $\forall xDog(x)\implies Mammal(x)$ Use conjunction? $\forall x Dog(x) \land Mammal(x)$ : **this is means everything is a dog and a mammal!** ""John owns a dog"" General form: $\exists x Dog(x)\land Owns(John,x)$ Use implication? $\exists xDog(x)\implies Owns(John,x)$: **this can mean that John owns things which are not dogs as well**"
First Order Logic.md,1669012068633,Inference Rules,Inference Rules Using substitutions is also called _Generalized Modus Ponens_. The substitution used is called the _unifier_.
First Order Logic.md,1669012068633,Getting to CNF,"Getting to CNF $$\begin{align} \exists xStudent(x)\land \neg Takes(x,AI)\tag1\equiv Student(K)\land\neg Takes(K,AI) \\ \tag2 \exists xStudent(x)\land Takes(x,AI)\land\neg pass(x,AI)\equiv \\ Student(F)\land Takes(F,AI)\land \neg pass(F,AI)\tag3\\ \forall x,y \neg Student(x)\lor\neg pass(x,y)\lor\neg hard(y)\lor diligent(x)\models\\\tag4\neg Student(x)\lor\neg pass(x,y)\lor\neg hard(y)\lor diligent(x) \\ 3+4:\neg pass(x,y)\lor\neg hard(y)\lor diligent(x)\tag5 \\ \tag6 3+4+5: Takes(x,AI)\lor\neg hard(AI)\\ 6+Subst\{x/K\}\ Takes(K,AI)\lor\neg hard(y)\tag7\\ 1+7: \neg hard(AI)\tag8\\ 8+iv:\emptyset \end{align}$$"
File Systems.md,1681221415486,---,"--- title: ""File Systems"" tags: [question] date: 2022-11-08 lastmod: 2022-11-21 ---"
File Systems.md,1681221415486,File Systems,File Systems
File Systems.md,1681221415486,File,File A file is an unstructured sequence of bytes. Each byte is individually addressable from the beginning of the file.
File Systems.md,1681221415486,File access,File access - Sequential: information is processed from the beginning of the file one byte after the other - Direct access: bytes can be read in any order by referencing the byte number
File Systems.md,1681221415486,Protection,"Protection - Owner: Permissions used by the assigned owner of the file or directory - Group: Permissions used by members of the group that owns the file or directory - Other: Permissions used by all users other than the file owner, and members of the group that owns the file or the directory"
File Systems.md,1681221415486,Data Structures,Data Structures
File Systems.md,1681221415486,File Control Block,File Control Block
File Systems.md,1681221415486,Open File Table,"Open File Table `open()` syscall: - First searches the system wide OFT to see if it is being used by another process. If it is, per process open file table entry is created pointing to this. - If not, the directory structure is searched for the file. FCB is copied to the system wide OFT. Entry is made in per process OFT and a pointer to the entry is returned. The open file tables saves substantial overhead by serving as a cache for the FCB. Data blocks are *not* kept in memory, instead, when the process is closed, the FCB entry is removed and the updated data is copied back to the disk."
File Systems.md,1681221415486,File Descriptor,File Descriptor A file descriptor is a non-negative integer which indexes into a per-process file descriptor table which is maintained by the kernel. This in turn indexes into a system-wide open file table. It also indexes into the inode table that describes the actual underlying files. All operations are done on the file descriptor
File Systems.md,1681221415486,Storage allocation,"Storage allocation File-Organisation Module: allocates storage space for files, translates logical block addresses to physical block addresses, and manages free disk space."
File Systems.md,1681221415486,Contiguous,"Contiguous Each file occupies a set of contiguous blocks on the disk. Advantages: - Simple as only starting location and length is required - Supports random access Disadvantages: - Finding a hole big enough may result in external fragmentation - File space is constricted by size of the hole, it might need to be moved to a bigger hole in the future - If file space is overestimated there will be internal fragmentation > [! The delete operation] > Deleting a data block stored with contiguous allocation requires shifting of the data blocks. > *e.g. Delete data block 5 in a file with 10 data blocks*: > i.e. Read block 6, write block 5 with data from block 6, read block 7 and write block 6 with block 7 etc."
File Systems.md,1681221415486,Linked,"Linked Each file is a linked list of disk blocks and the blocks may be scattered anywhere on the disk. Advantages: - Simple as only need starting address - No wasted space and no constraint on file size Disadvantages: - No random access *Assuming 4 bytes is reserved for the pointer to the next block:* Why displacement need to + 4? question i think maybe the first 4 bytes is used for the pointer, so displacement needs to +4 to skip the pointer address. > [! The delete operation] > Deleting a data block stored with linked allocation requires an update to the connected pointer. > *e.g. Delete data block 5 in a file with 10 data blocks*: > 6 reads to reach block 5. 1 write to update the pointer of block 4 to block 6"
File Systems.md,1681221415486,Indexed allocation,"Indexed allocation Each file has an index block which contains all pointers to the allocated blocks. Directory entry contains the block number of the index block. Similar to a [page table](Notes/Memory%20Organisation.md^b8969e) for memory allocation. Advantages: - Supports random access - Dynamic storage allocation without external fragmentation Disadvantages: - Overhead in keeping index blocks - Internal fragmentation as the last block that the index is pointing to may not be fully utilised > [! The delete operation] > Deleting a data block stored with indexed allocation requires an update to the indexed pointers. > *e.g. Delete data block 5 in a file with 10 data blocks*: > 1 read for the index block, 4 writes to update pointers"
File Systems.md,1681221415486,inode,"inode An inode is an index block. For each file and directory there is an inode. The inode contains file attributes, 12 pointers to direct blocks (data blocks) and 3 pointers point to indirect blocks (index blocks) with 3 levels of indirection. Indirection allows the system to support large file sizes: Using the inode:"
File Systems.md,1681221415486,Directories,Directories
File Systems.md,1681221415486,Structure,"Structure A directory can be structured in two ways: 1. each entry contains a file name and other attributes 2. each entry contains a file name and a pointer to another data structure where file attributes can be found > [!Disk reads when navigating a directory] > Assume that root directory is in memory > Open(“/usr/ast/mbox”) will require 5 disk reads > 1. load inode of “usr” > 2. load data block of “usr” (i.e., directory “usr”) > 3. load inode for “ast” > 4. load data block of “ast” (i.e., directory “ast”) > 5. load inode for “mbox”"
File Systems.md,1681221415486,Tree Structured,"Tree Structured Path Name - Absolute Path Name: begins at the root and follows a path down to the specific file, e.g., /spell/mail/prt/first - Relative Path Name: Defines a path from the current directory, e.g. current directory is: /spell/mail relative path name for the above file is: prt/first Characteristics: - Efficient Searching: File can be easily located according to the path name. - Naming: Files can have the same name under different directories. - Grouping: files can be grouped logically according to their properties"
File Systems.md,1681221415486,Acyclic Graph Directories,Acyclic Graph Directories The tree structure prohibits sharing of files or directories while an acyclic graph allows this. It is a natural generalisation of the tree structure.
File Systems.md,1681221415486,Links as a UNIX implementation,"Links as a UNIX implementation A link is a directory entry which is a poitner to another file or subdirectory - A hard link points to the data on storage, while a soft link can point to another link which points to information on storage. - Both linking strategies allow a separate file name to be used for the source file name. This source file name will resolve to the target file data by following the link."
File Systems.md,1681221415486,What happens on deletion?,"What happens on deletion? One possibility is to remove the file whenever anyone deletes it, but this action may leave dangling pointers to the now-nonexistent file. Worse, if the remaining file pointers contain actual disk addresses, and the space is subsequently reused for other files, these dangling pointers may point into the middle of other files. Soft links - Search for liks and remove them: expensive unless a list of links is kept with the file OR - Leave the links and remove them only when trying to access them Hard links - Preserve file unless all references are deleted. A count to the number of references is maintained in the file (a new link ++, deleting a link--)."
File Systems.md,1681221415486,Why not just duplicate the file?,Why not just duplicate the file? Duplicate directory entries make the original and the copy indistinguishable. A major problem with this is maintaining consistency when a file is modified
File Systems.md,1681221415486,Disk Space Management,"Disk Space Management Block size affects both data rate and disk space utilisation - Big block size: file fits into few blocks resulting in fast to find & transfer blocks, but wastes space if file does not occupy the entire last block - Small block size: file may consist of many blocks resulting in slow data rate"
File Systems.md,1681221415486,Managing Free Blocks,Managing Free Blocks There is a need to track which blocks are free in order to allocate disk space to files
File Systems.md,1681221415486,Bitmap,"Bitmap Each block is represented by 1 bit, 1 (free) and 0 (allocated) Advantage: - Simple and efficient to find the first free block via bit manipulation. i.e. Find the first non-0 word, and find the first bit 1 in the word. Disadvantage: - Takes up additional space as each block requires 1 bit - Inefficient to look up this bitmap unless the entire map is kept in memory"
File Systems.md,1681221415486,Linked list,"Linked list The pointer to the next block is stored in the block itself, hence to read the entire list, each block must be read sequentially requiring substantial I/O time."
File Systems.md,1681221415486,Practice Problems,"Practice Problems a. False. Owner and the group which owner belongs to is able to read. b. False. The OFT caches the FCB rather than the data block. c. False. Using linked file allocation, any free data block can be used. a. The previous links will now point to the data of the new file. To avoid this, dangling links need to be cleaned up. b. Single copy - Race conditions, mutual exclusion Multiple copy - Storage waste - Inconsistency a. 5 disk accesses 1. Load inode of usr 2. Load directory for usr 3. Load inode for ast 4. Load directory for ast 5. Load inode for mbox b. Seek: no disk reads needed Current position is 5900: Logical block 5, byte 900. read(100): 1 disk read by following direct pointer read(200): 2 disk read by following single indirect pointer 3 disk reads total c. Number of pointers in 1 index block = $1000/2=500$ File size supported = $(6+500) \times 1000=506,000B$ File data can be stored across different physical storage blocks. A smaller physical block helps to reduce internal fragmentation as the last block occupied by the file can is only 512B compared to 4KB. Using the larger block size would also help to improve throughput."
Finite State Machines.md,1675247864731,---,"--- title: ""Finite State Machines"" date: 2023-02-01 ---"
Finite State Machines.md,1675247864731,Finite State Machines,Finite State Machines
Functions.md,1685696717882,---,"--- title: ""Functions"" date: 2023-05-23 lastmod: 2023-05-23 ---"
Functions.md,1685696717882,Functions,"Functions The name of the function being called isn’t actually part of the call syntax. The thing being called, *the callee*, can be any expression that evaluates to a function. `getCallback()();` The first pair of parentheses has `getCallback` as its callee. But the second call has the entire `getCallback()` expression as its callee. It is the parentheses following an expression that indicate a function call. You can think of a call as sort of like a postfix operator that starts with `(`. Updating our grammar: ``` unary → ( ""!"" | ""-"" ) unary | call ; call → primary ( ""("" arguments? "")"" )* ; arguments → expression ( "","" expression )* ; ``` We can say that a function is one that implements an interface: ```java interface LoxCallable { int arity(); Object call(Interpreter interpreter, List<Object> arguments); ``` Interpreting function calls: ```java @Override public Object visitCallExpr(Expr.Call expr) { Object callee = evaluate(expr.callee); List<Object> arguments = new ArrayList<>(); for (Expr argument : expr.arguments) { arguments.add(evaluate(argument)); } LoxCallable function = (LoxCallable)callee; return function.call(this, arguments); } ```"
Functions.md,1685696717882,Currying,"Currying Named after Haskell Curry, the rule uses `*` to allow matching a series of calls like `fn(1)(2)(3)`. In this style, defining a function that takes multiple arguments is as a series of nested functions. Each function takes one argument and returns a new function. That function consumes the next argument, returns yet another function, and so on."
Functions.md,1685696717882,Arity,"Arity Arity is the fancy term for the number of arguments a function or operation expects. Unary operators have arity one, binary operators two, etc. With functions, the arity is determined by the number of parameters it declares."
Functions.md,1685696717882,Native Functions,"Native Functions **Primitives**, **external functions**, or **foreign functions**. They are functions that the interpreter exposes to user code but that are implemented in the host language (in our case Java), not the language being implemented (Lox). They provide access to the fundamental services that all programs are defined in terms of. If you don’t provide native functions to access the file system, a user’s going to have a hell of a time writing a program that reads and displays a file. Add a new globals environment which will store all the native methods in fixed reference to the global scope: ```java class Interpreter implements Expr.Visitor<Object>, Stmt.Visitor<Void> { final Environment globals = new Environment(); private Environment environment = globals; Interpreter() { globals.define(""clock"", new LoxCallable() { @Override public int arity() { return 0; } @Override public Object call(Interpreter interpreter, List<Object> arguments) { return (double)System.currentTimeMillis() / 1000.0; } @Override public String toString() { return ""<native fn>""; } }); } ```"
Functions.md,1685696717882,Function declaration,"Function declaration Updated grammar: ``` declaration → funDecl | varDecl | statement ; funDecl → ""fun"" function ; function → IDENTIFIER ""("" parameters? "")"" block ; parameters → IDENTIFIER ( "","" IDENTIFIER )* ; ```"
Functions.md,1685696717882,Function Objects,"Function Objects The implementation of the call method is as follows: ```java @Override public Object call(Interpreter interpreter, List<Object> arguments) { Environment environment = new Environment(interpreter.globals); for (int i = 0; i < declaration.params.size(); i++) { environment.define(declaration.params.get(i).lexeme, arguments.get(i)); } interpreter.executeBlock(declaration.body, environment); return null; } ``` 1. Functions should encapsulate its parameters, meaning no code outside the function should be able to see them. Create a new environment with access to global environment. 2. Bind the params to the values based in as arguments 3. Execute the body of the function in a block"
Functions.md,1685696717882,Call frames,"Call frames The compiler allocates stack slots for local variables. How should that work when the set of local variables in a program is distributed across multiple functions? We solved this above, by dynamically allocating memory for an environment each time a function was called or a block entered. In clox, we don’t want that kind of performance cost on every function call."
Functions.md,1685696717882,Naive Static Option,"Naive Static Option One option would be to keep them totally separate. Each function would get its own dedicated set of slots in the VM stack that it would own forever, even when the function isn’t being called. Each local variable in the entire program would have a bit of memory in the VM that it keeps to itself."
Functions.md,1685696717882,Frame Pointers,"Frame Pointers When a function is called, we don’t know where the top of the stack will be because it can be called from different contexts. But, wherever that top happens to be, we do know where all of the function’s local variables will be relative to that starting point. ```java fun first() { var a = 1; second(); var b = 2; second(); } fun second() { var c = 3; var d = 4; } first(); ``` At the beginning of each function call, the VM records the location of the first slot where that function’s own locals begin. The instructions for working with local variables access them by a slot index relative to that, instead of relative to the bottom of the stack like they do today. At compile time, we calculate those relative slots. At runtime, we convert that relative slot to an absolute stack index by adding the function call’s starting slot."
Functions.md,1685696717882,Return Addresses,"Return Addresses For each live function invocation—each call that hasn’t returned yet—we need to track where on the stack that function’s locals begin, and where the caller should resume. Again, thanks to recursion, there may be multiple return addresses for a single function, so this is a property of each invocation and not the function itself."
Functions.md,1685696717882,Return Statements,"Return Statements In Lox, the body of a function is a list of statements which don’t produce values, so we need dedicated syntax for emitting a result. ``` statement → exprStmt | forStmt | ifStmt | printStmt | returnStmt | whileStmt | block ; returnStmt → ""return"" expression? "";"" ; ```"
Functions.md,1685696717882,Closures,"Closures Because the interpreter does not keep the environment surrounding a function around, a closure is essentially a data structure which helps to hold onto surrounding variables where the function is declared. ```java fun makeCounter() { var i = 0; fun count() { i = i + 1; print i; } return count; } var counter = makeCounter(); counter(); // ""1"". counter(); // ""2"". ``` Here we pass in the current state of the interpreter environment in function declaration semantics: ```java LoxFunction(Stmt.Function declaration, Environment closure) { this.closure = closure; this.declaration = declaration; ... public Void visitFunctionStmt(Stmt.Function stmt) { LoxFunction function = new LoxFunction(stmt, environment); environment.define(stmt.name.lexeme, function); ... Environment environment = new Environment(closure); for (int i = 0; i < declaration.params.size(); i++) { ```"
Functions.md,1685696717882,Static Scoping,"Static Scoping A variable usage refers to the preceding declaration with the same name in the innermost scope that encloses the expression where the variable is used. It is static scoping because running the program should not affect this. ```java var a = ""global""; { fun showA() { print a; } showA(); var a = ""block""; showA(); } ``` ""global"" should be printed twice, as `a` refers to the outermost `a` which is the preceding declaration in the innermost scope. Code may not always execute in the textual order with the introduction of functions which can defer it. **A block is not all the same scope** It’s like each `var` statement splits the block into two separate scopes, the scope before the variable is declared and the one after, which includes the new variable."
Functions.md,1685696717882,Persistent Environments,"Persistent Environments **Persistent data structures**: unlike the squishy data structures you’re familiar with in imperative programming, a persistent data structure can never be directly modified. Instead, any “modification” to an existing structure produces a brand new object that contains all of the original data and the new modification. The original is left unchanged."
Functions.md,1685696717882,Semantic Analysis for variable bindings,"Semantic Analysis for variable bindings Where a parser tells only if a program is grammatically correct (a _syntactic_ analysis), semantic analysis goes farther and starts to figure out what pieces of the program actually mean. In this case, our analysis will resolve variable bindings. We’ll know not just that an expression _is_ a variable, but _which_ variable it is."
Functions.md,1685696717882,Variable resolution pass,"Variable resolution pass After the parser produces the syntax tree, but before the interpreter starts executing it, we’ll do a single walk over the tree to resolve all of the variables it contains. It walks the tree, visiting each node, but a static analysis is different from a dynamic execution: - **There are no side effects.** When the static analysis visits a print statement, it doesn’t actually print anything. Calls to native functions or other operations that reach out to the outside world are stubbed out and have no effect. - **There is no control flow.** Loops are visited only once. Both branches are visited in `if` statements. Logic operators are not short-circuited."
Formal Specification.md,1681337183048,---,"--- title: ""Formal Specification"" date: 2023-03-28 lastmod: 2023-03-28 ---"
Formal Specification.md,1681337183048,Formal Specification,"Formal Specification A formal specification is the expression, in some formal language and at the some level of abstraction, of a collection of properties the system should satisfy through its behaviour."
Formal Specification.md,1681337183048,Motivation,Motivation *Boehm’s First Law: Errors are more frequent during requirements and design activities and are more expensive the later they are removed.* Formality helps us to obtain higher quality specifications which are able to detect serious problems in original informal specifications. It also enables automated analysis of the specification.
Formal Specification.md,1681337183048,Problem Abstraction,Problem Abstraction Process of simplifying the problem at hand and facilitating our understanding of a system. - focus on intended purpose - ignore details of how the purpose is achieved
Formal Specification.md,1681337183048,Systems,"Systems - Application: A physical entity whose function and operation is being monitored and controlled - Controller: Hardware and software monitoring and controlling the application in real time - Actuator (effector): A device that converts an electrical signal from the output of the computer to a physical quantity, which affects the function of the application. - Sensor: A device that converts an application’s physical quantity into an electric signal for input into the computer"
Formal Specification.md,1681337183048,Example on cold vaccine storage,"Example on cold vaccine storage A system that stores vaccine at a temperature that *should not exceed -70 degrees* - Application: storage chamber - Sensor: temperature sensor - Actuator: cooling engine - Controller (software): - checks measurements - sets the cooling engine Might also: - output information on a display - Write to log file and send it over network Safety property: $temp+\delta\le-70$ [Fault Tree Analysis](Notes/Risk%20Analysis.mdFault%20Tree%20Analysis): Safety invariants (things that should *always* hold) that need to be verified: 1. Always after controller has reacted, if sensor is not OK then alarm is raised and actuator is in decr 2. Always after controller reacted, if sensor is OK and temp + Δ ≥ -70 then cooler is in decr"
Formal Specification.md,1681337183048,Formal Specification Frameworks,Formal Specification Frameworks [[Event-B]]
Formal Specification.md,1681337183048,Building a Safety Case,Building a Safety Case Fundamental elements: - Supporting evidence e.g. observation - High level argument: explain how the evidence can be reasonably interpreted as indicating acceptable safety. 1. Define safety requirements (SR) 2. Use formal specification in Event-B to model the requirements 3. Discharging the proof obligations produces evidence that SR is met. Shown through [Goal Structured Notation:](Notes/Risk%20Analysis.mdGoal%20Structured%20Notation)
Game Theory.md,1669012068628,---,"--- title: ""Game Theory"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Game Theory.md,1669012068628,Game Theory,"Game Theory __Any game with a finite number of actions will have a Nash Equilibrium__ Pure Strategy set: a Nash equilibrium pure strategy exists if there are states which no party can gain a higher utility by choosing a separate action given that the other parties adhere to their current action. Mixed Strategy set: __when there is no pure strategy, an equilibrium must exist in mixed strategies.__ It is a probability distribution over 2 or more pure strategies."
Go.md,1669012068626,---,"--- title: ""Go"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Go.md,1669012068626,Go,Go https://go.dev/doc/effective_go - [Arrays and Slices](Notes/Arrays%20and%20Slices.md) - [Strings](Notes/Strings.md) - [Maps](Notes/Maps.md)
GPU Architecture.md,1669012068623,---,"--- title: ""GPU Architecture"" date: 2022-11-08 lastmod: 2022-11-21 ---"
GPU Architecture.md,1669012068623,GPU Architecture,GPU Architecture The general purpose CPU is designed for single-threaded code optimised for *low latency.* The GPU allows us to achieve higher throughput in exchange for *higher latency.* Need to achieve massive data parallelism for computing tasks such as vector processing and Multiplication and Accumulation (MAC) operations in matrices. SIMD: Single instruction multiple data
GPU Architecture.md,1669012068623,CUDA,CUDA
GPU Architecture.md,1669012068623,Architecture,Architecture
GPU Architecture.md,1669012068623,Programming Model,Programming Model CUDA works on a heterogeneous programming model that consists of a host and device. Host calls the device to run the program. - Host: CPU - Device: GPU
GPU Architecture.md,1669012068623,Programming Language,Programming Language The source code is split into host (compiled by standard compilers like gcc) and device components (compiled by nvcc).
GPU Architecture.md,1669012068623,Kernel,Kernel
GPU Architecture.md,1669012068623,Threads and Thread Blocks,Threads and Thread Blocks We can access important properties of the kernel: - Block ID: `blockIdx.x` gives us the ID of the thread block - Thread ID: `threadIdx.x` gives us the ID of the thread within a thread block - Dimension: `blockDim.x` gives us the number of threads per block The exact thread number can be found using `blockIdx.x* blockDim.x + threadIdx.x` Multi-dimensionality:
GPU Architecture.md,1669012068623,Synchronisation,Synchronisation
GPU Architecture.md,1669012068623,Memory management,"Memory management The above code does not take advantage of GPU parallelism in the CUDA core. We can create 1 block with 3 threads to achieve parallelism: `vector_add_cu<<<1,3>>>(d_c, d_a, d_b);` Use the threadIdx to access the memory: > [! Threads vs Blocks] > The example can also be achieved using 3 blocks each with 1 thread. However, parallel threads have the advantage to directly communicate and synchronise with each other due to shared hardware. Sharing memory between blocks would require *global memory access*"
GPU Architecture.md,1669012068623,Example,"Example ```c++ //initialize 1 block and 3 threads. We cannot use 3 blocks for this implementation as blocks would nt be able to share the local variable memory Dot_prod_cu<<<1,3>>>(d_c, d_a, d_b); __global__ void dot_prod_cu(int *d_c, int *d_a, int *d_b){ //use __shared__ to allow threads to share data __shared__ int tmp[3]; int i = threadIdx.x; tmp[i] = d_a[i] * d_b[i]; //wait for all threads to complete to prevent premature entering into if block __syncthreads(); if (i==0){ int sum = 0; for (int j = 0; j < 3; j++) sum = sum + tmp[j]; *d_c = sum; } } ```"
GPU Architecture.md,1669012068623,Internal Operations,"Internal Operations - Each SM contains multiple SP cores. Each core can only execute 1 thread. - Each block of threads can be scheduled on any available SM by the runtime system, but 1 block can only exist on 1 SM."
GPU Architecture.md,1669012068623,Warps,Warps
GPU Architecture.md,1669012068623,SIMT,"SIMT Warps enable a unique architecture called Single Instruction Multiple Thread. This means each warp executes only one common instruction for all threads. Within a single thread, its instructions are - pipelined to achieve instruction-level parallelism - issued in order, with no branch prediction and speculative execution Individual threads in a warp start together, at the same instruction address - but each has its own instruction address counter and registers - free to branch and execute independently when the thread diverges, such as due to data-dependent conditional execution and branch."
GPU Architecture.md,1669012068623,Thread Divergence,"Thread Divergence Branch statements will result in some threads in a warp wasting their clock cycles. This is because the threads in the warp must all execute the same instruction. For some which satisfy the condition, computation is done else NOP."
GPU Architecture.md,1669012068623,Practice Problems,"Practice Problems ```c __global__ void stencil(int N, int *input, int *output) { blockNum = blockIdx.x; i = threadIdx.x + blockNum * N; int sum = input[i]; for(int i = 1; i < 3; i++) { sum += input[i-i] sum += input[i+i] } } int N = len(input) / BLOCK_SIZE output = (int *) malloc(N * sizeof(int)) stencil<<<N, BLOCK_SIZE>>>(N, input, output) ```"
Games as Search Problems.md,1669012068631,---,"--- title: ""Games as Search Problems"" date: 2022-11-08 lastmod: 2022-11-21 --- Perfect information also means *fully observable* unlike in Poker where you cannot see the opponent's hand."
Games as Search Problems.md,1669012068631,Minimax Search,"Minimax Search Maximize own utility and minimize opponent's 1. Generate game tree to terminal state or a certain depth 2. Calculate the utility from the bottom-up (MAX turn will maximize own utility, MIN turn will minimize MAX utility) 3. Select the best move (we can assume that we start as MAX) Tic-Tac-Toe Tree Example: [Reflective and rotational symmetries:](https://courses.cs.duke.edu/cps100e/current/assign/ttt/:~:text=There%20are%20four%20reflective%20symmetries,the%20board%20on%20the%20left.&text=This%20means%20there%20are%20eight,board%20on%20each%20line%20above)"
Garbage Collection.md,1688005881622,---,"--- title: ""Garbage Collection"" date: 2023-06-26 ---"
Garbage Collection.md,1688005881622,Garbage Collection,"Garbage Collection In a *managed language*, the language implementation manages memory allocation and freeing on the user’s behalf. When a user performs an operation that requires some dynamic memory, the [Virtual Machine](Notes/Virtual%20Machine.md) automatically allocates it. The programmer never worries about deallocating anything. It ensures any memory the program is using sticks around as long as needed using a **garbage collector**."
Garbage Collection.md,1688005881622,Reachability,"Reachability How does a VM tell what memory is not needed? It considers a piece of memory to still be in use if it could possibly be read in the future. A value is *reachable* if there is some way for a user program to reference it. Some values can be directly accessed: ``` var global = ""string""; { var local = ""another""; print global + local; } ``` These are available on the stack or as an entry in the global hashmap and are called **roots**. Any values referenced by roots must still be alive and hence also reachable. 1. Starting with the roots, traverse through object references to find the full set of reachable objects. 2. Free all objects _not_ in that set."
Garbage Collection.md,1688005881622,Mark-Sweep Garbage Collection,"Mark-Sweep Garbage Collection - Marking: start with the roots and traverse through all of the objects those roots refer to. This is a classic graph traversal of all of the reachable objects. Each time we visit an object, we mark it in some way. - Sweeping: Once the mark phase completes, every reachable object in the heap has been marked. That means any unmarked object is unreachable and ripe for reclamation. We go through all the unmarked objects and free each one."
Garbage Collection.md,1688005881622,Tricolor Abstraction,"Tricolor Abstraction Each object has a conceptual “color” that tracks what state the object is in, and what work is left to do. This allows the GC to pause and pick up where it left off when needed. - ** White:** At the beginning of a garbage collection, every object is white. This color means we have not reached or processed the object at all. - ** Gray:** During marking, when we first reach an object, we darken it gray. This color means we know the object itself is reachable and should not be collected. But we have not yet traced _through_ it to see what _other_ objects it references. In graph algorithm terms, this is the _worklist_—the set of objects we know about but haven’t processed yet. - ** Black:** When we take a gray object and mark all of the objects it references, we then turn the gray object black. This color means the mark phase is done processing that object."
Garbage Collection.md,1688005881622,Weak References,"Weak References A reference that does not protect the referenced object from collection by the GC. For example, [strings saved in the intern table](Notes/String%20Interning.md) have a direct (root) reference by the VM, but must be treated as a weak reference if not the GC would never clean up its memory."
Garbage Collection.md,1688005881622,When to collect,"When to collect Every managed language pays a performance price compared to explicit, user-authored deallocation. The time spent actually freeing memory is the same, but the GC spends cycles figuring out which memory to free. That is time not spent running the user’s code and doing useful work. In our implementation, that’s the entirety of the mark phase. The goal of a sophisticated garbage collector is to minimize that overhead. - **Throughput** is the total fraction of time spent running user code versus doing garbage collection work - **Latency** is the longest _continuous_ chunk of time where the user’s program is completely paused while garbage collection happens. It’s a measure of how “chunky” the collector is."
Garbage Collection.md,1688005881622,Self adjusting heap,"Self adjusting heap The collector frequency automatically adjusts based on the live size of the heap. We track the total number of bytes of managed memory that the VM has allocated. When it goes above some threshold, we trigger a GC. After that, we note how many bytes of memory remain—how many were not freed. Then we adjust the threshold to some value larger than that. The result is that as the amount of live memory increases, we collect less frequently in order to avoid sacrificing throughput by re-traversing the growing pile of live objects. As the amount of live memory goes down, we collect more frequently so that we don’t lose too much latency by waiting too long."
Garbage Collection.md,1688005881622,Generational GC,"Generational GC A collector loses throughput if it spends a long time re-visiting objects that are still alive. But it can increase latency if it avoids collecting and accumulates a large pile of garbage to wade through. If only there were some way to tell which objects were likely to be long-lived and which weren’t. Then the GC could avoid revisiting the long-lived ones as often and clean up the ephemeral ones more frequently. The key observation was that most objects are very short-lived but once they survive beyond a certain age, they tend to stick around quite a long time. The longer an object _has_ lived, the longer it likely will _continue_ to live. - Every time a new object is allocated, it goes into a special, relatively small region of the heap called the “nursery”. Since objects tend to die young, the garbage collector is invoked frequently over the objects just in this region. - Those that survive are now considered one generation older, and the GC tracks this for each object. If an object survives a certain number of generations—often just a single collection—it gets _tenured_. At this point, it is copied out of the nursery into a much larger heap region for long-lived objects. The garbage collector runs over that region too, but much less frequently since odds are good that most of those objects will still be alive."
Hash Index.md,1669012068614,---,"--- title: ""Hash Index"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Hash Index.md,1669012068614,Hash Index,"Hash Index Idea: use a [hash table](Notes/Hash%20Tables.md) 1. Take a search key and hash it into an integer in the range of 0 to B-1 where B is the number of buckets 2. A bucket array holds the headers of B linked lists, one for each bucket 3. If a record has search key K, store the record by linking it to bucket list number h(K) Implementations 1. We can directly hash a key which points to the record - 2. Add a level of indirection: use an array of pointers to blocks to represent the buckets rather than an array holding data itself -"
Hash Index.md,1669012068614,Static Hash,Static Hash
Hash Index.md,1669012068614,Insertion and Deletion,Insertion and Deletion Bucket overflow can be handled naively by adding additional pointer to a separate block.
Hash Index.md,1669012068614,Extensible Hash Index,"Extensible Hash Index If the number of buckets is fixed from the start, we will end up with many additional pointers to additional overflow buckets. This incurs more I/O as more records are added. Idea: use only the first i bits output from the hash function to point to a directory."
Hash Index.md,1669012068614,Insertion and Deletion,Insertion and Deletion Increase i when overflow: Update the directory structure: Delete implementations: 1. Merge blocks when removing record 2. Do not merge blocks at all
Hash Index.md,1669012068614,Duplicate Keys,Duplicate Keys Unable to allocate different directory for duplicate keys:
Hash Index.md,1669012068614,Practice Problems,Practice Problems i. 2 I/O ii. All blocks needs to be searched: 11 I/O a. Hash index on ID b. B+ tree index on ID to support better range query c. Multi key index on ID and name
Greedy Best First Search.md,1669012068620,---,"--- title: ""Greedy Best First Search"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Greedy Best First Search.md,1669012068620,Greedy Search,Greedy Search Expands the node that appears to be closest to the goal based on the evaluation function h(n) i.e. expands the lowest h(n) values first. Completeness: No Optimality: No Time Complexity: $O(b^m)$ Space Complexity: $O(b^m)$
Greedy Best First Search.md,1669012068620,Graph Traversal,Graph Traversal _Assuming ties are handled in alphabetical order_ Expansion Order: A > B > C > G Final Path: A > B > C > G
Hardware Protection.md,1688265772127,---,"--- title: ""Hardware Protection"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Hardware Protection.md,1688265772127,Hardware Protection,Hardware Protection
Hardware Protection.md,1688265772127,Dual mode operation,"Dual mode operation Differentiates between at least 2 modes of operations 1. User mode: execution of user processes 2. __Monitor mode__ (supervisor/system/kernel mode): execution of operating system processes >[!Hardware Context Switching] > The above is an example of [hardware context switching](Notes/Context%20Switch.md), which is no longer supported in 64-bit mode. > [!Kernel mode vs root/admin] > Kernel mode is not the same as root/admin privileges. Kernel or user modes are hardware operation modes while the root/admin is just a user account in the OS. > > The root/admin may execute code in kernel mode indirectly."
Hardware Protection.md,1688265772127,I/O Protection,I/O Protection _All I/O instructions are privileged instructions._ The OS will ensure that they are correct and legal.
Hardware Protection.md,1688265772127,Memory Protection,Memory Protection OS needs to set the range of legal addresses a program may access. This is done using 2 registers. Base register: holds the first legal memory address Limit register: contains the size of the legal range
Hardware Protection.md,1688265772127,Practice Problems,"Practice Problems Given a base register value of 0x1000 and a limit register value of 0x1000, access to memory location 0x1FFF will generate a trap. False. Each access to memory by a process must be in the range [base, base+limit-1]. In this case, it translates to the range [0x1000, 0x1FFF]."
Grid World Scenario.md,1669012068616,---,"--- title: ""Grid World Scenario"" date: 2022-11-08 lastmod: 2022-11-21 ---"
HTTP.md,1674045428739,---,"--- title: ""HTTP"" date: 2022-12-03 lastmod: 2022-12-05 ---"
HTTP.md,1674045428739,Hypertext Transfer Protocol,"Hypertext Transfer Protocol HTTP is the Web's application layer protocol, above the transport or optional encryption layer. A Web page contains many objects and is addressable using a Uniform Resource Locator (URL): **HTTP uses TCP as its underlying transport protocol.**"
HTTP.md,1674045428739,A brief history rundown,"A brief history rundown 1. HTTP 0.9 began in 1991 with the goal of transferring HTML between client and server. 2. HTTP 1.0 evolved to add more capabilities such as header fields and supporting more than HTML file types, becoming a misnomer for hypermedia transport. A typical plaintext HTTP request 3. HTTP 1.1 introduced critical performance optimisations such as keepalive connections, chunked encoding transfers and additional caching mechanisms. 4. HTTP 2.0 aimed to improve transport performance for lower latency and higher throughput."
HTTP.md,1674045428739,HTTP message format,HTTP message format
HTTP.md,1674045428739,Request,"Request GET, POST, PUT, UPDATE, DELETE, HEAD methods are available"
HTTP.md,1674045428739,Response,Response
HTTP.md,1674045428739,User-Server State,"User-Server State HTTP is a *stateless* protocol, and does not maintain information about the clients. This simplifies server design and allow for high-performance web servers."
HTTP.md,1674045428739,Cookies,"Cookies It is often desirable for a Web site to identify users, either to restrict user access or serve specific content. Cookie technology consists of 4 components: 1. Cookie header line in the HTTP response message 2. Cookie header line in the HTTP request message 3. Cookie file kept on the user’s end system and managed by the user’s browser 4. Back-end database at the Web site"
HTTP.md,1674045428739,Web Caching,Web Caching A proxy server acts as a Web cache with its own disk storage and keeps recently requested objects. The proxy server sits on the LAN and reduces response time for a request.
HTTP.md,1674045428739,Optimisations in HTTP 1.1,Optimisations in HTTP 1.1
HTTP.md,1674045428739,HTTP Keepalive,HTTP Keepalive Reuse existing TCP connections paired with [TCP Keep-Alive](Notes/Transmission%20Control%20Protocol.mdTCP%20Keep-Alive) to save 1 roundtrip of network latency
HTTP.md,1674045428739,HTTP Pipelining,"HTTP Pipelining Persistent HTTP implies a strict FIFO order of client requests: *dispatch request, wait for full response, dispatch next request*. Pipelining moves the queue to the server side, allows the client to send all requests at once, and reduces server idle time by processing requests immediately without delay."
HTTP.md,1674045428739,Why not do server processing in parallel?,"Why not do server processing in parallel? The HTTP 1.x protocol enforces a requirement similar to that encountered in [TCP Head-of-Line Blocking due to its requirement for strict in-order packet delivery](Notes/Transmission%20Control%20Protocol.mdHead-of-Line%20Blocking), where there must be strict serialization of returned responses. Hence, although the CSS response finishes first, the server must wait for the full HTML response before it can deliver the CSS asset."
HTTP.md,1674045428739,Parallel TCP Connections,"Parallel TCP Connections Rather than opening one TCP connection, and sending each request one after another on the client, we can open multiple TCP connections in parallel. In practice, most browsers use a value of 6 connections per host. These connections are considered independent, and hence do not face the same head-of-line blocking issues in parallel server processing."
HTTP.md,1674045428739,Domain Sharding,"Domain Sharding Although browsers can maintain a connection pool of up to 6 TCP streams per host, this might not be enough considering how an average page needs 90+ individual resources. If delivered all by the same host, there will be queueing delays: Sharding can artificially split up a single host *e.g. www.example.com into {shard1,shard2}.example.com*, helping to achieve higher levels of parallelism at a cost of additional network resources."
HTTP.md,1674045428739,Enhancements in HTTP 2.0,"Enhancements in HTTP 2.0 HTTP 2.0 extends the standards from previous versions, and is designed to allow all applications using previous versions to carry on without modification."
HTTP.md,1674045428739,Binary Framing Layer,"Binary Framing Layer At the core of the performance enhancements, is this layer which dictates how the HTTP messages are encapsulated and transferred. Rather than delimiting parts of the protocol in newlines like in HTTP 1.x, all communication is split into smaller frames and encoded in binary: > [! Frames, Messages and Streams] > Frames: The smallest unit of communication in HTTP 2.0, each containing a frame header, which at minimum identifies the stream to which the frame belongs. It contains data such as HTTP headers, payload etc. > > Messages: A complete sequence of frames that map to a logical message. It can be a request or response. > > Stream: A bidirectional flow of bytes within an established connection > > *Each TCP connection can carry any number of streams which communicates in messages consisting of one or multiple frames.*"
HTTP.md,1674045428739,Request and Response Multiplexing,"Request and Response Multiplexing In [Why not do server processing in parallel?](Notes/HTTP.mdWhy%20not%20do%20server%20processing%20in%20parallel?), we find that only one response can be delivered at a time per connection. HTTP 2.0 removes these limitations. With this, workaround optimisations in HTTP 1.x such as domain sharding is no longer necessary."
HTTP.md,1674045428739,Request Prioritisation,"Request Prioritisation The exact order in which the frames are interleaved and delivered can be optimised further by assigning a 31 bit priority value (0 represents highest, $2^{31}-1$ being the lowest). HTTP 2.0 merely provides the mechanism for which priority data can be exchanged, and *does not implement any specific prioritisation algorithm*. It is up to the server to implement this."
HTTP.md,1674045428739,Server Push,"Server Push A document contains dozens of resources which the client will discover. To eliminate extra latency, let the server figure out what resources the client will require and push it ahead of time. Essentially, the server can send multiple replies for a single request, without the client having to explicitly request each resource."
HTTP.md,1674045428739,Header Compression,"Header Compression Each HTTP transfer carries a set of headers that describe the transferred resource and its properties. In HTTP 1.x, this metadata is always sent as plain text and adds anywhere from 500–800 bytes of overhead per request, and kilobytes more if HTTP cookies are required."
HTTP.md,1674045428739,Header table,Header table A header table is used on both the client and server to track and store previously sent key value pairs. They are persisted for the entire connection and incrementally updated both by the client and server. Each new pair is either appended or replaces a previous value in the table. This allows a new set of headers to be coded as a simple difference from the previous set:
Heap Sort.md,1669012068603,---,"--- title: ""Heap Sort"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Heap Sort.md,1669012068603,Heap Sort,"Heap Sort <iframe width=""560"" height=""315"" src=""https://www.youtube.com/embed/2DmK_H7IdTo"" title=""YouTube video player"" frameborder=""0"" allow=""accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"" allowfullscreen></iframe>"
Heap Sort.md,1669012068603,General Idea,"General Idea Data structure based sorting algorithm using [Heaps](Notes/Heaps.md). Makes use of the Partial Order Tree property: A tree is a maximising partial order tree if and only if each node has a key value greater than or equal to each of its child nodes. 1. Create a maximizing heap 2. Remove the root node, giving us the largest value. Fill an array from the back with this max value 3. Fix the heap structure 4. Repeat until we obtain a sorted array"
Heap Sort.md,1669012068603,Pseudocode,Pseudocode Remove the max element and retain the heap structure using [](Notes/Heaps.mdFix%20Heap%20maximising)
Heap Sort.md,1669012068603,Complexity,"Complexity Summary of complexities for Heapsort and corresponding heap structure methods: Heapsort has a best, worst and average case time complexity of $O(nlogn)$"
Heap Sort.md,1669012068603,Examples,Examples
Hash Tables.md,1669012068611,---,"--- title: ""Hash Tables"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Hash Tables.md,1669012068611,Hash Tables,"Hash Tables The problem with direct addressing: when the number of keys known to us is large, we are unable to map each key to their own slot."
Hash Tables.md,1669012068611,Hash functions,"Hash functions A hash function allows us to compute the slot from the key and reduce the amount of storage required to store all the keys. The time complexity for searching an element is O(1) _A good hash function satisfies (approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots, independently of where any other key has hashed to._"
Hash Tables.md,1669012068611,Division method,Division method Divide the key by some prime number not too close to a power of 2 (bits are in powers of 2) $h(k)=k\ mod\ m$
Hash Tables.md,1669012068611,Multiplication method,"Multiplication method _I am not too sure how this works_. Refer to “Introduction to algorithms”, 2009, p. 263"
Hash Tables.md,1669012068611,Collision Resolution,Collision Resolution
Hash Tables.md,1669012068611,Linked lists,"Linked lists We can combine the hash table with a [Linked Lists](Linked%20Lists), placing all the elements that hash to the same slot into the same linked list: | Search | Insert | Delete | | -------------------------------------- | ------ | ------ | | Proportional to the length of the list | O(1) | O(1) if doubly-linked |"
Hash Tables.md,1669012068611,Probing,"Probing The extra memory freed by not storing pointers provides the hash table with a larger number of slots for the same amount of memory, potentially yielding fewer collisions and faster retrieval."
Hash Tables.md,1669012068611,Open addressing,"Open addressing We systematically examine table slots until either we find the desired element or we have ascertained that the element is not in the table. ``` HASH-INSERT(T, k) i = 0 repeat j = h(k,i) if T[j] == NIL T[j] = k return j else i++ until i==m error ""hash table overflow"" ```"
Index Based Algorithms.md,1669012068609,---,"--- title: ""Index Based Algorithms"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Index Based Algorithms.md,1669012068609,Index Based Algorithms,"Index Based Algorithms Having an index on 1 or more attributes of a relation makes some algorithms more feasible. >[! ] $V(R,a)$ >This represents the number of distinct values of an attribute $a$ in a relation _R_ >"
Index Based Algorithms.md,1669012068609,Index Based Selection,Index Based Selection For a given selection operation $\sigma_{a=v}(R)$ (_meaning select all tuples in R where attribute a = v_) we can use an index on attribute a to gain cost savings.
Index Based Algorithms.md,1669012068609,Clustered Index,"Clustered Index Since each tuple with the same attribute value are packed in as little blocks as possible, a clustered index will have savings averaging: $$IO =B(R)/V(R,a)$$ The actual value can be higher due to: 1. All tuples with $a=v$ might be spread across more than 1 block"
Index Based Algorithms.md,1669012068609,Non-clustered Index,"Non-clustered Index We can assume that each tuple will be on a different block, a non-clustered index will have savings averaging: $$IO =T(R)/V(R,a)$$"
Index Based Algorithms.md,1669012068609,Index Based Joining,"Index Based Joining Assume an operation to join S and R over an attribute $C$. If only S has an index on C, algorithm: 1. Iterate over all the blocks of R 2. For each tuple $t$ in each block $b$ with an attribute value $t.C$ use the index to find the matching attribute value in S 3. Join and output"
Index Based Algorithms.md,1669012068609,Cost,"Cost Step 1: incur a cost of B(R) as we need to read all blocks of R Step 2: each tuple of R requires a read of the index - Clustered index: $T(R)\times B(S) / V(S,C)$ - Non-Clustered index: $T(R)\times T(S) / V(S,C)$ Total cost: $B(R) + \text{index cost}$"
Index Based Algorithms.md,1669012068609,Sorted Index Join - Zig Zag Join,"Sorted Index Join - Zig Zag Join With a sorted index on both relations, we can just perform the final step of [](Notes/Two%20Pass%20Algorithms.mdSort%20Based%20Algorithms%7Csort-based%20joining). Index allows us to ignore retrieving data blocks where there are no matching keys."
Index Based Algorithms.md,1669012068609,Cost,"Cost From example 15.12, the number of disk I/O's if R and S both have sorted indexes, the total cost would simply be that cost to read all blocks of R and S: 1500. Consider that a large fraction of R or S cannot match tuples of the other relation, then the total cost will be considerably less than 1500."
Index Based Algorithms.md,1669012068609,Practice Problems,Practice Problems Cost of access the data blocks: There are $\frac{k}{10}$ distinct values that we must access. There are a total of $B(R)/k=1000/k$ blocks having distinct values Total block access = $\frac{1000}{k}\times\frac{k}{10}=1000$
Heaps.md,1669277689787,---,"--- title: ""Heaps"" date: 2022-11-08 lastmod: 2022-11-24 ---"
Heaps.md,1669277689787,Heaps,Heaps A specialized tree based data structure. Efficient data structure to implement a priority queue.
Heaps.md,1669277689787,Implementations,Implementations Binary Heap built with a [Binary Tree](Notes/Binary%20Tree.md):
Heaps.md,1669277689787,Operations,Operations
Heaps.md,1669277689787,Fix Heap (maximising),"Fix Heap (maximising) Method to obtain retain the heap structure after the root is removed. Assumes both left and right subtrees are already maximising heaps. ```java fixHeap(H,k){ if (H is a leaf) insert k in root of H; else { compare left and right child; largerSubHeap = the larger child of H; //key is the largest element if (k >= key of root(largerSubHeap)) insert k in root of H; //key is not the largest, move the largest element up else{ insert root(largerSubHeap) in root of H; //recursively find the spot where key is the largest fixHeap(largerSubHeap, k) } } } ```"
Heaps.md,1669277689787,Example,"Example If there are 2 child nodes, this operation will take 2 comparisons:"
Heaps.md,1669277689787,Heapify,Heapify Method to obtain the maximising heap property from an arbitrary tree. Fix the heap from the bottom up:
Heaps.md,1669277689787,Complexity for heap construction:,Complexity for heap construction:
Instruction Level Parallelism.md,1669012068594,---,"--- title: ""Instruction Level Parallelism"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Instruction Level Parallelism.md,1669012068594,Instruction Level Parallelism,Instruction Level Parallelism We can group multiple independent instructions and execute them concurrently in different functional units of a single processor.
Instruction Level Parallelism.md,1669012068594,Approaches,Approaches Hardware approach: Superscalar processor Software approach: compiler based using a Very Long Instruction Word (VLIW) processor
Instruction Level Parallelism.md,1669012068594,Superscalar Processing,Superscalar Processing Way-2 can be assigned load and store instructions while Way-1 will handle other instructions: We can introduce previous techniques to reduce data hazard and improve overall CPI:
Instruction Level Parallelism.md,1669012068594,Very Long Instruction Word,Very Long Instruction Word
Instruction Level Parallelism.md,1669012068594,Practice Problems,Practice Problems a. P1 in order: | Time | Lane 1 | Lane 2 | | ---- | ------ | ------ | | 1 | A | | | 2 | B | | | 3 | C | D | | 4 | E | F | | 5 | G | | | 6 | H | | P2 out of order: | Time | Lane 1 | Lane 2 | | ---- | ------ | ------ | | 1 | A | D | | 2 | B | E | | 3 | C | F | | 4 | G | | | 5 | H | | b. $Speedup = 6/5=1.2$
Instructions.md,1669221191167,---,"--- title: ""Instructions"" tags: [question] date: 2022-11-08 lastmod: 2022-11-21 ---"
Instructions.md,1669221191167,Instructions,Instructions An example using the ARM ISA > [!The datapaths shown below are examples given a single cycle datapath]
Instructions.md,1669221191167,Register Type,Register Type All data values are located in registers Addressing Mode: __register addressing mode__ Rm: First source register Rn: Second source register Rd: Destination register shamt: Shift amount for use in shift operations
Instructions.md,1669221191167,Datapath,Datapath
Instructions.md,1669221191167,Data transfer type,Data transfer type Addressing Mode: __Base/Displacement addressing__
Instructions.md,1669221191167,Datapath,"Datapath LDUR 1. Rn contains the information about WHERE the data in memory is 2. Offset Rn by address value to get the memory address 3. Store the data from this memory address into Rt STUR 1. Rn register contains the information about WHERE to store the data 2. Offset the information in Rn by the address value (22 + 64) = 90, to get the destination memory address 3. Store the data inside Rt into this offset value > [!We can utilize a set of extra multiplexers to reuse components for both types] >"
Instructions.md,1669221191167,Immediate type,Immediate type Addressing mode: Immediate addressing
Instructions.md,1669221191167,Datapath,Datapath
Instructions.md,1669221191167,Conditional Branch type,Conditional Branch type PC relative addressing mode
Instructions.md,1669221191167,Datapath,Datapath
Instructions.md,1669221191167,What's with the shift left by 2? question,"What's with the shift left by 2? question Each instruction word is 32 bits (4 bytes) long. If we want to move by 2 instructions, we need to move 8 bytes. Thus, left shift by 2 to multiple the address by 4 to navigate the correct number of bytes."
Instructions.md,1669221191167,Unconditional Branch type,Unconditional Branch type Addressing mode: PC relative addressing
Instructions.md,1669221191167,Combine all types into a single datapath,Combine all types into a single datapath
Instructions.md,1669221191167,R-Type,"R-Type Critical path: ```mermaid graph LR; A(Reg2Loc Mux) --> T(""2 x REG(read)"") --> C(ALUSrc Mux) --> ALU --> E(""Mem2Reg Mux"") --> F(""REG(write)"") ``` Notes: - Reg2Loc (0) used to select Rm as a source register - ALUSrc (0) to select register data rather than sign-extended address - Mem2Reg (0) to select data from ALU rather than memory"
Instructions.md,1669221191167,I-Type,"I-Type Critical path: ```mermaid graph LR; A(""REG(read)"") --> T(Zero Extend) --> C(ALUSrc Mux) --> ALU --> E(""Mem2Reg Mux"") --> F(""REG(write)"") ``` - Immediate address is zero extended and hence there is no delay here"
Instructions.md,1669221191167,Load,"Load Critical path: ```mermaid graph LR; A(""REG(read)"") --> C(ALUSrc Mux) --> ALU --> T(D-MEM)--> E(""Mem2Reg Mux"") --> F(""REG(write)"") ``` Notes: - Reg2Loc not used as only Rn is used - ALUSrc (1) to select sign-extended address - Mem2Reg (1) to select data from memory"
Instructions.md,1669221191167,Store,"Store ```mermaid graph LR; A(""REG(read)"") --> C(ALUSrc Mux) --> ALU --> T(D-MEM) ``` - Reg2Loc used to select Rt as the read register 2. Rt data is passed into the D-MEM and not used by the ALU. Hence, the RegFile + Reg2Loc Mux delay is overshadowed by the ALU."
Instructions.md,1669221191167,Conditional Branch,"Conditional Branch Critical path: ```mermaid graph LR; T(Reg2Loc Mux)--> A(""REG(read)"")--> C(ALUSrc Mux) --> ALU --> E(Branch MUX) -->AND-->OR-->F(PCin/out) ``` Notes: - Reg2Loc (1) to read Rt - ALUSrc (0) to use data from register rather than address - Zero-flag in AND-gate together with Branch-flag to select address to add for branching rather than default +4 to load into PC"
Instructions.md,1669221191167,Unconditional Branch,Unconditional Branch Critical path: ```mermaid graph LR; T(Sign Extend)--> A(Shift left)--> C(ADD)-->E(Branch MUX)-->F(PCin/out) ``` Notes: - Additional OR-gate to always select the address for branching
Instructions.md,1669221191167,Practice Problems,"Practice Problems i. All instructions ii. All instructions iii. All except unconditional branch instructions iv. ALU instructions, Load/Store instructions and Conditional Branch. *Why unconditional branch don't need?* v. Load/Store instructions PC++, PCin -> PCout and I-MEM is used for all datapaths Propagation delay is the time delay for the signal to reach its destination. Some signals are sent out in parallel (e.g. PC++) and the delay there is overshadowed by the overall delay by main logic. i. Reg2Loc Mux -> 2 x REG(read) -> ALUSrc Mux -> ALU -> Mem2Reg Mux -> REG(write) 2 Reg read signals are done in parallel. $500+50+200+50+2000+50+200=3050ps$ ii. REG(read) -> Zero Extend -> ALUSrc Mux -> ALU -> Mem2Reg Mux -> REG(write) The delay from ALUSrcMux is overshadowed by the REG(R) $500+200+2000+200+50=2950ps$ iii. REG(read) -> ALUSrc Mux -> ALU -> D-MEM -> Mem2Reg Mux -> REG(write) ALUSrc MUX delay is overshadowed by the delay in REG(read) $500+200+2000+2000+50+200=4950$ iv. STUR is LDUR but without the Mem2Reg Mux and REG write $4950-200-50=4750ps$ v. Reg2Loc Mux -> REG(read) -> ALUSrc Mux -> ALU -> BranchMUX -> PCin/out $500+50+200+50+2000+50+100=2950$ vi. Sign extend -> Shift -> Add -> BranchMUX -> PCin/out $500+25+0+1500+50+100=2175$ i. Minimum clock period must allow types of instructions to complete without that clock period. Hence the minimum clock period is the time needed to complete the longest instruction: 4950ps ii. Minimum clock period of a specific cycle must allow the longest stage to complete. Hence, the longest stage is EX or MA which has 2000ps."
Insertion Sort.md,1669012068600,---,"--- title: ""Insertion Sort"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Insertion Sort.md,1669012068600,Insertion Sort,Insertion Sort
Insertion Sort.md,1669012068600,General Idea,"General Idea Incremental approach 1. Iterate through the array and maintain a sorted array _in-place_, at the start of the array. 2. For every element, loop through the sorted array backwards. Compare and swap the elements until it is in the correct spot. 3. Repeat until the entire array has been iterated through."
Insertion Sort.md,1669012068600,Pseudocode,Pseudocode _Equal elements are not swapped in position_. This means that Insertion Sort is [](005%20Sorting%20Algorithms.md^85ee66%20%7Cstable).
Insertion Sort.md,1669012068600,Complexity,"Complexity > [!NOTE] Best Case > - Occurs when the array is already sorted > - 1,2,3,4,5 > - 1 Key comparison is still required at each iteration > - Total of $n-1$ comparisons Observe that the time complexity is $O(n+I)$ where $I$ is the number of inversions. This also means that the time complexity to sort an array that is almost sorted i.e. _small number of inversions_, is linear. > [!NOTE] Worst Case > - Occurs when the array is reversely sorted, $\theta(n^2)$ inversions > - 5,4,3,2,1 > - Each iteration requires iterating through the entire sorted array > - Total: $$1+2+...+(n-1)=\sum_{i=1}^{n-1}i=\frac{(n-1)n}{2} $$"
Insertion Sort.md,1669012068600,Examples,Examples
Insertion Sort.md,1669012068600,Overall Evaluation,Overall Evaluation
Instruction Set Architecture.md,1669012068595,---,"--- title: ""Instruction Set Architecture"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Instruction Set Architecture.md,1669012068595,Instruction Set Architecture (ISA),Instruction Set Architecture (ISA) A set of specifications a programmer must know to write correct and efficient programs for a specific machine
Instruction Set Architecture.md,1669012068595,RISC vs CISC,RISC vs CISC __RISC__: Reduced Instruction Set Architecture __CISC__: Complex Instruction Set Architecture
Instruction Set Architecture.md,1669012068595,ARM ISA,ARM ISA Advanced RISC Machine (ARM)
Instruction Set Architecture.md,1669012068595,Register specification,Register specification
Instruction Set Architecture.md,1669012068595,Register File,Register File A register file is a set of registers that can be read and written by supplying a register number. This is done using [](Notes/Combinational%20Circuits.mdMultiplexer%7Cmultiplexers) to choose source registers and using a [](Notes/Combinational%20Circuits.mdDecoder%7Cdecoder) to select a destination register.
Instruction Set Architecture.md,1669012068595,Memory organization,Memory organization
Instruction Set Architecture.md,1669012068595,Instruction memory,Instruction memory
Instruction Set Architecture.md,1669012068595,Data memory,Data memory
Instruction Set Architecture.md,1669012068595,Instructions Format,Instructions Format Based on the system we can design a set of computer [Instructions](Notes/Instructions.md).
Instruction Set Architecture.md,1669012068595,Addressing Modes,Addressing Modes
Instruction Set Architecture.md,1669012068595,Practice Problems,"Practice Problems ``` ADDI X2, X3, 101 ;save loop termination index loop: LDUR X4, [X11, 0] ;x4 = b[i] ADDI X11, X11, 8 ;x11 = x11 + 8 ADD X4, X4, X1 ;x4 = b[i] + c STUR X4, X10, 8 ;a[i] = x4 ADDI X10, X10, 8 ;a[i] = a[i+1] SUBI X2, X2, 1 ;x2 = x2-1 CBNZ X2, loop exit END ``` ii. 1 is run once: 1 2 -> 8 is run 101 times = 1 + 707 = 708 iii. Line 1 and 4 are memory references, each done 101 times: 202 references."
Information Theory.md,1678961241253,---,"--- title: ""Information Theory"" date: 2023-03-14 lastmod: 2023-03-14 ---"
Information Theory.md,1678961241253,Information Theory,Information Theory
Information Theory.md,1678961241253,Shannon Information Content (Surprise),Shannon Information Content (Surprise) Can be interpreted as the *surprise* of an outcome. A lower probability outcome is more surprising! $$Surprise=log_2\frac{1}{p(x)}$$
Information Theory.md,1678961241253,Entropy,Entropy A measure of uncertainty. It is the expected surprise of an event. $$ \begin{aligned} Entropy&= E(Surprise)\\ &=\sum xP(X=x); \ x=surprise\\ &=\sum log\frac{1}{p(x)}\times p(x)\\ &=\sum p(x)\times (log(1)-log(p(x)))\\ &=\sum-p(x)log(p(x)) \end{aligned} $$
Information Theory.md,1678961241253,Information Gain,Information Gain Can be interpreted as the reduction in entropy (made things more certain/less surprising).
Information Theory.md,1678961241253,Example,Example Drawing 3 cards out of a standard deck of 52 cards with replacement. Win = all cards are the same colour. Lose = not all the same colour. $$ \begin{aligned} &P(Win)=2\times(\frac{1}{2})^3=\frac{1}{4}\\ &P(Lose)=\frac{3}{4} \\ &Entropy_{game}= -\frac{1}{4}log(1/4)-\frac{3}{4}log(3/4)=0.811\\ \end{aligned} $$ What is the information gain in the event you drew 2 cards (i.e. know what 2 of 3 cards suites are)? $$ \begin{align} \text{If both are same color}:\\ P(Win)=1/2\\ P{Lose}=1/2\\ Entropy=1\\ \text{If both are different color}:\\ P(Win)=0\\ P(Lose)=1\\ Entropy = 0\\ E(Entropy)= \frac{1}{2}(1)+\frac{1}{2}(0)=0.5\\ Gain=0.811-0.5=0.311 \end{align} $$
Information Theory.md,1678961241253,Gini Impurity,Gini Impurity
Intelligent Agents.md,1669012068588,---,"--- title: ""Intelligent Agents"" date: 2022-11-08 lastmod: 2022-11-21 ---"
Intelligent Agents.md,1669012068588,Intelligent Agents,Intelligent Agents Rational agents: make decisions based on maximising a specific value. Based on built-in knowledge about the environment Autonomous agents: do not rely entirely on built-in knowledge. Adapts to environment through experience and learning