Skip to content

jless97/UCLA-S19-CS188-Distributed-Systems

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CS188, Distributed Systems, Spring 2019

Course Objective

This class teaches design and implementation techniques that enable the building of fast, scalable, fault-tolerant distributed systems.

Course Description

Distributed systems enable developers to leverage computing and storage resources across many networked computers, to supply services that are fast, highly available, and scalable. Study covers fundamental concepts of design and implementation of such distributed systems. Topics include communication models for distributed machines (e.g., sockets, RPCs); synchronization (e.g., clock synchronization, logical clocks, vector clocks); failure recovery (e.g., snapshotting, primary-backup); consistency models (e.g., linearizability, eventual, causal); consensus protocols (e.g., Paxos, Raft); distributed transactions; and locking. Students gain hands-on, practical experience with these topics through multiple programming assignments, which work through steps of creating fault-tolerant, sharded key/value store. Exploration of how these concepts have manifested in several real-world, large-scale distributed systems used by Internet companies like Google, Facebook, and Amazon.

Languages Covered

Programming Language: Go

Assignment Overviews

  • P1 - MapReduce
    • Built a MapReduce library as a way to learn the Go programming language and as a way to learn about fault tolerance in distributed systems.
      • Part I: Word Count - wrote a simple MapReduce program
      • Part II: Distributed MapReduce jobs - wrote a Master that hands out jobs to workers
      • Part III: Handling worker failures - modified the Master to handle failure of workers
  • P2 - Primary/Backup Key/Value Service
    • Built a key/value service using primary/backup replication, assisted by a view service that decides which machines are alive. The view service allows the primary/backup service to work correctly in the presence of network partitions. The view service itself is not replicated, and is a single point of failure
      • Part A: The Viewservice
      • Part B: The primary/backup key/value service
  • P3 - Paxos-Based Key/Value Service
    • Built a key/value service using the Paxos protocol to replicate the key/value database with no single point of failure, and handles network partitions correctly. This key/value service is slower than a non-replicated key/value server would be, but is fault tolerant
      • Part A: Paxos
      • Part B: Paxos-based Key/Value Server
  • P4 - Sharded Key/Value Service
    • Built a key/value storage system that "shards" the keys over a set of replica groups. Each replica group handles puts and gets for just a few of the shards, and the groups operate in parallel; thus total system throughput increases in proportion to the number of groups. The sharded key/value store consists of two components: (1) set of replica groups, in which each is responsible for a subset of the shards, and consists of a handful of servers that use Paxos to replicate the group's shard, and (2) the shard master that decides which replica group should serve each shard (i.e. the configuration). There is a single shard master for the whole system, implemented as a fault-tolerant service using Paxos.
      • Part A: The Shard Master
      • Part B: Sharded Key/Value Server

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published