Skip to content

Latest commit

 

History

History
68 lines (53 loc) · 1.58 KB

20150219.md

File metadata and controls

68 lines (53 loc) · 1.58 KB

#Lecture Notes: NoSQL

http://cu-data-engineering-s15.github.io/lecture_12/#/

Homework 3 Discussion Review fetch upstream Reiew pull request

##Credit where due Big Data: Principles and best practices of scalable realtime data systems Makeing Sendse of NoSQL: A guide for managers and the rest of use

##Problem: Traditional DBs Imagine: Web Analytics App to track page views by customerId Many customers/views=timeouts So....You batch updates..All good for now More requests in....Flooding ensues Still a DB bottleneck Relational Solutions Vertical Scaling Upgrade Hardware Add Cache Server SHARD Multi-Instance Multi-Hardware Partition Data Fault Tolerance still needs to be considered (HARD) Human Error Considerations (Lack of Fault Tolerance) Operational Complexity Hidden Glue Code (Maintenance) Application Layer Complexity (Maintenance)

##NoSQL to the Rescue Distributed by nature (SHARD/Replication managed by default) Horizontally scalable Just Add Servers Avoids Mutable Data (Versioned Data) Fault Tolerant

##Types of NoSQL DB

  • Key-Value
  • Graphs
  • Columnar (Column Store different)
  • Document

###Key-Value Hash table style Benefit: Siimply

###Graph Store Graph Relationships Benefit: Structural Query Languages

###Columnar Stores (Column Family Stores) Benefit: Scales well, fast writes Basic Data Model: Hashtables all the way Uneven multi-demensional array Distributed HashTable

###Document Stores Insert Documents (Bags for key-value pairs) Data is heavily indexed