#Lecture Notes: NoSQL
http://cu-data-engineering-s15.github.io/lecture_12/#/
Homework 3 Discussion Review fetch upstream Reiew pull request
##Credit where due Big Data: Principles and best practices of scalable realtime data systems Makeing Sendse of NoSQL: A guide for managers and the rest of use
##Problem: Traditional DBs Imagine: Web Analytics App to track page views by customerId Many customers/views=timeouts So....You batch updates..All good for now More requests in....Flooding ensues Still a DB bottleneck Relational Solutions Vertical Scaling Upgrade Hardware Add Cache Server SHARD Multi-Instance Multi-Hardware Partition Data Fault Tolerance still needs to be considered (HARD) Human Error Considerations (Lack of Fault Tolerance) Operational Complexity Hidden Glue Code (Maintenance) Application Layer Complexity (Maintenance)
##NoSQL to the Rescue Distributed by nature (SHARD/Replication managed by default) Horizontally scalable Just Add Servers Avoids Mutable Data (Versioned Data) Fault Tolerant
##Types of NoSQL DB
- Key-Value
- Graphs
- Columnar (Column Store different)
- Document
###Key-Value Hash table style Benefit: Siimply
###Graph Store Graph Relationships Benefit: Structural Query Languages
###Columnar Stores (Column Family Stores) Benefit: Scales well, fast writes Basic Data Model: Hashtables all the way Uneven multi-demensional array Distributed HashTable
###Document Stores Insert Documents (Bags for key-value pairs) Data is heavily indexed