Big Data Joe Brain Dump: RDBMS / NoSQL Concepts
Just a bunch of notes around RDBMS / NoSQL Concepts .. hopefully someone finds it useful.
ACID
- Atomicity
– Each transaction is all or nothing. If one part fails, the entire transaction fails.
- Consistency
– Ensures that any transaction will bring the database from one valid state to another.
- Isolation
– Ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially.
- Durability
– Once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
Cap Theorem aka Distributed System Model
- Consistency
— All nodes see the same data at the same time
- Availability
— A guarantee that every request receives a response about whether it was successful or failed
- Partition Tolerance
— The system continues to operate despite arbitrary message loss or failure of part of the system
- You can only have 2 of these
BASE aka Eventual Consistency aka Optimistic Replication
- Basically Available
- Soft State
- Eventually Consistent
Key-value Stores:
- Each record is an object identified by a unique key
- Easy to scale across multiple machines
- Great choice for applications requiring high read/write speeds
- Examples
– Voldemort (LinkedIn)
– DynamoDB (Amazon)
– Cassandra (Datastax)
– Redis
- Disadvantages
– Data cannot be accessed by value, it can only be access by key
– Cannot use query techniques common to relational databases
Document Stores:
- Schema-less, data storage in a format like JSON
- Data values can be accessed directly
- Useful for applications serving documents that require high read rates
- Use case would be something like a blog with high traffic
- Examples
– MongoDB
– CouchDB
- Disadvantages
– Challenging to update fields across many records
– Difficult to enforce transactions because every record could have a different schema
- Database sizes get very large, hard to compress
Misc Notes:
- The relational database model attempts to retain consistency at all costs
- In the BASE model the characteristics that we saw in the CAP Theorem can be relaxed somewhat as long as it’s practically applicable to the use case.
- If you can solve the problem with a relational model, where you don’t have to worry about scale, serving to millions of users, etc .. it’s often the best choice.
- When that doesn’t work you’d have to use a non-relational datastore
- It’s tough to scale and split a relational database because when you split it across 2 machines, you violate some of the ACID concept.
- The term “Non-Relational Databases” is more accurate and less confusing then “NoSQL Databases”.