Is NoSQL the Future of Data Storage?

Report
Is NoSQL the Future of Data
Storage?
By Gary Short
Developer Express
Introduction
•
•
•
•
•
•
Gary Short
Technical Evangelist for Developer Express
C# MVP
[email protected]
www.garyshort.org
@garyshort.
Where Does NoSQL Originate?
• 1998
– OS relational database
• Didn’t expose an SQL interface
• Created by Carlo Strozzi
– Said the NoSQL movement
• “departs from the relational model altogether...”
• “...should have been called ‘NoREL”.
More Recently...
• Eric Evans reintroduced the term in 2009
– Johan Oskarsson (last.fm)
• Event to discuss OS distributed databases
• This labels growing number datastores
– Open source
– Non-relational
– Distributed
– (often) don’t guarantee ACID.
Atlanta 2009
• No:sql(east) conference
• Billed as “conference of no-rel datastores”
• Worst tag line ever
– SELECT fun, profit FROM real_world WHERE rel=false.
Not Ant-RDBMS
Key Attributes of NoSQL Databases
•
•
•
•
Don’t require fixed table schemas
Non-relational
(Usually) avoid join operations
Scale horizontally
– Adding more nodes to a storage system.
What Does the Taxonomy Look Like?
Document Store
•
•
•
•
•
Apache Jackrabbit
CouchDB
MongoDB
SimpleDB
XML Databases
– MarkLogic Server
– eXist.
Document What?
• Okay think of a web page...
– Relational model requires column/tag
– Lots of empty columns
– Wasted space
• Document model just stores the pages as is
– Saves on space
– Very flexible.
Graph Storage
•
•
•
•
•
AllegroGraph
Core Data
Neo4j
DEX
FlockDB.
Which Means?
• Graph consists of
– Node (‘stations’ of the graph)
– Edges (lines between them)
• FlockDB
– Created by the Twitter folks
– Nodes = Users
– Edges = Nature of relationship between nodes.
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
– Weak Definition
• “If no updates occur for a period, eventually all updates will
propagate through the system and all replicas will be consistent”
– Strong Definition
• “for a given update and a given replica eventually either the
update reaches the replica or the replica retires”
• Ordered
– Distributed Hash Table allows lexicographical processing.
Object Databases
•
•
•
•
•
Db4o
GemStone/S
InterSystems Caché
Objectivity/DB
ZODB.
Okay got it, Now Let’s Compare Some
Real World Scenarios
You Need Constant Consistency
•
•
•
•
You’re dealing with financial transactions
You’re dealing with medical records
You’re dealing with bonded goods
Best you use a RDMBS .
You Need Horizontal Scalability
•
•
•
•
You’re working across defined timezones
You’re Aggregating large quantities of data
Maintaining a chat server (Facebook chat)
Use NoSQL.
Up in the Clouds Baby
• If you are using Azure or AWS
– Compare costs of Azure Storage or SimpleDB to
SQL Azure or Elastic RDBMS
• Could be cheaper for your scenario.
It’s all About the iPhone!
Frequently Written Rarely Read
•
•
•
•
Think web counters and the like
Every time a user comes to a page = ctr++
But it’s only read when the report is run
Use NoSQL (key-value storage).
I Got Big Data!
•
•
•
•
Think weather stats
Satellite Images
Maps
Use NoSQL ( Something like Hadoop).
Binary Baby!
•
•
•
•
•
If you are YouTube
Flickr
Twitpic
Spotify
NoSQL (Amazon S3).
Here Today Gone Tomorrow
• Transient data like..
– Web Sessions
– Locks
– Short Term Stats
• Shopping cart contents
• Use NoSQL (Memcache).
Data Replication
• Same data in two or more locations
– Music Library
• Web browser
• iPone App
• NoSQL (CouchDB).
Hit me Baby One More Time!
• High Availability
– High number of important transactions
• Online gambling
• Pay Per view
– Ahem!
• Online Auction
• NoSQL (Cassandra – automatic clustering).
Give me a Real World Example
• Twitter
– The challenges
• Needs to store many graphs
– Who you are following
– Who’s following you
– Who you receive phone notifications from etc
• To deliver a tweet requires rapid paging of followers
• Heavy write load as followers are added and removed
• Set arithmetic for @mentions (intersection of users).
What Did They Try?
• Relational Databases
• Key-Value storage of denormalized lists
• Did it work?
– Nope!
• Either good at
– Handling the write load
– Or paging large amounts of data
– But not both .
What Did They Need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
– Arrive out of order
– Or be processed more than once
• Failures should result in redundant work
– Not lost work!
The Result was FlockDB
• Stores graph data
• Not optimised for graph traversal operations
• Optimised for large adjacency lists
– List of all edges in a graph
• Key is the edge value a set of the node end points
• Optimised for fast read and write
• Optimised for page-able set arithmetic.
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
– All queries can be answered by a single partition
• Write operations are idempotent
– Can be applied multiple times without changing
the result
• And commutative
– Changing the order of operands doesn’t change
the result.
Commutative Writes Help Bring up
Partitions
• Partition can receive write traffic immediately
• Receive dump of data in the background
• Live for read as soon as the dump is complete.
Performance?
• Currently store 13 billion edges
• 20K writes / second
• 100K reads / second.
Lessons Learned?
• Use aggressive timeouts
– Cut a client loose after timeout expired
– Let it try again on another app server
• Use same code path for error and normal ops
– Error requests are periodically retried
• Instrument.
Punchline?
• Under all the bells and whistles...
– Its MySQL .
So is this the Future?
• Yes!
• And No!
Questions?
• Contact me
– [email protected]
– @garyshort
Coming up…
P/X001
Understanding and Preventing SQL Injection Attacks
Kevin Kline
P/L001
SSIS Fieldnotes
Darren Green
P/L002
The (Geospatial) Shapes of Things to Come
Simon Munro
P/L005
End to End Master Data Management with SQL Server Master Data Services
Jeremy Kashel
P/T007
Understanding Microsoft Certification in SQL Server
Chris Testa-O'Neill
#SQLBITS

similar documents