Is NoSQL the Future of Data Storage?

Is NoSQL the Future of Data
By Gary Short
Developer Express
Gary Short
Technical Evangelist for Developer Express
[email protected]
Where Does NoSQL Originate?
• 1998
– OS relational database
• Didn’t expose an SQL interface
• Created by Carlo Strozzi
– Said the NoSQL movement
• “departs from the relational model altogether...”
• “...should have been called ‘NoREL”.
More Recently...
• Eric Evans reintroduced the term in 2009
– Johan Oskarsson (
• Event to discuss OS distributed databases
• This labels growing number datastores
– Open source
– Non-relational
– Distributed
– (often) don’t guarantee ACID.
Atlanta 2009
• No:sql(east) conference
• Billed as “conference of no-rel datastores”
• Worst tag line ever
– SELECT fun, profit FROM real_world WHERE rel=false.
Key Attributes of NoSQL Databases
Don’t require fixed table schemas
(Usually) avoid join operations
Scale horizontally
– Adding more nodes to a storage system.
What Does the Taxonomy Look Like?
Document Store
Apache Jackrabbit
XML Databases
– MarkLogic Server
– eXist.
Document What?
• Okay think of a web page...
– Relational model requires column/tag
– Lots of empty columns
– Wasted space
• Document model just stores the pages as is
– Saves on space
– Very flexible.
Graph Storage
Core Data
Which Means?
• Graph consists of
– Node (‘stations’ of the graph)
– Edges (lines between them)
• FlockDB
– Created by the Twitter folks
– Nodes = Users
– Edges = Nature of relationship between nodes.
Key/Value Stores
• On disk
• Cache in Ram
• Eventually Consistent
– Weak Definition
• “If no updates occur for a period, eventually all updates will
propagate through the system and all replicas will be consistent”
– Strong Definition
• “for a given update and a given replica eventually either the
update reaches the replica or the replica retires”
• Ordered
– Distributed Hash Table allows lexicographical processing.
Object Databases
InterSystems Caché
Okay got it, Now Let’s Compare Some
Real World Scenarios
You Need Constant Consistency
You’re dealing with financial transactions
You’re dealing with medical records
You’re dealing with bonded goods
Best you use a RDMBS .
You Need Horizontal Scalability
You’re working across defined timezones
You’re Aggregating large quantities of data
Maintaining a chat server (Facebook chat)
Use NoSQL.
Up in the Clouds Baby
• If you are using Azure or AWS
– Compare costs of Azure Storage or SimpleDB to
SQL Azure or Elastic RDBMS
• Could be cheaper for your scenario.
It’s all About the iPhone!
Frequently Written Rarely Read
Think web counters and the like
Every time a user comes to a page = ctr++
But it’s only read when the report is run
Use NoSQL (key-value storage).
I Got Big Data!
Think weather stats
Satellite Images
Use NoSQL ( Something like Hadoop).
Binary Baby!
If you are YouTube
NoSQL (Amazon S3).
Here Today Gone Tomorrow
• Transient data like..
– Web Sessions
– Locks
– Short Term Stats
• Shopping cart contents
• Use NoSQL (Memcache).
Data Replication
• Same data in two or more locations
– Music Library
• Web browser
• iPone App
• NoSQL (CouchDB).
Hit me Baby One More Time!
• High Availability
– High number of important transactions
• Online gambling
• Pay Per view
– Ahem!
• Online Auction
• NoSQL (Cassandra – automatic clustering).
Give me a Real World Example
• Twitter
– The challenges
• Needs to store many graphs
– Who you are following
– Who’s following you
– Who you receive phone notifications from etc
• To deliver a tweet requires rapid paging of followers
• Heavy write load as followers are added and removed
• Set arithmetic for @mentions (intersection of users).
What Did They Try?
• Relational Databases
• Key-Value storage of denormalized lists
• Did it work?
– Nope!
• Either good at
– Handling the write load
– Or paging large amounts of data
– But not both .
What Did They Need?
• Simplest possible thing that would work
• Allow for horizontal partitioning
• Allow write operations to
– Arrive out of order
– Or be processed more than once
• Failures should result in redundant work
– Not lost work!
The Result was FlockDB
• Stores graph data
• Not optimised for graph traversal operations
• Optimised for large adjacency lists
– List of all edges in a graph
• Key is the edge value a set of the node end points
• Optimised for fast read and write
• Optimised for page-able set arithmetic.
How Does it Work?
• Stores graphs as sets of edges between nodes
• Data is partitioned by node
– All queries can be answered by a single partition
• Write operations are idempotent
– Can be applied multiple times without changing
the result
• And commutative
– Changing the order of operands doesn’t change
the result.
Commutative Writes Help Bring up
• Partition can receive write traffic immediately
• Receive dump of data in the background
• Live for read as soon as the dump is complete.
• Currently store 13 billion edges
• 20K writes / second
• 100K reads / second.
Lessons Learned?
• Use aggressive timeouts
– Cut a client loose after timeout expired
– Let it try again on another app server
• Use same code path for error and normal ops
– Error requests are periodically retried
• Instrument.
• Under all the bells and whistles...
– Its MySQL .
So is this the Future?
• Yes!
• And No!
• Contact me
– [email protected]
– @garyshort
Coming up…
Understanding and Preventing SQL Injection Attacks
Kevin Kline
SSIS Fieldnotes
Darren Green
The (Geospatial) Shapes of Things to Come
Simon Munro
End to End Master Data Management with SQL Server Master Data Services
Jeremy Kashel
Understanding Microsoft Certification in SQL Server
Chris Testa-O'Neill

similar documents