ScaleDB: Persistence for Stream Data Data Velocity (Driven by Performance) ScaleDB: Big Fast Data w/MariaDB 2 In-Memory SAP HANA BigQuery High-Velocity / Disk ScaleDB Disk MariaDB, Oracle, SQL Server, etc. Data Volume (Driven by Cost – DRAM vs. Disk) Disk Hadoop Demo • Payment Table • P.K. * FK: Account, Time, * Fields: Store, Amount, Coupon • • • • 3 Inserts Lookup by Primary Key Lookup by Account (Foreign Key) Complex queries - BI & analytics Demo © Copyright 2014 ScaleDB. The information contained herein is subject to change without notice. ScaleDB’s Solution • 1M Inserts/Second (indexed) with Simultaneous Queries • Commodity “Cloud” Instance Total: 6 Nodes, 48 cores, 0.2TB main memory • ~1M inserts/second, cost is less than $15,000 • SAP HANA (In memory DBMS) • Cluster total: 100 Nodes, 4,000 cores, 100TB of main memory • “1.5M inserts/second” (Vishal Sikka, SAP TechED) • In Memory: DRAM cost alone is ~ $2M More Than 2 Orders of Magnitude Cost Advantage 5 Data Volumes are Exploding Tweets per Day iPhone Downloads AWS S3 & Dropbox Data Objects …Driven by new data sources and data types Devices 6 Social Log Files Analytics Business Faster Insights = More Value (Complements Kinesis, Storm, etc.) Twitter Storm Response Latency 0 ms Milliseconds to minutes Later. Possibly much later Lower Higher 7 Value of the Data to Users/Advertisers Big Data Fast Data Twitter Storm MillWheel Big Data Fast Data • Real-Time Data BigQuery • Ad Hoc (SQL) Processing • ScaleDB & Stream Processors • Pools of Data at Rest • Batch (programmatic) Processing • Hadoop 8 Hadoop’s Batch Processing “…MapReduce technologies are good at handling large volumes of data. But they are fundamentally batch-based, and struggle with enabling real-time decisions on a neverending—and never fully complete—stream of data.” Terry Hanold Vice President of New Business Initiatives Amazon AWS 9 Fast Data: The Car Metaphor Limited View / Real-Time Data No Historical View 10 Historical View “Batch Lag” Real-Time Data Historical View SQL Support DRAM Too Expensive for Stream Data Media Costs Based upon Data Volume (DRAM vs. Disk) 1TB DRAM 10TB $20,000 Disk $200,000 $43 100TB 1 Petabyte $2,000,000 $20,000,000 $430 $4,300 $43,000 This is why Amazon uses disk-based S3 (non-DBMS) for Kinesis • 1M inserts/second (100 byte rows), 24 hours = >8.5 TB/Day • Disk Media Cost = ~ $370 • DRAM Media Cost = ~ $172,800 (>450X more) 11 But Data Volumes Increase 78% CAGR According to IDC1 and Gartner2 data volumes have been measured to increase ten-fold every five years. 1. Gantz, John F. The Diverse and Exploding Digital Universe: An Updated Forecast of Worldwide Information Growth Through 2011. Tech. An IDC White Paper 12 2. Paquet, Raymond. “Technology Trends You Can’t Afford to Ignore.” Lecture. Gartner Webinar. Gartner.com. Gartner Inc., Jan. 2010. In-Memory & Big Data 12 10 8 Data Volumes (78%) 6 DRAM Prices (-30%) 4 DRAM Affordability (30%) 2 0 1 2 3 Years 13 4 5 Increase Multiplier (Volume/Affordability) Increase Multiplier (Volume/Affordability) Data Volume Growth Dramatically Outpaces DRAM Affordability 200 180 160 140 120 100 80 60 40 20 0 1 2 3 4 5 6 Years 7 8 9 10 ScaleDB: Big Fast Data w/MariaDB Data Velocity (Driven by Performance) 1,000,000 Inserts per second 14 In-Memory SAP HANA BigQuery High-Velocity / Disk ScaleDB Disk MariaDB, Oracle, SQL Server, etc. Data Volume (Driven by Cost – DRAM vs. Disk) Disk Hadoop BigQuery Cost: $86,400/day ScaleDB Cost*: $46/day * AWS: $28 for 8.4TB storage, $18 for 6 instances of heavy usage EBS optimized How it Works © Copyright 2014 ScaleDB. The information contained herein is subject to change without notice. Scaling the Database MariaDB DBMS Instance MariaDB MyIsam InnoDB ScaleDB Storage Data ScaleDB 16 Storage Instance Scaling the Database Tier DBMS Instance Cluster Manager Storage Instance 17 Storage Instance DBMS Instance DBMS Instance DBMS Instance Scaling the Storage Tier DBMS Instance DBMS Instance DBMS Instance DBMS Instance Cluster Manager Storage Instance 18 Storage Instance Storage Instance Storage Instance Storage Instance High-Availability DBMS Instance DBMS Instance DBMS Instance DBMS Instance Cluster Manager Mirrored Volumes 19 Storage Instance Storage Instance Storage Instance Storage Instance Storage Instance NoSQL v. MySQL Function Transactions Joins Data Consistency SQL Support ACID Compliant Mature Ecosystem (e.g. MySQL tools, apps, developers) Optimal for Analytics / BI / Reporting Disk-Based Insert Performance Ideal Use Case 20 NoSQL No No No (Eventual) No No No No 25,000-40,000/second Storing/Accessing Individual Objects ScaleDB Yes Yes Yes Yes Yes Yes Yes 1,000,000/second Processing Large Quantities of Data Push-Down: Distributed Parallel Processing Query Response Push Processing to the Data MariaDB 21 Result: High-Performance Parallel Processing Similar to Map/Reduce Query Response ScaleDB Query Response Query Response ScaleDB ScaleDB ScaleDB Storage Storage Storage Customer Success Story © Copyright 2014 ScaleDB. The information contained herein is subject to change without notice. Customer Success Story: Statricks Target: 300M-450M Listings per Day From: eBay, Craigslist …. Processing: • Price trends • Listing Longevity • Spam Detection • Ad Metrics • Price Trend Time Series • Statistical Analysis 23 Thank You © Copyright 2014 ScaleDB. The information contained herein is subject to change without notice.