Middleware-based Database Replication

Report
Performance Benchmarking
in Systems
L’évaluation de performance
en système
Emmanuel Cecchet
University of Massachusetts Amherst
Laboratory for Advanced
Systems Software
& UMass Digital Data Forensics Research
WHY ARE WE BENCHMARKING?
Because my advisor told me to do it?
 Because others are doing it?
 Because I can’t get my paper published without it?

Why am I building a new system?





What am I trying to improve?
Does it need to be improved?
How am I going to measure it?
What do I expect to see?
Am I really measuring the right thing?
CFSE – [email protected]

2
PERFORMANCE
Faster is better?
 Bigger is better?
 Scalable is better?
 What about manageability?

Which is the right metric?





Hardware counters
Throughput
Latency
Watts
$…
CFSE – [email protected]

3
EXPERIMENTAL METHODOLOGY
Limiting performance bias
 Producing Wrong Data Without Doing
Anything Obviously Wrong! – T. Mytkowicz, A.
Diwan, M. Hauswirth, P. Sweeney – Asplos 2009
Performance sensitive to experimental setup
 Changing a UNIX environment variable can change
program performance from 33 to 300%
 Setup randomization

CFSE – [email protected]

4
EXPERIMENTAL ENVIRONMENT
Software used









OS
Libraries
Middleware
JVMs
Application version
Compiler / build options
Logging/debug overhead
Monitoring software
CFSE – [email protected]

Hardware used
Cpu / mem / IO
 Network topology

5
8
2031616
1638400
1245184
851968
458752
249856
225280
200704
176128
151552
126976
102400
30
77824
53248
32256
29184
26112
23040
19968
16896
13824
10752
7680
4608
3776
3392
3008
2624
2240
1856
1472
1088
704
416
224
56
Bande passante en Mo/s
70
60
50
40
2 noeuds 64-bit stepping 1
2 noeuds 64-bit stepping 2
20
CFSE – [email protected]
SCI NETWORK PERFORMANCE AND
PROCESSOR STEPPING
80
10
0
Taille des paquets en octets
6
How Relevant are Standard Systems
Benchmarks?
 BenchLab: Realistic Web Application
Benchmarking
 An Agenda for Systems Benchmarking
Research

CFSE – [email protected]
OUTLINE
7


http://www.spec.org
Benchmark groups
 Open Systems Group
CPU (int & fp)
 JAVA (client and server)
 MAIL (mail server)
 SFS (file server)
 WEB


High Performance Group
OMP (OpenMP)
 HPC
 MPI


CFSE – [email protected]
SPEC BENCHMARKS
Graphics Performance Group
APC (Graphics applications)
 OPC (OpenGL)

8
TYPICAL E-COMMERCE PLATFORM
Virtualization
 Elasticity/Pay as you go in the Cloud
Internet
Frontend/
Load balancer
App.
Servers
CFSE – [email protected]

Databases
9
TYPICAL E-COMMERCE BENCHMARK
Setup for performance benchmarking
Browser emulator
 Static load distribution
 LAN environment

Internet
Emulated
clients
App.
Servers
CFSE – [email protected]

Database
10

Open Versus Closed: A Cautionary Tale –
B. Schroeder, A. Wierman, M. Harchor-Balter – NSDI’06
 response time difference between open and close can be large
 scheduling more beneficial in open systems
CFSE – [email protected]
OPEN VS CLOSED
11
TYPICAL DB VIEW OF E-COMMERCE BENCHMARKS
Direct SQL injection
Internet
SQL SQL
SQL
CFSE – [email protected]

Database
12
TPC-W BENCHMARK
Open source PHP and Java servlets implementations
with MySQL/PostgreSQL
 Browser Emulators have significant variance in replay
CFSE – [email protected]

13

HTTP 1.0, no CSS, no JS…

And seriously… did you recognize Amazon.com?
CFSE – [email protected]
WHY IS TPC-W OBSOLETE?
14
RUBIS BENCHMARK
Auction site (a la eBay.com)
 Many open source implementations
PHP
 Java: Servlet, JEE, Hibernate, JDO…

Everybody complains about it
 Everybody uses it


Why?



CFSE – [email protected]

It is available
It is small enough to be able to mess with it
Others are publishing papers with it!
15
WEB APPLICATIONS HAVE CHANGED
Web 2.0 applications
Rich client interactions (AJAX, JS…)
o Multimedia content
o Replication, caching…
o Large databases (few GB to multiple TB)
o

Complex Web interactions
HTML 1.1, CSS, images, flash, HTML 5…
o WAN latencies, caching, Content Delivery
Networks…
o
CFSE – [email protected]

16
MORE REASONS WHY BENCHMARKS ARE OBSOLETE?
HTML CSS JS Multimedia Total
RUBiS
eBay.com
TPC-W
amazon.com
1
1
1
6
0
0
3
3
0
0
13 33
1
31
5
91
2
38
6
141
CloudStone
1
2
4
21
28
facebook.com
6
13 22
135
176
wikibooks.org
1
19 23
35
78
wikipedia.org
1
5
20
36
10
CFSE – [email protected]
Benchmark
Number of interactions to fetch the home page of various web sites and benchmarks
17
STATE SIZE MATTERS
Does the entire DB of Amazon or eBay fit in the
memory of a cell phone?



TPC-W DB size: 684MB
RUBiS DB size: 1022MB
Impact of CloudStone database size on
performance
Dataset
size
25 users
100 users
200 users
400 users
500 users
State size
(in GB)
3.2
12
22
38
44
Database
rows
173745
655344
1151590
1703262
1891242
Avg cpu load
with 25 users
8%
10%
16%
41%
45%
CFSE – [email protected]

CloudStone Web application server load observed for various dataset sizes
using a workload trace of 25 users replayed with Apache HttpClient 3.
18
How Relevant are Standard Systems
Benchmarks?
 BenchLab: Realistic Web Application
Benchmarking
 An Agenda for Systems Benchmarking
Research

CFSE – [email protected]
OUTLINE
19
BENCHMARK DESIGN
Traditional approach (TPC-W, RUBiS…)
Web Emulator
Application under Test
CFSE – [email protected]
Workload definition
+
BenchLab approach
Real Web Browsers
HTTP trace
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
Application under Test
20
BENCHLAB: TRACE RECORDING
Record traces of real Web sites
 HTTP Archive (HAR format)
HA Proxy recorder
Internet
Frontend/
Load balancer
httpd recorder
App.
Servers
CFSE – [email protected]

SQL recorder?
Databases
21
BENCHLAB WEBAPP
JEE WebApp with embedded database
 Repository of benchmarks and traces
 Schedule and control experiment execution
 Results repository
 Can be used to distribute / reproduce
experiments and compare results
Browser
registration
Experiment
start/stop
Trace download
Results upload
Web Frontend
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
http://...
Traces (HAR or access_log)
Results (HAR or latency)
Experiment Config
Benchmark VMs
Upload traces / VMs
Define and run
experiments
Compare results
Distribute
benchmarks, traces,
configs and results
CFSE – [email protected]

Experiment scheduler
22
BENCHLAB CLIENT RUNTIME (BCR)
Replay traces in real Web browsers
 Small Java runtime based on Selenium/WebDriver
 Collect detailed response times in HAR format
 Can record HTML and page snapshots
 Upload results to BenchLab WebApp when done
BCR
Web page browsing
and rendering
CFSE – [email protected]

HAR results
23
WIKIMEDIA FOUNDATION WIKIS



Wikimedia Wiki open source software stack
Lots of extensions
Very complex to setup/install
Real database dumps (up to 6TB)





3 months to create a dump
3 years to restore with default tools
Multimedia content
Images, audio, video
Generators (dynamic or static) to avoid copyright issues
CFSE – [email protected]

Real Web traces from Wikimedia
 Packaged as Virtual Appliances

24
WIKIPEDIA DEMO

Wikimedia Wikis
Real software
 Real dataset
 Real traces
 Packaged as Virtual Appliances

Real Web Browsers
Firefox
 Chrome
 Internet Explorer

CFSE – [email protected]

25
Replay

0.06s
Browsers are smart
Parallelism on multiple
connections
 JavaScript execution can trigger
additional queries
 Rendering introduces delays in
resource access
 Caching and pre-fetching


0.25s
HTTP replay cannot
approximate real Web browser
access to resources
1.02s
0.67s
0.90s
1.19s
0.14s
0.97s
1.13s
0.70s
0.28s
0.27s
0.12s
3.86s
1.88s
Total network time
GET /wiki/page
1
Analyze page
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
combined.min.css
jquery-ui.css
main-ltr.css
commonPrint.css
shared.css
flaggedrevs.css
Common.css
wikibits.js
jquery.min.js
ajax.js
mwsuggest.js
plugins...js
Print.css
Vector.css
raw&gen=css
ClickTracking.js
Vector...js
js&useskin
WikiTable.css
CommonsTicker.css
flaggedrevs.js
Infobox.css
Messagebox.css
Hoverbox.css
Autocount.css
toc.css
Multilingual.css
mediawiki_88x31.png
2
Rendering + JavaScript
GET
GET
GET
GET
GET
GET
GET
GET
GET
ExtraTools.js
Navigation.js
NavigationTabs.js
Displaytitle.js
RandomBook.js
Edittools.js
EditToolbar.js
BookSearch.js
MediaWikiCommon.css
3
Rendering + JavaScript
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
GET
4
GET
GET
GET
GET
GET
GET
page-base.png
page-fade.png
border.png
1.png
external-link.png
bullet-icon.png
user-icon.png
tab-break.png
tab-current.png
tab-normal-fade.png
search-fade.png
Rendering
search-ltr.png
arrow-down.png
wiki.png
portal-break.png
portal-break.png
arrow-right.png
generate
page
send
files
send
files
CFSE – [email protected]
HTTP VS BROWSER REPLAY
0.25s
send
files
send
files
+ 2.21s total rendering time
26
TYPING SPEED MATTERS
Auto-completion in search fields is common
 Each keystroke can generate a query
GET
GET
GET
GET
GET
GET
/api.php?action=opensearch&search=W
/api.php?action=opensearch&search=Web
/api.php?action=opensearch&search=Web+
/api.php?action=opensearch&search=Web+2
/api.php?action=opensearch&search=Web+2.
/api.php?action=opensearch&search=Web+2.0
CFSE – [email protected]

27
JAVASCRIPT EFFECTS ON WORKLOAD

Browser side input validation
Additional queries during form processing
Emulated Browser
Good
Input
Real Browser
CFSE – [email protected]

Bad
Input
28
LAN VS WAN LOAD INJECTION
Deployed BCR instances in Amazon EC2 data centers



Latency



As little as $0.59/hour for 25 instances for Linux
Windows from $0.84 to $3/hour
CFSE – [email protected]

WAN latency >= 3 x LAN latency
Latency standard deviation increases with distance
CPU usage varies greatly on server for same workload
(LAN 38.3% vs WAN 54.4%)
US East
US West
Europe
Asia
Average
latency
920ms
1573ms
1720ms
3425ms
Standard
deviation
526
776
906
1670
29
How Relevant are Standard Systems
Benchmarks?
 BenchLab: Realistic Web Application
Benchmarking
 An Agenda for Systems Benchmarking
Research

CFSE – [email protected]
OUTLINE
30
OPEN CHALLENGES - METRICS

Manageability
Online operations
 Autonomic aspects

HA / Disaster recovery
Fault loads
 RTO/RPO

Elasticity
 Scalability

Private cloud
 Internet scale


CFSE – [email protected]

Cacheability
Replication
 CDNs

31
OPEN CHALLENGES - WORKLOADS

Capture
Quantifiable overhead
 Complex interactions
 Correlation of distributed traces
Separating trace generation from replay
 Scaling traces
 Security

Anonymization
 Content of updates


CFSE – [email protected]

Replay
Complex interactions
 Parallelism vs Determinism
 Internet scale

32
OPEN CHALLENGES - EXPERIMENTS

Experiment automation
Capturing experimental environment
 Reproducing experiments
 Minimizing setup bias

Experimental results
Certifying results
 Results repository
 Mining/comparing results


Realistic benchmarks
CFSE – [email protected]

Applications
 Workloads
 Injection

33
CONCLUSION
Benchmarking is hard
 Applications are becoming more complex
Realistic workloads/interactions
 Realistic applications



BenchLab for Internet scale Benchmarking of real
applications
A lot to explore…
CFSE – [email protected]

34
http://lass.cs.umass.edu/projects/benchlab/
CFSE – [email protected]
Q&A
35

similar documents