Presentation

Report
Characterizing Cloud
Management Performance
Adarsh Jagadeeshwaran
CMG INDIA CONFERENCE,
December 12, 2014
© 2014 VMware Inc. All rights reserved.
Agenda
•Building Blocks of VMware’s Cloud Infrastructure
•The Software Defined Datacenter
•Cloud Management Performance at VMware
•Performance Challenges
•Tools and Benchmarks
•Role of Simulation
•Performance Testing Methodology
•Conclusion
Building Blocks of
VMware’s Cloud
Infrastructure
It all started with x86 virtualization
Traditional Architecture
Virtual Architecture
CONFIDENTIAL
4
And features like..
• VM.migrate
– Move the compute state of a Virtual Machine (VM) from one physical box to
another
– Typically used for resource load balancing
• VM.snapshot
– Preserve state and data of a VM at a specific point in time
– Snapshots are very helpful in avoiding damages to VMs during patch or
upgrade problems.
• Distributed Resource Scheduling
CONFIDENTIAL
5
Building the cloud
The New Role for IT: IT as a Service
Virtual Workspace
Manage access to services, applications and data for any device
Private Clouds
Public Clouds
60%
Hybrid Cloud
Seamlessly extend your data center to the public cloud
Software-Defined Data Center
Virtualize the entire data center
Management and Automation
Storage and Availability
Compute
Network and Security
CONFIDENTIAL
6
Cloud Infrastructure
=
Software Defined Data Center
Compute : cpu, memory resources
APP
APP
APP
APP
APP
APP
APP
APP
OS
OS
OS
OS
OS
OS
OS
OS
Compute
CONFIDENTIAL
8
+Storage
APP
APP
APP
APP
APP
APP
APP
APP
OS
OS
OS
OS
OS
OS
OS
OS
Storage
Compute
CONFIDENTIAL
9
+Networking/Security
APP
APP
APP
APP
APP
APP
APP
APP
OS
OS
OS
OS
OS
OS
OS
OS
Network/Security
Storage
Compute
CONFIDENTIAL
10
+Automation/Management – This is key
APP
APP
APP
APP
APP
APP
APP
APP
OS
OS
OS
OS
OS
OS
OS
OS
Network/Security
Storage
Automation
&
Management
Compute
CONFIDENTIAL
11
=Virtual Datacenter
APP
APP
APP
APP
APP
APP
APP
APP
OS
OS
OS
OS
OS
OS
OS
OS
Software-defined Datacenter
VDC
1
Network/Security
Storage
VDC 2
Automation
&
Management
Compute
CONFIDENTIAL
12
Typical Deployment
Finance
R&D
Grid
Software-defined Datacenter Services
Software-defined Datacenter Services
Software-defined Datacenter
CONFIDENTIAL
13
Cloud Management
Performance at
VMware
SDDC Management Suite
SDDC
Cloud Service
Virtual Networking Provisioning
and Security
Software-Defined
Storage and Availability
Operations
Management
VMware
vCloud® Suite
Virtual
CONFIDENTIAL
15
VMware Performance R&D
MEASURE
instrument,
benchmark, analyze
PERFORMANCE
OPTIMIZE
design, fix code,
tune settings
PUBLISH
white papers,
blogs, kb
articles, flings
CONFIDENTIAL
16
Performance
Challenges
The Management Server
UI Client
UI Client
Single
SignOn
vm
UI Server
Server 1
host_agent
vm
vm
Server 2
Stats
Processing
Inventory DB
(xml)
host_agent
vm
Relational
Database
vm
CONFIDENTIAL
18
Components affecting performance
• VM Resources like cpu and memory – shared across other VMs on
same physical server (host)
• Virtual devices – storage, networking, VM devices – data stored in
management server database
• #Managed Objects – data stored in management server database
– ESXi hosts
– VMs
– Resource Pools
– Clusters
• Performance statistics about objects – stored and processed in the
database
– Multiple levels of statistics from less to more detailed
• Incoming tasks and queries – cpu/mem usage on mgmt. server
CONFIDENTIAL
19
Deployment Size
• Overall Size:
– Small – Up to 150 servers, 3000 VMs
– Medium – up to 300 servers, 6000 VMs
– Large – up to 1000 servers, 10000 VMs
• Single Cluster Size:
– Resource Scheduling, Availability and Power Management work at a cluster
level
– Up to 32 servers or 4000 VMs in a single cluster
• A setup with 50 servers and 2000 VMs with least detailed statistics can
result in a database size of approx. 16GB
CONFIDENTIAL
20
Identify Common Use Cases
Cloud Solutions – Ex: vCloud Director
(Spans multiple Management Servers)
Cloud Management Workflow - 1
Instantiate
vApp
Deploy
vApp
Edit
vApp
Undeploy
vApp
Delete
vApp
Cloud Management Workflow - 2
Clone
vApp
Delete
vApp
CONFIDENTIAL
21
Identify Common Use Cases – Contd.
Customer Usage Patterns
• Customer Support Data
• Software support bundle – logs, events, traces
• Identify common operation pattern and frequency
• Group patterns by deployment size
CONFIDENTIAL
22
Build Tools for Stats and Monitoring
• Monitor Resource Usage
– Server level
– Management level
– Components of the Management Server
• Build Internal Profiling Counters
– Count of objects in memory
– Aggregated stats about tasks, events, etc
– Locking information
CONFIDENTIAL
23
Tools and
Benchmarks
Microbenchmark
• Simulates load on server from a given operation
– Example: 256 VM.powerOn operations in sequence
• Focus on specific operation (no background load)
• Study scaling trend for a given operation (latency)
• Study resource usage trend
• Performance of a specific server component
CONFIDENTIAL
25
Macro-benchmark
• In-house benchmark: VCBench
• Simulate (Admin) User Tasks
– Issues management operations using public APIs
• Simulate Multiple Users
– Multiple threads issuing a series of operations
• (User) Think time
– User can specify “think” time between operations
• Realistic work-load
– Operation mix & frequency extracted from customer data
• Measure throughput – Number of operations completed in given time
• Measure latency of operation in the presence of load and
corresponding resource usage
CONFIDENTIAL
26
Benchmark Run Profile
• Two primary modes
– “Light”: around 100 operations issued per minute
– “Heavy”: around 500 operations issued per minute
• Light load slightly above most customer work loads
– Lets us exercise the entire management stack
– And anticipate increased realistic demand in the short term
• Heavy load for saturating the management server
– The point where increasing the amount of resources for my server doesn’t
result in throughput increase any more.
CONFIDENTIAL
27
Realistic Operation Mix
Operation
Operation/min. (light)
Power On VM
40
Power Off VM
40
Clone VM
10
Migrate VM
40
Remove VM
10
Create Snapshot
5
Delete Snapshot
5
Reconfigure VM
10
• Mix of operations revised constantly based on new features and changing
datacenter use cases.
• Mix and frequency varied simply by editing a run list.
CONFIDENTIAL
28
Tools for monitoring performance
• Resource Usage Tool
– Tool built into hypervisor (esxtop) and management server
– Monitoring at component level
• Profiling tools (post-process)
– Uses management server’s internal profiling information from log bundle
– Summarizes performance metrics of internal objects
CONFIDENTIAL
29
Role of Simulation
Why Simulation?
• 1024 physical servers running ESXi (host) is a management nightmare
• Plus 15K VMs and the associated networking and storage components
• Solution?
– Have a simulated version of the hypervisor
– Fake the existence of VMs and datastores
– Management Server sees no difference
CONFIDENTIAL
31
Simulating the hypervisor
• Hypervisor agent is the Management server’s agent running on the
ESXi server
• With the hardware and VMs simulated, we can have the real
hypervisor agents run as separate threads in different containers
• We retain the agent to management server communication intact
• #Objects & properties to be managed by server remains the same
• Some Challenges:
– Simulating performance statistics, events and alarms
– Simulating VM IO
• Advantages:
– Hypervisor layer is a black box with consistent performance
– No hypervisor or storage performance bottleneck
– Focus is purely on management server scaling and performance
CONFIDENTIAL
32
Performance Testing
Methodology
Testing for Performance and Scale
• Testing at supported scale
• Hypervisor Scaling (Scale-up)
– Stacking more VMs on the same physical box
– Focus is on Hypervisor performance
• Management Server Scaling (Scale-out)
– Managing more physical boxes and VMs
– Focus is on Management Server performance
– a) Single Cluster at scale
– b) Overall large deployment
CONFIDENTIAL
34
Test configurations
• Scale-Up
– 1 or 2 ESXi Hosts
– 0.5-1K VMs per Host
– Microbenchmark with focus on one operation at a time
– 1, 32, 64, 128, 256, 512 vm.powerOn, vm.reconfigure, etc.
– Metrics measured: end-to-end latency, cpu/mem. usage
• Scale-Out
– 1024 ESXi Hosts managed by a single Management server
– 15K VMs total
– Benchmark with concurrently issued operations: datacenter.powerOn,
vm.migrate, etc.
– Metrics measured: Operation throughput, latency, cpu/mem. usage
CONFIDENTIAL
35
Regression Tracking
• Performance Automation automates processes for setup and
regression tracking
• Tracking for different scale inventories
• Track benchmark data (throughput, latency), and resource usage of
management server components for regression
• Analyze and fix regressions in performance
• Also useful for sizing guidelines
CONFIDENTIAL
36
Conclusion
Takeaways
• Understand factors affecting performance
• Have a comprehensive stats/monitoring framework
• Build a realistic benchmark that replicates customer behavior
• Ideal benchmark run should
– Include common use cases and user behavior
– remove variability in a multi-tiered setup
– Be able to focus on single component
• Simulation can help remove variability and with scaling
• Generate microbenchmarks that stress a single/small number of
components
CONFIDENTIAL
38
References
Thanks To• VMware vCenter Server Performance Team
• “Benchmarking a Virtualized Platform” – Vijayaraghavan
Soundararajan, et. al., IISWC 2014
(http://www.iiswc.org/iiswc2014/program2014.html)
CONFIDENTIAL
40
Backup
Example SDDC Management Task:
Distributed Resource Scheduling using VMotion
Resource Pool
VMware ESX
VMware ESX
VMware ESXi
• Balance VM Load in a cluster of ESXi servers
• Enforce Policy Based Rules
• Power Management
CONFIDENTIAL
42

similar documents