Windows Performance Troubleshooting and

Daniel Pearson
David Solomon Expert Seminars
• Started working with Windows NT 3.51
• Three years at Digital Equipment Corporation
• Supporting Intel and Alpha systems running Windows NT
• Seven years at Microsoft
• Senior Escalation Lead in Windows base team
• Worked in the Mobile Internet sustained engineering team
• Instructor for David Solomon, co-author of the Windows Internals
book series
• Components of performance analysis
• Understanding the tools for troubleshooting and analyzing
performance issues
• Troubleshooting CPU and memory issues using various
Windows tools
* Portions of this session are based on material developed by
Mark Russinovich and David Solomon
Components of Performance Analysis
• Event Tracing for Windows
• Core component of the operating system
• Kernel mode data structures
• Used to store information about the system and system objects that
can be read by various tools
• e.g. dt nt!_KTHREAD KernelTime
• CPU performance monitoring events
• Refer to the Intel 64 and IA-32 Architectures Software Developer’s
Event Tracing for Windows
Built in to the system
High performance, low overhead and scalable
2.5% CPU usage for a sustained rate of 10,000 events/sec on
a 2 GHz CPU1
Operations throughout the system that are of interest to performance
are fully instrumented
e.g. process and thread activity, registry I/O, disk I/O
1. Milirud, Michael. 2008. Windows Performance Analysis: Using Windows
Performance Tools. Presented at Microsoft's WinHEC conference, November
5-7, Los Angeles, CA.
Event Tracing for Windows
Uses a buffering and logging mechanism implemented in the kernel
Per-processor buffers that are written to disk by an asynchronous
writer thread
Ability to enable and disable tracing dynamically
Supports a managed code provider
Sysinternals Utilities
• Process Explorer
• Useful for displaying which files, registry keys and other objects
processes have open and which DLLs they have loaded
• Process Monitor
• Useful for showing real-time file system, registry and process &
thread activity
• Available for download from the TechNet site
Resource Monitor
• Included with Windows Vista and greatly enhanced in Windows 7 and
Windows Server 2008 R2
• Allows the viewing of CPU, memory, disk and network resources as well
as handles and modules in real time
• Ability to end, suspend and resume processes as well as to start, stop
and restart Windows services
• Useful for identifying the highest resource consumers by individual
resource type, e.g. CPU
• Able to list the wait chain tree of a process to determine if a process is
waiting on another
Using Resource Monitor
Performance Monitor
• Queries performance counters that measure system state or activity
• Current values are read at specific intervals
• Performance counters are included in the operating system and can be
included as part of applications
• Able to collect event trace data from trace providers that report actions
or events
• Can combine multiple trace providers into a single session
• Configuration information can be collected from registry keys at a
specific time or interval
Using Performance Monitor
Windows Performance Analyzer
• Part of the Windows Performance Toolkit
• Support for both x86, x64, and IA64 architectures
Consists of three primary programs
Used for controlling tracing and processing trace data
Automates on and off state transitions and captures traces during
those transitions
A graphical trace visualization tool to represent data in the form of
interactive graphs and summary tables
Windows Performance Analyzer
• Primarily uses the Event Tracing for Windows infrastructure built in to
the system
• Can be enabled or disabled at any time without requiring a system or
process restart
• Supports symbol decoding, sample profiling, and recording of call
stacks on kernel events
• Designed to be used during automation
• All the functions of the tools are available via the command line tool
Support for Earlier Systems
• The Windows Performance Toolkit will fail to install on Windows XP and
on Windows Server 2003 although data collection is supported
• Copy xperf.exe and perfctrl.dll
• Trace analysis is only supported on Windows Vista and later systems
Capturing a Performance Trace
• Kernel options divided into two parts
• Kernel Flags
• Identified by the use of uppercase characters
• Kernel Groups
• Indentified by the use of title case characters
• e.g. Base, Diag, Latency, FileIO
• Kernel Groups are made up of a collection of Kernel Flags
• Flags and groups are separated by the ‘+’ token
• e.g. xperf.exe -on FileIO+DISK_IO_INIT
Merging of Performance Trace Data
• Traces can be copied to another system for analysis
• The trace file should be “merged” on the collection system before
analysis to include additional system information
• xperf -d trace.etl
System and symbol
Merged trace
Kernel trace
Using the Windows
Performance Toolkit
Understanding CPU Activity
• Windows uses 32 priority levels
• The system implements a preemptive,
priority driven scheduler
Real time
• Priority adjustments can be applied to
threads in the “dynamic” range
• At least one runnable thread with the
highest priority will be running
Context Switching
A switch from one thread to another is known as a context switch
Switching involves saving the hardware state of a thread and restoring
the state of another
When a thread is scheduled, that thread’s context switch count is
also incremented
The context switch count represents how often a thread begins
running, not how long it ran
Time Accounting Quirks
Looking at total CPU time for each process may not reveal where the
system has spent its time
CPU time accounting is driven by an interrupt timer which is set by
the Hardware Abstraction Layer
Usually at either 10 or 15 msec intervals
Thread execution and context switches that happen between clock
intervals are not accounted for
e.g. a thread runs and enters a wait before the clock fires
Thus threads may run but never get charged
Time Accounting Prior to Windows Vista
• Windows accounted for CPU time based on the interval clock timer
• Thread quantum expiration was not always fair
• A thread might get almost no turn
• Threads were also charged for interrupts that occurred while they
were running
Time Accounting Since Windows Vista
• Windows Vista and later reads the Time Stamp Counter during every
context switch
• The actual CPU cycles consumed are charged to a thread
• Any interrupt time is not charged to the interrupted thread
• Allows for more accurate quantum accounting
• A thread gets at least one turn and at most will be given one turn
plus an additional tick
Troubleshooting High
CPU Utilization
Understanding Memory Management
• Windows provides two system memory pools
• Nonpaged Pool and Paged Pool
• Used for system wide persistent data
• Prior to Windows Vista, pool sizes were a function of memory size and
whether or not the system was configured as a server or a workstation
• Windows Vista introduced the concept of a dynamic system
address space
Dynamic System Address Space
• In 32-bit Windows Vista and later, virtual memory is assigned as needed
• Permits larger paged, nonpaged, and session pools
• Components still cannot exceed 2 GB on 32-bit systems
• On 64-bit systems, address space regions are configured to their
current maximum limits for all memory sizes
Memory Leaks
Additional Information
• Windows Internals 5th edition
• Windows Performance Analysis Developer Center
• Windows Server Performance Team Blog
• Ask the Performance Team Blog
• David Solomon Expert Seminars offers training
on Windows Internals both as public and private workshops and public
webinars via the Internet
• Currently scheduled up and coming classes
• Public workshop in London, April 12th – April 16th
• Public webinar, April 26th & April 28th
• Public workshop in New York, May 3rd – May 7th
• Public workshop in San Francisco, November 8th – November 12th
• Visit for further course descriptions and up to
date information

