Thursday, October 30, 2014

Mac OS X local privilege escalation (IOBluetoothFamily)

(This post is a joint work with @joystick, see also his blog here)

Nowadays, exploitation of user-level vulnerabilities is becoming more and more difficult, because of the widespread diffusion of several protection methods, including ASLR, NX, various heap protections, stack canaries, and sandboxed execution. As a natural consequence, instead of extricating themselves with such a plethora of defensive methods, attackers prefer to take the “easy” way and started to move at the kernel-level, where sophisticated protection techniques are still not very common (indeed, things like as KASLR and SMEP are implemented only in the latest versions of the most popular OSes). This trend is also confirmed by the rising number of kernel-level vulnerabilities reported in the last few months in Windows, Linux, and OS X.

Following this trend, we recently looked at few OS X drivers (“KEXT”s) and found a integer signedness bug affecting service IOBluetoothHCIController (implemented by the IOBluetoothFamily KEXT). This vulnerability can be exploited by a local attacker to gain root privileges. The issue is present on the latest versions of OS X Mavericks (tested on 10.9.4 and 10.9.5), but has been “silently” patched by Apple in OS X Yosemite.

Vulnerability overview

In a nutshell, the bug lies in the IOBluetoothHCIUserClient::SimpleDispatchWL() function. The function eventually takes a user-supplied 32-bit signed integer value and uses it to index a global array of structures containing a function pointer. The chosen function pointer is finally called. As the reader can easily imagine, SimpleDispatchWL() fails at properly sanitizing the user-supplied index, thus bad things may happen if a malicious user is able to control the chosen function pointer.

More in detail, the vulnerable part of the function is summarized in the pseudocode below. At line 14, the user-supplied 32-bit integer is casted to a 64-bit value. Then, the "if" statement at line 16 returns an error if the casted (signed) value is greater than the number of methods available in the global _sRoutines array; obviously, due to the signed comparison, any negative value for the method_index variable will pass this test. At line 20 method_index is used to access the _sRoutines array, and the retrieved callback is finally called at line 23.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
typedef struct {
  void (*function_pointer)();
  uint64 num_arguments;
} BluetoothMethod;

BluetoothMethod _sRoutines[] = {
  ...
};

uint64 _sRoutineCount = sizeof(_sRoutines)/sizeof(BluetoothMethod);

IOReturn IOBluetoothHCIUserClient::SimpleDispatchWL(IOBluetoothHCIDispatchParams *params) {
  // Here "user_param" is a signed 32-bit integer parameter
  int64 method_index = (int64) user_param;

  if (method_index >= _sRoutineCount) {
    return kIOReturnUnsupported;
  }

  BluetoothMethod method = _sRoutines[method_index];
  ...
  if (method.num_arguments < 8) {
       method.function_pointer(...);
  }
  ...  
}

Exploitation details

Exploitation of this vulnerability is just a matter of supplying the proper negative integer value in order to make IOBluetoothFamily index the global _sRoutines structure out of its bounds, and to fetch an attacker-controlled structure. The supplied value must be negative to index outside the _sRoutines structure while still satisfying the check at line 16.

As a foreword, consider that for our "proof-of-concept" we disabled both SMEP/SMAP and KASLR, so some additional voodoo tricks are required to get a fully weaponized exploit. Thus, our approach was actually very simple: we computed a value for the user-supplied parameter that allowed us to index a BluetoothMethod structure such that BluetoothMethod.function_ptr is a valid user-space address (where we placed our shellcode), while BluetoothMethod.num_arguments is an integer value less than 8 (to satisfy the check performed by SimpleDispatchWL() at line 22).

As shown in the C code fragment above, the user-supplied 32-bit value (user_param) is first casted to a 64-bit signed value, and then used as an index in _sRoutines. Each entry of the global _sRoutines array is 16-byte wide (two 8-byte values). These operations are implemented by the following assembly code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
; r12+70h points to the user-supplied index value
mov     ecx, [r12+70h]
mov     r13d, kIOReturnUnsupported
lea     rdx, _sRoutineCount
cmp     ecx, [rdx]
jge     fail
; Go on and fetch _sRoutine[method_index]
...
movsxd  rax, ecx             ; Sign extension to 64-bit value
shl     rax, 4               ; method_index *= sizeof(BluetoothMethod)
lea     rdx, _sRoutines
mov     esi, [rdx+rax+8]     ; esi = _sRoutines[method_index].num_arguments
cmp     esi, 7               ; Check method.num_arguments < 8
ja      loc_289BA
...

At a higher-level, the address of the BluetoothMethod structure fetched when processing an index value "user_param" is computed by the following formula:
struct_addr = (ext(user_param & 0xffffffff) * 16) + _sRoutine
Where ext() is the sign-extension operation (implemented by the movsxd instruction in the assembly code snipped above).

By solving this formula for user_param and searching inside the kernel address space, we found several candidate addresses that matched our criteria (i.e., a valid user-space pointer followed by an integer value < 8). The rest of the exploit is just a matter of mmap()'ing the shellcode at the proper user-space address, connecting to the IOBluetoothHCIController service and invoking the vulnerable method.

The source code for a (very rough) proof-of-concept implementation of the aforementioned exploit is available here, while the following figure shows the exploit "in action".

Execution of our "proof-of-concept" exploit

Patching

We verified the security issue both on OS X Mavericks 10.9.4 and 10.9.5 (MD5 hash values for the IOBluetoothFamily KEXT bundle on these two OS versions are 2a55b7dac51e3b546455113505b25e75 and b7411f9d80bfeab47f3eaff3c36e128f, respectively). After the release of OS X Yosemite (10.10), we noticed the vulnerability has been silently patched by Apple, with no mention about it in the security change log.

A side-by-side comparison between versions 10.9.x and 10.10 of IOBluetoothFamily confirms Apple has patched the device driver by rejecting negative values for the user-supplied index. In the figure below, the user-supplied index value is compared against _sRoutineCount (orange basic block). Yosemite adds an additional check to ensure the (signed) index value is non-negative (green basic block, on the right).

Comparison of the vulnerable OS X driver (Mavericks, on the left) and patched version (Yosemite, on the right)

Conclusions

We contacted Apple on October 20th, 2014, asking for their intention to back-port the security fix to OS X Mavericks. Unfortunately, we got no reply, so we decided to publicly disclose the details of this vulnerability: Yosemite has now been released since a while and is available for free for Apple customers; thus, we don’t think the public disclosure of this bug could endanger end-users.

Update (31/10/2014)

Yesterday evening, few hours after the publication of our blog post, we received a reply from Apple Product Security. They confirmed the bug has been fixed in Yosemite, and they are still evaluating whether the issue should be addressed in the previous OS versions as well.

Wednesday, March 26, 2014

Introducing QTrace, a "zero knowledge" system call tracer

It has been a while since my last post on this blog (one year!), but I have been quite busy both with my work and personal matters, including my wedding :-). Today, I would like to break the silence to introduce QTrace, a "zero knowledge" system call tracer I have just released open source.

TL;DR: QTrace is yet-another (Windows) system call tracer. Its peculiarity is that it requires no information about the structure of system call arguments: the tracer can infer their format automagically, by observing kernel memory access patterns. QTrace also includes a taint-tracking module to identify data dependencies between system calls.

Please, not another syscall tracer!

Some time ago I was writing an IOCTL fuzzer to test a Windows device driver. The fuzzer itself was pretty standard stuff: intercept the IOCTLs issued by the driver, fuzz the input buffer and monitor what happens. Unfortunately, blindly fuzzing IOCTL input buffers, without any idea of their format is usually ineffective, especially when facing structured arguments, and can reveal only shallow bugs. The typical solution is to undertake a boring reverse engineering session to determine the structure of the input data the kernel driver is expecting, and adapt the fuzzer accordingly. A somehow similar problem arose when I was developing WUSSTrace, a user-space syscall tracer for Windows. That time we relied on some preprocessor macros to recursively serialize system call arguments with as little code as possible. This approach was great to save some coding, but we still had to provide the tracer with the prototypes for all Windows system calls, together with the definitions for all the data types used by their arguments. Even if some of them are well-documented, others aren't, especially when moving to GUI (i.e., win32k.sys) syscalls.

But wouldn't it be nice if it was possible to automatically infer the format of system calls input arguments, in a (almost) OS-independent manner? This is exactly what QTrace tries to do!

QTrace in a nutshell

QTrace is a "zero knowledge" system call tracer. Its main characteristic is that system call arguments are dumped without the need to instruct the tracer about their structure. The only information explicitly provided to QTrace are the names of the system calls, but this is required just for "pretty printing" purposes (saying "NtOpenFile" is way better than just "0xb3"). This also makes QTrace almost OS-independent, at least when moving to different Windows versions.

The basic idea behind QTrace is that arguments structures can be determined by observing kernel memory access patterns: during the execution of a system call, if the kernel accesses a user-space memory address, then this location must belong to a system call argument, with only few exceptions (credits for this nice idea go to Lorenzo). Similarly, user-space data pointers can be recognized by observing the kernel accessing a location whose address has also been previously read from user-space.

Practically speaking, QTrace is implemented on top of QEMU and includes two main modules: the system call tracer itself and a dynamic taint-tracking engine. The former implements the "zero knowledge" syscall tracing technique just described, while the latter is used to track data dependencies (use/def) between system call arguments (more about this later). Traced system calls, together with taint information, are serialized to a protobuf stream and can be parsed off-line. QTrace also includes some basic Python post-processing tools to parse and display syscall traces in a human-readable form; recorded system calls can be even re-executed.

For the rest of this post we are assuming we are dealing with a 32-bit Windows system, but the same approach can be adapted to 64-bit environments as well. 

Example: Syscall tracing

Imagine the Windows kernel is reading a UNICODE_STRING syscall argument located at address 0x0205e2ac. The definition for this structure is provided below; inline comments specify field offsets and sample values.
typedef struct _UNICODE_STRING {
  USHORT Length;          // Offset 0, value: 0x0042
  USHORT MaximumLength;   // Offset 2, value: 0x021a
  PWSTR  Buffer;          // Offset 4, value: 0x02bd11a0
} UNICODE_STRING, *PUNICODE_STRING;
To access the Unicode data buffer (field "Buffer") the kernel will first access user-space address 0x0205e2b0 (corresponding to 0x0205e2ac+4) and read a pointer from there (value 0x02bd11a0). Then, kernel will start accessing 0x42 bytes starting at address 0x02bd11a0. By observing this memory access pattern, we can reconstruct the overall layout of the UNICODE_STRING parameter.

To give an idea about the information that can be inferred through this approach, figure below shows an excerpt of a sample QTrace trace file from a Windows 7 system. In this case, process explorer.exe executed a NtOpenFile() system call, passing six arguments. As an example, the third argument (at 0x0205e258) is a pointer to a structure located at address 0x0205e274; this structure includes a data pointer (at offset 8) to 0x0205e2ac, which in turn has a data pointer (at offset 4) pointing to the Unicode string "\??\C:\Windows\System32\pnpui.dll".

Sample QTrace log file for a NtOpenFile system call (Windows 7)
We can assess the correctness of this information by checking Microsoft documentation for the prototype of this system call:
NTSTATUS ZwOpenFile(
  _Out_  PHANDLE FileHandle,
  _In_   ACCESS_MASK DesiredAccess,
  _In_   POBJECT_ATTRIBUTES ObjectAttributes,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG ShareAccess,
  _In_   ULONG OpenOptions
);
Third parameter of NtOpenFile() is actually a pointer to a OBJECT_ATTRIBUTES structure, which contains a UNICODE_STRING (at offset 8), that in turn contains a wide-character buffer (at offset 4) that provides the name of the file to be opened. Except for parameter names (which of course cannot be guessed), inferred arguments structure matches the official documentation. Obviously things are slightly more complicated than this, as there are some corner-cases that must be considered, but this should give a rough idea of the approach.

Example: Taint-tracking

Briefly, in our context system call B depends on system call A if one of the output arguments of A is eventually used as an input argument for B. The reason why we could be interested in recording this kind of information is that syscall B cannot be re-executed alone, as it probably leverages some resources created by system call A. Details about the dynamic taint-tracking module will be presented in a future blog post.

A very simple example of a data dependency between two system calls is provided by the next figure. Here NtQuerySymbolicLinkObject() operates on the handle of a symbolic link object; this handle value is provided through its first argument. As can be seen from the "taint" column for this system call, the HANDLE argument is tainted with label 257 (the one highlighted in red): this taint label identifies the system call that opened this handle, in this case NtOpenSymbolicLinkObject().

Data dependency between NtOpenSymbolicLinkObject() and NtQuerySymbolicLinkObject() system calls; the former defines a handle value (0x00000010) that is then used by the latter

Conclusions

This post just sketched out the basic idea behind QTrace and its architecture. QTrace is now available open source, but do not expect it to be bug-free ;-) In some future blog posts I will discuss QTrace internals more in detail, describing the syscall tracer, the taint-tracking engine and providing some use cases.