Monday, January 12, 2015

Time to fill OS X (Blue)tooth: Local privilege escalation vulnerabilities in Yosemite

(This post is a joint work with @joystick, see also his blog here)

Motivated by our previous findings, we performed some more tests on service IOBluetoothHCIController of the latest version of Mac OS X (Yosemite 10.10.1), and we found five additional security issues. The issues have been reported to Apple Security and, since the deadline we agreed upon with them expired, we now disclose details & PoCs for four of them (the last one was notified few days later and is still under investigation by Apple). All the issues are in class IOBluetoothHCIController, implemented in the IOBluetoothFamily kext (md5 e4123caff1b90b81d52d43f9c47fec8f).

Issue 1 (crash-issue1.c)

Many callback routines handled by IOBluetoothHCIController blindly dereference pointer arguments without checking them. The caller of these callbacks, IOBluetoothHCIUserClient::SimpleDispatchWL(), may actually pass NULL pointers, that are eventually dereferenced.

More precisely, every user-space argument handled by SimpleDispatchWL() consists of a value and a size field (see crash-issue1.c for details). When a user-space client provides an argument with a NULL value but a large size, a subsequent call to IOMalloc(size) fails, returning a NULL pointer that is eventually passed to callees, causing the NULL pointer dereference.

The PoC we provide targets method DispatchHCICreateConnection(), but the very same approach can be used to cause a kernel crash using other callback routines (basically any other callback that receives one or more pointer arguments). At first, we ruled out this issue as a mere local DoS. However, as discussed here, Yosemite only partially prevents mapping the NULL page from user-space, so it is still possible to exploit NULL pointer dereferences to mount LPE attacks. For instance, the following code can be used to map page zero:

Mac:tmp $ cat zeropage.c
#include <mach/mach.h>
#include <mach/mach_vm.h>
#include <mach/vm_map.h>
#include <stdio.h>

int main(int argc, char **argv)
{
  mach_vm_address_t addr = 0;
  vm_deallocate(mach_task_self(), 0x0, 0x1000);
  int r = mach_vm_allocate(mach_task_self(), &addr, 0x1000, 0);
  printf("%08llx %d\n", addr, r);
  *((uint32_t *)addr) = 0x41414141;
  printf("%08x\n", *((uint32_t *)addr));
}
Mac:tmp $ llvm-gcc -Wall -o zeropage{,.c} -Wl,-pagezero_size,0 -m32
Mac:tmp $ ./zeropage
00000000 0
41414141
Mac:tmp $

Trying the same without the -m32 flag results in the 64-bit Mach-O being blocked at load time by the OS with message "Cannot enforce a hard page-zero for ./zeropage" (unless you do it as "root", but then what’s the point?).

Issue 2 (crash-issue2.c)

As shown in the screenshot below, IOBluetoothHCIController::BluetoothHCIChangeLocalName() is affected by an "old-school" stack-based buffer overflow, due to a bcopy(src, dest, strlen(src)) call where src is fully controlled by the attacker. To the best of our knowledge, this bug cannot be directly exploited due to the existing stack canary protection. However, it may still be useful to mount a LPE attack if used in conjunction with a memory leak vulnerability, leveraged to disclose the canary value.

Issue 2, a plain stack-based buffer overflow

Issue 3 (crash-issue3.c)

IOBluetoothHCIController::TransferACLPacketToHW() receives as an input parameter a pointer to an IOMemoryDescriptor object. The function carefully checks that the supplied pointer is non-NULL; however, regardless of the outcome of this test, it then dereferences the pointer (see the figure below, the attacker-controlled input parameter is stored in register r15). The IOMemoryDescriptor object is created by the caller (DispatchHCISendRawACLData()) using the IOMemoryDescriptor::withAddress() constructor. As this constructor is provided with a user-controlled value, it may fail and return a NULL pointer. See Issue 1 discussion regarding the exploitability of NULL pointer dereferences on Yosemite.

Issue 3, the module checks if r15 is NULL, but dereferences it anyway

Issue 4 (lpe-issue1.c)

In this case, the problem is due to a missing sanity check on the arguments of the following function:

IOReturn BluetoothHCIWriteStoredLinkKey(
                           uint32_t req_index, 
                           uint32_t num_of_keys, 
                           BluetoothDeviceAddress *p_device_addresses, 
                           BluetoothKey *p_bluetooth_keys, 
                           BluetoothHCINumLinkKeysToWrite *outNumKeysWritten);

The first parameter, req_index, is used to find an HCI Request in the queue of allocated HCI Requests (thus this exploit requires first to fill this queue with possibly fake requests). The second integer parameter (num_of_keys) is used to calculate the total size of the other inputs, respectively pointed by p_device_addresses and p_bluetooth_keys. As shown in the screenshot below, these values are not checked before being passed to function IOBluetoothHCIController::SendHCIRequestFormatted(), which has the following prototype:

IOReturn SendHCIRequestFormatted(uint32_t req_index, uint16_t inOpCode,
                                 uint64_t outResultsSize,
                                 void *outResultBuf,
                                 const char *inFormat, ...);


Issue 4, an exploitable buffer overflow (click to enlarge)

The passed format string "HbbNN" will eventually cause size_of_addresses bytes to be copied from p_device_addresses to outResultBuf in reverse order (the 'N' format consumes two arguments, the first is a size, the second a pointer to read from). If the calculated size_of_addresses is big enough (i.e., if we provide a big enough num_of_keys parameter), the copy overflows outResultBuf, thrashing everything above it, including a number of function pointers in the vtable of a HCI request object. These pointers are overwritten with attacker-controlled data (i.e., those pointed by p_bluetooth_keys) and called before returning to userspace, thus we can divert the execution wherever we want.

As a PoC, lpe-issue1.c exploits this bug and attempts to call a function located at the attacker-controller address 0x4141414142424242. Please note that the attached PoC requires some more tuning before it can cleanly return to user-space, since more than one vtable pointer is corrupted during the overflow and needs to be fixed with valid pointers.

Execution of our "proof-of-concept" exploit (Issue 4)

Notes

All the PoCs we provide in this post are not "weaponized", i.e., they do not contain a real payload, nor they attempt to bypass existing security features of Yosemite (e.g., kASLR and SMEP). If you’re interested in bypass techniques (as you probably are, if you made it here), Ian Beer of Google Project Zero covered pretty much all of it in a very thorough blog post. In this case, he used a leak in the IOKit registry to calculate kslide and defeat kASLR, while he used an in-kernel ROP-chain to bypass SMEP. More recently, @i0n1c posted here about how kASLR is fundamentally broken on Mac OS X at the moment.

Conclusions

Along the last issue identified, we shared with Apple our conclusions on this kext: according to the issues we identified, we speculate there are many other crashes and LPE vulnerabilities in it. Ours, however, is just a best-effort analysis done in our spare time, and given the very small effort that took us to identify the vulnerabilities, we would suggest a serious security evaluation of the whole kext code.

Disclosure timeline

  • 02/11: Notification of issues 1, 2 and 3.
  • 23/11: No answer received from Apple, notification of issue 4. As no answer was received since the first contact, propose December 2 as possible disclosure date.
  • 25/11: Apple answers requesting more time. We propose to move the disclosure date to January 12.
  • 27/11: Apple accepts the new deadline.
  • 05/12: Contact Apple asking for the status of the vulnerabilities.
  • 06/12: Apple says they're still "investigating the issue".
  • 23/12: Notification of a new issue (#5), proposing January 23 as a tentative disclosure date.
  • 06/01: Apple asks for more time for issue #5. We propose to move the disclosure date to February 23. We remind our intention to disclose the 4 previous issues on January 12.
  • 12/01: No answer from Apple, disclosing first 4 issues.

Thursday, October 30, 2014

Mac OS X local privilege escalation (IOBluetoothFamily)

(This post is a joint work with @joystick, see also his blog here)

Nowadays, exploitation of user-level vulnerabilities is becoming more and more difficult, because of the widespread diffusion of several protection methods, including ASLR, NX, various heap protections, stack canaries, and sandboxed execution. As a natural consequence, instead of extricating themselves with such a plethora of defensive methods, attackers prefer to take the “easy” way and started to move at the kernel-level, where sophisticated protection techniques are still not very common (indeed, things like as KASLR and SMEP are implemented only in the latest versions of the most popular OSes). This trend is also confirmed by the rising number of kernel-level vulnerabilities reported in the last few months in Windows, Linux, and OS X.

Following this trend, we recently looked at few OS X drivers (“KEXT”s) and found a integer signedness bug affecting service IOBluetoothHCIController (implemented by the IOBluetoothFamily KEXT). This vulnerability can be exploited by a local attacker to gain root privileges. The issue is present on the latest versions of OS X Mavericks (tested on 10.9.4 and 10.9.5), but has been “silently” patched by Apple in OS X Yosemite.

Vulnerability overview

In a nutshell, the bug lies in the IOBluetoothHCIUserClient::SimpleDispatchWL() function. The function eventually takes a user-supplied 32-bit signed integer value and uses it to index a global array of structures containing a function pointer. The chosen function pointer is finally called. As the reader can easily imagine, SimpleDispatchWL() fails at properly sanitizing the user-supplied index, thus bad things may happen if a malicious user is able to control the chosen function pointer.

More in detail, the vulnerable part of the function is summarized in the pseudocode below. At line 14, the user-supplied 32-bit integer is casted to a 64-bit value. Then, the "if" statement at line 16 returns an error if the casted (signed) value is greater than the number of methods available in the global _sRoutines array; obviously, due to the signed comparison, any negative value for the method_index variable will pass this test. At line 20 method_index is used to access the _sRoutines array, and the retrieved callback is finally called at line 23.


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
typedef struct {
  void (*function_pointer)();
  uint64 num_arguments;
} BluetoothMethod;

BluetoothMethod _sRoutines[] = {
  ...
};

uint64 _sRoutineCount = sizeof(_sRoutines)/sizeof(BluetoothMethod);

IOReturn IOBluetoothHCIUserClient::SimpleDispatchWL(IOBluetoothHCIDispatchParams *params) {
  // Here "user_param" is a signed 32-bit integer parameter
  int64 method_index = (int64) user_param;

  if (method_index >= _sRoutineCount) {
    return kIOReturnUnsupported;
  }

  BluetoothMethod method = _sRoutines[method_index];
  ...
  if (method.num_arguments < 8) {
       method.function_pointer(...);
  }
  ...  
}

Exploitation details

Exploitation of this vulnerability is just a matter of supplying the proper negative integer value in order to make IOBluetoothFamily index the global _sRoutines structure out of its bounds, and to fetch an attacker-controlled structure. The supplied value must be negative to index outside the _sRoutines structure while still satisfying the check at line 16.

As a foreword, consider that for our "proof-of-concept" we disabled both SMEP/SMAP and KASLR, so some additional voodoo tricks are required to get a fully weaponized exploit. Thus, our approach was actually very simple: we computed a value for the user-supplied parameter that allowed us to index a BluetoothMethod structure such that BluetoothMethod.function_ptr is a valid user-space address (where we placed our shellcode), while BluetoothMethod.num_arguments is an integer value less than 8 (to satisfy the check performed by SimpleDispatchWL() at line 22).

As shown in the C code fragment above, the user-supplied 32-bit value (user_param) is first casted to a 64-bit signed value, and then used as an index in _sRoutines. Each entry of the global _sRoutines array is 16-byte wide (two 8-byte values). These operations are implemented by the following assembly code:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
; r12+70h points to the user-supplied index value
mov     ecx, [r12+70h]
mov     r13d, kIOReturnUnsupported
lea     rdx, _sRoutineCount
cmp     ecx, [rdx]
jge     fail
; Go on and fetch _sRoutine[method_index]
...
movsxd  rax, ecx             ; Sign extension to 64-bit value
shl     rax, 4               ; method_index *= sizeof(BluetoothMethod)
lea     rdx, _sRoutines
mov     esi, [rdx+rax+8]     ; esi = _sRoutines[method_index].num_arguments
cmp     esi, 7               ; Check method.num_arguments < 8
ja      loc_289BA
...

At a higher-level, the address of the BluetoothMethod structure fetched when processing an index value "user_param" is computed by the following formula:
struct_addr = (ext(user_param & 0xffffffff) * 16) + _sRoutine
Where ext() is the sign-extension operation (implemented by the movsxd instruction in the assembly code snipped above).

By solving this formula for user_param and searching inside the kernel address space, we found several candidate addresses that matched our criteria (i.e., a valid user-space pointer followed by an integer value < 8). The rest of the exploit is just a matter of mmap()'ing the shellcode at the proper user-space address, connecting to the IOBluetoothHCIController service and invoking the vulnerable method.

The source code for a (very rough) proof-of-concept implementation of the aforementioned exploit is available here, while the following figure shows the exploit "in action".

Execution of our "proof-of-concept" exploit

Patching

We verified the security issue both on OS X Mavericks 10.9.4 and 10.9.5 (MD5 hash values for the IOBluetoothFamily KEXT bundle on these two OS versions are 2a55b7dac51e3b546455113505b25e75 and b7411f9d80bfeab47f3eaff3c36e128f, respectively). After the release of OS X Yosemite (10.10), we noticed the vulnerability has been silently patched by Apple, with no mention about it in the security change log.

A side-by-side comparison between versions 10.9.x and 10.10 of IOBluetoothFamily confirms Apple has patched the device driver by rejecting negative values for the user-supplied index. In the figure below, the user-supplied index value is compared against _sRoutineCount (orange basic block). Yosemite adds an additional check to ensure the (signed) index value is non-negative (green basic block, on the right).

Comparison of the vulnerable OS X driver (Mavericks, on the left) and patched version (Yosemite, on the right)

Conclusions

We contacted Apple on October 20th, 2014, asking for their intention to back-port the security fix to OS X Mavericks. Unfortunately, we got no reply, so we decided to publicly disclose the details of this vulnerability: Yosemite has now been released since a while and is available for free for Apple customers; thus, we don’t think the public disclosure of this bug could endanger end-users.

Update (31/10/2014)

Yesterday evening, few hours after the publication of our blog post, we received a reply from Apple Product Security. They confirmed the bug has been fixed in Yosemite, and they are still evaluating whether the issue should be addressed in the previous OS versions as well.

Wednesday, March 26, 2014

Introducing QTrace, a "zero knowledge" system call tracer

It has been a while since my last post on this blog (one year!), but I have been quite busy both with my work and personal matters, including my wedding :-). Today, I would like to break the silence to introduce QTrace, a "zero knowledge" system call tracer I have just released open source.

TL;DR: QTrace is yet-another (Windows) system call tracer. Its peculiarity is that it requires no information about the structure of system call arguments: the tracer can infer their format automagically, by observing kernel memory access patterns. QTrace also includes a taint-tracking module to identify data dependencies between system calls.

Please, not another syscall tracer!

Some time ago I was writing an IOCTL fuzzer to test a Windows device driver. The fuzzer itself was pretty standard stuff: intercept the IOCTLs issued by the driver, fuzz the input buffer and monitor what happens. Unfortunately, blindly fuzzing IOCTL input buffers, without any idea of their format is usually ineffective, especially when facing structured arguments, and can reveal only shallow bugs. The typical solution is to undertake a boring reverse engineering session to determine the structure of the input data the kernel driver is expecting, and adapt the fuzzer accordingly. A somehow similar problem arose when I was developing WUSSTrace, a user-space syscall tracer for Windows. That time we relied on some preprocessor macros to recursively serialize system call arguments with as little code as possible. This approach was great to save some coding, but we still had to provide the tracer with the prototypes for all Windows system calls, together with the definitions for all the data types used by their arguments. Even if some of them are well-documented, others aren't, especially when moving to GUI (i.e., win32k.sys) syscalls.

But wouldn't it be nice if it was possible to automatically infer the format of system calls input arguments, in a (almost) OS-independent manner? This is exactly what QTrace tries to do!

QTrace in a nutshell

QTrace is a "zero knowledge" system call tracer. Its main characteristic is that system call arguments are dumped without the need to instruct the tracer about their structure. The only information explicitly provided to QTrace are the names of the system calls, but this is required just for "pretty printing" purposes (saying "NtOpenFile" is way better than just "0xb3"). This also makes QTrace almost OS-independent, at least when moving to different Windows versions.

The basic idea behind QTrace is that arguments structures can be determined by observing kernel memory access patterns: during the execution of a system call, if the kernel accesses a user-space memory address, then this location must belong to a system call argument, with only few exceptions (credits for this nice idea go to Lorenzo). Similarly, user-space data pointers can be recognized by observing the kernel accessing a location whose address has also been previously read from user-space.

Practically speaking, QTrace is implemented on top of QEMU and includes two main modules: the system call tracer itself and a dynamic taint-tracking engine. The former implements the "zero knowledge" syscall tracing technique just described, while the latter is used to track data dependencies (use/def) between system call arguments (more about this later). Traced system calls, together with taint information, are serialized to a protobuf stream and can be parsed off-line. QTrace also includes some basic Python post-processing tools to parse and display syscall traces in a human-readable form; recorded system calls can be even re-executed.

For the rest of this post we are assuming we are dealing with a 32-bit Windows system, but the same approach can be adapted to 64-bit environments as well. 

Example: Syscall tracing

Imagine the Windows kernel is reading a UNICODE_STRING syscall argument located at address 0x0205e2ac. The definition for this structure is provided below; inline comments specify field offsets and sample values.
typedef struct _UNICODE_STRING {
  USHORT Length;          // Offset 0, value: 0x0042
  USHORT MaximumLength;   // Offset 2, value: 0x021a
  PWSTR  Buffer;          // Offset 4, value: 0x02bd11a0
} UNICODE_STRING, *PUNICODE_STRING;
To access the Unicode data buffer (field "Buffer") the kernel will first access user-space address 0x0205e2b0 (corresponding to 0x0205e2ac+4) and read a pointer from there (value 0x02bd11a0). Then, kernel will start accessing 0x42 bytes starting at address 0x02bd11a0. By observing this memory access pattern, we can reconstruct the overall layout of the UNICODE_STRING parameter.

To give an idea about the information that can be inferred through this approach, figure below shows an excerpt of a sample QTrace trace file from a Windows 7 system. In this case, process explorer.exe executed a NtOpenFile() system call, passing six arguments. As an example, the third argument (at 0x0205e258) is a pointer to a structure located at address 0x0205e274; this structure includes a data pointer (at offset 8) to 0x0205e2ac, which in turn has a data pointer (at offset 4) pointing to the Unicode string "\??\C:\Windows\System32\pnpui.dll".

Sample QTrace log file for a NtOpenFile system call (Windows 7)
We can assess the correctness of this information by checking Microsoft documentation for the prototype of this system call:
NTSTATUS ZwOpenFile(
  _Out_  PHANDLE FileHandle,
  _In_   ACCESS_MASK DesiredAccess,
  _In_   POBJECT_ATTRIBUTES ObjectAttributes,
  _Out_  PIO_STATUS_BLOCK IoStatusBlock,
  _In_   ULONG ShareAccess,
  _In_   ULONG OpenOptions
);
Third parameter of NtOpenFile() is actually a pointer to a OBJECT_ATTRIBUTES structure, which contains a UNICODE_STRING (at offset 8), that in turn contains a wide-character buffer (at offset 4) that provides the name of the file to be opened. Except for parameter names (which of course cannot be guessed), inferred arguments structure matches the official documentation. Obviously things are slightly more complicated than this, as there are some corner-cases that must be considered, but this should give a rough idea of the approach.

Example: Taint-tracking

Briefly, in our context system call B depends on system call A if one of the output arguments of A is eventually used as an input argument for B. The reason why we could be interested in recording this kind of information is that syscall B cannot be re-executed alone, as it probably leverages some resources created by system call A. Details about the dynamic taint-tracking module will be presented in a future blog post.

A very simple example of a data dependency between two system calls is provided by the next figure. Here NtQuerySymbolicLinkObject() operates on the handle of a symbolic link object; this handle value is provided through its first argument. As can be seen from the "taint" column for this system call, the HANDLE argument is tainted with label 257 (the one highlighted in red): this taint label identifies the system call that opened this handle, in this case NtOpenSymbolicLinkObject().

Data dependency between NtOpenSymbolicLinkObject() and NtQuerySymbolicLinkObject() system calls; the former defines a handle value (0x00000010) that is then used by the latter

Conclusions

This post just sketched out the basic idea behind QTrace and its architecture. QTrace is now available open source, but do not expect it to be bug-free ;-) In some future blog posts I will discuss QTrace internals more in detail, describing the syscall tracer, the taint-tracking engine and providing some use cases.

Tuesday, March 19, 2013

Owning Samsung phones for fun (...but with no profit :-))

I was planning to open a blog since some months, but I decided to do it only now, to summarize some of the findings of a quick look I gave at a couple of Samsung Android devices.

But let's start at the beginning. During last Christmas holidays I finally had some free time to try to better understand the inner workings of some Samsung devices, focusing on the manufacturer's customizations to the Android system. I confess I was quite surprised to see how many Samsung applications are included in the original firmware image, including several customizations to lots of Android core packages.