Ivan Chew, Cybernetic Hand (1990)
The 32-bit compatibility mode in 64-bit Linux kernels (or in any kernel for that matter) is a little bit scary. Not just because it's an increased attack surface versus having purely 32-bit or purely 64-bit modes, but because of the type of input processing that has to be performed by any such compatibility layer. It invariably involves a significant amount of subtle bit wrangling between 32/64-bit values, using primitives that I'd argue most programmers aren't normally exposed to. The possibility of misuse and abuse is very real.

I found this out over the last couple of weeks when I discovered two distinctly exploitable local privilege escalation vulnerabilities in x86_64 kernels with compatibility mode enabled (which unfortunately is in every 64-bit kernel configuration I've come across).

I found the first vulnerability after Tavis Ormandy showed me a serious information leak bug he had found. One of the curiosities of his bug was a multiplication integer overflow in the argument to the access_ok method, which is used to check if a range of memory is safely within the bounds of userspace. Although this probably wouldn't lead to a security bug in most cases, my theory was that this was indicative of some fairly non-defensive programming habits, so I set about enumerating all the cases of this behavior that I could find.

This lead to an interesting case in the video4linux subsystem where I thought there was a potential heap overflow. I wrote up a test case that triggered a kernel oops in the compat ioctl. After a little bit of idle day-dreaming about the feasibility of a Chat Roulette worm, I did some analysis on the crash and discovered that it wasn't what I thought it was; in fact not even close.

I had actually triggered a stack pointer underflow - that is, my userspace stack pointer had been decremented into kernel space, and then the kernel was trying to write a value to the new "stack" location. The problem started in an allocation routine used by the compatibility layer, from "arch/x86/include/asm/compat.h":

static inline void __user *compat_alloc_user_space(long len)                    
        struct pt_regs *regs = task_pt_regs(current);                           
        return (void __user *)regs->sp - len;                                   
As you can see, no check is performed to ensure that the user space stack pointer doesn't underflow. If the compatibility layer uses the returned pointer without checking that it resides in user-space, kernel memory corruption can occur.

In the end I found two cases where this could happen (and I probably missed some others). The first was the video4linux ioctl. The second case was in the IP multicast getsockopt compat layer, which had very good properties for exploitation. From compat_mc_getsockopt in "net/compat.c":

kgf = compat_alloc_user_space(klen+sizeof(*optlen));                            
if (!access_ok(VERIFY_READ, gf32, __COMPAT_GF0_SIZE) ||                         
      __get_user(interface, &gf32->gf_interface) ||                             
      __get_user(fmode, &gf32->gf_fmode) ||                                     
      __get_user(numsrc, &gf32->gf_numsrc) ||                                   
      __put_user(interface, &kgf->gf_interface) ||                              
      __put_user(fmode, &kgf->gf_fmode) ||                                      
      __put_user(numsrc, &kgf->gf_numsrc) ||                                    
          return -EFAULT;                                                       
The value of klen is a non-negative 32-bit integer supplied from user space, meaning the kgf pointer can be wrapped around to point somewhere high in kernel space. The gf32 pointer is a valid user space address, and the contents of the structure are controlled. Note that the "nocheck" version of put_user is used (meaning no further access_ok is performed), so the fact that kgf points to kernel space doesn't matter; the controlled values will be written to the kgf structure.

This path allows an attacker to write a chosen value to anywhere within the top 31 bits of the kernel address space. In practice, this seems to be more than enough for exploitation. My proof of concept overwrote the interrupt descriptor table, but it's likely there are other good options too.

You can see the patch for CVE-2010-3081 here:


Part 2 covers a "rediscovered" vulnerability originally found by the late Wojciech "cliph" Purczynski.

- hawkes@inertiawar.com (@benhawkes)