reversing
C: Functions and Core Concepts
- C is a foundational programming language used across system programming and beyond.
- Functions as code blocks: take inputs (parameters) and return outputs.
- Example provided:
int doubleNum(int a) {
// Take a number and multiply it by 2
int x = a * 2;
return x;
}
- Key parts of a function:
- Function block: carries out a logical function (e.g., multiply by 2).
- Name: e.g.,
doubleNum. - Parameters/Inputs: e.g.,
int ameans the function takes one integer input. - Return type: the first
intinint doubleNum(int a)indicates an integer output. - Local variables:
xstores intermediate results; herex = 2 * a.
- Conceptual flow:
- High-level code is translated by a compiler into a lower-level representation, eventually turning into machine code.
- The typical path is: High-level language (C) -> compiler -> assembly language -> machine code (1’s and 0’s).
- Returning values and commenting:
returnstatement signals the output of the function (e.g.,return x;).- Comments (denoted by
//) explain logic and are ignored by the compiler.
- Datatypes and variables:
- Datatype describes the kind of data a variable holds.
- Parameters and local variables store inputs and intermediate results.
- Example highlights:
int a(parameter),int x(local variable).
- Practical note: the same function concepts apply across many languages; basics like parameters, return values, and comments are universal.
- Concept: Abstraction vs. performance trade-off
- Higher-level language features abstract away memory management and details.
- Understanding low-level steps (like compilation and assembly) helps optimize performance and debugging.
- Formulas and symbolic representations:
- Multiplication example in the function can be summarized as inside the function.
C: Compiling and Running a C Program with GCC
- Compiler role:
- GCC (GNU C Compiler) translates C code into executable machine code through stages such as compiling, assembling, and linking.
- Basic syntax to compile:
$ gcc -Wall filename.c -o filename
filename.cis the input C source file.-Wallenables a broad set of warning messages.-ospecifies the name of the output executable. The transcript states that the next token is the name of the executable object, e.g.filename.o(note: in practice,-onames the final executable, commonlyfilename).- Running the program:
$ ./filename
- The
./prefix runs the executable in the current directory. - The program can take inputs and produce outputs as defined by the code.
- Key concepts explained:
- The compiler chain: C code → assembly language → object code → executable.
- Understanding compiler flags can influence warnings, optimizations, and output naming.
Assembly: Core Concepts and Syntax
- Why assembly?
- Assembly provides control with less abstraction, potentially enabling faster code and more direct hardware interaction.
- What assembly looks like (basic structure):
- Components:
- label (optional): a programmer-chosen name for a statement.
- name: the operation (e.g., ADD, MOV).
- operands: optional list of arguments to the operation.
- comment: denoted by
;to explain intent.
- Registers:
- CPUs provide a small set of fast storage locations called registers.
- Conceptually, there are 32 registers described as 32-bit storage locations to hold data quickly.
- Rationale: main memory access is slow; registers speed up data access and manipulation.
Assembly: A Simple Addition Example and Instructions
- Common instructions in x86-64 (short overview):
MOV Dest, Sourceperforms a move from Source to Dest (e.g., register-to-register or register-to-immediate).ADD Dest, Sourceadds Source to Dest and stores the result in Dest.
- Example that adds two numbers (described in the transcript):
; Example: add two numbers
MOV RAX, 5 ; load 5 into RAX
MOV RBX, 4 ; load 4 into RBX
ADD RAX, RBX ; RAX = RAX + RBX = 9
- In the described text, the destination was
raxandrbxwere used for the operands; the result ends up inrax.- Note on labels and control flow:
- The transcript mentions that the example did not use labels.
- Labels and a
JMP(jump) statement enable loops, functions, and recursion in assembly.
Endianness: Big Endian vs Little Endian
- Endianness defines how multi-byte data is stored in memory.
- Example word: a 16-bit value
0xFEEDstored starting at address0x4000. - Byte-level storage considering 2 bytes per 16-bit word:
- Big Endian (most significant byte first):
- Memory at
0x4000=FE - Memory at
0x4001=ED - Little Endian (least significant byte first):
- Memory at
0x4000=ED - Memory at
0x4001=FE
- Practical notes:
- Big Endian is commonly used in networking applications.
- Little Endian is commonly used by many processors.
- Key mathematical statements:
- Number of bytes in a 16-bit word:
- 1 Byte = 8 Bits:
- Word representation: the two-byte representation depends on endianness.
Python: Overview and Our Foundational Context
- Python is highlighted as a high-level language that is easy to learn and versatile for many tasks.
- The transcript references CMU’s introductory CS course using Python as a resource for basics in Python and computer science.
- Clarifying note about terminology:
- In the context of the transcript, a “word” is defined as a hex number; specifically, a 16-bit word is a hexadecimal value with 16 bits.
- Related fact: a 16-bit word corresponds to 4 hexadecimal digits (because each hex digit represents 4 bits): .
- Practical implication: Python’s high level of abstraction contrasts with C and Assembly, offering easier programming at the cost of lower low-level control.
Cross-cutting Connections and Relevance
- Foundational concepts that recur across languages:
- Functions (C) vs. procedures in assembly: input handling, intermediate storage, and return values.
- Compilation vs. interpretation: C requires compilation to machine code; Python is typically interpreted or JIT-compiled depending on implementation.
- Memory storage considerations: high-level languages abstract memory, while assembly requires explicit memory/register handling.
- Real-world relevance:
- Understanding compilation, linking, and the role of the GCC compiler is crucial for debugging and optimization.
- Endianness matters for network protocols, file formats, and hardware interoperability.
- Basic assembly knowledge helps with performance-critical sections and reverse engineering challenges.
Ethical, Philosophical, and Practical Implications
- The trade-off between abstraction and control:
- Higher-level languages improve productivity but may conceal inefficiencies; low-level languages offer control at the cost of complexity.
- Portability vs. optimization:
- Code optimized for a specific architecture (assembly) may not be portable to others.
- Security considerations:
- Understanding how memory is managed (as in C/assembly) helps prevent common issues like buffer overflows and memory corruption.
Notes on formulas and notation used in this notes:
- Function result relation:
- Data size and memory relationships:
- per register if interpreted as 32-bit storage in some contexts, with the transcript noting 32 registers to store of information in total due to architectural design.
- Word and hex-digits relationship:
- and the same value can be represented in hex as .
- Addressing examples use hexadecimal notation such as and memory addresses like .