CMPSC 311 – Bit/Byte Operations & Data Representation

Number Systems

Any base- $X$ positional system shares two core ideas:
- A finite set of digits ${0,1,\dots ,X-1}$ .
- Place values that grow as successive powers of the base: $X^{0},X^{1},X^{2},\dots$
Base- $10$ (decimal)
- Digits: $0$ – $9$ .
- Example decomposition: $123=3\times10^{0}+2\times10^{1}+1\times10^{2}=123$ .
Base- $2$ (binary)
- Digits: $0$ , $1$ .
- Example: $1011{2}=1\times2^{0}+1\times2^{1}+0\times2^{2}+1\times2^{3}=11{10}$ .
Base- $16$ (hexadecimal)
- Digits: $0$ – $9$ and $A$ – $F$ ( $A=10\dots F=15$ ).
- Example: $0xAFC=C\times16^{0}+F\times16^{1}+A\times16^{2}=2812_{10}$ .
- Practical value: compactly expresses binary (exactly four bits per hex digit).

Converting Decimal to Binary (Division Algorithm)

Repeatedly divide the decimal number $n$ by $2$ , recording remainders (rightmost bit first):
- Pseudocode:
1. while $n\neq0$ :
 - output $n\bmod2$ (this is the next binary digit).
 - set $n\leftarrow\lfloor n/2 \rfloor$ .
- Worked example (decimal $235$ ):
 | $n$ | remainder | resulting bit position |
 |------|-----------|------------------------|
 | $235$ | $1$ | $0$ |
 | $117$ | $1$ | $1$ |
 | $58$ | $0$ | $2$ |
 | $29$ | $1$ | $3$ |
 | $14$ | $0$ | $4$ |
 | $7$ | $1$ | $5$ |
 | $3$ | $1$ | $6$ |
 | $1$ | $1$ | $7$ |
- Reads upward to obtain $235{10}=11101011{2}$ .
Significance: identical in spirit to converting to any base; division by the base peels off least-significant digits.

Converting Decimal ↔ Hex and Binary ↔ Hex

Decimal $\rightarrow$ Hexadecimal
- Use analogous division algorithm but divide by $16$ .
- Example: $235_{10}$
- $235\div16=14$ remainder $11\,(B)$ → least-significant hex digit.
- $14\div16=0$ remainder $14\,(E)$ → next digit.
- Result: $235_{10}=0xEB$ .
Binary $\leftrightarrow$ Hexadecimal
- Group binary into nibbles (4-bit chunks) from the right, then translate each nibble to its hex equivalent.
- Example: $111010112 = 1110\;10112 = 0xE\,0xB = 0xEB$ .
Practical memory reading skill: quickly recognize that an 8-digit hex value (e.g., $0xDEADBEEF$ ) represents $8$ nibbles = $4$ bytes.

Conversion Summary Cheat-Sheet

Binary $\rightarrow$ Decimal: add $\sum b_i2^{i}$ .
Hex $\rightarrow$ Decimal: add $\sum h_i16^{i}$ .
Decimal $\rightarrow$ Binary: repeated division by $2$ .
Decimal $\rightarrow$ Hex: repeated division by $16$ or via decimal→binary then binary→hex.
Binary $\leftrightarrow$ Hex: chunk into 4-bit nibbles.

Byte Ordering (Endianness)

The question: which byte of a multi-byte word is stored at the lowest memory address?
Big-Endian (e.g., SPARC): most-significant byte gets the smallest (lowest) address.
Little-Endian (e.g., x86): least-significant byte gets the smallest address.
Example: integer $x=305419896_{10}=0x12345678$ located at address $0x100$ .
- Big-Endian layout (addresses increase → right):
  | $0x100$ | $0x101$ | $0x102$ | $0x103$ |
  |-----------|-----------|-----------|-----------|
  | $0x12$ | $0x34$ | $0x56$ | $0x78$ |
- Little-Endian:
  | $0x100$ | $0x101$ | $0x102$ | $0x103$ |
  |-----------|-----------|-----------|-----------|
  | $0x78$ | $0x56$ | $0x34$ | $0x12$ |
Practical implications
- Network protocols standardize on big-endian ("network order"), thus host/byte-order conversions (htons, ntohl, etc.) are essential.
- Misinterpreting endianness leads to subtle data corruption and security flaws.

Inspecting Byte Layout Programmatically

Sample C program (architecture detection):

#include <stdio.h>
#include <stdint.h>
void show_bytes(uint8_t *start, int len) {
  for (int i = 0; i < len; ++i)
    printf("%p\t0x%.2x\n", start + i, start[i]);
  printf("\n");
}
int main(void) {
  int a = 305419896;          // 0x12345678
  printf("a lives at address %p\n\n", &a);
  show_bytes((uint8_t *)&a, sizeof(int));
}

Example output on Linux/x86 (little-endian):
- Address $0x…74$ contains $0x78$ , then $0x56$ , $0x34$ , $0x12$ .
Conceptual lesson: by casting to $uint8_t *$ you can treat *any* C object as a raw byte array.

Boolean Algebra Refresher

Originated by George Boole; formalizes logical reasoning via algebra over ${0,1}$ .
Fundamental operators (with truth criteria):
- AND ( $A\&B$ ): $1$ only when both operands $1$ .
- OR ( $A|B$ ): $1$ when at least one operand $1$ .
- NOT ( $\sim A$ ): bit-wise complement, flips $0\leftrightarrow1$ .
- XOR ( $A\hat{}B$ ): $1$ when exactly one operand is $1$ (exclusive).
Digital circuits, database query planning, compiler optimizations all rely on these identities.

Bit-Level Operators in C

Operators: $\&$ , $|$ , $\sim$ , $\hat{}$ .
Defined for any integral type (signed or unsigned; char → long).
Operate bit-wise on each corresponding bit position.
Character-size examples:
- Complement: $\sim0x41 \rightarrow 0xBE$ (flips every bit in $010000012$ → $101111102$ ).
- Bit-masking: $0x69\&0x55\rightarrow0x41$ (isolates bits where both are $1$ ).
- Bit-setting: $0x69|0x55\rightarrow0x7D$ (sets bits where either operand has $1$ ).
Common programming applications:
- Permissions/flags, embedded device registers, compression, cryptography, graphics.

Logical vs Bitwise Operators

Logical (&&, ||, !) interpret whole values as truthiness:
- $0$ ≙ false, any non-zero ≙ true.
- Results are always exactly $0$ or $1$ .
- Short-circuit evaluation may bypass RHS evaluation.
Bitwise operators inspect every bit; no short-circuiting; result preserves width.
Pitfall: confusing && with & in conditions leads to unintended all-bits evaluation and slower code.

Using Bit Vectors to Represent Sets

A $w$ -bit word encodes a subset of ${0,\dots ,w-1}$ where bit $j$ set to $1$ means element $j$ ∈ set.
Set operations map cleanly to bitwise operators:
- Intersection: $A\&B$ (only bits set in both).
- Union: $A|B$ (bits set in either).
- Symmetric difference: $A\hat{}B$ (exclusive membership).
- Complement (within universe of width $w$ ): $\sim A$ .
Visualization example (8-bit universe):
- $A=010101012={0,2,4,6}$ , $B=011010012={0,3,5,6}$ .
- $A\&B=010000012={0,6}$ , $A|B=011111012={0,2,3,4,5,6}$ , etc.
Practical relevance: CPU instruction sets implement these directly; used in bloom filters, database bitmaps, scheduler affinity masks.

Shift Operations

Syntax: $x<<k$ (left shift), x>>k (right shift).
Effect: moves bits $k$ positions, discards bits shifted out, injects zeros (for unsigned; arithmetic right shift may sign-extend).
Pictorial example (left shift by $3$ ):
- Before: $ABCDEFGH$ ( $1$ $0$ $1$ $0$ $1$ $1$ $0$ $0$ ).
- After: $ABCD$ discarded; $EFGH000$ produced ( $0$ s fill vacated spots).
Multiply/divide trick: for unsigned integers, $x<<k = x\times2^{k}$ and x>>k = \lfloor x/2^{k} \rfloor (fast power-of-two arithmetic).
Security caution: shifting signed data is undefined for negative numbers in C, leading to portability bugs.

Packing Multiple Byte-Sized Values into a 32-bit Word

Goal: store four independent $8$ -bit values $(a,b,c,d)$ into one $uint32_t$ so that
- $a$ occupies bits $7$ – $0$ (least-significant byte).
- $b$ occupies bits $15$ – $8$ , $c$ bits $23$ – $16$ , $d$ bits $31$ – $24$ .
Step-by-step (mask-then-shift):

uint32_t pack_bytes(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
  uint32_t tempa = a & 0xff;           // ensure only low 8 bits
  uint32_t tempb = (b & 0xff) << 8;   // shift into 2nd byte
  uint32_t tempc = (c & 0xff) << 16;  // shift into 3rd byte
  uint32_t tempd = (d & 0xff) << 24;  // shift into 4th byte (MSB)
  return tempa | tempb | tempc | tempd; // combine via OR
}

Diagnostic printout (input $0x111,0x222,0x333,0x444$ ):
- $A:0x00000011$
- $B:0x00002200$
- $C:0x00330000$
- $D:0x44000000$
- Combined: $0x44332211$ .
Applications: network packet headers, graphics RGBA pixels, SIMD vector loading.
Broader concept: serialization—converting structured data into a raw byte stream; essential for file I/O, IPC, and cryptographic hashing.

Ethical & Practical Reflections

Low-level data representation mastery underpins systems security. Misaligned assumptions (e.g., endianness, signed shifts) can create exploitable vulnerabilities (buffer overflows, integer wrapping).
Efficient bit fiddling enables energy-aware embedded programming and high-performance data-plane processing but complicates code readability. Strive for clear abstractions and extensive unit tests.
Adherence to standards (network byte order, two’s-complement arithmetic) fosters portability and interoperability across disparate hardware.