CMPSC 311 – Bit/Byte Operations & Data Representation

Number Systems

  • Any base-XX positional system shares two core ideas:
    • A finite set of digits 0,1,,X1{0,1,\dots ,X-1}.
    • Place values that grow as successive powers of the base: X0,X1,X2,X^{0},X^{1},X^{2},\dots
  • Base-1010 (decimal)
    • Digits: 0099.
    • Example decomposition: 123=3×100+2×101+1×102=123123=3\times10^{0}+2\times10^{1}+1\times10^{2}=123.
  • Base-22 (binary)
    • Digits: 00, 11.
    • Example: 1011<em>2=1×20+1×21+0×22+1×23=11</em>101011<em>{2}=1\times2^{0}+1\times2^{1}+0\times2^{2}+1\times2^{3}=11</em>{10}.
  • Base-1616 (hexadecimal)
    • Digits: 0099 and AAFF (A=10F=15A=10\dots F=15).
    • Example: 0xAFC=C×160+F×161+A×162=2812100xAFC=C\times16^{0}+F\times16^{1}+A\times16^{2}=2812_{10}.
    • Practical value: compactly expresses binary (exactly four bits per hex digit).

Converting Decimal to Binary (Division Algorithm)

  • Repeatedly divide the decimal number nn by 22, recording remainders (rightmost bit first):
    • Pseudocode:
    1. while n0n\neq0:
      • output nmod2n\bmod2 (this is the next binary digit).
      • set nn/2n\leftarrow\lfloor n/2 \rfloor.
    • Worked example (decimal 235235):
      | nn | remainder | resulting bit position |
      |------|-----------|------------------------|
      | 235235 | 11 | 00 |
      | 117117 | 11 | 11 |
      | 5858 | 00 | 22 |
      | 2929 | 11 | 33 |
      | 1414 | 00 | 44 |
      | 77 | 11 | 55 |
      | 33 | 11 | 66 |
      | 11 | 11 | 77 |
    • Reads upward to obtain 235<em>10=11101011</em>2235<em>{10}=11101011</em>{2}.
  • Significance: identical in spirit to converting to any base; division by the base peels off least-significant digits.

Converting Decimal ↔ Hex and Binary ↔ Hex

  • Decimal \rightarrow Hexadecimal
    • Use analogous division algorithm but divide by 1616.
    • Example: 23510235_{10}
    • 235÷16=14235\div16=14 remainder 11(B)11\,(B) → least-significant hex digit.
    • 14÷16=014\div16=0 remainder 14(E)14\,(E) → next digit.
    • Result: 23510=0xEB235_{10}=0xEB.
  • Binary \leftrightarrow Hexadecimal
    • Group binary into nibbles (4-bit chunks) from the right, then translate each nibble to its hex equivalent.
    • Example: 11101011<em>2=1110  1011</em>2=0xE0xB=0xEB11101011<em>2 = 1110\;1011</em>2 = 0xE\,0xB = 0xEB.
  • Practical memory reading skill: quickly recognize that an 8-digit hex value (e.g., 0xDEADBEEF0xDEADBEEF) represents 88 nibbles = 44 bytes.

Conversion Summary Cheat-Sheet

  • Binary \rightarrow Decimal: add bi2i\sum b_i2^{i}.
  • Hex \rightarrow Decimal: add hi16i\sum h_i16^{i}.
  • Decimal \rightarrow Binary: repeated division by 22.
  • Decimal \rightarrow Hex: repeated division by 1616 or via decimal→binary then binary→hex.
  • Binary \leftrightarrow Hex: chunk into 4-bit nibbles.

Byte Ordering (Endianness)

  • The question: which byte of a multi-byte word is stored at the lowest memory address?
  • Big-Endian (e.g., SPARC): most-significant byte gets the smallest (lowest) address.
  • Little-Endian (e.g., x86): least-significant byte gets the smallest address.
  • Example: integer x=30541989610=0x12345678x=305419896_{10}=0x12345678 located at address 0x1000x100.
    • Big-Endian layout (addresses increase → right):
      | 0x1000x100 | 0x1010x101 | 0x1020x102 | 0x1030x103 |
      |-----------|-----------|-----------|-----------|
      | 0x120x12 | 0x340x34 | 0x560x56 | 0x780x78 |
    • Little-Endian:
      | 0x1000x100 | 0x1010x101 | 0x1020x102 | 0x1030x103 |
      |-----------|-----------|-----------|-----------|
      | 0x780x78 | 0x560x56 | 0x340x34 | 0x120x12 |
  • Practical implications
    • Network protocols standardize on big-endian ("network order"), thus host/byte-order conversions (htons, ntohl, etc.) are essential.
    • Misinterpreting endianness leads to subtle data corruption and security flaws.

Inspecting Byte Layout Programmatically

  • Sample C program (architecture detection):
#include <stdio.h>
#include <stdint.h>
void show_bytes(uint8_t *start, int len) {
  for (int i = 0; i < len; ++i)
    printf("%p\t0x%.2x\n", start + i, start[i]);
  printf("\n");
}
int main(void) {
  int a = 305419896;          // 0x12345678
  printf("a lives at address %p\n\n", &a);
  show_bytes((uint8_t *)&a, sizeof(int));
}
  • Example output on Linux/x86 (little-endian):
    • Address 0x740x…74 contains 0x780x78, then 0x560x56, 0x340x34, 0x120x12.
  • Conceptual lesson: by casting to uint8tuint8_t * you can treat *any* C object as a raw byte array.

Boolean Algebra Refresher

  • Originated by George Boole; formalizes logical reasoning via algebra over 0,1{0,1}.
  • Fundamental operators (with truth criteria):
    • AND (A&amp;BA\&amp;B): 11 only when both operands 11.
    • OR (ABA|B): 11 when at least one operand 11.
    • NOT (A\sim A): bit-wise complement, flips 010\leftrightarrow1.
    • XOR (A^BA\hat{}B): 11 when exactly one operand is 11 (exclusive).
  • Digital circuits, database query planning, compiler optimizations all rely on these identities.

Bit-Level Operators in C

  • Operators: &amp;\&amp;, |, \sim, ^\hat{}.
  • Defined for any integral type (signed or unsigned; char → long).
  • Operate bit-wise on each corresponding bit position.
  • Character-size examples:
    • Complement: 0x410xBE\sim0x41 \rightarrow 0xBE (flips every bit in 01000001<em>201000001<em>210111110</em>210111110</em>2).
    • Bit-masking: 0x69&amp;0x550x410x69\&amp;0x55\rightarrow0x41 (isolates bits where both are 11).
    • Bit-setting: 0x690x550x7D0x69|0x55\rightarrow0x7D (sets bits where either operand has 11).
  • Common programming applications:
    • Permissions/flags, embedded device registers, compression, cryptography, graphics.

Logical vs Bitwise Operators

  • Logical (&&, ||, !) interpret whole values as truthiness:
    • 00false, any non-zero ≙ true.
    • Results are always exactly 00 or 11.
    • Short-circuit evaluation may bypass RHS evaluation.
  • Bitwise operators inspect every bit; no short-circuiting; result preserves width.
  • Pitfall: confusing && with & in conditions leads to unintended all-bits evaluation and slower code.

Using Bit Vectors to Represent Sets

  • A ww-bit word encodes a subset of 0,,w1{0,\dots ,w-1} where bit jj set to 11 means element jj ∈ set.
  • Set operations map cleanly to bitwise operators:
    • Intersection: A&amp;BA\&amp;B (only bits set in both).
    • Union: ABA|B (bits set in either).
    • Symmetric difference: A^BA\hat{}B (exclusive membership).
    • Complement (within universe of width ww): A\sim A.
  • Visualization example (8-bit universe):
    • A=01010101<em>2=0,2,4,6A=01010101<em>2={0,2,4,6}, B=01101001</em>2=0,3,5,6B=01101001</em>2={0,3,5,6}.
    • A&amp;B=01000001<em>2=0,6A\&amp;B=01000001<em>2={0,6}, AB=01111101</em>2=0,2,3,4,5,6A|B=01111101</em>2={0,2,3,4,5,6}, etc.
  • Practical relevance: CPU instruction sets implement these directly; used in bloom filters, database bitmaps, scheduler affinity masks.

Shift Operations

  • Syntax: x<<kx<<k (left shift), x>>k (right shift).
  • Effect: moves bits kk positions, discards bits shifted out, injects zeros (for unsigned; arithmetic right shift may sign-extend).
  • Pictorial example (left shift by 33):
    • Before: ABCDEFGHABCDEFGH (11 00 11 00 11 11 00 00).
    • After: ABCDABCD discarded; EFGH000EFGH000 produced (00s fill vacated spots).
  • Multiply/divide trick: for unsigned integers, x<<k=x×2kx<<k = x\times2^{k} and x>>k = \lfloor x/2^{k} \rfloor (fast power-of-two arithmetic).
  • Security caution: shifting signed data is undefined for negative numbers in C, leading to portability bugs.

Packing Multiple Byte-Sized Values into a 32-bit Word

  • Goal: store four independent 88-bit values (a,b,c,d)(a,b,c,d) into one uint32tuint32_t so that
    • aa occupies bits 7700 (least-significant byte).
    • bb occupies bits 151588, cc bits 23231616, dd bits 31312424.
  • Step-by-step (mask-then-shift):
uint32_t pack_bytes(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
  uint32_t tempa = a & 0xff;           // ensure only low 8 bits
  uint32_t tempb = (b & 0xff) << 8;   // shift into 2nd byte
  uint32_t tempc = (c & 0xff) << 16;  // shift into 3rd byte
  uint32_t tempd = (d & 0xff) << 24;  // shift into 4th byte (MSB)
  return tempa | tempb | tempc | tempd; // combine via OR
}
  • Diagnostic printout (input 0x111,0x222,0x333,0x4440x111,0x222,0x333,0x444):
    • A:0x00000011A:0x00000011
    • B:0x00002200B:0x00002200
    • C:0x00330000C:0x00330000
    • D:0x44000000D:0x44000000
    • Combined: 0x443322110x44332211.
  • Applications: network packet headers, graphics RGBA pixels, SIMD vector loading.
  • Broader concept: serialization—converting structured data into a raw byte stream; essential for file I/O, IPC, and cryptographic hashing.

Ethical & Practical Reflections

  • Low-level data representation mastery underpins systems security. Misaligned assumptions (e.g., endianness, signed shifts) can create exploitable vulnerabilities (buffer overflows, integer wrapping).
  • Efficient bit fiddling enables energy-aware embedded programming and high-performance data-plane processing but complicates code readability. Strive for clear abstractions and extensive unit tests.
  • Adherence to standards (network byte order, two’s-complement arithmetic) fosters portability and interoperability across disparate hardware.