CMPSC 311 – Bit/Byte Operations & Data Representation
Number Systems
- Any base-X positional system shares two core ideas:
- A finite set of digits 0,1,…,X−1.
- Place values that grow as successive powers of the base: X0,X1,X2,…
- Base-10 (decimal)
- Digits: 0–9.
- Example decomposition: 123=3×100+2×101+1×102=123.
- Base-2 (binary)
- Digits: 0, 1.
- Example: 1011<em>2=1×20+1×21+0×22+1×23=11</em>10.
- Base-16 (hexadecimal)
- Digits: 0–9 and A–F (A=10…F=15).
- Example: 0xAFC=C×160+F×161+A×162=281210.
- Practical value: compactly expresses binary (exactly four bits per hex digit).
Converting Decimal to Binary (Division Algorithm)
- Repeatedly divide the decimal number n by 2, recording remainders (rightmost bit first):
- while n=0:
- output nmod2 (this is the next binary digit).
- set n←⌊n/2⌋.
- Worked example (decimal 235):
| n | remainder | resulting bit position |
|------|-----------|------------------------|
| 235 | 1 | 0 |
| 117 | 1 | 1 |
| 58 | 0 | 2 |
| 29 | 1 | 3 |
| 14 | 0 | 4 |
| 7 | 1 | 5 |
| 3 | 1 | 6 |
| 1 | 1 | 7 | - Reads upward to obtain 235<em>10=11101011</em>2.
- Significance: identical in spirit to converting to any base; division by the base peels off least-significant digits.
Converting Decimal ↔ Hex and Binary ↔ Hex
- Decimal → Hexadecimal
- Use analogous division algorithm but divide by 16.
- Example: 23510
- 235÷16=14 remainder 11(B) → least-significant hex digit.
- 14÷16=0 remainder 14(E) → next digit.
- Result: 23510=0xEB.
- Binary ↔ Hexadecimal
- Group binary into nibbles (4-bit chunks) from the right, then translate each nibble to its hex equivalent.
- Example: 11101011<em>2=11101011</em>2=0xE0xB=0xEB.
- Practical memory reading skill: quickly recognize that an 8-digit hex value (e.g., 0xDEADBEEF) represents 8 nibbles = 4 bytes.
Conversion Summary Cheat-Sheet
- Binary → Decimal: add ∑bi2i.
- Hex → Decimal: add ∑hi16i.
- Decimal → Binary: repeated division by 2.
- Decimal → Hex: repeated division by 16 or via decimal→binary then binary→hex.
- Binary ↔ Hex: chunk into 4-bit nibbles.
Byte Ordering (Endianness)
- The question: which byte of a multi-byte word is stored at the lowest memory address?
- Big-Endian (e.g., SPARC): most-significant byte gets the smallest (lowest) address.
- Little-Endian (e.g., x86): least-significant byte gets the smallest address.
- Example: integer x=30541989610=0x12345678 located at address 0x100.
- Big-Endian layout (addresses increase → right):
| 0x100 | 0x101 | 0x102 | 0x103 |
|-----------|-----------|-----------|-----------|
| 0x12 | 0x34 | 0x56 | 0x78 | - Little-Endian:
| 0x100 | 0x101 | 0x102 | 0x103 |
|-----------|-----------|-----------|-----------|
| 0x78 | 0x56 | 0x34 | 0x12 |
- Practical implications
- Network protocols standardize on big-endian ("network order"), thus host/byte-order conversions (htons, ntohl, etc.) are essential.
- Misinterpreting endianness leads to subtle data corruption and security flaws.
Inspecting Byte Layout Programmatically
- Sample C program (architecture detection):
#include <stdio.h>
#include <stdint.h>
void show_bytes(uint8_t *start, int len) {
for (int i = 0; i < len; ++i)
printf("%p\t0x%.2x\n", start + i, start[i]);
printf("\n");
}
int main(void) {
int a = 305419896; // 0x12345678
printf("a lives at address %p\n\n", &a);
show_bytes((uint8_t *)&a, sizeof(int));
}
- Example output on Linux/x86 (little-endian):
- Address 0x…74 contains 0x78, then 0x56, 0x34, 0x12.
- Conceptual lesson: by casting to uint8t∗ you can treat *any* C object as a raw byte array.
Boolean Algebra Refresher
- Originated by George Boole; formalizes logical reasoning via algebra over 0,1.
- Fundamental operators (with truth criteria):
- AND (A&B): 1 only when both operands 1.
- OR (A∣B): 1 when at least one operand 1.
- NOT (∼A): bit-wise complement, flips 0↔1.
- XOR (A^B): 1 when exactly one operand is 1 (exclusive).
- Digital circuits, database query planning, compiler optimizations all rely on these identities.
Bit-Level Operators in C
- Operators: &, ∣, ∼, ^.
- Defined for any integral type (signed or unsigned; char → long).
- Operate bit-wise on each corresponding bit position.
- Character-size examples:
- Complement: ∼0x41→0xBE (flips every bit in 01000001<em>2 → 10111110</em>2).
- Bit-masking: 0x69&0x55→0x41 (isolates bits where both are 1).
- Bit-setting: 0x69∣0x55→0x7D (sets bits where either operand has 1).
- Common programming applications:
- Permissions/flags, embedded device registers, compression, cryptography, graphics.
Logical vs Bitwise Operators
- Logical (&&, ||, !) interpret whole values as truthiness:
- 0 ≙ false, any non-zero ≙ true.
- Results are always exactly 0 or 1.
- Short-circuit evaluation may bypass RHS evaluation.
- Bitwise operators inspect every bit; no short-circuiting; result preserves width.
- Pitfall: confusing && with & in conditions leads to unintended all-bits evaluation and slower code.
Using Bit Vectors to Represent Sets
- A w-bit word encodes a subset of 0,…,w−1 where bit j set to 1 means element j ∈ set.
- Set operations map cleanly to bitwise operators:
- Intersection: A&B (only bits set in both).
- Union: A∣B (bits set in either).
- Symmetric difference: A^B (exclusive membership).
- Complement (within universe of width w): ∼A.
- Visualization example (8-bit universe):
- A=01010101<em>2=0,2,4,6, B=01101001</em>2=0,3,5,6.
- A&B=01000001<em>2=0,6, A∣B=01111101</em>2=0,2,3,4,5,6, etc.
- Practical relevance: CPU instruction sets implement these directly; used in bloom filters, database bitmaps, scheduler affinity masks.
Shift Operations
- Syntax: x<<k (left shift), x>>k (right shift).
- Effect: moves bits k positions, discards bits shifted out, injects zeros (for unsigned; arithmetic right shift may sign-extend).
- Pictorial example (left shift by 3):
- Before: ABCDEFGH (1 0 1 0 1 1 0 0).
- After: ABCD discarded; EFGH000 produced (0s fill vacated spots).
- Multiply/divide trick: for unsigned integers, x<<k=x×2k and x>>k = \lfloor x/2^{k} \rfloor (fast power-of-two arithmetic).
- Security caution: shifting signed data is undefined for negative numbers in C, leading to portability bugs.
Packing Multiple Byte-Sized Values into a 32-bit Word
- Goal: store four independent 8-bit values (a,b,c,d) into one uint32t so that
- a occupies bits 7–0 (least-significant byte).
- b occupies bits 15–8, c bits 23–16, d bits 31–24.
- Step-by-step (mask-then-shift):
uint32_t pack_bytes(uint32_t a, uint32_t b, uint32_t c, uint32_t d) {
uint32_t tempa = a & 0xff; // ensure only low 8 bits
uint32_t tempb = (b & 0xff) << 8; // shift into 2nd byte
uint32_t tempc = (c & 0xff) << 16; // shift into 3rd byte
uint32_t tempd = (d & 0xff) << 24; // shift into 4th byte (MSB)
return tempa | tempb | tempc | tempd; // combine via OR
}
- Diagnostic printout (input 0x111,0x222,0x333,0x444):
- A:0x00000011
- B:0x00002200
- C:0x00330000
- D:0x44000000
- Combined: 0x44332211.
- Applications: network packet headers, graphics RGBA pixels, SIMD vector loading.
- Broader concept: serialization—converting structured data into a raw byte stream; essential for file I/O, IPC, and cryptographic hashing.
Ethical & Practical Reflections
- Low-level data representation mastery underpins systems security. Misaligned assumptions (e.g., endianness, signed shifts) can create exploitable vulnerabilities (buffer overflows, integer wrapping).
- Efficient bit fiddling enables energy-aware embedded programming and high-performance data-plane processing but complicates code readability. Strive for clear abstractions and extensive unit tests.
- Adherence to standards (network byte order, two’s-complement arithmetic) fosters portability and interoperability across disparate hardware.