Hashing, MACs, and Attacks – Study Notes

Topics covered: Hashing, vulnerability to MiTM, keyed hashing (MAC), replay attacks, and secure protocols (HTTPS/TLS, IPsec) with a two-phase pattern.

Hashing basics

Q1: What is hashing?
- A hashing is a function that takes a number of data bytes and generates a fixed-size hash value (digest).
- Example sizes:
- ext{SHA-1}()
  ightarrow 20 ext{ bytes } \ (160 ext{ bits})
- ext{SHA-256}()
  ightarrow 32 ext{ bytes } \ (256 ext{ bits})
Q2: Is hashing encryption?
- No. Hashing is one-way hashing (non-reversible). You cannot reliably recover the original data from the hash.
- Encryption flow (conceptual):
- Key1 (data bytes) -- $→$ ciphertext (enciphered data bytes)
- Decryption: ciphertext -- $→$ original data bytes (assuming: (a) ciphertext unchanged; (b) key2 is correct; (c) the decryption algorithm is correct)
Symmetric vs Asymmetric cryptography (Q2):
- Symmetric cryptography: key1 = key2 (same key for encryption and decryption).
- Asymmetric cryptography: key1 and key2 are inverse keys (public/private pair).
Q2a: How would we benefit from asymmetric crypto?
- In a keypair, each participant has two keys: public key (shared) and private key (secret).
- Public keys are known to others; private keys are known only to the owner.
- Scenario: John’s keypair = (pubJ, privJ); Mary’s keypair = (pubM, privM).
- Public key encryption (confidentiality):
- Q2a.1: If Mary wants to send a secret message to John, she encrypts with John’s public key to produce ciphertext, and sends it to John. John decrypts with his private key to recover the message.
 - Ciphertext = $E{ ext{pub}J}( ext{message})$
 - Plaintext = $D{ ext{priv}J}( ext{ciphertext})$
- Q2a.2: Would this provide confidentiality? Yes, because John’s private key is known only to John, so only John can decrypt.
Digital signatures (concept introduced in Page 1):
- A signer uses her/his private key to encrypt data to produce a digital signature.
- Receiver uses signer’s public key to verify the signature.
- Digital signatures provide:
- Data integrity
- Origin integrity (authenticity of the signer)
- Non-repudiation (signer cannot deny signing)

Hashing for data integrity and vulnerabilities

Q3: Main use cases of hashing
- Primary use: data integrity verification.
- Scenario:
- User A saves a file f and stores its hash hash1 = hf(f).
- User B retrieves the file, computes hash2 = hf(retrieved_f).
- If hash2 == hash1, data is trusted; if not, data may be corrupted.
- Underlying assumption: both users A and B use the same hash function hf.
Q4: Vulnerabilities of hashing
- A Man-in-the-Middle (MiTM) attack can target keyless hashing.
Q5: How MiTM attacks against keyless hashing work
- Attack idea: attacker changes stored data (dataNew) and generates a new hashNew = hf(dataNew).
- The attacker swaps out the original data and hash with dataNew and hashNew.
- When a user retrieves the file and verifies the hash, they may accept dataNew as authentic.
Q5a: Prerequisites for the MiTM attack to work
- The attacker must have access to both the stored file and the stored hash.
Practical note: to simulate MiTM against keyless hashing, one could write a program that:
- Reads the stored file and its hash, modifies the file bytes, computes a new hash for the modified data, and overwrites both the file and hash with the new values.
Q6: Keyed hashing (MAC) – how it works
- A shared secret key K is used by both the data creator and the verifier.
- Saving process (file saver):
- Data bytes -- $ext{MAC}_K( ext{data})$ → MAC code (the MAC)
- Store both the data bytes and the MAC into the system.
- Retrieval process (file retriever):
- Retrieve data bytes and MAC, and obtain the shared key K.
- Compute $ext{MAC}_K( ext{data})$ → MAC code 2
- If MAC code 2 equals the stored MAC, data is trusted; otherwise, do not trust.
- Notation: $ext{MAC}_K(m)$ where m is the message/data.
Q7: How keyed hashing mitigates MiTM against keyless hashing
- If the attacker does not possess the shared key K, they cannot produce a valid MAC for the modified data, so tampering is detectable.
Q8: Replay attacks
- Definition: attacker copies a message and replays it to the original destination (one or more times).
- Consequence: potential Denial-of-Service (DoS) due to flooding with repeated messages.
Q8a: Defenses against replay attacks (anti-replay)
- Make each message unique in the protocol.
- Techniques: include a timestamp, a sequence number, or a nonce (a unique random number) in each message.
Q9: Are keyless hashing replay-prone?
- Yes; an attacker can replay a copied data+hash pair to the receiver.
Q10: Are keyed hashing replay-prone?
- Yes; a replay attack can be launched against any communication protocol, though MACs can help detect tampering, the replay of a previously valid message can still occur.
Real-world context: Secure network protocols commonly employ anti-replay protections within secure communications (e.g., HTTPS over TLS, IPsec).
HTTPS/TLS overview and IPsec (two-phase pattern)
- TLS/HTTPS is used for secure HTTP traffic; IPsec is a secure protocol for the Internet Protocol layer.
- Two-phase pattern:
- Phase one: Handshake
  - Authentication: the client/server authenticate each other (e.g., via server certificate).
  - Key exchange: agreement on a shared session key (symmetric key) for efficient encryption/decryption.
  - Note: Symmetric keys are more economical than public-key operations for ongoing data exchange.
- Phase two: Secure sessions
  - The session key established in Phase one is used to generate a MAC for messages to verify integrity.
  - The receiver uses the MAC to verify message integrity; if verification passes, the message is trusted.
- Rationale: Public-key cryptography (asymmetric) is computationally expensive; symmetric cryptography is faster for bulk data.
Final note: In-class assignment on encryption/decryption
- Encourages practice with constructing and verifying encryption/decryption workflows using the concepts above.
Summary of key ideas
- Hashing vs encryption: one-way, fixed-length digests; encryption is reversible with correct keys.
- Asymmetric crypto enables confidentiality and digital signatures; public keys enable others to encrypt and verify signatures, while private keys enable decryption and signing.
- Hashing provides data integrity; MACs (keyed hashing) add authenticity and integrity with a shared secret key, mitigating certain tampering attacks.
- Replay attacks exploit repeated transmission; anti-replay measures (timestamps, nonces, sequence numbers) help deter them.
- TLS/HTTPS and IPsec use a two-phase approach: handshake to authenticate and derive a session key, then secure sessions with MACs and symmetric encryption.
Important numerical/formula references from the notes
- Hash output sizes:
- $20 ext{ bytes}$ for $ext{SHA-1}$ , $160 ext{ bits}$
- $32 ext{ bytes}$ for $ext{SHA-256}$ , $256 ext{ bits}$
- MAC notation: $ext{MAC}_K(m)$
- Phase two session key: $K_s$ (shared symmetric session key)
- Example verification: if $ext{MAC}K( ext{data}) = ext{MAC}K( ext{data})'$ then data is trusted; otherwise it is rejected.
Practical implications and connections
- Real-world protocols rely on the combination of hashing/MAC and public-key operations to provide confidentiality, integrity, and authenticity.
- Anti-replay is a fundamental defense in secure communications, ensuring that old messages cannot be replayed to produce undesired effects.
- The choice between MACs and public-key signatures depends on the scenario: MACs for efficiency with shared keys; signatures for non-repudiation without shared secrets.