Hashing and Data Integrity Notes (CSCI 3341)

Hashing is a function (e.g., SHA2(), MD5(), SHA256(), …) that takes a set of bytes and returns a fixed number of bytes as the hash value.
- $SHA2()$ returns 20 bytes as the hash.
- $MD4()$ and $MD5()$ return 16 bytes.
- $SHA256()$ returns 256 bits (or 32 bytes).

Step 1: The writer hashes the data to produce a hash value hv1.
- In notation: $hv_1 = H(data)$ where $H$ is the hashing function.
Step 2: The writer saves both the data and hv1 on the disk.
Step 3: The accessor retrieves the data and the hash value hv1 from the disk.
Step 4: The accessor hashes the retrieved data to produce hv2.
- $hv_2 = H(retrieved ext{-}data)$
Step 5: If hv1 == hv2, the accessor assumes that the retrieved data is correct; otherwise, abort.
Note: The strength of a hashing function depends on the size of the hash value. For example, $SHA2()$ is not as strong as $SHA256()$ .
Example comparison using an online tool: https://www.tools4noobs.com/online_tools/hash/
- Source data Sha256 hashes Md4 hashes
- How are you doing? ec3db42c7f1e7ed8b05e55506e18a5cd06216b09 7b029da89105e3983ddd2689 f7fde94d44f655ae56b4 0c7a1430f756
- What are you doing? 5abff00c28a3f309d13738779a878490f0152be63e95d03a143d97c364b77fa4 7800b65d8345d9a6ac7 5235014b4cf32
Q4: Is hashing subject to ‘man-in-the-middle attacks’? Explain how.

A man-in-the-middle (MITM) attack is an adversary that sits between two communicating parties and can eavesdrop, modify, or inject messages, potentially undermining confidentiality and integrity.

In a keyless hashing scenario (where there is no secret key to authenticate the hash), a MITM attacker can:
- Intercept the original data and its hash hv1 as sent by the writer.
- Modify the data to tampered data and recompute a new hash hv1' for the tampered data.
- Replace hv1 with hv1' (or replace both data and hv1 at the recipient) so that, when the recipient recomputes hv2 on the tampered data, hv2 matches hv1'.
- Consequently, the recipient is deceived into accepting the tampered data as valid without detecting tampering.
Practical takeaway: Without authentication mechanisms (e.g., HMAC with a secret key or digital signatures), hashing alone does not prevent MITM– attackers can alter both data and its hash.
Recommended defense: Use authenticated hashing or signatures, such as HMACs with a shared secret or digital signatures, to provide integrity and authenticity beyond plain hashing.