Hashing and Data Integrity Notes (CSCI 3341)
Q1: What is hashing?
Hashing is a function (e.g., SHA2(), MD5(), SHA256(), …) that takes a set of bytes and returns a fixed number of bytes as the hash value.
returns 20 bytes as the hash.
and return 16 bytes.
returns 256 bits (or 32 bytes).
Q2: What is ‘data integrity’?
Data Integrity is the validation of the correctness of data.
Q3: How is hashing used to provide data integrity?
Step 1: The writer hashes the data to produce a hash value hv1.
In notation: where $H$ is the hashing function.
Step 2: The writer saves both the data and hv1 on the disk.
Step 3: The accessor retrieves the data and the hash value hv1 from the disk.
Step 4: The accessor hashes the retrieved data to produce hv2.
Step 5: If hv1 == hv2, the accessor assumes that the retrieved data is correct; otherwise, abort.
Note: The strength of a hashing function depends on the size of the hash value. For example, is not as strong as .
Example comparison using an online tool: https://www.tools4noobs.com/online_tools/hash/
Source data Sha256 hashes Md4 hashes
How are you doing? ec3db42c7f1e7ed8b05e55506e18a5cd06216b09 7b029da89105e3983ddd2689 f7fde94d44f655ae56b4 0c7a1430f756
What are you doing? 5abff00c28a3f309d13738779a878490f0152be63e95d03a143d97c364b77fa4 7800b65d8345d9a6ac7 5235014b4cf32
Q4: Is hashing subject to ‘man-in-the-middle attacks’? Explain how.
Q4a: What is a man-in-the-middle attack?
A man-in-the-middle (MITM) attack is an adversary that sits between two communicating parties and can eavesdrop, modify, or inject messages, potentially undermining confidentiality and integrity.
Q4b: How would a man-in-the-middle attack be launched against keyless hashing?
In a keyless hashing scenario (where there is no secret key to authenticate the hash), a MITM attacker can:
Intercept the original data and its hash hv1 as sent by the writer.
Modify the data to tampered data and recompute a new hash hv1' for the tampered data.
Replace hv1 with hv1' (or replace both data and hv1 at the recipient) so that, when the recipient recomputes hv2 on the tampered data, hv2 matches hv1'.
Consequently, the recipient is deceived into accepting the tampered data as valid without detecting tampering.
Practical takeaway: Without authentication mechanisms (e.g., HMAC with a secret key or digital signatures), hashing alone does not prevent MITM– attackers can alter both data and its hash.
Recommended defense: Use authenticated hashing or signatures, such as HMACs with a shared secret or digital signatures, to provide integrity and authenticity beyond plain hashing.