Hashing and Data Integrity Notes (CSCI 3341)

Q1: What is hashing?

  • Hashing is a function (e.g., SHA2(), MD5(), SHA256(), …) that takes a set of bytes and returns a fixed number of bytes as the hash value.

    • SHA2()SHA2() returns 20 bytes as the hash.

    • MD4()MD4() and MD5()MD5() return 16 bytes.

    • SHA256()SHA256() returns 256 bits (or 32 bytes).

Q2: What is ‘data integrity’?

  • Data Integrity is the validation of the correctness of data.

Q3: How is hashing used to provide data integrity?

  • Step 1: The writer hashes the data to produce a hash value hv1.

    • In notation: hv1=H(data)hv_1 = H(data) where $H$ is the hashing function.

  • Step 2: The writer saves both the data and hv1 on the disk.

  • Step 3: The accessor retrieves the data and the hash value hv1 from the disk.

  • Step 4: The accessor hashes the retrieved data to produce hv2.

    • hv2=H(retrievedextdata)hv_2 = H(retrieved ext{-}data)

  • Step 5: If hv1 == hv2, the accessor assumes that the retrieved data is correct; otherwise, abort.

  • Note: The strength of a hashing function depends on the size of the hash value. For example, SHA2()SHA2() is not as strong as SHA256()SHA256().

  • Example comparison using an online tool: https://www.tools4noobs.com/online_tools/hash/

    • Source data Sha256 hashes Md4 hashes

    • How are you doing? ec3db42c7f1e7ed8b05e55506e18a5cd06216b09 7b029da89105e3983ddd2689 f7fde94d44f655ae56b4 0c7a1430f756

    • What are you doing? 5abff00c28a3f309d13738779a878490f0152be63e95d03a143d97c364b77fa4 7800b65d8345d9a6ac7 5235014b4cf32

  • Q4: Is hashing subject to ‘man-in-the-middle attacks’? Explain how.

Q4a: What is a man-in-the-middle attack?

  • A man-in-the-middle (MITM) attack is an adversary that sits between two communicating parties and can eavesdrop, modify, or inject messages, potentially undermining confidentiality and integrity.

Q4b: How would a man-in-the-middle attack be launched against keyless hashing?

  • In a keyless hashing scenario (where there is no secret key to authenticate the hash), a MITM attacker can:

    • Intercept the original data and its hash hv1 as sent by the writer.

    • Modify the data to tampered data and recompute a new hash hv1' for the tampered data.

    • Replace hv1 with hv1' (or replace both data and hv1 at the recipient) so that, when the recipient recomputes hv2 on the tampered data, hv2 matches hv1'.

    • Consequently, the recipient is deceived into accepting the tampered data as valid without detecting tampering.

  • Practical takeaway: Without authentication mechanisms (e.g., HMAC with a secret key or digital signatures), hashing alone does not prevent MITM– attackers can alter both data and its hash.

  • Recommended defense: Use authenticated hashing or signatures, such as HMACs with a shared secret or digital signatures, to provide integrity and authenticity beyond plain hashing.