Cryptography and System Security - Hashes Message Digests and Digital Certificates

Hashes, Message Digests and Digital Certificates

Cryptographic Hash Functions

A hash function maps digital data of arbitrary size to digital data of a fixed size. The values returned are called hash values, hash codes, or simply hashes.

H=h(m)H = h(m)

Where:

  • hh is a hash function.

  • mm is a message.

  • HH is the authentication tag (message digest).

It is considered practically impossible to invert a cryptographic hash function.

Data Integrity

Cryptographic hash functions ensure data integrity. For example, Bob can verify if H=h(m)H = h(m).

Alice sends Bob the message and hash (m,H)(m, H).

Properties of Cryptographic Hash Functions
  1. Computationally infeasible to find data mapping to specific hash (one-way property).

  2. Computationally infeasible to find two data to the same hash (collision-free property).

Properties of Hash Functions

Ideal cryptographic hash functions have these properties:

  1. Easy to compute the hash value for any given message.

  2. Infeasible to generate a message from its hash (one-way commitment).

  3. Infeasible to modify a message without changing the hash.

  4. Infeasible to find two different messages with the same hash.

These properties ensure that a malicious adversary cannot replace or modify the input data without changing its digest. If two strings have the same digest, one can be very confident that they are identical.

Requirements
  • Compression: The output length of H=h(m)H = h(m) is fixed and small, regardless of the size of input mm.

  • Efficiency: It must be easy to compute h(m)h(m) for any input mm. Computational effort grows with the length of mm, but should not grow too fast.

Security Properties

A cryptographic hash function must withstand known cryptanalytic attacks and have the following properties:

  • Pre-image resistance: Given a hash value HH, it should be difficult to find any message mm such that H=h(m)H = h(m). This is related to a one-way function.

  • Second pre-image resistance (Weak collision resistance): Given m1m1 and h(m1)h(m1), it’s infeasible to find m2m2 with m2m1m2 ≠ m1, such that h(m2)=h(m1)h(m2) = h(m1).

  • Strong Collision resistance: It should be difficult to find two different messages m1m1 and m2m2, with m1m2m1 ≠ m2, such that h(m1)=h(m2)h(m1) = h(m2). Such a pair is called a cryptographic hash collision.

Hash Function Uses

  • Message Integrity Check (MIC): Send the hash of the message (digest). MIC always encrypts the message optionally.

  • Message Authentication Code (MAC): Send a keyed hash of the message. The message can be optionally encrypted.

  • Digital Signature (non-repudiation): Encrypt the hash with a private (signing) key and verify with a public (verification) key.

Message Authentication Code (MAC)

Message authentication is achieved using a message authentication code (MAC), also known as a keyed hash function. MACs are used between two parties that share a secret key to authenticate information exchanged between those parties.

A MAC function takes as input a secret key and a data block, producing a hash value (MAC) associated with the protected message. An attacker altering the message will be unable to alter the associated MAC value without knowing the secret key.

E(K,H(M))E(K, H(M)) is a function of a variable-length message MM and a secret key KK, producing a fixed-size output secure against an opponent who does not know the secret key.

Hash Functions & Message Authentication

Symmetric Key - Unkeyed Hash

a) Message encrypted
b) Message unencrypted

Symmetric Key - Keyed Hash

c) Message unencrypted
d) Message encrypted

Digital Signatures

The operation of a digital signature is similar to that of the MAC. The hash value of a message is encrypted with a user’s private key. Anyone who knows the user’s public key can verify the integrity of the message associated with the digital signature. An attacker needs to know the private key to alter the message.

The hash code is encrypted using public-key encryption with the sender’s private key, providing authentication and a digital signature. Only the sender could have produced the encrypted hash code.

If confidentiality and a digital signature are desired, the message plus the private-key-encrypted hash code can be encrypted using a symmetric secret key.

Hash Functions & Digital Signatures - PKCS

(a) Source A sends message MM to Destination B with digital signature. The hash HH of message MM is encrypted with source A's private key (PRPR) and sent along with the message. Destination B decrypts the encrypted hash with source A's public key (PUPU) and compares it with the hash of the received message.E(PR,H(M))E(PR, H(M))

(b) Source A sends message MM encrypted with symmetric key KK along with the hash HH of message MM encrypted with source A's private key (PRPR). Destination B decrypts the symmetric key encrypted data using the symmetric key and decrypts the encrypted hash with source A's public key (PUPU) and compares it with the hash of the received message.E(K,[ME(PR,H(M))])E(K, [M || E(PR, H(M))])

Other Hash Function Uses

  • Information security applications:

    • Verifying the integrity of files or messages

    • Password verification: store the hash of the password, not the actual password

    • Proof-of-work

    • File or data identifier

    • For intrusion detection and virus detection: keep & check hash of files on system

  • Pseudorandom function (PRF):

    • Generate session keys

    • Produce key from password

    • Derive keys from master key cooperatively

  • Pseudorandom number generator (PRNG):

    • Vernam Cipher/OTP

    • S/Key, proof of “what you have” via messages

  • To create a one-way password file:

    • Store hash of password not actual password (e.g., Unix, Windows NT, etc.)

    • Salt to deter pre-computation attacks
      *Rainbow tables (is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a password (or credit card numbers, etc.) up to a certain length consisting of a limited set of characters.

  • For intrusion detection and virus detection:

    • Keep & check hash of files on system (e.g., Tripwire - Is a free software security and data integrity tool for monitoring and alerting on specific file change(s) on a range of systems.)

Applications

  • Information security applications:

    • Authentication: Digital signatures, Message authentication codes (MACs).

    • Ordinary hash functions: to index data in hash tables.

    • Fingerprinting: to detect duplicate data or uniquely identify files.

    • Checksums: to detect accidental data corruption.

    • Cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values

Hash Function Requirements

Requirement

Description

Variable input size

HH can be applied to a block of data of any size.

Fixed output size

HH produces a fixed-length output.

Efficiency

H(x)H(x) is relatively easy to compute for any given xx, making both hardware and software implementations practical.

Preimage resistant (one-way property)

For any given hash value hh, it is computationally infeasible to find yy such that H(y)=hH(y) = h.

Second preimage resistant

For any given block xx, it is computationally infeasible to find y!=xy != x with H(y)=H(x)H(y) = H(x).

Collision resistant

It is computationally infeasible to find any pair (x,y)(x, y) such that H(x)=H(y)H(x) = H(y).

Pseudorandomness

Output of HH meets standard tests for pseudorandomness

Attacks on Hash Functions

  • Brute-force attacks: simplest method to gain access by trying various combinations of usernames and passwords.

  • Cryptanalysis: attacks exploit some property of the algorithm so faster than exhaustive search.

    • A preimage or second preimage attack: Find yy such that H(y)H(y) equals a given hash value.

    • Collision resistance: Find two messages xx & yy with the same hash so H(x)=H(y)H(x) = H(y). Hence value 2m/22^{m/2} determines strength of hash code against brute-force attacks. 128-bits inadequate, 160-bits suspect.

Birthday Attacks

The probability that all 23 people have different birthdays is

1×(11365)×(12365)(122365)=0.4931 \times (1-\frac{1}{365}) \times (1-\frac{2}{365}) \dots (1-\frac{22}{365}) = 0.493

Therefore, the probability of at least two having the same birthday is 10.493=0.5071 - 0.493 = 0.507

More generally, suppose we have NN objects, where NN is large. There are rr people, and each chooses an object. Then

P(there is a match)=1er2/2N1rNP(\text{there is a match}) = 1 - e^{-r^2/2N} \approx 1 - \frac{r}{N}

Choosing r2/2N=ln2r^2/2N = ln2, we find that if r1.177Nr ≈ 1.177 \sqrt{N}, then the probability is 50% that at least two people choose the same object.

If there are NN possibilities and we have a list of length N\sqrt{N}, then there is a good chance of a match.

If we want to increase the chance of a match, we can make a list of length of a constant times N\sqrt{N}.

Hash Function Cryptanalysis

Cryptanalytic attacks exploit some property of the algorithm, making them faster than exhaustive search. Hash functions use iterative structure to process messages in blocks (including length). Attacks focus on collisions in function ff.

Block Ciphers as Hash Functions

Can use block ciphers as hash functions. Using H=0H = 0 and zero-pad of the final block. Compute:

H<em>i=E</em>M<em>i[H</em>i1]H<em>i = E</em>{M<em>i}[H</em>{i-1}]

and use the final block as the hash value. The resulting hash is too small (64-bit), both due to direct birthday attack and “meet-in-the-middle” attack. Other variants are also susceptible to attack.

  • Block cipher key length BB

  • Pad Message MM to a multiple of BB

  • Break padded MM into LL blocks

  • L=M/BL = |M|/B

  • M=M<em>1M</em>2MLM = M<em>1 M</em>2 … M_L

  • Use blocks of MM as keys in the block cipher, iteratively encrypting the state value starting with constant H0H_0, resulting in a hash value.

  • H=H<em>L=E(M</em>L,.E(M<em>2,E(M</em>1,H0)))H = H<em>L = E(M</em>L,….E(M<em>2,E(M</em>1,H_0))…)

MD5: Message Digest

  • Designed by Ronald Rivest in 1991.

  • Latest in a series of MD2, MD4.

  • It takes an input of arbitrary length.

  • Produces a 128-bit message digest.

  • The input message is processed in 512-bit blocks.

  • MD5 has become a popular hashing tool through PHP.

  • The most widely used hash algorithm is MD5.

  • MD5 is not collision-resistant.

  • MD5 hash is a 32-digit hexadecimal number.

  • The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" securely before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.

  • Long messages need an integrity check before encryption.

MD5 Algorithm Steps
  1. Append padding bits

  2. Append length

  3. Initialize MD buffer

  4. Process messages in 512-bit (16 words of 32-bit) blocks

  5. Output (message digest)

MD5 Implementation Steps
  • Step 1: Append padding bits

    • The input message is

/