Cryptography and System Security - Hashes Message Digests and Digital Certificates

Hashes, Message Digests and Digital Certificates

Cryptographic Hash Functions

A hash function maps digital data of arbitrary size to digital data of a fixed size. The values returned are called hash values, hash codes, or simply hashes.

$H = h(m)$

Where:

$h$ is a hash function.
$m$ is a message.
$H$ is the authentication tag (message digest).

It is considered practically impossible to invert a cryptographic hash function.

Data Integrity

Cryptographic hash functions ensure data integrity. For example, Bob can verify if $H = h(m)$ .

Alice sends Bob the message and hash $(m, H)$ .

Properties of Cryptographic Hash Functions

Computationally infeasible to find data mapping to specific hash (one-way property).
Computationally infeasible to find two data to the same hash (collision-free property).

Properties of Hash Functions

Ideal cryptographic hash functions have these properties:

Easy to compute the hash value for any given message.
Infeasible to generate a message from its hash (one-way commitment).
Infeasible to modify a message without changing the hash.
Infeasible to find two different messages with the same hash.

These properties ensure that a malicious adversary cannot replace or modify the input data without changing its digest. If two strings have the same digest, one can be very confident that they are identical.

Requirements

Compression: The output length of $H = h(m)$ is fixed and small, regardless of the size of input $m$ .
Efficiency: It must be easy to compute $h(m)$ for any input $m$ . Computational effort grows with the length of $m$ , but should not grow too fast.

Security Properties

A cryptographic hash function must withstand known cryptanalytic attacks and have the following properties:

Pre-image resistance: Given a hash value $H$ , it should be difficult to find any message $m$ such that $H = h(m)$ . This is related to a one-way function.
Second pre-image resistance (Weak collision resistance): Given $m1$ and $h(m1)$ , it’s infeasible to find $m2$ with $m2 ≠ m1$ , such that $h(m2) = h(m1)$ .
Strong Collision resistance: It should be difficult to find two different messages $m1$ and $m2$ , with $m1 ≠ m2$ , such that $h(m1) = h(m2)$ . Such a pair is called a cryptographic hash collision.

Hash Function Uses

Message Integrity Check (MIC): Send the hash of the message (digest). MIC always encrypts the message optionally.
Message Authentication Code (MAC): Send a keyed hash of the message. The message can be optionally encrypted.
Digital Signature (non-repudiation): Encrypt the hash with a private (signing) key and verify with a public (verification) key.

Message Authentication Code (MAC)

Message authentication is achieved using a message authentication code (MAC), also known as a keyed hash function. MACs are used between two parties that share a secret key to authenticate information exchanged between those parties.

A MAC function takes as input a secret key and a data block, producing a hash value (MAC) associated with the protected message. An attacker altering the message will be unable to alter the associated MAC value without knowing the secret key.

$E(K, H(M))$ is a function of a variable-length message $M$ and a secret key $K$ , producing a fixed-size output secure against an opponent who does not know the secret key.

Hash Functions & Message Authentication

Symmetric Key - Unkeyed Hash

a) Message encrypted
b) Message unencrypted

Symmetric Key - Keyed Hash

c) Message unencrypted
d) Message encrypted

Digital Signatures

The operation of a digital signature is similar to that of the MAC. The hash value of a message is encrypted with a user’s private key. Anyone who knows the user’s public key can verify the integrity of the message associated with the digital signature. An attacker needs to know the private key to alter the message.

The hash code is encrypted using public-key encryption with the sender’s private key, providing authentication and a digital signature. Only the sender could have produced the encrypted hash code.

If confidentiality and a digital signature are desired, the message plus the private-key-encrypted hash code can be encrypted using a symmetric secret key.

Hash Functions & Digital Signatures - PKCS

(a) Source A sends message $M$ to Destination B with digital signature. The hash $H$ of message $M$ is encrypted with source A's private key ( $PR$ ) and sent along with the message. Destination B decrypts the encrypted hash with source A's public key ( $PU$ ) and compares it with the hash of the received message. $E(PR, H(M))$

(b) Source A sends message $M$ encrypted with symmetric key $K$ along with the hash $H$ of message $M$ encrypted with source A's private key ( $PR$ ). Destination B decrypts the symmetric key encrypted data using the symmetric key and decrypts the encrypted hash with source A's public key ( $PU$ ) and compares it with the hash of the received message. $E(K, [M || E(PR, H(M))])$

Other Hash Function Uses

Information security applications:
- Verifying the integrity of files or messages
- Password verification: store the hash of the password, not the actual password
- Proof-of-work
- File or data identifier
- For intrusion detection and virus detection: keep & check hash of files on system
Pseudorandom function (PRF):
- Generate session keys
- Produce key from password
- Derive keys from master key cooperatively
Pseudorandom number generator (PRNG):
- Vernam Cipher/OTP
- S/Key, proof of “what you have” via messages
To create a one-way password file:
- Store hash of password not actual password (e.g., Unix, Windows NT, etc.)
- Salt to deter pre-computation attacks
  *Rainbow tables (is a precomputed table for reversing cryptographic hash functions, usually for cracking password hashes. Tables are usually used in recovering a password (or credit card numbers, etc.) up to a certain length consisting of a limited set of characters.
For intrusion detection and virus detection:
- Keep & check hash of files on system (e.g., Tripwire - Is a free software security and data integrity tool for monitoring and alerting on specific file change(s) on a range of systems.)

Applications

Information security applications:
- Authentication: Digital signatures, Message authentication codes (MACs).
- Ordinary hash functions: to index data in hash tables.
- Fingerprinting: to detect duplicate data or uniquely identify files.
- Checksums: to detect accidental data corruption.
- Cryptographic hash values are sometimes called (digital) fingerprints, checksums, or just hash values

Hash Function Requirements

Requirement	Description
Variable input size	$H$ can be applied to a block of data of any size.
Fixed output size	$H$ produces a fixed-length output.
Efficiency	$H(x)$ is relatively easy to compute for any given $x$ , making both hardware and software implementations practical.
Preimage resistant (one-way property)	For any given hash value $h$ , it is computationally infeasible to find $y$ such that $H(y) = h$ .
Second preimage resistant	For any given block $x$ , it is computationally infeasible to find $y != x$ with $H(y) = H(x)$ .
Collision resistant	It is computationally infeasible to find any pair $(x, y)$ such that $H(x) = H(y)$ .
Pseudorandomness	Output of $H$ meets standard tests for pseudorandomness

Attacks on Hash Functions

Brute-force attacks: simplest method to gain access by trying various combinations of usernames and passwords.
Cryptanalysis: attacks exploit some property of the algorithm so faster than exhaustive search.
- A preimage or second preimage attack: Find $y$ such that $H(y)$ equals a given hash value.
- Collision resistance: Find two messages $x$ & $y$ with the same hash so $H(x) = H(y)$ . Hence value $2^{m/2}$ determines strength of hash code against brute-force attacks. 128-bits inadequate, 160-bits suspect.

Birthday Attacks

The probability that all 23 people have different birthdays is

$1 \times (1-\frac{1}{365}) \times (1-\frac{2}{365}) \dots (1-\frac{22}{365}) = 0.493$

Therefore, the probability of at least two having the same birthday is $1 - 0.493 = 0.507$

More generally, suppose we have $N$ objects, where $N$ is large. There are $r$ people, and each chooses an object. Then

$P(\text{there is a match}) = 1 - e^{-r^2/2N} \approx 1 - \frac{r}{N}$

Choosing $r^2/2N = ln2$ , we find that if $r ≈ 1.177 \sqrt{N}$ , then the probability is 50% that at least two people choose the same object.

If there are $N$ possibilities and we have a list of length $\sqrt{N}$ , then there is a good chance of a match.

If we want to increase the chance of a match, we can make a list of length of a constant times $\sqrt{N}$ .

Hash Function Cryptanalysis

Cryptanalytic attacks exploit some property of the algorithm, making them faster than exhaustive search. Hash functions use iterative structure to process messages in blocks (including length). Attacks focus on collisions in function $f$ .

Block Ciphers as Hash Functions

Can use block ciphers as hash functions. Using $H = 0$ and zero-pad of the final block. Compute:

$Hi = E{Mi}[H{i-1}]$

and use the final block as the hash value. The resulting hash is too small (64-bit), both due to direct birthday attack and “meet-in-the-middle” attack. Other variants are also susceptible to attack.

Block cipher key length $B$
Pad Message $M$ to a multiple of $B$
Break padded $M$ into $L$ blocks
$L = |M|/B$
$M = M1 M2 … M_L$
Use blocks of $M$ as keys in the block cipher, iteratively encrypting the state value starting with constant $H_0$ , resulting in a hash value.
$H = HL = E(ML,….E(M2,E(M1,H_0))…)$

MD5: Message Digest

Designed by Ronald Rivest in 1991.
Latest in a series of MD2, MD4.
It takes an input of arbitrary length.
Produces a 128-bit message digest.
The input message is processed in 512-bit blocks.
MD5 has become a popular hashing tool through PHP.
The most widely used hash algorithm is MD5.
MD5 is not collision-resistant.
MD5 hash is a 32-digit hexadecimal number.
The MD5 algorithm is intended for digital signature applications, where a large file must be "compressed" securely before being encrypted with a private (secret) key under a public-key cryptosystem such as RSA.
Long messages need an integrity check before encryption.

MD5 Algorithm Steps

Append padding bits
Append length
Initialize MD buffer
Process messages in 512-bit (16 words of 32-bit) blocks
Output (message digest)

MD5 Implementation Steps

Step 1: Append padding bits
- The input message is