Lecture Notes on Digital Signature Schemes and Cryptographic Hash Functions

Problems with Signature Schemes

Basic Model:
- Input file divided into blocks. Each block is a plaintext element $P$ . The file is a sequence of elements $x1, x2, …, x_n$ .
- Each $xi$ is encrypted separately to create $yi$ .
- The same model is used for signature schemes, leading to problems.
Problems
- Length of the signature.
 - RSA Signature Scheme
 - Size of plaintext space is the same as the size of the signature space (both are $Z_n$ ).
 - If one plaintext element is 512 bits, the signature is also 512 bits.
 - If the input file is 10 megabytes, the signature is also 10 megabytes.
 - The signed file becomes 20 megabytes.
 - Elgamal Signature Scheme
 - Plaintext space is $Zp^$ , but the signature space is $Zp^ \times Z_{p-1}$ .
 - The length of the signature is approximately two times the size of the message.
 - If the file is 10 megabytes, the signature is 20 megabytes, and the signed file becomes 30 megabytes.
 - This requires a lot of bandwidth to transmit the file, which is inefficient.
- Amount of computation.
 - The time required to create the signature is too long.
 - Public key cryptosystems are slow.
 - Hybrid encryption is used to solve this problem in encryption, but signature creation still relies on public key cryptosystems.
 - The problem is exacerbated because a signature must be created for each block of the plaintext.
- Cut and Paste Attack.
 - If each plaintext block is signed separately, an attacker can reassemble the components to create a different message.
 - The attacker can move signed components around.
 - The signature remains verifiable on each component, so the sender cannot deny the altered message.
Example of Cut and Paste Attack
- A note saying, "I owe you $100 and not $1,000,000 as you said."
- The attacker has the signature on "$100", and the signature on "$1,000,000."
- The attacker can rearrange the message to read, "I owe you $1,000,000 and not $100 as you said."
Solution: Hash Functions
- Using hash functions solves all three problems.

Cryptographic Hash Functions

Definition
- A hash function is a function that maps an arbitrary-length message to a fixed-size output.
- Denoted as $h$ .
- The co-domain is typically a bit string of a specific length, e.g., 160 bits as in the Digital Signature Standard.
- $h: {0, 1}^* \rightarrow {0, 1}^{160}$ , where $0, 1^*$ represents an arbitrary string of arbitrary length.
The New Model
- Instead of signing each plaintext element separately, the entire message is hashed first.
- The output of the hash function is called the message digest (z).
- The message digest is then signed, and the signature is attached to the message.
- Alice creates her signature on the message digest z and attaches the signature y to the message x.
- The pair x, y is transmitted over the channel
Signing and Verification Process
- Alice takes a long message $x$ and computes $z = h(x)$ .
- Alice signs $z$ to get $y = sig(z)$ .
- Alice sends $(x, y)$ .
- Bob receives $(x, y)$ .
  - Bob computes $z' = h(x)$ .
  - Bob verifies the signature $y$ on $z'$ using Alice's public key.
  - $verify(pk_A, z', y) == True$
- Bob computes $z = h(x)$ because the hash function is public.
- Bob checks if $y$ is a valid signature of Alice on the message digest $z$ .
Benefits of Using Hash Functions
- Solves the problem of long signature lengths.
  - If a file is 10 megabytes long, hashing compresses it to a 160-bit message digest.
  - Using RSA, the signature is 160 bits, so the total file size is 10 megabytes + 160 bits.
  - Using Elgamal, the signature would be 320 bits (twice the message digest size), resulting in 10 megabytes + 320 bits.
  - The overhead is independent of the size of the original file.
- Reduces computation time.
  - The signing algorithm is used only once on the message digest instead of on each plaintext block.
- Prevents cut and paste attacks.
  - The entire message is signed holistically.
Attacks and Properties of Hash Functions
- Hash functions must be carefully designed to avoid attacks.
- A good cryptographic hash function should have specific properties to ensure security.

Security Properties of Cryptographic Hash Functions

Infinite Domain to Finite Range
- Hash functions map an infinite domain and codomain.
- Because the domain is larger than the co-domain, collisions are unavoidable.
First Attack: Second Preimage Problem
- Goal: Given a hash function $h$ and a message $x1$ , find another message $x2 != x1$ such that $h(x1) = h(x_2)$ .
- Attack Scenario:
 - Alice sends $(x1, y1)$ , where $y1 = sig(h(x1))$ .
 - Oscar observes $(x1, y1)$ .
 - Oscar computes $x2$ such that $h(x2) = h(x_1)$ .
 - Oscar sends $(x2, y1)$ to Bob, posing as Alice.
 - Bob verifies the signature, and it appears valid because $h(x2) = h(x1)$ .
- Second Preimage Resistance
Definition: A hash function is second preimage resistant if, given $h$ and $x1$ , it is computationally infeasible to find $x2 != x1$ such that $h(x1) = h(x_2)$ .
Second Attack: Collision Problem
- Goal: Find two distinct messages $x1$ and $x2$ such that $h(x1) = h(x2)$ .
- Attack Scenario:
 - Oscar finds $x1$ and $x2$ such that $h(x1) = h(x2) = z$ .
 - Oscar convinces Alice to sign $x1$ , resulting in $(x1, y)$ , where $y = sig(h(x_1))$ .
 - Oscar sends $(x_2, y)$ to Bob.
 - Bob verifies the signature, and it appears valid because $h(x2) = h(x1)$ .
- Collision Resistance:
 - Definition: A hash function is collision-resistant if it is computationally infeasible to find distinct messages $x1$ and $x2$ such that $h(x1) = h(x2)$ .
Third Attack: Preimage Problem (One-Way Hash Function)
- Exploits the ability to sign random messages in signature schemes.
- Attack Scenario:
  - Oscar computes a signature $y$ and constructs a value $z$ such that $verify(z, y) == True$ .
  - Oscar finds a message $x$ such that $h(x) = z$ .
  - Oscar sends $(x, y)$ to Bob.
  - Bob verifies the signature, and it appears valid because $h(x) = z$ .
- Preimage Resistance (One-Way Hash Function):
  - Definition: A hash function is preimage resistant (one-way) if, given $h$ and $z$ , it is computationally infeasible to find $x$ such that $h(x) = z$ .
  - Computing $h(x)$ should be easy, but computing $x$ given $h(x)$ should be hard.
    Implications and Relationships:
- Collision resistance is the most important property. If a hash function is collision-resistant, it is also one-way and second-preimage resistant.
  - If a hash function is not second preimage resistant, it is not collision resistant.
  - If it is easy to find a secondary image, then you have found a collision.

Hash Function Constructions and Standards

Provably Secure Hash Functions
- In 1991, David Chaum, Juan Hastad, and Bridget Pfitzmann created a provably secure hash function.
- Assuming the discrete log problem is difficult, their hash function is collision-resistant.
- The construction is theoretical and not used in practice.
Merkle-Damgård Construction
- Uses a compression function to compress a long bit string to a short string.
- Iterates the compression function to create a hash function from an infinite domain.
  MD Series
- Ron Rivest created a sequence of algorithms: MD0, MD1, MD2, MD3, MD4, and MD5.
- MD4 (1990) and MD5 (1992) are real hash functions.
Secure Hash Algorithm (SHA) Series:
- NIST created the Secure Hash Algorithm (SHA) standard in 1993.
- SHA-0:
  - Based on MD4.
  - Vulnerabilities were found in MD4 and MD5.
- SHA-1:
  - Collisions were found in 2017.
- SHA-2:
  - Multiple sizes: 224, 256, 384, and 512 bits.
  - Not broken, but uses the same principles as SHA-1.
- SHA-3:
  - Became a standard in 2015.
  - Based on the sponge construction.
  - Absorbing and squeezing phases.