Lecture Notes on Digital Signature Schemes and Cryptographic Hash Functions

Problems with Signature Schemes

  • Basic Model:
    • Input file divided into blocks. Each block is a plaintext element PP. The file is a sequence of elements x<em>1,x</em>2,,xnx<em>1, x</em>2, …, x_n.
    • Each x<em>ix<em>i is encrypted separately to create y</em>iy</em>i.
    • The same model is used for signature schemes, leading to problems.
  • Problems
    • Length of the signature.
      • RSA Signature Scheme
        • Size of plaintext space is the same as the size of the signature space (both are ZnZ_n).
        • If one plaintext element is 512 bits, the signature is also 512 bits.
        • If the input file is 10 megabytes, the signature is also 10 megabytes.
        • The signed file becomes 20 megabytes.
      • Elgamal Signature Scheme
        • Plaintext space is Z<em>p<em>Z<em>p^<em>, but the signature space is Z</em>p</em>×Zp1Z</em>p^</em> \times Z_{p-1} .
        • The length of the signature is approximately two times the size of the message.
        • If the file is 10 megabytes, the signature is 20 megabytes, and the signed file becomes 30 megabytes.
        • This requires a lot of bandwidth to transmit the file, which is inefficient.
    • Amount of computation.
      • The time required to create the signature is too long.
      • Public key cryptosystems are slow.
      • Hybrid encryption is used to solve this problem in encryption, but signature creation still relies on public key cryptosystems.
      • The problem is exacerbated because a signature must be created for each block of the plaintext.
    • Cut and Paste Attack.
      • If each plaintext block is signed separately, an attacker can reassemble the components to create a different message.
      • The attacker can move signed components around.
      • The signature remains verifiable on each component, so the sender cannot deny the altered message.
  • Example of Cut and Paste Attack
    • A note saying, "I owe you $100 and not $1,000,000 as you said."
    • The attacker has the signature on "$100", and the signature on "$1,000,000."
    • The attacker can rearrange the message to read, "I owe you $1,000,000 and not $100 as you said."
  • Solution: Hash Functions
    • Using hash functions solves all three problems.

Cryptographic Hash Functions

  • Definition
    • A hash function is a function that maps an arbitrary-length message to a fixed-size output.
    • Denoted as hh.
    • The co-domain is typically a bit string of a specific length, e.g., 160 bits as in the Digital Signature Standard.
    • h:0,10,1160h: {0, 1}^* \rightarrow {0, 1}^{160}, where 0,10, 1^* represents an arbitrary string of arbitrary length.
  • The New Model
    • Instead of signing each plaintext element separately, the entire message is hashed first.
    • The output of the hash function is called the message digest (z).
    • The message digest is then signed, and the signature is attached to the message.
    • Alice creates her signature on the message digest z and attaches the signature y to the message x.
    • The pair x, y is transmitted over the channel
  • Signing and Verification Process
    • Alice takes a long message $x$ and computes z=h(x)z = h(x).
    • Alice signs zz to get y=sig(z)y = sig(z).
    • Alice sends (x,y)(x, y).
    • Bob receives (x,y)(x, y).
      • Bob computes z=h(x)z' = h(x).
      • Bob verifies the signature yy on zz' using Alice's public key.
      • verify(pkA,z,y)==Trueverify(pk_A, z', y) == True
    • Bob computes z=h(x)z = h(x) because the hash function is public.
    • Bob checks if yy is a valid signature of Alice on the message digest zz.
  • Benefits of Using Hash Functions
    • Solves the problem of long signature lengths.
      • If a file is 10 megabytes long, hashing compresses it to a 160-bit message digest.
      • Using RSA, the signature is 160 bits, so the total file size is 10 megabytes + 160 bits.
      • Using Elgamal, the signature would be 320 bits (twice the message digest size), resulting in 10 megabytes + 320 bits.
      • The overhead is independent of the size of the original file.
    • Reduces computation time.
      • The signing algorithm is used only once on the message digest instead of on each plaintext block.
    • Prevents cut and paste attacks.
      • The entire message is signed holistically.
  • Attacks and Properties of Hash Functions
    • Hash functions must be carefully designed to avoid attacks.
    • A good cryptographic hash function should have specific properties to ensure security.

Security Properties of Cryptographic Hash Functions

  • Infinite Domain to Finite Range
    • Hash functions map an infinite domain and codomain.
    • Because the domain is larger than the co-domain, collisions are unavoidable.
  • First Attack: Second Preimage Problem
    • Goal: Given a hash function $h$ and a message x<em>1x<em>1, find another message x</em>2!=x<em>1x</em>2 != x<em>1 such that h(x</em>1)=h(x2)h(x</em>1) = h(x_2).
    • Attack Scenario:
      • Alice sends (x<em>1,y</em>1)(x<em>1, y</em>1), where y<em>1=sig(h(x</em>1))y<em>1 = sig(h(x</em>1)).
      • Oscar observes (x<em>1,y</em>1)(x<em>1, y</em>1).
      • Oscar computes x<em>2x<em>2 such that h(x</em>2)=h(x1)h(x</em>2) = h(x_1).
      • Oscar sends (x<em>2,y</em>1)(x<em>2, y</em>1) to Bob, posing as Alice.
      • Bob verifies the signature, and it appears valid because h(x<em>2)=h(x</em>1)h(x<em>2) = h(x</em>1).
    • Second Preimage Resistance
  • Definition: A hash function is second preimage resistant if, given hh and x<em>1x<em>1, it is computationally infeasible to find x</em>2!=x<em>1x</em>2 != x<em>1 such that h(x</em>1)=h(x2)h(x</em>1) = h(x_2).
  • Second Attack: Collision Problem
    • Goal: Find two distinct messages x<em>1x<em>1 and x</em>2x</em>2 such that h(x<em>1)=h(x</em>2)h(x<em>1) = h(x</em>2).
    • Attack Scenario:
      • Oscar finds x<em>1x<em>1 and x</em>2x</em>2 such that h(x<em>1)=h(x</em>2)=zh(x<em>1) = h(x</em>2) = z.
      • Oscar convinces Alice to sign x<em>1x<em>1, resulting in (x</em>1,y)(x</em>1, y), where y=sig(h(x1))y = sig(h(x_1)).
      • Oscar sends (x2,y)(x_2, y) to Bob.
      • Bob verifies the signature, and it appears valid because h(x<em>2)=h(x</em>1)h(x<em>2) = h(x</em>1).
    • Collision Resistance:
      • Definition: A hash function is collision-resistant if it is computationally infeasible to find distinct messages x<em>1x<em>1 and x</em>2x</em>2 such that h(x<em>1)=h(x</em>2)h(x<em>1) = h(x</em>2).
  • Third Attack: Preimage Problem (One-Way Hash Function)
    • Exploits the ability to sign random messages in signature schemes.
    • Attack Scenario:
      • Oscar computes a signature yy and constructs a value zz such that verify(z,y)==Trueverify(z, y) == True.
      • Oscar finds a message xx such that h(x)=zh(x) = z.
      • Oscar sends (x,y)(x, y) to Bob.
      • Bob verifies the signature, and it appears valid because h(x)=zh(x) = z.
    • Preimage Resistance (One-Way Hash Function):
      • Definition: A hash function is preimage resistant (one-way) if, given hh and zz, it is computationally infeasible to find xx such that h(x)=zh(x) = z.
      • Computing h(x)h(x) should be easy, but computing xx given h(x)h(x) should be hard.
        Implications and Relationships:
    • Collision resistance is the most important property. If a hash function is collision-resistant, it is also one-way and second-preimage resistant.
      • If a hash function is not second preimage resistant, it is not collision resistant.
      • If it is easy to find a secondary image, then you have found a collision.

Hash Function Constructions and Standards

  • Provably Secure Hash Functions
    • In 1991, David Chaum, Juan Hastad, and Bridget Pfitzmann created a provably secure hash function.
    • Assuming the discrete log problem is difficult, their hash function is collision-resistant.
    • The construction is theoretical and not used in practice.
  • Merkle-Damgård Construction
    • Uses a compression function to compress a long bit string to a short string.
    • Iterates the compression function to create a hash function from an infinite domain.
      MD Series
    • Ron Rivest created a sequence of algorithms: MD0, MD1, MD2, MD3, MD4, and MD5.
    • MD4 (1990) and MD5 (1992) are real hash functions.
  • Secure Hash Algorithm (SHA) Series:
    • NIST created the Secure Hash Algorithm (SHA) standard in 1993.
    • SHA-0:
      • Based on MD4.
      • Vulnerabilities were found in MD4 and MD5.
    • SHA-1:
      • Collisions were found in 2017.
    • SHA-2:
      • Multiple sizes: 224, 256, 384, and 512 bits.
      • Not broken, but uses the same principles as SHA-1.
    • SHA-3:
      • Became a standard in 2015.
      • Based on the sponge construction.
      • Absorbing and squeezing phases.