Lecture 1 Notes: Intro to Crypto and Cryptocurrencies

Cryptocurrencies

Most of the slides used are derived from "Bitcoins and Cryptocurrencies Technologies – A Comprehensive Introduction", Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller & Steven Goldfeder, 2016, Princeton University Press.

Lecture 1: Intro to Crypto and Cryptocurrencies

This lecture covers:

  • Background on cryptographic primitives useful for building "crypto"currencies.

  • Hash functions.

  • Properties of hash functions we are interested in.

  • Applications that make use of these properties.

  • Data structures that use hash functions.

  • Digital signatures.

  • Public Keys and Identities.

  • Two simple cryptocurrencies (precursors to Bitcoins).

    • Goofycoin.

    • Scroogecoin.

Cryptographic Hash Functions

A hash function is a mathematical function that:

  • Maps items (values) in the domain to items (values) in the range.

  • Converts any digital data into an output string with a fixed number of characters.

  • Hashing is the one-way act of converting the data (called a message) into the output (called the hash).

  • Inputs (or items in the domain) can be any size (not-fixed); technically size of input is not unbounded in practice.

  • Outputs (or items in the range) are fixed-size (we’ll generally employ a hash function such as SHA-256 that has an output size of 256 bits).

  • Efficiently computable, i.e., the mapping should be efficiently (in polynomial time in terms of the input size) computable.

Cryptographically Secure Hash Functions

In cryptography (and cryptography based applications such as bitcoins), we are interested in a special type of hash function, often referred to as cryptographically secure hash functions.

A cryptographically secure hash function satisfies the following additional security properties:

  • Collision Resistance.

  • Hiding or Pre-image Resistance.

  • Puzzle-friendliness.

Property 1: Collision Resistance

  • Strong Collision Resistance:

    • A hash function is said to have strong collision resistance if it is computationally infeasible to find any two distinct inputs that produce the same hash output.

    • In other words, it is extremely difficult to find any collision at all. Strong collision resistance provides a high level of security against adversaries attempting to find collisions.

  • Weak Collision Resistance (or second pre-image resistance):

    • A hash function is considered to have weak collision resistance if, given an input message, it is computationally infeasible to find any other input that produces the same hash output as the original message.

    • In this case, the original message is known, and the goal is to find a different input that hashes to the same value. Weak collision resistance is generally easier to achieve compared to strong collision resistance.

Collision Resistance: Equations

  • Strong collision resistance: Infeasible to find x and y such that x!=yx != y and H(x)=H(y)H(x) = H(y)

  • Weak collision resistance: Given x and H(x)H(x), it is infeasible to find y!=xy != x such that H(y)=H(x)H(y) = H(x)

Existence of Collisions

  • For all hash functions (cryptographically secure or not), collisions exist.

  • For cryptographically secure hash functions, it is difficult to find collisions.

Finding Collisions in Secure Hash Functions

  • How to find a collision in a Secure Hash function with a 256-bit output?

    • Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you find a collision.

      • How long does this take?

        • Worst-case - 2256+12^{256} + 1 inputs.

        • On average – more than 50% chance of finding collision after 21282^{128} inputs (Birthday paradox).

        • More than 99.8% chance of collision after 21302^{130} randomly chosen inputs.

      • Brute-force always works, no matter what H is, in finding collision. However, it takes too long to matter (21282^{128} is a lot of tries!).

    • Strategy 2: Find cryptographic or other weaknesses in hash functions.

      • Is the following function cryptographically secure H(x)=xmod23H(x) = x \mod 2^3? Yes/No? Why?

      • Most cryptographically secure hash functions also have weaknesses. E.g., MD5 was considered to be secure, until after many years of research, collisions were found. SHA 256 (currently used secure hash function) has no known attacks, but we don’t know it is secure! No Hash function has proven to be collision resistant!

Example Calculation

2256=1157920892373161954235709850086879078532699846656405640394575840079131296399362^{256} = 115792089237316195423570985008687907853269984665640564039457584007913129639936

Since 3 < 2^{256}, the remainder is simply:

3mod2256=33 \mod 2^{256} = 3

2256mod2256=02^{256} \mod 2^{256} = 0

So:

(3+2256)mod2256=(3mod2256+2256mod2256)mod2256=(3+0)mod2256=3(3 + 2^{256}) \mod 2^{256} = (3 \mod 2^{256} + 2^{256} \mod 2^{256}) \mod 2^{256} = (3 + 0) \mod 2^{256} = 3

Application of Collision-resistance: Hash as Message Digest

  • Message digest: Output of a hash function is also called a message digest.

  • Now, if H is a secure hash function and if we know H(x)=H(y)H(x) = H(y), is it safe to assume that x=yx = y? Why?

    • Yes! Because if the above is not true, it violates the collision-resistance property and H would not be secure!

  • Application of message digests?

    • Verify integrity of large files.

    • To verify integrity, rather than comparing files just compare hashes or message digests!

    • Useful because message digest or hash is smaller compared to inputs!

  • A hash serves as a fixed-length digest, or unambiguous summary, of a message!

Property 2: Hiding

  • Pre-image Resistance:

    • A hash function is said to have pre-image resistance if it is computationally infeasible to find any input that maps to a specific hash output.

    • In other words, given a hash value, it should be difficult to determine the original input that produced that hash. Pre-image resistance ensures that the hash function's output provides no information about the input.

Hiding or Pre-image Resistance

  • Given H(x)H(x), it is infeasible to find x.

  • The property cannot be true in the form stated – Consider a simple example: H(“heads”) H(“tails”) easy to find x! Why? Input x is not spread out or uniformly distributed, P(x=“heads”)=0.5P(x=\text{“heads”}) = 0.5, P(x=“tails”)=0.5P(x=\text{“tails”}) = 0.5, P(all other x)=0P(\text{all other x}) = 0!

Achieving Hiding or Pre-image Resistance

  • Combine the input x which is not spread out (or not uniformly distributed) with another input r which is spread out or uniformly distributed!

Hiding Property (More Precisely)

  • If r is chosen from a probability distribution that has high min-entropy (the possible outcomes of the random variable are difficult to predict or guess), then given H(rx)H(r | x), it is infeasible to find x.

  • Min-entropy: Entropy of a probability distribution function captures the predictability of the output of the function!

  • Let r be a random variable which takes values from the set 1,2,3..n\text{{1,2,3..n}}, where n is very large such as 22562^{256}, according to some probability distribution function f.

    • If f is high min-entropy -> distribution of the value of r is “very spread out” or “uniform”.

    • In other words, r takes each value 1,2,…,n with probability exactly 1/n. If n is very large (e.g., 22562^{256}), the probability of correctly guessing which value it took is very small (negligible)!

Application of Hiding: Commitments

  • Commit to a value, reveal it later.

  • Analogy: Want to “seal a value in an envelope”, and “open the envelope” later.

  • A commitment scheme consists of two algorithms:

    • com := commit(msg, nonce)

    • match := verify(com, msg, nonce)

  • We require that the following two security properties hold:

    • Hiding: Given com, it is infeasible to find msg.

    • Binding: It is infeasible to find two pairs (msg, nonce) and (msg’, nonce’) such that msg!=msgmsg != msg’ and commit( msg, nonce ) == commit( msg’, nonce’ )

      • The binding property of a hash function ensures that it is computationally infeasible to find two different inputs that produce the same hash output.

Commitment API

  • com := commit(msg, nonce)

    • The commit function takes a message and a secret random value, called a nonce, as input and returns a commitment.

  • match := verify(com, msg, nonce)

    • The verify function takes a commitment, nonce, and message as input. It returns true if com == commit(msg, nonce) and false otherwise.

  • To seal msg in envelope:

    1. Select a random nonce and keep it secret.

    2. Compute the commitment com := commit(msg, nonce).

    3. Publish com.

  • To open envelope:

    1. Publish nonce, msg.

    2. Anyone can use verify() to check validity of msg and the previously published com.

Security Properties of Commitment APIs

  • com := commit(msg, nonce)

  • match := verify(com, msg, nonce)

  • Security properties:

    • Hiding: Given com (and if nonce is chosen from a distribution with high min-entropy), infeasible to find msg.

    • Binding: Infeasible to find <msg’, nonce’> != <msg, nonce> such that: verify(commit(msg, nonce), msg’, nonce’) == true

  • So, how to implement a commitment scheme such that these two properties hold?

Implementing Commitment APIs using Hash Functions

  • commit(msg, nonce) := H(nonce | msg) where nonce is a random 256-bit value & H is a cryptographically secure hash function

  • verify(com, msg, nonce) := (H(nonce | msg) == com )

  • Security properties:

    • Hiding: Given H(nonce | msg), infeasible to find msg.

      • Hiding or pre-image resistant property of secure Hash function

    • Binding: Infeasible to find <msg’, nonce’> != <msg, nonce> such that H(nonce’ | msg’) == H(nonce | msg)

      • Collision-resistance property of secure Hash function

Property 3: Puzzle-friendliness

Puzzle-friendliness property: For every possible n-bit output value y, if k is chosen from a distribution with high min-entropy, then it is infeasible to find x such that H(kx)=yH(k | x) = y in time significantly less than 2n2^n.

Intuition: If you want to target a Hash function H to have a particular output value y, and if part of the input (i.e., k) is chosen in a suitably randomized fashion, then it’s very difficult to find the other part of the input x to exactly hit the target output value (y).

Application of Puzzle-friendliness: Search Puzzle

  • What is a search puzzle?

    • Given: A “puzzle ID” id (from high min-entropy distrib.), and a target set Y.

    • Objective: Try to find a “solution” x such that H(idx)YH(id | x) \in Y.

  • Puzzle-friendly property implies that no solving strategy is much better than trying random values of x.

  • Strength of the puzzle depends on the size of Y

    • Y=y1Y={y_1}, Most time-consuming!

    • Y=y<em>1,y</em>2,y3Y={y<em>1,y</em>2,y_3}

    • Y=y<em>1,y</em>2,y<em>3,y</em>2256Y={y<em>1,y</em>2,y<em>3,y</em>2}^{256}, Trivial!

Construction of Hash Functions

Hash functions are typically constructed from fixed-input compression functions!

Example: See construction of SHA-256 Hash function -> SHA-256 used in Bitcoins

Also referred to as Merkle-Damgard Transform

Theorem: If c is collision-free, then SHA-256 is collision-free.

Hash Pointers and Data Structures

  • Hash pointer is:

    • Pointer to where some info/data is stored, and

    • (Cryptographic) hash of the info

  • What can you do with a hash pointer?

    • Retrieve or get back the info/data

    • Verify that the info/data hasn’t changed

  • Use hash pointers to build data structures!

Block Chains

  • What is a Block Chain?

    • Linked list with hash pointers

  • What is it used for?

    • Tamper-evident log or register data

Tamper-evident Log

  • What is a Tamper-evident log and how to detect tampering?

  • How many verifications to prove that some data has not been tampered with (worst-case)?

    • O(n)

Merkle Tree

  • Binary tree with hash pointers!

  • Drawback:

    • More number of blocks

  • Advantage:

    • Proving membership of a data block in the tree is easy

    • Only need to show O(log n) items

    • In other words, membership verification in O(log n) time/space

  • How to prove non-membership?

    • Sorted Merkle trees: Order leafs of the tree in some fashion, say lexicographically, numerically, etc.

    • Verify membership of data before and after the missing one!

    • Non-membership verification also takes O(log n) time/space

Hash Pointers in Data Structures

Hash pointers can be used in any pointer-based data structure that has no cycles

Digital Signatures

  • Second cryptographic primitive (in addition to Hash functions) that we will need to build cryptocurrencies (and bitcoins)

  • What are the properties we need from digital signatures? – same as properties we need from handwritten signatures

    • Only you can sign, but anyone can verify (security)

    • Signature tied to a particular document - can’t be cut-and-pasted to another document (unforgeability)

Digital Signatures APIs

  • (sk, pk) := generateKeys(keysize)

    • sk: secret signing key

    • pk: public verification key

  • sig := sign(sk, message) can be randomized algorithms

  • isValid := verify(pk, message, sig) Is a deterministic algorithm

Digital Signatures Requirements

  • Valid signatures must always verify correctly

    • i.e., verify(pk, message, sign(sk, message)) == true

    • Basic property for signatures to be useful!

  • Signatures should be existentially unforgeable -> can’t forge signatures

    • i.e., adversary who knows pk and gets to see signatures on messages of his choice, still can’t produce a verifiable signature on another message

    • Can be formalized by means of the unforgeability game described next

Unforgeability Game

  • Challenger generates key pair (sk, pk) and gives pk to the attacker

  • Attacker requests signatures on messages m0, m1, …

  • Challenger returns signatures sign(sk, m0), sign(sk, m1), …

  • Attacker outputs a message M and signature sig

  • Attacker wins if verify(pk, M, sig) == true and M is not in {m0, m1, …}

  • Signature scheme is unforgeable if and only if, no matter what algorithm the adversary is using, his chance of successfully generating a valid M, sig is negligibly small!

Digital Signatures - Practical Concerns

  • Many digital signature algorithms are randomized (esp. ones used in cryptocurrencies)

    • Need good source of randomness

    • Bad randomness -> Even an otherwise secure signature algorithm is not secure

  • Signature algorithms have fixed sized inputs -> How to sign large messages (whose size is greater than input size of the algorithm)?

    • Use Hash(message) rather than message

  • How to sign the entire Block chain?

    • Sign the entire hash pointer of the head block!

    • This signature “covers” the whole block chain structure

Digital Signatures used by Bitcoins

  • Bitcoin uses Elliptic Curve Digital Signature Algorithm (ECDSA) standard

    • ECDSA is a US Government standard

    • Bitcoin uses ECDSA over the standard elliptic curve secp256k1 -> this curve is rarely used outside Bitcoins

    • Provides 128 bit of security (equivalent to performing 2128 symmetric encryptions)

      • Private key – 256 bits

      • Public key compressed – 257 bits

      • Message to be signed – 256 bits

      • Signature – 512 bits

    • Technical details of ECDSA will be skipped here

    • Good randomness is essential for ECDSA

      • If you foul this up in generateKeys() or sign() -> you probably leaked your private key

Public Keys as Identities

  • Can we use public key pk, as generated before by generateKeys(keysize), as an identity?

    • For example, if you see signature sig such that verify(pk, msg, sig)==true, think of it as: pk says, “[msg]”.

  • But, pk by itself cannot be used as an identity!

    • To “speak for” pk, you must know matching secret key sk

Identities in a Cryptocurrency

  • How to make a new identity?

    • Create a new, random key-pair (sk, pk)

    • pk is the public “name” you can use [usually better to use H(pk)]

    • sk lets you “speak for” the identity

    • You control the identity pk, because only you know sk

    • Even if pk “looks random” that’s fine, nobody needs to know your real identity for the cryptocurrency application

    • Just like while spending an actually currency note

Decentralized Identity Management

  • Anybody can make a new identity at any time make as many as you want!

  • No central point of coordination

  • These identities are called “addresses” in Bitcoin

Privacy

  • Addresses not directly connected to real-world identity

  • But observer can link together an address’s activity over time, make inferences

  • Later: a whole chapter on privacy in Bitcoins …

Simple Cryptocurrencies

GoofyCoin

  • Coin Creation:

    • Goofy can create new coins signed by pkGoofy

    • CreateCoin [uniqueCoinID]

    • New coins belong to me.

  • Spending Coins:

    • A coin’s owner can spend it.

      • CreateCoin [uniqueCoinID] signed by pkGoofy

      • Pay to pkAlice : H( ) signed by pkGoofy

      • Alice owns it now.

    • The recipient can pass on the coin again.

      • CreateCoin [uniqueCoinID] signed by pkGoofy

      • Pay to pkAlice : H( ) signed by pkGoofy

      • Pay to pkBob : H( ) signed by pkAlice

      • Bob owns it now.

    • Double Spending Attack:

      • CreateCoin [uniqueCoinID] signed by pkGoofy

      • Pay to pkAlice : H( ) signed by pkGoofy

      • Pay to pkBob : H( ) signed by pkAlice

      • Pay to pkChuck : H( ) signed by pkAlice

    • Double-spending attack

      • The main design challenge in digital currency

ScroogeCoin

  • Scrooge publishes a history of all transactions (a block chain, signed by Scrooge)

    • Optimization: put multiple transactions in the same block

  • CreateCoins transaction creates new coins

    • transID: 73

    • type:CreateCoins

    • coins created

      • num: 0, value: 3.2, recipient: 0x…

        • coinID: 73(0)

      • num: 1, value: 1.4, recipient: 0x…

        • coinID: 73(1)

      • num: 2, value: 7.1, recipient: 0x…

        • coinID: 73(2)

    • Valid, because I said so.

  • PayCoins transaction consumes (and destroys) some coins, and creates new coins of the same total value

    • transID: 73

    • type:PayCoins

    • coins created

      • num: 0, value: 3.2, recipient: 0x…

      • num: 1, value: 1.4, recipient: 0x…

      • num: 2, value: 7.1, recipient: 0x…

    • consumed coinIDs: 68(1), 42(0), 72(3)

    • signatures

    • Valid if:

      • consumed coins valid,

      • not already consumed,

      • total value out = total value in, and

      • signed by owners of all consumed coins

  • Immutable coins

    • Coins can’t be transferred, subdivided, or combined.

    • How to overcome this problem?

      • You can get the effect of dividing coins by using transactions to subdivide: create new transaction to consume your coin and pay out two new coins to yourself

  • Advantages?

    • Prevents double spending

    • Makes sure all transactions are valid, i.e., each coin is consumed only once

  • Problems?

    • Scrooge has too much influence

    • He cannot spend other peoples coins (can’t create fake transactions) – why?

    • However, he can stop endorsing other peoples transaction and deny them service

    • If he is greedy, he may also ask for a service/transaction fee for endorsing every transaction

    • Also, as Scrooge is in charge of creating new coins, he can create as many for himself as he wants

    • Last, he can get “bored” of the system and just stop updating the block chain altogether!

Descroogifying the Currency

Crucial question:

Can we descroogify the currency, and operate without any central, trusted party?