Lecture 1 Notes: Intro to Crypto and Cryptocurrencies

Cryptocurrencies

Most of the slides used are derived from "Bitcoins and Cryptocurrencies Technologies – A Comprehensive Introduction", Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller & Steven Goldfeder, 2016, Princeton University Press.

Lecture 1: Intro to Crypto and Cryptocurrencies

This lecture covers:

Background on cryptographic primitives useful for building "crypto"currencies.
Hash functions.
Properties of hash functions we are interested in.
Applications that make use of these properties.
Data structures that use hash functions.
Digital signatures.
Public Keys and Identities.
Two simple cryptocurrencies (precursors to Bitcoins).
- Goofycoin.
- Scroogecoin.

Cryptographic Hash Functions

A hash function is a mathematical function that:

Maps items (values) in the domain to items (values) in the range.
Converts any digital data into an output string with a fixed number of characters.
Hashing is the one-way act of converting the data (called a message) into the output (called the hash).
Inputs (or items in the domain) can be any size (not-fixed); technically size of input is not unbounded in practice.
Outputs (or items in the range) are fixed-size (we’ll generally employ a hash function such as SHA-256 that has an output size of 256 bits).
Efficiently computable, i.e., the mapping should be efficiently (in polynomial time in terms of the input size) computable.

Cryptographically Secure Hash Functions

In cryptography (and cryptography based applications such as bitcoins), we are interested in a special type of hash function, often referred to as cryptographically secure hash functions.

A cryptographically secure hash function satisfies the following additional security properties:

Collision Resistance.
Hiding or Pre-image Resistance.
Puzzle-friendliness.

Property 1: Collision Resistance

Strong Collision Resistance:
- A hash function is said to have strong collision resistance if it is computationally infeasible to find any two distinct inputs that produce the same hash output.
- In other words, it is extremely difficult to find any collision at all. Strong collision resistance provides a high level of security against adversaries attempting to find collisions.
Weak Collision Resistance (or second pre-image resistance):
- A hash function is considered to have weak collision resistance if, given an input message, it is computationally infeasible to find any other input that produces the same hash output as the original message.
- In this case, the original message is known, and the goal is to find a different input that hashes to the same value. Weak collision resistance is generally easier to achieve compared to strong collision resistance.

Collision Resistance: Equations

Strong collision resistance: Infeasible to find x and y such that $x != y$ and $H(x) = H(y)$
Weak collision resistance: Given x and $H(x)$ , it is infeasible to find $y != x$ such that $H(y) = H(x)$

Existence of Collisions

For all hash functions (cryptographically secure or not), collisions exist.
For cryptographically secure hash functions, it is difficult to find collisions.

Finding Collisions in Secure Hash Functions

How to find a collision in a Secure Hash function with a 256-bit output?
- Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you find a collision.
  - How long does this take?
    - Worst-case - $2^{256} + 1$ inputs.
    - On average – more than 50% chance of finding collision after $2^{128}$ inputs (Birthday paradox).
    - More than 99.8% chance of collision after $2^{130}$ randomly chosen inputs.
  - Brute-force always works, no matter what H is, in finding collision. However, it takes too long to matter ( $2^{128}$ is a lot of tries!).
- Strategy 2: Find cryptographic or other weaknesses in hash functions.
  - Is the following function cryptographically secure $H(x) = x \mod 2^3$ ? Yes/No? Why?
  - Most cryptographically secure hash functions also have weaknesses. E.g., MD5 was considered to be secure, until after many years of research, collisions were found. SHA 256 (currently used secure hash function) has no known attacks, but we don’t know it is secure! No Hash function has proven to be collision resistant!

Example Calculation

$2^{256} = 115792089237316195423570985008687907853269984665640564039457584007913129639936$

Since 3 < 2^{256}, the remainder is simply:

$3 \mod 2^{256} = 3$

$2^{256} \mod 2^{256} = 0$

So:

$(3 + 2^{256}) \mod 2^{256} = (3 \mod 2^{256} + 2^{256} \mod 2^{256}) \mod 2^{256} = (3 + 0) \mod 2^{256} = 3$

Application of Collision-resistance: Hash as Message Digest

Message digest: Output of a hash function is also called a message digest.
Now, if H is a secure hash function and if we know $H(x) = H(y)$ , is it safe to assume that $x = y$ ? Why?
- Yes! Because if the above is not true, it violates the collision-resistance property and H would not be secure!
Application of message digests?
- Verify integrity of large files.
- To verify integrity, rather than comparing files just compare hashes or message digests!
- Useful because message digest or hash is smaller compared to inputs!
A hash serves as a fixed-length digest, or unambiguous summary, of a message!

Property 2: Hiding

Pre-image Resistance:
- A hash function is said to have pre-image resistance if it is computationally infeasible to find any input that maps to a specific hash output.
- In other words, given a hash value, it should be difficult to determine the original input that produced that hash. Pre-image resistance ensures that the hash function's output provides no information about the input.

Hiding or Pre-image Resistance

Given $H(x)$ , it is infeasible to find x.
The property cannot be true in the form stated – Consider a simple example: H(“heads”) H(“tails”) easy to find x! Why? Input x is not spread out or uniformly distributed, $P(x=\text{“heads”}) = 0.5$ , $P(x=\text{“tails”}) = 0.5$ , $P(\text{all other x}) = 0$ !

Achieving Hiding or Pre-image Resistance

Combine the input x which is not spread out (or not uniformly distributed) with another input r which is spread out or uniformly distributed!

Hiding Property (More Precisely)

If r is chosen from a probability distribution that has high min-entropy (the possible outcomes of the random variable are difficult to predict or guess), then given $H(r | x)$ , it is infeasible to find x.
Min-entropy: Entropy of a probability distribution function captures the predictability of the output of the function!
Let r be a random variable which takes values from the set $\text{{1,2,3..n}}$ , where n is very large such as $2^{256}$ , according to some probability distribution function f.
- If f is high min-entropy -> distribution of the value of r is “very spread out” or “uniform”.
- In other words, r takes each value 1,2,…,n with probability exactly 1/n. If n is very large (e.g., $2^{256}$ ), the probability of correctly guessing which value it took is very small (negligible)!

Application of Hiding: Commitments

Commit to a value, reveal it later.
Analogy: Want to “seal a value in an envelope”, and “open the envelope” later.
A commitment scheme consists of two algorithms:
- com := commit(msg, nonce)
- match := verify(com, msg, nonce)
We require that the following two security properties hold:
- Hiding: Given com, it is infeasible to find msg.
- Binding: It is infeasible to find two pairs (msg, nonce) and (msg’, nonce’) such that $msg != msg’$ and commit( msg, nonce ) == commit( msg’, nonce’ )
  - The binding property of a hash function ensures that it is computationally infeasible to find two different inputs that produce the same hash output.

Commitment API

com := commit(msg, nonce)
- The commit function takes a message and a secret random value, called a nonce, as input and returns a commitment.
match := verify(com, msg, nonce)
- The verify function takes a commitment, nonce, and message as input. It returns true if com == commit(msg, nonce) and false otherwise.
To seal msg in envelope:
1. Select a random nonce and keep it secret.
2. Compute the commitment com := commit(msg, nonce).
3. Publish com.
To open envelope:
1. Publish nonce, msg.
2. Anyone can use verify() to check validity of msg and the previously published com.

Security Properties of Commitment APIs

com := commit(msg, nonce)
match := verify(com, msg, nonce)
Security properties:
- Hiding: Given com (and if nonce is chosen from a distribution with high min-entropy), infeasible to find msg.
- Binding: Infeasible to find <msg’, nonce’> != <msg, nonce> such that: verify(commit(msg, nonce), msg’, nonce’) == true
So, how to implement a commitment scheme such that these two properties hold?

Implementing Commitment APIs using Hash Functions

commit(msg, nonce) := H(nonce | msg) where nonce is a random 256-bit value & H is a cryptographically secure hash function
verify(com, msg, nonce) := (H(nonce | msg) == com )
Security properties:
- Hiding: Given H(nonce | msg), infeasible to find msg.
 - Hiding or pre-image resistant property of secure Hash function
- Binding: Infeasible to find <msg’, nonce’> != <msg, nonce> such that H(nonce’ | msg’) == H(nonce | msg)
 - Collision-resistance property of secure Hash function

Property 3: Puzzle-friendliness

Puzzle-friendliness property: For every possible n-bit output value y, if k is chosen from a distribution with high min-entropy, then it is infeasible to find x such that $H(k | x) = y$ in time significantly less than $2^n$ .

Intuition: If you want to target a Hash function H to have a particular output value y, and if part of the input (i.e., k) is chosen in a suitably randomized fashion, then it’s very difficult to find the other part of the input x to exactly hit the target output value (y).

Application of Puzzle-friendliness: Search Puzzle

What is a search puzzle?
- Given: A “puzzle ID” id (from high min-entropy distrib.), and a target set Y.
- Objective: Try to find a “solution” x such that $H(id | x) \in Y$ .
Puzzle-friendly property implies that no solving strategy is much better than trying random values of x.
Strength of the puzzle depends on the size of Y
- $Y={y_1}$ , Most time-consuming!
- $Y={y1,y2,y_3}$
- $Y={y1,y2,y3,y2}^{256}$ , Trivial!

Construction of Hash Functions

Hash functions are typically constructed from fixed-input compression functions!

Example: See construction of SHA-256 Hash function -> SHA-256 used in Bitcoins

Also referred to as Merkle-Damgard Transform

Theorem: If c is collision-free, then SHA-256 is collision-free.

Hash Pointers and Data Structures

Hash pointer is:
- Pointer to where some info/data is stored, and
- (Cryptographic) hash of the info
What can you do with a hash pointer?
- Retrieve or get back the info/data
- Verify that the info/data hasn’t changed
Use hash pointers to build data structures!

Block Chains

What is a Block Chain?
- Linked list with hash pointers
What is it used for?
- Tamper-evident log or register data

Tamper-evident Log

What is a Tamper-evident log and how to detect tampering?
How many verifications to prove that some data has not been tampered with (worst-case)?
- O(n)

Merkle Tree

Binary tree with hash pointers!
Drawback:
- More number of blocks
Advantage:
- Proving membership of a data block in the tree is easy
- Only need to show O(log n) items
- In other words, membership verification in O(log n) time/space
How to prove non-membership?
- Sorted Merkle trees: Order leafs of the tree in some fashion, say lexicographically, numerically, etc.
- Verify membership of data before and after the missing one!
- Non-membership verification also takes O(log n) time/space

Hash Pointers in Data Structures

Hash pointers can be used in any pointer-based data structure that has no cycles

Digital Signatures

Second cryptographic primitive (in addition to Hash functions) that we will need to build cryptocurrencies (and bitcoins)
What are the properties we need from digital signatures? – same as properties we need from handwritten signatures
- Only you can sign, but anyone can verify (security)
- Signature tied to a particular document - can’t be cut-and-pasted to another document (unforgeability)

Digital Signatures APIs

(sk, pk) := generateKeys(keysize)
- sk: secret signing key
- pk: public verification key
sig := sign(sk, message) can be randomized algorithms
isValid := verify(pk, message, sig) Is a deterministic algorithm

Digital Signatures Requirements

Valid signatures must always verify correctly
- i.e., verify(pk, message, sign(sk, message)) == true
- Basic property for signatures to be useful!
Signatures should be existentially unforgeable -> can’t forge signatures
- i.e., adversary who knows pk and gets to see signatures on messages of his choice, still can’t produce a verifiable signature on another message
- Can be formalized by means of the unforgeability game described next

Unforgeability Game

Challenger generates key pair (sk, pk) and gives pk to the attacker
Attacker requests signatures on messages m0, m1, …
Challenger returns signatures sign(sk, m0), sign(sk, m1), …
Attacker outputs a message M and signature sig
Attacker wins if verify(pk, M, sig) == true and M is not in {m0, m1, …}
Signature scheme is unforgeable if and only if, no matter what algorithm the adversary is using, his chance of successfully generating a valid M, sig is negligibly small!

Digital Signatures - Practical Concerns

Many digital signature algorithms are randomized (esp. ones used in cryptocurrencies)
- Need good source of randomness
- Bad randomness -> Even an otherwise secure signature algorithm is not secure
Signature algorithms have fixed sized inputs -> How to sign large messages (whose size is greater than input size of the algorithm)?
- Use Hash(message) rather than message
How to sign the entire Block chain?
- Sign the entire hash pointer of the head block!
- This signature “covers” the whole block chain structure

Digital Signatures used by Bitcoins

Bitcoin uses Elliptic Curve Digital Signature Algorithm (ECDSA) standard
- ECDSA is a US Government standard
- Bitcoin uses ECDSA over the standard elliptic curve secp256k1 -> this curve is rarely used outside Bitcoins
- Provides 128 bit of security (equivalent to performing 2128 symmetric encryptions)
  - Private key – 256 bits
  - Public key compressed – 257 bits
  - Message to be signed – 256 bits
  - Signature – 512 bits
- Technical details of ECDSA will be skipped here
- Good randomness is essential for ECDSA
  - If you foul this up in generateKeys() or sign() -> you probably leaked your private key

Public Keys as Identities

Can we use public key pk, as generated before by generateKeys(keysize), as an identity?
- For example, if you see signature sig such that verify(pk, msg, sig)==true, think of it as: pk says, “[msg]”.
But, pk by itself cannot be used as an identity!
- To “speak for” pk, you must know matching secret key sk

Identities in a Cryptocurrency

How to make a new identity?
- Create a new, random key-pair (sk, pk)
- pk is the public “name” you can use [usually better to use H(pk)]
- sk lets you “speak for” the identity
- You control the identity pk, because only you know sk
- Even if pk “looks random” that’s fine, nobody needs to know your real identity for the cryptocurrency application
- Just like while spending an actually currency note

Decentralized Identity Management

Anybody can make a new identity at any time make as many as you want!
No central point of coordination
These identities are called “addresses” in Bitcoin

Privacy

Addresses not directly connected to real-world identity
But observer can link together an address’s activity over time, make inferences
Later: a whole chapter on privacy in Bitcoins …

Simple Cryptocurrencies

GoofyCoin

Coin Creation:
- Goofy can create new coins signed by pkGoofy
- CreateCoin [uniqueCoinID]
- New coins belong to me.
Spending Coins:
- A coin’s owner can spend it.
  - CreateCoin [uniqueCoinID] signed by pkGoofy
  - Pay to pkAlice : H( ) signed by pkGoofy
  - Alice owns it now.
- The recipient can pass on the coin again.
  - CreateCoin [uniqueCoinID] signed by pkGoofy
  - Pay to pkAlice : H( ) signed by pkGoofy
  - Pay to pkBob : H( ) signed by pkAlice
  - Bob owns it now.
- Double Spending Attack:
  - CreateCoin [uniqueCoinID] signed by pkGoofy
  - Pay to pkAlice : H( ) signed by pkGoofy
  - Pay to pkBob : H( ) signed by pkAlice
  - Pay to pkChuck : H( ) signed by pkAlice
- Double-spending attack
  - The main design challenge in digital currency

ScroogeCoin

Scrooge publishes a history of all transactions (a block chain, signed by Scrooge)
- Optimization: put multiple transactions in the same block
CreateCoins transaction creates new coins
- transID: 73
- type:CreateCoins
- coins created
  - num: 0, value: 3.2, recipient: 0x…
    - coinID: 73(0)
  - num: 1, value: 1.4, recipient: 0x…
    - coinID: 73(1)
  - num: 2, value: 7.1, recipient: 0x…
    - coinID: 73(2)
- Valid, because I said so.
PayCoins transaction consumes (and destroys) some coins, and creates new coins of the same total value
- transID: 73
- type:PayCoins
- coins created
  - num: 0, value: 3.2, recipient: 0x…
  - num: 1, value: 1.4, recipient: 0x…
  - num: 2, value: 7.1, recipient: 0x…
- consumed coinIDs: 68(1), 42(0), 72(3)
- signatures
- Valid if:
  - consumed coins valid,
  - not already consumed,
  - total value out = total value in, and
  - signed by owners of all consumed coins
Immutable coins
- Coins can’t be transferred, subdivided, or combined.
- How to overcome this problem?
  - You can get the effect of dividing coins by using transactions to subdivide: create new transaction to consume your coin and pay out two new coins to yourself
Advantages?
- Prevents double spending
- Makes sure all transactions are valid, i.e., each coin is consumed only once
Problems?
- Scrooge has too much influence
- He cannot spend other peoples coins (can’t create fake transactions) – why?
- However, he can stop endorsing other peoples transaction and deny them service
- If he is greedy, he may also ask for a service/transaction fee for endorsing every transaction
- Also, as Scrooge is in charge of creating new coins, he can create as many for himself as he wants
- Last, he can get “bored” of the system and just stop updating the block chain altogether!

Descroogifying the Currency

Crucial question:

Can we descroogify the currency, and operate without any central, trusted party?