Lecture 1 Notes: Intro to Crypto and Cryptocurrencies
Cryptocurrencies
Most of the slides used are derived from "Bitcoins and Cryptocurrencies Technologies – A Comprehensive Introduction", Arvind Narayanan, Joseph Bonneau, Edward Felten, Andrew Miller & Steven Goldfeder, 2016, Princeton University Press.
Lecture 1: Intro to Crypto and Cryptocurrencies
This lecture covers:
Background on cryptographic primitives useful for building "crypto"currencies.
Hash functions.
Properties of hash functions we are interested in.
Applications that make use of these properties.
Data structures that use hash functions.
Digital signatures.
Public Keys and Identities.
Two simple cryptocurrencies (precursors to Bitcoins).
Goofycoin.
Scroogecoin.
Cryptographic Hash Functions
A hash function is a mathematical function that:
Maps items (values) in the domain to items (values) in the range.
Converts any digital data into an output string with a fixed number of characters.
Hashing is the one-way act of converting the data (called a message) into the output (called the hash).
Inputs (or items in the domain) can be any size (not-fixed); technically size of input is not unbounded in practice.
Outputs (or items in the range) are fixed-size (we’ll generally employ a hash function such as SHA-256 that has an output size of 256 bits).
Efficiently computable, i.e., the mapping should be efficiently (in polynomial time in terms of the input size) computable.
Cryptographically Secure Hash Functions
In cryptography (and cryptography based applications such as bitcoins), we are interested in a special type of hash function, often referred to as cryptographically secure hash functions.
A cryptographically secure hash function satisfies the following additional security properties:
Collision Resistance.
Hiding or Pre-image Resistance.
Puzzle-friendliness.
Property 1: Collision Resistance
Strong Collision Resistance:
A hash function is said to have strong collision resistance if it is computationally infeasible to find any two distinct inputs that produce the same hash output.
In other words, it is extremely difficult to find any collision at all. Strong collision resistance provides a high level of security against adversaries attempting to find collisions.
Weak Collision Resistance (or second pre-image resistance):
A hash function is considered to have weak collision resistance if, given an input message, it is computationally infeasible to find any other input that produces the same hash output as the original message.
In this case, the original message is known, and the goal is to find a different input that hashes to the same value. Weak collision resistance is generally easier to achieve compared to strong collision resistance.
Collision Resistance: Equations
Strong collision resistance: Infeasible to find x and y such that and
Weak collision resistance: Given x and , it is infeasible to find such that
Existence of Collisions
For all hash functions (cryptographically secure or not), collisions exist.
For cryptographically secure hash functions, it is difficult to find collisions.
Finding Collisions in Secure Hash Functions
How to find a collision in a Secure Hash function with a 256-bit output?
Strategy 1: Brute-force – Continue to randomly pick inputs and compute its Hash until you find a collision.
How long does this take?
Worst-case - inputs.
On average – more than 50% chance of finding collision after inputs (Birthday paradox).
More than 99.8% chance of collision after randomly chosen inputs.
Brute-force always works, no matter what H is, in finding collision. However, it takes too long to matter ( is a lot of tries!).
Strategy 2: Find cryptographic or other weaknesses in hash functions.
Is the following function cryptographically secure ? Yes/No? Why?
Most cryptographically secure hash functions also have weaknesses. E.g., MD5 was considered to be secure, until after many years of research, collisions were found. SHA 256 (currently used secure hash function) has no known attacks, but we don’t know it is secure! No Hash function has proven to be collision resistant!
Example Calculation
Since 3 < 2^{256}, the remainder is simply:
So:
Application of Collision-resistance: Hash as Message Digest
Message digest: Output of a hash function is also called a message digest.
Now, if H is a secure hash function and if we know , is it safe to assume that ? Why?
Yes! Because if the above is not true, it violates the collision-resistance property and H would not be secure!
Application of message digests?
Verify integrity of large files.
To verify integrity, rather than comparing files just compare hashes or message digests!
Useful because message digest or hash is smaller compared to inputs!
A hash serves as a fixed-length digest, or unambiguous summary, of a message!
Property 2: Hiding
Pre-image Resistance:
A hash function is said to have pre-image resistance if it is computationally infeasible to find any input that maps to a specific hash output.
In other words, given a hash value, it should be difficult to determine the original input that produced that hash. Pre-image resistance ensures that the hash function's output provides no information about the input.
Hiding or Pre-image Resistance
Given , it is infeasible to find x.
The property cannot be true in the form stated – Consider a simple example: H(“heads”) H(“tails”) easy to find x! Why? Input x is not spread out or uniformly distributed, , , !
Achieving Hiding or Pre-image Resistance
Combine the input x which is not spread out (or not uniformly distributed) with another input r which is spread out or uniformly distributed!
Hiding Property (More Precisely)
If r is chosen from a probability distribution that has high min-entropy (the possible outcomes of the random variable are difficult to predict or guess), then given , it is infeasible to find x.
Min-entropy: Entropy of a probability distribution function captures the predictability of the output of the function!
Let r be a random variable which takes values from the set , where n is very large such as , according to some probability distribution function f.
If f is high min-entropy -> distribution of the value of r is “very spread out” or “uniform”.
In other words, r takes each value 1,2,…,n with probability exactly 1/n. If n is very large (e.g., ), the probability of correctly guessing which value it took is very small (negligible)!
Application of Hiding: Commitments
Commit to a value, reveal it later.
Analogy: Want to “seal a value in an envelope”, and “open the envelope” later.
A commitment scheme consists of two algorithms:
com := commit(msg, nonce)match := verify(com, msg, nonce)
We require that the following two security properties hold:
Hiding: Given com, it is infeasible to find msg.
Binding: It is infeasible to find two pairs (msg, nonce) and (msg’, nonce’) such that and
commit( msg, nonce ) == commit( msg’, nonce’ )The binding property of a hash function ensures that it is computationally infeasible to find two different inputs that produce the same hash output.
Commitment API
com := commit(msg, nonce)The commit function takes a message and a secret random value, called a nonce, as input and returns a commitment.
match := verify(com, msg, nonce)The verify function takes a commitment, nonce, and message as input. It returns true if
com == commit(msg, nonce)and false otherwise.
To seal msg in envelope:
Select a random nonce and keep it secret.
Compute the commitment
com := commit(msg, nonce).Publish com.
To open envelope:
Publish nonce, msg.
Anyone can use
verify()to check validity of msg and the previously published com.
Security Properties of Commitment APIs
com := commit(msg, nonce)match := verify(com, msg, nonce)Security properties:
Hiding: Given com (and if nonce is chosen from a distribution with high min-entropy), infeasible to find msg.
Binding: Infeasible to find
<msg’, nonce’> != <msg, nonce>such that:verify(commit(msg, nonce), msg’, nonce’) == true
So, how to implement a commitment scheme such that these two properties hold?
Implementing Commitment APIs using Hash Functions
commit(msg, nonce) := H(nonce | msg)where nonce is a random 256-bit value & H is a cryptographically secure hash functionverify(com, msg, nonce) := (H(nonce | msg) == com )Security properties:
Hiding: Given
H(nonce | msg), infeasible to find msg.Hiding or pre-image resistant property of secure Hash function
Binding: Infeasible to find
<msg’, nonce’> != <msg, nonce>such thatH(nonce’ | msg’) == H(nonce | msg)Collision-resistance property of secure Hash function
Property 3: Puzzle-friendliness
Puzzle-friendliness property: For every possible n-bit output value y, if k is chosen from a distribution with high min-entropy, then it is infeasible to find x such that in time significantly less than .
Intuition: If you want to target a Hash function H to have a particular output value y, and if part of the input (i.e., k) is chosen in a suitably randomized fashion, then it’s very difficult to find the other part of the input x to exactly hit the target output value (y).
Application of Puzzle-friendliness: Search Puzzle
What is a search puzzle?
Given: A “puzzle ID” id (from high min-entropy distrib.), and a target set Y.
Objective: Try to find a “solution” x such that .
Puzzle-friendly property implies that no solving strategy is much better than trying random values of x.
Strength of the puzzle depends on the size of Y
, Most time-consuming!
, Trivial!
Construction of Hash Functions
Hash functions are typically constructed from fixed-input compression functions!
Example: See construction of SHA-256 Hash function -> SHA-256 used in Bitcoins
Also referred to as Merkle-Damgard Transform
Theorem: If c is collision-free, then SHA-256 is collision-free.
Hash Pointers and Data Structures
Hash pointer is:
Pointer to where some info/data is stored, and
(Cryptographic) hash of the info
What can you do with a hash pointer?
Retrieve or get back the info/data
Verify that the info/data hasn’t changed
Use hash pointers to build data structures!
Block Chains
What is a Block Chain?
Linked list with hash pointers
What is it used for?
Tamper-evident log or register data
Tamper-evident Log
What is a Tamper-evident log and how to detect tampering?
How many verifications to prove that some data has not been tampered with (worst-case)?
O(n)
Merkle Tree
Binary tree with hash pointers!
Drawback:
More number of blocks
Advantage:
Proving membership of a data block in the tree is easy
Only need to show O(log n) items
In other words, membership verification in O(log n) time/space
How to prove non-membership?
Sorted Merkle trees: Order leafs of the tree in some fashion, say lexicographically, numerically, etc.
Verify membership of data before and after the missing one!
Non-membership verification also takes O(log n) time/space
Hash Pointers in Data Structures
Hash pointers can be used in any pointer-based data structure that has no cycles
Digital Signatures
Second cryptographic primitive (in addition to Hash functions) that we will need to build cryptocurrencies (and bitcoins)
What are the properties we need from digital signatures? – same as properties we need from handwritten signatures
Only you can sign, but anyone can verify (security)
Signature tied to a particular document - can’t be cut-and-pasted to another document (unforgeability)
Digital Signatures APIs
(sk, pk) := generateKeys(keysize)sk: secret signing keypk: public verification key
sig := sign(sk, message)can be randomized algorithmsisValid := verify(pk, message, sig)Is a deterministic algorithm
Digital Signatures Requirements
Valid signatures must always verify correctly
i.e.,
verify(pk, message, sign(sk, message)) == trueBasic property for signatures to be useful!
Signatures should be existentially unforgeable -> can’t forge signatures
i.e., adversary who knows pk and gets to see signatures on messages of his choice, still can’t produce a verifiable signature on another message
Can be formalized by means of the unforgeability game described next
Unforgeability Game
Challenger generates key pair (sk, pk) and gives pk to the attacker
Attacker requests signatures on messages m0, m1, …
Challenger returns signatures sign(sk, m0), sign(sk, m1), …
Attacker outputs a message M and signature sig
Attacker wins if
verify(pk, M, sig) == trueand M is not in {m0, m1, …}Signature scheme is unforgeable if and only if, no matter what algorithm the adversary is using, his chance of successfully generating a valid M, sig is negligibly small!
Digital Signatures - Practical Concerns
Many digital signature algorithms are randomized (esp. ones used in cryptocurrencies)
Need good source of randomness
Bad randomness -> Even an otherwise secure signature algorithm is not secure
Signature algorithms have fixed sized inputs -> How to sign large messages (whose size is greater than input size of the algorithm)?
Use
Hash(message)rather than message
How to sign the entire Block chain?
Sign the entire hash pointer of the head block!
This signature “covers” the whole block chain structure
Digital Signatures used by Bitcoins
Bitcoin uses Elliptic Curve Digital Signature Algorithm (ECDSA) standard
ECDSA is a US Government standard
Bitcoin uses ECDSA over the standard elliptic curve secp256k1 -> this curve is rarely used outside Bitcoins
Provides 128 bit of security (equivalent to performing 2128 symmetric encryptions)
Private key – 256 bits
Public key compressed – 257 bits
Message to be signed – 256 bits
Signature – 512 bits
Technical details of ECDSA will be skipped here
Good randomness is essential for ECDSA
If you foul this up in
generateKeys()orsign()-> you probably leaked your private key
Public Keys as Identities
Can we use public key pk, as generated before by
generateKeys(keysize), as an identity?For example, if you see signature sig such that
verify(pk, msg, sig)==true, think of it as: pk says, “[msg]”.
But, pk by itself cannot be used as an identity!
To “speak for” pk, you must know matching secret key sk
Identities in a Cryptocurrency
How to make a new identity?
Create a new, random key-pair (sk, pk)
pk is the public “name” you can use [usually better to use H(pk)]
sk lets you “speak for” the identity
You control the identity pk, because only you know sk
Even if pk “looks random” that’s fine, nobody needs to know your real identity for the cryptocurrency application
Just like while spending an actually currency note
Decentralized Identity Management
Anybody can make a new identity at any time make as many as you want!
No central point of coordination
These identities are called “addresses” in Bitcoin
Privacy
Addresses not directly connected to real-world identity
But observer can link together an address’s activity over time, make inferences
Later: a whole chapter on privacy in Bitcoins …
Simple Cryptocurrencies
GoofyCoin
Coin Creation:
Goofy can create new coins signed by pkGoofy
CreateCoin [uniqueCoinID]New coins belong to me.
Spending Coins:
A coin’s owner can spend it.
CreateCoin [uniqueCoinID] signed by pkGoofyPay to pkAlice : H( ) signed by pkGoofyAlice owns it now.
The recipient can pass on the coin again.
CreateCoin [uniqueCoinID] signed by pkGoofyPay to pkAlice : H( ) signed by pkGoofyPay to pkBob : H( ) signed by pkAliceBob owns it now.
Double Spending Attack:
CreateCoin [uniqueCoinID] signed by pkGoofyPay to pkAlice : H( ) signed by pkGoofyPay to pkBob : H( ) signed by pkAlicePay to pkChuck : H( ) signed by pkAlice
Double-spending attack
The main design challenge in digital currency
ScroogeCoin
Scrooge publishes a history of all transactions (a block chain, signed by Scrooge)
Optimization: put multiple transactions in the same block
CreateCoinstransaction creates new coinstransID: 73type:CreateCoinscoins created
num: 0,value: 3.2,recipient: 0x…coinID: 73(0)
num: 1,value: 1.4,recipient: 0x…coinID: 73(1)
num: 2,value: 7.1,recipient: 0x…coinID: 73(2)
Valid, because I said so.
PayCoinstransaction consumes (and destroys) some coins, and creates new coins of the same total valuetransID: 73type:PayCoinscoins created
num: 0,value: 3.2,recipient: 0x…num: 1,value: 1.4,recipient: 0x…num: 2,value: 7.1,recipient: 0x…
consumed coinIDs: 68(1), 42(0), 72(3)
signatures
Valid if:
consumed coins valid,
not already consumed,
total value out = total value in, and
signed by owners of all consumed coins
Immutable coins
Coins can’t be transferred, subdivided, or combined.
How to overcome this problem?
You can get the effect of dividing coins by using transactions to subdivide: create new transaction to consume your coin and pay out two new coins to yourself
Advantages?
Prevents double spending
Makes sure all transactions are valid, i.e., each coin is consumed only once
Problems?
Scrooge has too much influence
He cannot spend other peoples coins (can’t create fake transactions) – why?
However, he can stop endorsing other peoples transaction and deny them service
If he is greedy, he may also ask for a service/transaction fee for endorsing every transaction
Also, as Scrooge is in charge of creating new coins, he can create as many for himself as he wants
Last, he can get “bored” of the system and just stop updating the block chain altogether!
Descroogifying the Currency
Crucial question:
Can we descroogify the currency, and operate without any central, trusted party?