Stat 5101 Lecture Slides: Deck 1 Probability and Expectation on Finite Sample Spaces
Sets
- In mathematics, a set is a collection of objects considered as a single entity.
- The objects within a set are called its elements.
- denotes that x is an element of the set S.
- indicates that set A is a subset of set S, meaning every element of A is also an element of S.
- Sets can be indicated by listing elements within curly brackets, e.g., .
- Sets can contain various types of elements, not just numbers; e.g., .
- The empty set is the unique set containing no elements.
- The empty set is denoted by or .
- denotes the set of natural numbers .
- denotes the set of integers .
- denotes the set of real numbers.
- Set builder notation: represents the set of elements in S that satisfy the specified condition.
- denotes the image or range of the function h with domain S.
- Example: { x \in R : x > 0 } is the set of positive real numbers.
Intervals
- Intervals are a special type of set.
- Notation:
- (a, b) = { x \in R : a < x < b }: Open interval with endpoints a and b.
- : Closed interval with endpoints a and b.
- (a, b] = { x \in R : a < x \leq b }: Half-open interval.
- [a, b) = { x \in R : a \leq x < b }: Half-open interval.
- These notations assume a and b are real numbers with a < b.
- Infinite intervals:
- (a, \infty) = { x \in R : a < x }: Open interval.
- : Closed interval.
- (-\infty, b) = { x \in R : x < b }: Open interval.
- : Closed interval.
- : The set of all real numbers, both open and closed.
Functions
A mathematical function is a rule that maps each point in one set (the domain) to a point in another set (the codomain).
Functions can also be called maps, mappings, or transformations.
Functions are often denoted by single letters, such as f.
represents the value of the function f at the point x.
If X is the domain and Y the codomain of the function f, this is written as or .
To define a function, a formula can be used, e.g., , where the domain is specified.
Alternatively, the notation can be used, read as “x maps to x squared,” but the domain must be indicated separately.
For small finite sets, a table can define a function:
Input 1 2 3 4 Output 1/10 2/10 3/10 4/10 Functions can map any set to any set. For instance, one could have:
- Input red | orange | yellow | green | blue
- Output tomato | orange | lemon | lime | corn
It's crucial to be precise about the domain of a function.
For example, is only properly defined for .
The exponential function is denoted as
- with values , also written as .
The logarithmic function is denoted as
- with values .
These functions are inverses of each other:
- for all x in the domain of .
- for all x in the domain of .
In this course, always denotes the base e logarithm (natural logarithm).
Constant functions (e.g., ) and identity functions (e.g., ) are simple but important.
It is more correct to say rather than is a function.
Probability Models
- A probability model, also called a probability distribution, is a fundamental concept in probability theory.
- Specifying a probability model can be done in several ways:
- Probability mass function (PMF).
- Probability density function (PDF).
- Distribution function (DF).
- Probability measure.
- Expectation operator.
- Function mapping from one probability model to another.
Probability Mass Functions (PMF)
- A probability mass function (PMF) is a function where:
- S is the sample space (nonempty set).
- R is the set of real numbers.
- for all (non-negativity).
- (sums to one).
- If the sample space is , the PMF can be written as for and .
- The underlying concept of a PMF is more important than the specific notation used.
Interpretation of PMFs
- An element of the sample space is called an outcome.
- The value of the PMF at an outcome x is the probability of that outcome.
- Probabilities are real numbers between 0 and 1, inclusive.
- Probability 0 means "cannot happen" (or is ignored).
- Probability 1 means "certain to happen" (or the possibility of it not happening is ignored).
Finite Probability Models
- A probability model is finite if its sample space is a finite set.
- Example: Smallest possible sample space , with .
- Example: Next simplest sample space , with and where .
Bernoulli Distribution
- A probability distribution on the sample space is called a Bernoulli distribution.
- If , then it is denoted as .
- A Bernoulli distribution can represent any two-point set by coding the points.
Statistical Models
- A statistical model is a family of probability models.
- The Bernoulli distribution often refers to the Bernoulli family of distributions, the set of all distributions for .
- The PMF of the distribution can be defined by
- This family of PMFs is .
- is the argument of the function , while is the parameter.
- The set of allowed parameter values is called the parameter space.
- For the Bernoulli statistical model, the parameter space is the interval .
- Example: Sample space with three points , with , , and .
Discrete Uniform Distribution
- For a sample space , the uniform distribution assigns equal probability to each outcome.
- The PMF is defined as for .
- Applications include coin flips and dice rolls.
- Coin flip: Modeled by the uniform distribution on a two-point sample space.
- Die roll: Modeled by a uniform distribution on a six-point sample space.
Supports
- The support of a probability distribution is the set { x \in S : f(x) > 0 }, where S is the sample space and f is the PMF.
- The distribution is concentrated on the support.
- Points not in the support can be removed from the sample space without consequence.
- In the Bernoulli family, all distributions have support except for
- the distribution for , which is concentrated at 0, and
- the distribution for , which is concentrated at 1.
Events and Measures
- A subset of the sample space is called an event.
- If f is the PMF, the probability of an event A is defined by
- By convention, a sum with no terms is zero, so .
- This defines a probability measure that maps events to real numbers
. - PMF and probability measures determine each other
- goes from PMF to measure, and
- goes from measure to PMF.
- Note the distinction between the outcome x and the event .
- For any event A, we have because all the terms in the sum in are nonnegative.
- For any event A, we have because all the terms in the sum in are nonnegative.
Random Variables and Expectation
- A real-valued function on the sample space is called a random variable.
- If f is the PMF, then the expectation of a random variable X is defined by
- This defines an expectation operator E that maps random variables to real numbers
.
Sets Again: Cartesian Product
- The Cartesian product of sets A and B, denoted , is the set of all pairs of elements
- We write the Cartesian product of A with itself as .
- In particular, is the space of two-dimensional vectors or points in two-dimensional space.
- Similarly for triples
- We write .
- In particular, is the space of three-dimensional vectors or points in three-dimensional space.
- Similarly for n-tuples
- We write when there are n sets in the product.
- In particular, is the space of n-dimensional vectors or points in n-dimensional space.
- Any function of random variables is a random variable.
Averages and Weighted Averages
- The average of the numbers is
- The weighted average of the numbers with the weights is
- The weights in a weighted average are required to be nonnegative and sum to one
- Expectation and weighted averages are the same concept in different language and notation.
- In expectation we sum
values of random variable · probabilities - in weighted averages we sum
arbitrary numbers · weights - but weights are just like probabilities (nonnegative and sum to one) and the values of a random variable can be defined arbitrarily (whatever we please) and are numbers.
Random Variables and Expectation (cont.)
*When using f for the PMF, S for the sample space, and x for points of S, if , then we often use X for the identity random variable x -> x
\begin{aligned}
E(X) &= \sum{x \in S} xf(x) \ E{g(X)} &= \sum{x \in S} g(x)f(x)
\end{aligned}
Probability of Events and Random Variables
- Suppose we are interested in , where A is an event involving a random variable
A = { s \in S : 4 < X(s) < 6 } - A convenient shorthand for this is \Pr(4 < X < 6).
- The explicit subset A of the sample space the event consists of is not mentioned.
- Nor is the sample space S explicitly mentioned.
- Since X is a function , the sample space is implicitly mentioned.
Sets Again: Set Difference
- The difference of sets A and B, denoted A \ B, is the set of all points of A that are not in B
Functions Again: Indicator Functions
- If , the function defined by
I_A(x) = \begin{cases}
0, & x \in S \backslash A \
1, & x \in A
\end{cases} - is called the indicator function of the set A.
- If S is the sample space of a probability model, then is a random variable.
Indicator Random Variables
- Any indicator function on the sample space is a random variable.
- Conversely, any random variable X that takes only the values zero or one (we say zero-or-one-valued) is an indicator function.
- Define
- Then
Probability is a Special Case of Expectation
- If is the probability measure and E the expectation operator of a probability model, then
Philosophy
- Philosophers and philosophically inclined mathematicians and scientists have spent centuries trying to say exactly what probability and expectation are.
- This project has been a success in that it has piled up an enormous literature.
- It has not generated agreement about the nature of probability and expectation.
- If you ask two philosophers what probability and expectation are, you will get three or four conflicting opinions
Frequentism
- The frequentist theory of probability and expectation holds that they are objective facts about the world.
- Probabilities and expectations can actually be measured in an infinite sequence of repetitions of a random phenomenon, if each repetition has no influence whatsoever on any other repetition.
- Let be such an infinite sequence of random variables and for each n define
- then gets closer and closer to — which is assumed to be the same for all i because each is the “same” random phenomenon — as n goes to infinity.
Subjectivism
- The subjectivist theory of probability and expectation holds that they are all in our heads, a mere reflection of our uncertainty about what will happen or has happened.
- Consequently, subjectivism is personalistic.
- You have your probabilities, which reflect or “measure” your uncertainties.
- I have mine.
- There is no reason we should agree, unless our information about the world is identical, which it never is.
- Hiding probabilities and expectations inside the human mind, which is incompletely understood, avoids the troubles of frequentism, but it makes it hard to motivate any properties of such a hidden, perhaps mythical, thing.
Formalism
- The mainstream philosophy of all of mathematics — not just probability theory — of the twentieth century and the twenty-first, what there is of it so far, is formalism.
Mathematics may be defined as the subject in which we never know what we are talking about, nor whether what we are saying is true — Bertrand Russell - Formalists only care about the form of arguments, that theorems have correct proofs, conclusions following from hypotheses and definitions by logically correct arguments.
- It does not matter what the hypotheses and definitions “really” mean (“we never know what we are talking about”) nor whether they are “really” true (“nor whether what we are saying is true”).
Everyday Philosophy
How statisticians really think about probability and expectation.
- You’ve got two kinds of variables:
- random variables are denoted by capital letters like X and
- ordinary variables are denoted by lower case letters like x.
- A random variable X doesn’t have a value yet, because you haven’t seen the results of the random process that generates it. After you have seen it, it is either a number or an ordinary variable x standing for whatever number it is.
Change of Variable
- Suppose is the PMF of a random variable X having sample space S, and is another random variable.
- If we want to consider Y as the “original” random variable rather than X, then we need to determine its PMF .
- This is a function on the codomain of g, call that T, given by
- and
- Thus we have derived the change-of-variable formula for discrete probability distributions.
- The probability distribution with PMF is sometimes called the image distribution of the distribution with PMF because its support is the image of the support of X under the function g
(if S is the support of X). But (*) works even if S is larger than the support of X.
The PMF of a Random Variable
- A random variable is a function on the sample space. Hence it induces an image distribution by the change-of-variable formula.
- We say two random variables X and Y having different probability models (possibly different sample spaces and different PMF’s) are equal in distribution or have the same distribution if they have the same image distribution.
- If probability theory is to make sense, it had better be true that if Y = g(X) and fX and fY are the PMF’s of X and Y , then
The PMF of a Random Vector
- For any random variable X taking values in a finite subset S of R and any random variable Y taking values in a finite subset T of R define
- By the change-of-variable formula, is the PMF of the two-dimensional random vector (X, Y ).
- For any random variables taking values in finite subsets of R, respectively, define
- By the change-of-variable formula, is the PMF of the n-dimensional random vector .
Independence
- The only notion of independence used in probability theory, sometimes called statistical independence or stochastic independence for emphasis, but the adjectives are redundant.
- Random variables are independent if the PMF f of the random vector is the product of the PMF’s of the component random variables
- where
Counting
- How many ways are there to arrange n distinct things?
- You have n choices for the first. After the first is chosen, you have n − 1 choices for the second. After the second is chosen, you have n − 2 choices for the third.
- There are
arrangements, which is read “n factorial”. - n factorial can also be written
By definition 0! = 1. There is one way to order zero things. - How many ways are there to arrange k things chosen from n distinct things?
After the first is chosen, you have n − 1 choices for the second. After the second is chosen, you have n − 2 choices for the third.
You stop when you have made k choices. There are
arrangements, which is read “the number of permutations of n things taken k at a time”.
The Binomial Distribution
- Let be independent and identically distributed Bernoulli random variables.
Identically distributed means they all have the same parameter value: they are all with the same p.
Define
The distribution of Y is called the binomial distribution for sample size n and success probability p, indicated Bin(n, p) for short.
Binomial Distribution (cont.)
Hence the binomial distribution has PMF
The sample space is and the parameter space is [0, 1] just like for the Bernoulli distribution.
Addition Rules
- We have now met another “brand name” distribution Bin(n, p).
- We have also met our first “addition rule”.
- If are independent and identically distributed (IID) random variables, then is a random variable.