Random Variables and Probability Models Notes

Random Variables and Probability Models

6.1 Expected Value of a Random Variable

A random variable is a variable whose value is a numerical outcome of a random phenomenon. It can be discrete or continuous.

  • Discrete Random Variable: A variable that can only take on a finite number of values or a countably infinite number of values. Outcomes can be listed.

  • Continuous Random Variable: A variable that can take on any value within a given range. It is not restricted to specific, separate values.

For both discrete and continuous random variables, the set of all the possible values and their associated probabilities is called the probability model. A probability model provides the probabilities associated with each possible value of a random variable.

When the probability model is known, the expected value can be calculated for a discrete random variable. The expected value represents the average value we expect the random variable to take over many trials. It's a weighted average where each value is weighted by its probability.

Let XX be a discrete random variable that can take on the values x<em>1,x</em>2,,xnx<em>1, x</em>2, …, x_n. The expected value of XX, written as E(X)E(X) or mu\,mu, is calculated using the formula:

E(X)=μ=<em>i=1nx</em>iP(xi)E(X) = \mu = \sum<em>{i=1}^{n} x</em>i \cdot P(x_i)

This formula means you multiply each possible value of XX by its corresponding probability P(xi)P(x_i), and then sum all of these products. The result is the average value you expect XX to have over many trials, considering the likelihood of each possible outcome.

This formula calculates the expected value by multiplying each possible value of the random variable by its probability and summing all the products. It gives a measure of the central tendency of the random variable.

Example: Life Insurance Policy

The probability model for a particular life insurance policy is shown. The expected annual payout on a policy is $200 per policy per year. This means that, on average, the insurance company expects to pay out $200 per policy each year. This value is calculated by considering the probabilities of different payout scenarios (e.g., death benefit, etc.) and their corresponding costs.

6.1 Standard Deviation of a Discrete Random Variable

Standard Deviation of a Discrete Random Variable:

The standard deviation measures the spread or variability of the random variable around its expected value. A higher standard deviation indicates greater variability, meaning the values are more spread out from the expected value.

Variance: sigma2=Var(X)=(xμ)2P(x)\,sigma^2 = Var(X) = \sum (x - \mu)^2 \cdot P(x)

The variance is calculated by summing the squared differences between each value and the expected value, weighted by the probability of each value. This provides a measure of how much the possible values deviate from the expected value, taking into account the likelihood of each value.

Standard Deviation: sigma=SD(X)=Var(X)\,sigma = SD(X) = \sqrt{Var(X)}

The standard deviation is the square root of the variance and provides a measure of the typical distance of the values from the mean. It is expressed in the same units as the random variable, making it easier to interpret.

6.2 Example: Book Store Purchases

Example: Book Store Purchases

Suppose the probabilities of a customer purchasing 0, 1, or 2 books at a book store are 0.2, 0.4 and 0.4 respectively.

Expected number of books a customer will purchase?

What is the standard deviation of the book purchases?

μ=0(0.2)+1(0.4)+2(0.4)=1.2\mu=0(0.2)+1(0.4)+2(0.4) = 1.2

sigma2=(xμ)2P(x)=(01.2)2(0.2)+(11.2)2(0.4)+(21.2)2(0.4)=0.288+0.016+0.256=0.56\,sigma^2 = \sum (x - \mu)^2 \cdot P(x) = (0 - 1.2)^2 (0.2) + (1 - 1.2)^2 (0.4)+(2 - 1.2)^2 (0.4) = 0.288 + 0.016 + 0.256 = 0.56

sigma=0.56=0.748\,sigma = \sqrt{0.56} = 0.748

6.3 Properties of Expected Values and Variances

Adding a constant c to X: E(X+c)=E(X)+cE(X + c) = E(X) + c and Var(X+c)=Var(X)Var(X + c) = Var(X)

Adding a constant to a random variable shifts the expected value by that constant. For example, if you increase every possible value of the random variable by 5, the expected value also increases by 5. However, adding a constant does not change the variance because the spread of the values remains the same.

Multiplying X by a constant a: E(aX)=aE(X)E(aX) = aE(X) and Var(aX)=a2Var(X)Var(aX) = a^2Var(X)

Multiplying a random variable by a constant multiplies the expected value by that constant. For example, if you double every possible value of the random variable, the expected value also doubles. The variance, however, is multiplied by the square of the constant. So, if you double every possible value, the variance is multiplied by 4.

6.3 Addition Rule for Expected Values and Variances

Addition Rule for Expected Values of Random Variables: E(X+Y)=E(X)+E(Y)E(X + Y) = E(X) + E(Y)

The expected value of the sum of two random variables is the sum of their expected values. This rule holds regardless of whether the random variables are independent.

Addition Rule for Variances of (independent) Random Variables: Var(X+Y)=Var(X)+Var(Y)Var(X + Y) = Var(X) + Var(Y)

The variance of the sum of two independent random variables is the sum of their variances. This rule only applies if the random variables are independent. If the random variables are not independent, the covariance between them must be considered.

6.3 Example: Insurance Policy Payouts

The expected annual payout per insurance policy is $200 and the variance is $14,960,000. If the payout amounts are doubled, what are the new expected value and variance?

New expected value: E(2X) = 2 * 200 = $400

New variance: Var(2X) = 2^2 * 14,960,000 = $59,840,000

Compare this to the expected value and variance on two independent policies at the original payout amount.

Note: The expected values are the same but the variances are different. This illustrates how scaling affects the variance more significantly than the expected value. Doubling the payout amounts quadruples the variance.

6.3 Covariance and Correlation

The association of two random variables can be measured using the covariance of X and Y:

Cov(X,Y)=E[(XE(X))(YE(Y))]Cov(X, Y) = E[(X - E(X))(Y - E(Y))]

The covariance measures how much two random variables change together. A positive covariance indicates that the variables tend to increase or decrease together, while a negative covariance indicates they tend to move in opposite directions. However, the magnitude of the covariance is not easily interpretable.

Then, the covariance gives us the extra information needed to find the variance of the sum or difference of two random variables when they are not independent:

Var(X+Y)=Var(X)+Var(Y)+2Cov(X,Y)Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y)

Var(XY)=Var(X)+Var(Y)2Cov(X,Y)Var(X - Y) = Var(X) + Var(Y) - 2Cov(X, Y)

These formulas show how covariance affects the variance of the sum or difference of random variables. If X and Y are positively correlated (positive covariance), the variance of their sum is greater than the sum of their individual variances. If they are negatively correlated (negative covariance), the variance of their sum is less than the sum of their individual variances.

Covariance doesn’t have to be between –1 and +1, which makes it harder to interpret. To fix this “problem”, we can divide the covariance by each of the standard deviations to get the correlation:

Corr(X,Y)=Cov(X,Y)SD(X)SD(Y)Corr(X, Y) = \frac{Cov(X, Y)}{SD(X) \cdot SD(Y)}

The correlation is a standardized measure of the linear relationship between two random variables, ranging from -1 to +1. A correlation of +1 indicates a perfect positive relationship, -1 indicates a perfect negative relationship, and 0 indicates no linear relationship. The correlation is easier to interpret than the covariance because it is always between -1 and +1.

6.4 Bernoulli Trials

Bernoulli trials have the following characteristics:

There are only two outcomes per trial, called success and failure.

The probability of success, called p, is the same on every trial (the probability of failure, 1 – p, is often called q).

The trials are independent.

These conditions define a Bernoulli trial. A Bernoulli trial is the simplest type of random experiment. Examples include flipping a coin (heads or tails), or a yes/no survey.

6.4 Bernoulli Trials

Examples of Bernoulli trials include tossing a coin, collecting yes / no responses from a survey, or shooting free throws in basketball. These are all examples where each trial has only two possible outcomes.

When using Bernoulli trials to develop probability models, we require that trial are independent. Independence means that the outcome of one trial does not affect the outcome of any other trial. This is a crucial assumption for many probability models.

10% Condition: As long as the number of trials or sample size is less than 10% of the population size, we can proceed with confidence that the trials are independent. This condition ensures that removing one observation does not significantly change the probabilities of subsequent observations. It's a rule of thumb to ensure independence when sampling without replacement.

6.5 Discrete Probability Models

The Uniform Model

If X is a random variable with possible outcomes 1, 2, …, n and P(X=i)=1nP(X=i) = \frac{1}{n} for each i, then we say X has a discrete Uniform distribution U[1, …, n]. In a uniform distribution, each outcome is equally likely.

When tossing a fair die, each number is equally likely to occur. So tossing a fair die is described by the Uniform model U[1, 2, 3, 4, 5, 6], with P(X=i)=16P(X=i) = \frac{1}{6}. This means each face of the die has an equal chance of landing face up. This model is useful when all outcomes are equally probable.

6.5 Other Discrete Probability Models

The Geometric Model: Predicting the number of Bernoulli trials required to achieve the first success. The geometric model is used when we are interested in how many trials it takes to get the first success. For example, how many coin flips until you get heads?

The Binomial Model: Predicting the number of successes in a fixed number of Bernoulli trials. The binomial model is used when we have a fixed number of trials and want to know the probability of getting a certain number of successes. For example, if you flip a coin 10 times, what is the probability of getting exactly 5 heads?

6.5 Example: Customer Acquisition Probabilities

Example: Probability in customer acquisition

A venture capital firm has a list of potential investors who have previously invested in new technologies. On average, these investors invest about 5% of the time. A new client of the firm is interested in finding investors for a mobile phone application that enables financial transactions, an application that is finding increasing acceptance in much of the developing world. An analyst at the firm is about to start calling potential investors. This scenario can be modeled using probability distributions.

6.5 Example: Closing Sales with Geometric Distribution

Example: Closing Sales

A salesman normally closes a sale on 80% of his presentations. Assuming the presentations are independent:

What model should be used to determine the probability that he closes his first presentation on the fourth attempt? Geometric

What is the probability he closes his first presentation on the fourth attempt?

This question can be answered using the geometric distribution formula.

6.5 Example: Professional Tennis and Binomial Distribution

Example: Professional Tennis

A tennis player makes a successful first serve 67% of the time. Of the first 6 serves of the next match:

What model should be used to determine the probability that all 6 first serves will be in bounds? Binomial

What is the probability that all 6 first serves will be inbounds?

How many first serves can be expected to be in bounds? E(X)=np=6(0.67)=4.02E(X) = np = 6(0.67) = 4.02

This calculation shows the expected number of successful first serves based on the given probability, using the expected value formula for a binomial distribution.

6.5 Example: Satisfaction Survey and Uniform Distribution

Example: Satisfaction Survey

What probability model would be used to model the selection of a single number? Uniform, all numbers are equally likely.

What is the probability the number selected will be an even number? 0.5

What is the probability the number selected will end in 000? 0.001

These questions illustrate how uniform distributions can be applied. Each number has an equal chance of being selected.

6.5 Example: Probability in Customer Acquisition (Revisited)

Example: Probability in customer acquisition

A venture capital firm has a list of potential investors who have previously invested in new technologies. On average, these investors invest about 5% of the time. A new client of the firm is interested in finding investors for a mobile phone application that enables financial transactions, an application that is finding increasing acceptance in much of the developing world. An analyst at the firm is about to start calling potential investors. Again, this is an application of probability in a practical scenario.

6.5 Example: Continued Customer Acquisition Probabilities

Example: (Continued) Probability in customer acquisition

What is the probability that the first person she calls will want to invest?

How many investors will she have to call, on average, to find someone interested?

These are practical questions that can be answered using probability models.

6.5 Example: Solving Customer Acquisition Probabilities

Example: (Continued) Probability in customer acquisition

What is the probability that the first person she calls will want to invest? Each investor has a 5% or 1/20 chance of wanting to invest, so the chance that the first person she calls is interested is 1/20.

How many investors will she have to call, on average, to find someone interested? This uses a Geometric model. Let X = number of people she calls until the first interested person. people. The expected number of calls until the first success is 1/p = 1/(1/20) = 20 calls.

6.5 Further Analysis of Customer Acquisition

Example: (Continued) Probability in customer acquisition

If she calls 10 investors, what is the probability that exactly 2 of them will be interested? Using the Binomial model, let Y = number of people interested in 10 calls. What assumptions are you making to answer these questions? We are assuming that the trials are independent and that the probability of being interested in investing is the same for all potential investors. These assumptions are crucial for the validity of the model.

What Can Go Wrong?

Probability distributions are still just models. They are simplifications of reality and may not perfectly capture the true behavior of a random variable. Always be aware of the assumptions and limitations of the model.

If the model is wrong, so is everything else. The accuracy of any conclusions or predictions depends on the appropriateness of the chosen model. Choosing the right model is essential for accurate analysis.

Watch out for variables that aren’t independent. Many statistical techniques assume independence, and violating this assumption can lead to incorrect results. Always check for independence before applying these techniques.

Don’t write independent instances of a random variable with notation that looks like they are the same variables. This can cause confusion and lead to errors in calculations. Use distinct notation for distinct variables.

What Can Still Go Wrong?

Don’t forget: Variances of independent random variables add. Standard deviations don’t. This is a common mistake. Variances, not standard deviations, should be added when combining independent random variables.

Don’t forget: Variances of independent random variables add, even when you’re looking at the difference between them. This principle applies whether you are adding or subtracting independent random variables.

Be sure you have Bernoulli trials. The conditions for Bernoulli trials (two outcomes, constant probability of success, independence) must be met for the associated models (Binomial, Geometric) to be valid. Double-check that these conditions are satisfied before using the models.

From Learning to Earning

Apply the facts about probability to determine whether an assignment of probabilities is legitimate.

Probability is long-run relative frequency. Probabilities are based on the idea that if an experiment is repeated many times, the relative frequency of an event will approach its probability. This is the foundation of probability theory.

Individual probabilities must be between 0 and 1. This is a fundamental rule of probability. A probability cannot be negative or greater than 1. Probabilities must be valid.

The sum of probabilities assigned to all outcomes must be 1. This ensures that the probability distribution covers all possible outcomes. The probability distribution must be complete.

Understand the Law of Large Numbers and that the common understanding of the “Law of Averages” is false. The Law of Large Numbers states that as the number of trials increases, the sample mean will approach the population mean. The “Law of Averages,” which suggests that past results influence future outcomes in a random process, is a misunderstanding of probability. Only probability matter on a single trail.

From Learning to Earning: Probability Models

Understand how probability models relate values to probabilities.

For discrete random variables, probability models