Chapter 5 - Discrete Probability Distributions
Random Variables
A random variable x represents a numerical value associated with each outcome of a probability distribution.
Discrete Random Variable:
- Has a finite or countable number of possible outcomes that can be listed.
- Example: x = 0, 2, 4, 6, 8, 10
Continuous Random Variable:
- Has an uncountable number of possible outcomes, represented by intervals on a number line.
- Example: x = 0, 2, 4, 6, 8, 10
Discrete vs. Continuous Random Variables - Examples
Distance car travels on a tank of gas:
- Continuous because it is a measurement that cannot be counted.
- All measurements are continuous random variables.
Number of students in a statistics class:
- Discrete because it can be counted.
Discrete Probability Distributions
Lists each possible value the random variable can assume, together with its probability.
Conditions for a valid probability distribution:
- The probability of each value of the discrete random variable is between 0 and 1, inclusive: 0 \leq P(x) \leq 1
- The sum of all the probabilities is 1: \sum P(x) = 1
Constructing a Discrete Probability Distribution
- Make a frequency distribution for the possible outcomes (x1, x2, …, x_n).
- Find the sum of the frequencies.
- Find the probability of each possible outcome by dividing its frequency by the sum of the frequencies.
- Check that each probability is between 0 and 1 and that the sum is 1.
Example: Probability Distribution of Persons per Household
Data from The Statistical Abstract of the United States.
Calculating probabilities as relative frequencies.
Example: Calculation of P(x \geq 4)
- P(x \geq 4) = 0.128 + 0.058 + 0.022 + 0.013 = 0.221
The probability of each value of X, the number of persons per household, is computed as the relative frequency.
- P(1) = 35.2 / 126.1 = 0.279
- P(2) = 43.5 / 126.1 = 0.345
- P(3) = 19.5 / 126.1 = 0.155
- P(4) = 16.2 / 126.1 = 0.128
- P(5) = 7.3 / 126.1 = 0.058
- P(6) = 2.8 / 126.1 = 0.022
- P(\geq 7) = 1.6 / 126.1 = 0.013
- \text{Total} = 126.1 / 126.1 = 1.000
Describing the Population/Probability Distribution
Population Mean (Expected Value):
- \mu = E[x] = \sum xP(x)
Population Variance:
- \sigma^2 = \sum (x - \mu)^2 P(x)
Population Standard Deviation:
- \sigma = \sqrt{\sigma^2}
Example: Describing the Population of the Number of Persons per Household
Calculating the mean, variance, and standard deviation for the number of persons per household.
Mean:
- \mu = \sum xP(x) = 1(0.279) + 2(0.345) + … + 7(0.013) = 2.46
Variance:
- \sigma^2 = \sum (x - \mu)^2 P(x) = (1-2.46)^2(0.279) + (2-2.46)^2(0.345) + … + (7-2.46)^2(0.013) = 1.931
Standard Deviation:
- \sigma = \sqrt{1.931} = 1.39
Using Expected Value for Decision-Making in Business
- Investment in two different projects:
- Project A: 90% chance of a profit of $100,000; 10% chance of a loss of $20,000
- Project B: 50% chance of a profit of $200,000; 50% chance of a loss of $50,000
Calculating Expected Value for Projects A and B
Expected Value for Project A:
- E[A] = (0.9 \times $100,000) + (0.1 \times (-$20,000)) = $90,000 - $2,000 = $88,000
Expected Value for Project B:
- E[B] = (0.5 \times $200,000) + (0.5 \times (-$50,000)) = $100,000 - $25,000 = $75,000
Based on expected value, Project A has a higher expected profit ($88,000) compared to Project B ($75,000).
Even though Project B has a possibility of a higher profit ($200,000), Project A is the better option when considering the risks and rewards.
Understanding Investment Risk Through Variance
- Investment in two different stock portfolios:
- Portfolio A: 40% chance of gaining $10,000; 40% chance of gaining $15,000; 20% chance of gaining $20,000
- Portfolio B: 40% chance of gaining $5,000; 40% chance of gaining $25,000; 20% chance of gaining $35,000
Expected Value for Portfolios A and B
Expected Value for Portfolio A:
- E[A] = 0.4 \times $10,000 + 0.4 \times $15,000 + 0.2 \times $20,000 = $14,000
Expected Value for Portfolio B:
- E[B] = 0.4 \times $5,000 + 0.4 \times $25,000 + 0.2 \times $35,000 = $17,000
Variance is calculated as the average of the squared differences from the Mean.
Variance for Portfolio A:
Variance(A) = 0.4 \times (10000-14000)^2 + 0.4 \times (15000-14000)^2 + 0.2 \times (20000-14000)^2 = 14,000,000
Std Dev = \sqrt{14000000} = $3,741.66
Variance for Portfolio B:
Variance(B) = 0.4 \times (5000-17000)^2 + 0.4 \times (25000-17000)^2 + 0.2 \times (35000-17000)^2 = 148,000,000
Std Dev = \sqrt{148000000} = $12,165.53
Decision Making Based on Variance
- Portfolio A has a lower variance.
- Portfolio B has a higher variance.
- Key Takeaway: Portfolio B has a higher expected return ($17,000 vs $14,000), it comes with higher risk, as indicated by a higher variance.
Mini Workshops
- Mini Workshop 1:
- Rolling of a single die.
- What are the outcomes?
- What are the probabilities for each outcome?
- Use Excel to build the discrete probability distribution in both tabular and graphical form.
- Mini Workshop 2:
- Rolling of two dice.
- What are all the possible outcomes? This is the sample space.
- What are the probabilities for each outcome?
- Use Excel to build the discrete probability distribution in both tabular and graphical form.
- Mini Workshop 3:
- Download the file Households.xlsx
- The data are 1000 random households with the number of people living in each home.
- What are the outcomes?
- What are the frequencies for each outcome?
- What are the probabilities for each outcome?
- Use Excel to build the discrete probability distribution in both tabular and graphical form.
- What is the mean number of family members living in a household? What is the expected value of the discrete probability distribution. Also can be thought of as a long run average or an anticipated value.
- Develop a column with cumulative probabilities, P(x).
- What are; P(x
Binomial Distribution
The binomial distribution is the result of a binomial experiment, which has the following properties:
- The experiment consists of a sequence of n smaller experiments called trials, where n is fixed in advance of the experiment
- Each trial has two possible outcomes: a success and a failure, which we generically denote by success (S) and failure (F).
- The trials are independent, so that the outcome of any trial does not influence the outcome of any other trial.
- The probability of success P(S) is constant from trial to trial; we denote this probability by p.
Notation for Binomial Experiments
- n = The number of times a trial is repeated.
- p = P(S) = The probability of success in a single trial.
- q = P(F) = The probability of failure in a single trial. (q = 1 – p)
- x = The random variable represents a count of the number of successes in n trials: x = 0, 1, 2, 3, … , n.
Examples of Binomial Experiments
Flip a coin 10 times.
- n = 10
- Two outcomes: heads (success) and tails (failure).
- If the coin is fair: P(heads) = P(tails) = 0.5
- Each coin toss is independent.
Draw five cards out of a shuffled deck with replacement.
- n = 5
- We label success any suit of cards we seek, such as clubs.
- If success is a club, then P(club) = 13 / 52
- Each draw is independent only if we replace the drawn card in the deck and reshuffle each time.
Binomial Probability Distribution Using Excel
- Even for a relatively small value of n, the computation of binomial probabilities can be tedious.
- Excel allows you to calculate the Binomial distribution directly:
BINOM.DIST(x, n, p, cumulative)- x = number of successes in the n trials
- n = number of trials
- p = probability of success
- Cumulative = TRUE if you want the p(x \leq x^*)
- Cumulative = FALSE if you want the p(x = x^*)
Example using Excel
I toss a weighted coin 5 times. The probability of heads is 60%.
- The probability that I will toss no heads:
BINOM.DIST(0,5,0.6,FALSE)= 0.01024 - The probability that I will toss exactly 1 head:
BINOM.DIST(1,5,0.6,FALSE)= 0.0768 - The probability that I will toss exactly 2 heads:
BINOM.DIST(2,5,0.6,FALSE)= 0.2304
- The probability that I will toss no heads:
Example: Pat Statsdude and the Statistics Quiz
Pat Statsdude is a (not good) student taking a statistics course. Pat’s exam strategy is to rely on luck for the next quiz.
The quiz consists of 10 multiple-choice questions.
Each question has five possible answers, only one of which is correct. Pat plans to guess the answer to each question.
- What is the probability that Pat gets no answers correct?
- This is a binomial experiment because:
- n = 10
- Two outcomes: correct and incorrect answer.
- Probability of correct answer: p = 1/5 = 0.2.
- Answers to questions are independent.
- We can apply the binomial probability distribution to answer both questions:
- x = 0:
BINOM.DIST(0,10,0.2,FALSE)= 0.1074
- x = 0:
- This is a binomial experiment because:
- What is the probability that Pat gets exactly two answers correct?
- x = 2:
BINOM.DIST(2,10,0.2,FALSE)= 0.3020
- x = 2:
- What is the probability that Pat gets no answers correct?
Cumulative Probability
- The probability that a random variable is less than or equal to a value x is called a cumulative probability, and it is represented as P(X \leq x).
- In the case of a discrete probability distribution, such as the binomial distribution, we can write:
- P(X \leq x) = \sum_{i=0}^{x} P(X = i)
Using Excel - Example
- Assume that the probability of a car stopping at an intersection is 30%. A researcher studies 15 cars approaching an intersection.
- Using Excel (BINOM.DIST(x, n, p, cumulative)),
- What is the probability that:
- Exactly 5 cars stop?
- Answer:
BINOM.DIST(5, 15, 0.3, False)
- Answer:
- More than 5 cars stop?
- Answer:
1 - BINOM.DIST(5, 15, 0.3, True)
- Answer:
- At least 5 cars stop?
- Answer:
1 - BINOM.DIST(4, 15, 0.3, True)
- Answer:
- Less than cars 5 stop?
- Answer:
BINOM.DIST(4, 15, 0.3, True)
- Answer:
- At most 5 cars stop?
- Answer:
BINOM.DIST(5, 15, 0.3, True)
- Answer:
- Exactly 5 cars stop?
- What is the probability that:
- Using Excel (BINOM.DIST(x, n, p, cumulative)),
Mean, Variance and Standard Deviation Population Parameters of a Binomial Distribution
Mean:
- \mu = np
Variance:
- \sigma^2 = npq
Standard deviation:
- \sigma = \sqrt{npq}
Example: One out of 5 students at a local college say that they skip breakfast in the morning. Find the mean, variance and standard deviation if 10 students are randomly selected.
Mini Workshop 5
- According to recommendations from a large investment firm and a national survey, 25% of retirees have too much of their savings portfolio invested in equities/stocks.
- In a random sample of 15 near-retires what is the probability that…
- 0 had too much in stocks?
- Exactly one had too much in stocks?
- 3 or fewer?
- More than 3?
- 4<= x <10