Chapter 2: Discrete Probability Distributions
Chapter 2: Discrete Probability Distributions
Discrete Random Variables
- Objectives:
- Define discrete vs. continuous variables.
- Define a probability distribution.
- Understand the shape of a probability distribution.
- Define and calculate the expected value.
- Install R.
Discrete vs. Continuous Variables
- Discrete variables are not measured on a spectrum, which implies infinite precision.
- Examples: Sex/Gender/Sexual_Preference
- A discrete variable has either a finite or countable number of possible values.
- Countable means infinite but able to be listed (e.g., natural numbers 1, 2, 3, …).
- A continuous variable has an uncountably infinite number of possible values.
- Examples:
- Number of children from a married couple: Discrete
- Number of canned lima beans in stock at Fareway: Discrete
- Number of students at Wartburg: Discrete
- Weight of a fish: Continuous
- Weight of a fish (rounded to the nearest tenth of a gram): Artificially Discrete
- Age of a professor: Continuous
- Age of a professor (rounded to years): Artificially Discrete
- Whether a variable is discrete can depend on the rounding used in an experiment.
Probability Distribution
A random variable X is a numerical measure of the outcome of a probability experiment. Random variables are denoted with capital letters.
- Example: Experiment: Flip a coin. Let X=1 indicate heads and X=0 indicate tails. Then X is a random variable.
A probability distribution provides the possible values of the random variable X and their corresponding probabilities.
A probability distribution can be in the form of a table, graph, or mathematical formula.
Example of a discrete probability distribution:
X=x P(x) : 0 0.06 1 0.58 2 0.22 3 0.10 4 0.03 5 or more 0.01 To find a missing value in a probability distribution, ensure that the sum of all probabilities equals 1.
Shape of a Probability Distribution
- A probability histogram is a histogram in which the horizontal axis is the random variable X and the vertical axis represents probabilities between 0 and 1.
- Shapes of distributions:
- Uniform (symmetric)
- Bell-shaped (symmetric)
- Skewed Right
- Skewed Left
- Example: Drawing a probability histogram for marital status of an older man (skewed to the right).
Expected Value
- The mean of a discrete random variable X (also called the average or expected value) is denoted by .
- Formula: , where x is a value of X and P(x) is the associated probability of x.
- Example 1: Expected value calculation for the number of times a 60-year-old man has been married.
- Example 2: Simple dice game.
- Cost to play: $1
- Win $2 for rolling a 2 or 4.
- Win nothing for rolling an odd number.
- Win $3 for rolling a 6.
- Expected value: E(X) = 1(0.333) - 1(0.500) + 2(0.167) = $0.167
- Since you expect to win a positive amount each time, you should play this game as many times as possible.
- Example 3: Life insurance policy.
- Policy pays $250,000 upon death; one-time cost is $530.
- Probability of survival: 0.99791
- E(X) = 530(0.99791) + (-249470)(0.00209) = $7.50
- Example 4: Roulette.
- Cost to play: $10
- Bet on a number (e.g., 6), with a (1/38) probability of winning $350.
- E(X) = $340(1/38) + (-$10)(37/38) = -$0.79
- If played 1000 times, expect to lose approximately -$790.
Installing R
- Guidance on downloading and installing R.
- Download R from the official website.
- Select a mirror in Iowa.
- Choose the appropriate version for Windows, macOS, or Linux.
- Windows: Install R for the first time and download the latest version.
- macOS: Select the latest release for macOS.
- Ipad and Chromebook users may need to register for a free account at www.posit.cloud and use R from a cloud server.
- How to enter data in R and compute the expected value.
- Actuary example in R from both the insurance company's and the person's perspective.
- The company profits $7.50 per policy sold (on average).
- The person loses $7.50 for a policy (on average).
Binomial Distribution
- Objectives:
- Determine if an experiment is binomial.
- Use R to calculate binomial probabilities with the
dbinom(k,n,p)function. - Compute the mean and standard deviation.
- Construct binomial probability histograms.
- Use the
pbinom()and1-pbinom()functions.
Criteria for a Binomial Probability Experiment
- Fixed number of trials.
- Independent trials.
- Binary outcomes in trials: 0 or 1, success or failure.
- The probability of success is fixed for each trial.
- Examples of identifying binomial experiments (basketball free throws, drawing cards without replacement, etc.).
Binomial Probability Formula
- For a random variable X that is binomial, the probability of k successes is given by:
- Where:
- k is a user input
- n is the number of trials
- p is the probability of success
- Example: A basketball player shoots 20 free throws with a free throw percentage of 0.75. Calculate P(14), the probability of 14 successes.
*R code to print all values:
for(i in 0:20){
print(c(round(i,0), round(dbinom(i,20,0.75),2)) )}
Probability Histogram (Excel)
- Example showcasing the number of free throws made in the binomial experiment.
The rbinom() Command
- Example usage of the
rbinom()command.rbinom(number of simulated experiments, n, p)- To simulate 50 experiments:
rbinom(50,20,0.75)
Making a Histogram of Your Simulation
- R code for visualizing simulations.
hist(rbinom(10000, 20, 0.75))
Compute the Mean and Standard Deviation of a Binomial Random Variable
- Formula:
- Mean:
- Standard deviation:
- Example from the previous example we calculate,
- “We expect the player to make 15 shots on average, with a plus-minus margin of about 1.94”. There is around a 67% chance that the number of shots made is in between 13 and 17.
- Example: 35% of households own a pet dog. A sample of 400 households is taken.
What is the mean number of households with dogs from the sample?
Empirical Rule
- Use the Empirical Rule to identify unusual observations in a binomial experiment.
- The Empirical Rule states that in a bell-shaped distribution about 95% of all observations lie within two standard deviations of the mean.
- 95% of the observations lie between and .
- Any observation that lies far outside this interval is unusual because the observation occurs less than 5% of the time.
*35% of all households own a pet dog. A researcher believes this percentage is higher nowadays (since the time of the original survey of 35%). He conducts a simple random sample of 400 households and found that 162 had a pet dog. Is this result unusual ?
The result is unusual since 162 > 159.1
Poisson Probability Distribution
Let k be the number of successes in an experiment that is usually contained in an interval of time (but sometimes in other ways). We say that k is has a Poisson distribution if the following conditions are met.
- The number of successes in each interval ranges from k=0,1,2,… (where the “…” means there is no clear upper limit of successes)
- (Independence) The occurrence of successes do not affect the chance of future successes. (no snowball effect)
- The number of successes in one interval is ‘mostly likely’ to be k=0 or k=1.
- The average number of successes per interval (lambda) remains constant (does not fluctuate).
Poisson command in R:
Let k be the number of successes in an interval. Then the chance of k successes is…
where lambda is the average number of successes in one interval (lambda must be given in the problem, or approximated).
Examples:
From the website, Manchester United averages 1.91 goals per game. (1 game = 1 interval of time)
The probability that the team scores k goals in their next game is shown below.
Number of goals per game Probability (rounded to two)
0 0.15
1 0.28
2 0.27
3 0.17
4 0.08
5 0.03
6 0.01
7 0.00
What’s the chance that the team scores 3 or fewer goals in their next game?
What’s the chance that the team scores 3 or more goals in their next game?
The expected value
The number of errors on one page of typed writing is usually 2. Let X be the number of errors in an 11 page pamphlet. Calculate P(20).
Answer: We interpret each page of the pamphlet as an ‘interval’. We have, for 1 page, and so, = 2 for 11 pages. = 22
A 1000 page book on World war II is known to contain 300 errors. On page 425 is a picture of Emperor Hirohito. What’s the chance of 2 errors on that page.
Answer: We interpret each page of the book as an ‘interval’. Even though is not given, we calculate . = 300/1000.
The average number of homes sold by Ace Realty is 2 homes per day. What’s the chance that exactly 3 homes will be sold tomorrow?
We interpret each day as an ‘interval’. We have, per interval, and so, = 2 The mean is
Hypergeometric Probability Distribution
Requirements:
An experiment is hypergeometric if
- The number of trials is fixed. (Sometimes, people use this distribution when sampling from a binary population; Each consecutive draw from the sample is thought to be a ‘trial’)
- The trials are dependent, i.e. the probability of success changes with each trial.
- Two possible outcomes for each trial; success or failure, 0 or 1, good or bad.
Notation:
- N: total population
- n: sample size
- G: total “good” items
- g: the number of “good” items chosen in the sample
- B: the total “bad” items
- b: the number of “bad” items chosen in the sample
Let the random variable X denote “the number of good items g in a sample of size n”. So P(g) is the probability that we choose g good items in a sample of n.
Hypergeometric command in R:
The probability of obtaining g good items is given by
g=“good items”, G=“total good items”, B=“total bad items”, n=“sample size”
dhyper (g,G,B,n)
Example:
Calculate all of the possible results from the previous problem.
dhyper (0,18,12,4)
1] 0.0180624
dhyper (1,18,12,4)
1] 0.1444992
dhyper (2,18,12,4)
1] 0.3684729
dhyper (3,18,12,4)
1] 0.3573071
dhyper (4,18,12,4)
1] 0.1116585
dhyper (5,18,12,4)
1] 0
To calculate the probability of 1 red apple… Notice that we switch our perspective so that red apples are good.
The mean and standard deviation of a Hypergeometric random variable
=
Standard Diviation:
(to find the standard deviation you must take the square root of the variance above)
Jury problem:
There are 40 potential jurors, 2 of which will oppose conviction under any circumstance, regardless of evidence, even if there is insurmountable evidence. On a jury of 12, with a court case that has insurmountable evidence, what is the probability of selecting a jury that convicts? Assume we need true verdict, i.e. all 12 jurors must convict.
The geometric Probability Distribution
Criteria for a Geometric Probability Experiment
An experiment may be geometric if
- The number of trials is (unbounded) = 1,2,3,…
- The trials are independent.
- There are only two possible outcomes for each trial.
- The probability of success is the same for each trial.
- The experiment is performed until the 1st success is achieved.
Geometric probability formula
Let X=“the number of failures before the 1st success”. Then,
where p is the probability of success.
Example: Slot machine
A man will play the slot machine until he wins. The probability of winning is p = 0.10. Calculate the probability that he wins on the 5th attempt (after 4 failures). Calculate the probability that he wins on the 6th try.
For a geometric random variable, the mean (expected value) and standard deviation formulas are given below. ( is the prob. of success, and is the prob. of failure)
is prob. of success and is prob. of sucess
Standard Diviation:
Example Slot machine continue
Calculate the average of this distribution, i.e. the average number of failures before the 1st win.
E(X) = 0.90/0.10 = 9
Conclusion: The mean isn’t always where the peak is.
The Negative Binomial Probability Distribution
Criteria for a Negative Binomial Probability Experiment An experiment may be negative binomial if
The number of trials is (unbounded) = 1,2,3,…
The trials are independent.
There are only two possible outcomes for each trial.
The probability of success is the same for each trial.
The experiment is performed until the success is h achieved.
Negative binomial probability formula
Let X=“the number of failures before the h success”. To calculate P(X=f) we use, dnbinom(f,s,p)
For a negative binomial random variable with a probability of success, and number of failures probability of failure, and the number of successes.
Standard deviation
Suspicious Events An event is suspicious if the chance of it occurring, or something more extreme occurring, is less than 5%.(unless the event is forced [see next slide])