Chapter 5 - Discrete Probability Distributions

Random Variables

  • A random variable x represents a numerical value associated with each outcome of a probability distribution.

  • Discrete Random Variable:

    • Has a finite or countable number of possible outcomes that can be listed.
    • Example: x = 0, 2, 4, 6, 8, 10
  • Continuous Random Variable:

    • Has an uncountable number of possible outcomes, represented by intervals on a number line.
    • Example: x = 0, 2, 4, 6, 8, 10

Discrete vs. Continuous Random Variables - Examples

  • Distance car travels on a tank of gas:

    • Continuous because it is a measurement that cannot be counted.
    • All measurements are continuous random variables.
  • Number of students in a statistics class:

    • Discrete because it can be counted.

Discrete Probability Distributions

  • Lists each possible value the random variable can assume, together with its probability.

  • Conditions for a valid probability distribution:

    1. The probability of each value of the discrete random variable is between 0 and 1, inclusive: 0 \leq P(x) \leq 1
    2. The sum of all the probabilities is 1: \sum P(x) = 1

Constructing a Discrete Probability Distribution

  1. Make a frequency distribution for the possible outcomes (x1, x2, …, x_n).
  2. Find the sum of the frequencies.
  3. Find the probability of each possible outcome by dividing its frequency by the sum of the frequencies.
  4. Check that each probability is between 0 and 1 and that the sum is 1.

Example: Probability Distribution of Persons per Household

  • Data from The Statistical Abstract of the United States.

  • Calculating probabilities as relative frequencies.

  • Example: Calculation of P(x \geq 4)

    • P(x \geq 4) = 0.128 + 0.058 + 0.022 + 0.013 = 0.221
  • The probability of each value of X, the number of persons per household, is computed as the relative frequency.

    • P(1) = 35.2 / 126.1 = 0.279
    • P(2) = 43.5 / 126.1 = 0.345
    • P(3) = 19.5 / 126.1 = 0.155
    • P(4) = 16.2 / 126.1 = 0.128
    • P(5) = 7.3 / 126.1 = 0.058
    • P(6) = 2.8 / 126.1 = 0.022
    • P(\geq 7) = 1.6 / 126.1 = 0.013
    • \text{Total} = 126.1 / 126.1 = 1.000

Describing the Population/Probability Distribution

  • Population Mean (Expected Value):

    • \mu = E[x] = \sum xP(x)
  • Population Variance:

    • \sigma^2 = \sum (x - \mu)^2 P(x)
  • Population Standard Deviation:

    • \sigma = \sqrt{\sigma^2}

Example: Describing the Population of the Number of Persons per Household

  • Calculating the mean, variance, and standard deviation for the number of persons per household.

  • Mean:

    • \mu = \sum xP(x) = 1(0.279) + 2(0.345) + … + 7(0.013) = 2.46
  • Variance:

    • \sigma^2 = \sum (x - \mu)^2 P(x) = (1-2.46)^2(0.279) + (2-2.46)^2(0.345) + … + (7-2.46)^2(0.013) = 1.931
  • Standard Deviation:

    • \sigma = \sqrt{1.931} = 1.39

Using Expected Value for Decision-Making in Business

  • Investment in two different projects:
    • Project A: 90% chance of a profit of $100,000; 10% chance of a loss of $20,000
    • Project B: 50% chance of a profit of $200,000; 50% chance of a loss of $50,000

Calculating Expected Value for Projects A and B

  • Expected Value for Project A:

    • E[A] = (0.9 \times $100,000) + (0.1 \times (-$20,000)) = $90,000 - $2,000 = $88,000
  • Expected Value for Project B:

    • E[B] = (0.5 \times $200,000) + (0.5 \times (-$50,000)) = $100,000 - $25,000 = $75,000
  • Based on expected value, Project A has a higher expected profit ($88,000) compared to Project B ($75,000).

  • Even though Project B has a possibility of a higher profit ($200,000), Project A is the better option when considering the risks and rewards.

Understanding Investment Risk Through Variance

  • Investment in two different stock portfolios:
    • Portfolio A: 40% chance of gaining $10,000; 40% chance of gaining $15,000; 20% chance of gaining $20,000
    • Portfolio B: 40% chance of gaining $5,000; 40% chance of gaining $25,000; 20% chance of gaining $35,000

Expected Value for Portfolios A and B

  • Expected Value for Portfolio A:

    • E[A] = 0.4 \times $10,000 + 0.4 \times $15,000 + 0.2 \times $20,000 = $14,000
  • Expected Value for Portfolio B:

    • E[B] = 0.4 \times $5,000 + 0.4 \times $25,000 + 0.2 \times $35,000 = $17,000
  • Variance is calculated as the average of the squared differences from the Mean.

  • Variance for Portfolio A:

    • Variance(A) = 0.4 \times (10000-14000)^2 + 0.4 \times (15000-14000)^2 + 0.2 \times (20000-14000)^2 = 14,000,000

    • Std Dev = \sqrt{14000000} = $3,741.66

  • Variance for Portfolio B:

    • Variance(B) = 0.4 \times (5000-17000)^2 + 0.4 \times (25000-17000)^2 + 0.2 \times (35000-17000)^2 = 148,000,000

    • Std Dev = \sqrt{148000000} = $12,165.53

Decision Making Based on Variance

  • Portfolio A has a lower variance.
  • Portfolio B has a higher variance.
  • Key Takeaway: Portfolio B has a higher expected return ($17,000 vs $14,000), it comes with higher risk, as indicated by a higher variance.

Mini Workshops

  • Mini Workshop 1:
    • Rolling of a single die.
    • What are the outcomes?
    • What are the probabilities for each outcome?
    • Use Excel to build the discrete probability distribution in both tabular and graphical form.
  • Mini Workshop 2:
    • Rolling of two dice.
    • What are all the possible outcomes? This is the sample space.
    • What are the probabilities for each outcome?
    • Use Excel to build the discrete probability distribution in both tabular and graphical form.
  • Mini Workshop 3:
    • Download the file Households.xlsx
    • The data are 1000 random households with the number of people living in each home.
    • What are the outcomes?
    • What are the frequencies for each outcome?
    • What are the probabilities for each outcome?
    • Use Excel to build the discrete probability distribution in both tabular and graphical form.
    • What is the mean number of family members living in a household? What is the expected value of the discrete probability distribution. Also can be thought of as a long run average or an anticipated value.
    • Develop a column with cumulative probabilities, P(x).
    • What are; P(x

Binomial Distribution

  • The binomial distribution is the result of a binomial experiment, which has the following properties:

    1. The experiment consists of a sequence of n smaller experiments called trials, where n is fixed in advance of the experiment
    2. Each trial has two possible outcomes: a success and a failure, which we generically denote by success (S) and failure (F).
    3. The trials are independent, so that the outcome of any trial does not influence the outcome of any other trial.
    4. The probability of success P(S) is constant from trial to trial; we denote this probability by p.

Notation for Binomial Experiments

  • n = The number of times a trial is repeated.
  • p = P(S) = The probability of success in a single trial.
  • q = P(F) = The probability of failure in a single trial. (q = 1 – p)
  • x = The random variable represents a count of the number of successes in n trials: x = 0, 1, 2, 3, … , n.

Examples of Binomial Experiments

  1. Flip a coin 10 times.

    • n = 10
    • Two outcomes: heads (success) and tails (failure).
    • If the coin is fair: P(heads) = P(tails) = 0.5
    • Each coin toss is independent.
  2. Draw five cards out of a shuffled deck with replacement.

    • n = 5
    • We label success any suit of cards we seek, such as clubs.
    • If success is a club, then P(club) = 13 / 52
    • Each draw is independent only if we replace the drawn card in the deck and reshuffle each time.

Binomial Probability Distribution Using Excel

  • Even for a relatively small value of n, the computation of binomial probabilities can be tedious.
  • Excel allows you to calculate the Binomial distribution directly:
  • BINOM.DIST(x, n, p, cumulative)
    • x = number of successes in the n trials
    • n = number of trials
    • p = probability of success
    • Cumulative = TRUE if you want the p(x \leq x^*)
    • Cumulative = FALSE if you want the p(x = x^*)

Example using Excel

  • I toss a weighted coin 5 times. The probability of heads is 60%.

    1. The probability that I will toss no heads: BINOM.DIST(0,5,0.6,FALSE) = 0.01024
    2. The probability that I will toss exactly 1 head: BINOM.DIST(1,5,0.6,FALSE) = 0.0768
    3. The probability that I will toss exactly 2 heads: BINOM.DIST(2,5,0.6,FALSE) = 0.2304

Example: Pat Statsdude and the Statistics Quiz

  • Pat Statsdude is a (not good) student taking a statistics course. Pat’s exam strategy is to rely on luck for the next quiz.

  • The quiz consists of 10 multiple-choice questions.

  • Each question has five possible answers, only one of which is correct. Pat plans to guess the answer to each question.

    • What is the probability that Pat gets no answers correct?
      • This is a binomial experiment because:
        • n = 10
        • Two outcomes: correct and incorrect answer.
        • Probability of correct answer: p = 1/5 = 0.2.
        • Answers to questions are independent.
      • We can apply the binomial probability distribution to answer both questions:
        • x = 0: BINOM.DIST(0,10,0.2,FALSE) = 0.1074
    • What is the probability that Pat gets exactly two answers correct?
      • x = 2: BINOM.DIST(2,10,0.2,FALSE) = 0.3020

Cumulative Probability

  • The probability that a random variable is less than or equal to a value x is called a cumulative probability, and it is represented as P(X \leq x).
  • In the case of a discrete probability distribution, such as the binomial distribution, we can write:
    • P(X \leq x) = \sum_{i=0}^{x} P(X = i)

Using Excel - Example

  • Assume that the probability of a car stopping at an intersection is 30%. A researcher studies 15 cars approaching an intersection.
    • Using Excel (BINOM.DIST(x, n, p, cumulative)),
      • What is the probability that:
        • Exactly 5 cars stop?
          • Answer: BINOM.DIST(5, 15, 0.3, False)
        • More than 5 cars stop?
          • Answer: 1 - BINOM.DIST(5, 15, 0.3, True)
        • At least 5 cars stop?
          • Answer: 1 - BINOM.DIST(4, 15, 0.3, True)
        • Less than cars 5 stop?
          • Answer: BINOM.DIST(4, 15, 0.3, True)
        • At most 5 cars stop?
          • Answer: BINOM.DIST(5, 15, 0.3, True)

Mean, Variance and Standard Deviation Population Parameters of a Binomial Distribution

  • Mean:

    • \mu = np
  • Variance:

    • \sigma^2 = npq
  • Standard deviation:

    • \sigma = \sqrt{npq}
  • Example: One out of 5 students at a local college say that they skip breakfast in the morning. Find the mean, variance and standard deviation if 10 students are randomly selected.

Mini Workshop 5

  • According to recommendations from a large investment firm and a national survey, 25% of retirees have too much of their savings portfolio invested in equities/stocks.
  • In a random sample of 15 near-retires what is the probability that…
    • 0 had too much in stocks?
    • Exactly one had too much in stocks?
    • 3 or fewer?
    • More than 3?
    • 4<= x <10