Chapter 7: Random Variables and Probability Distributions

Chapter 7: Random Variables and Probability Distributions

Section 7.1 Random Variables

  • Random Variable (RV) Definition: A numerical variable whose value depends on the outcome of a chance experiment. It assigns a numerical value to each outcome of a chance experiment.

    • Example: Rolling a fair die.

      • Outcomes: {1, 2, 3, 4, 5, 6} (each with probability 1/6).

      • Let X = the numerical outcome of a fair die.

      • P(X=1) = 1/6, P(X=2) = 1/6, …, P(X=6) = 1/6.

  • Discrete Random Variable Definition: A random variable is discrete if its set of possible values is a collection of distinct, countable values.

  • Continuous Random Variable Definition: A random variable is continuous if its set of possible values includes an entire interval (i.e., it can take on any value within a range).

  • Example: Book Sales

    • Experiment: Three successive customers purchase a single book (Print (P) or Digital (D)).

    • Random Variable: X = number of customers purchasing a digital book.

    • Possible Outcomes and Corresponding X Values:

      • PPP: X=0

      • DPP, PDP, PPD: X=1

      • DDP, DPD, PDD: X=2

      • DDD: X=3

    • Conclusion: There are only four possible X values (0, 1, 2, 3), making X a discrete random variable.

Section 7.2 Probability Distributions for Discrete Random Variables

  • Probability Distribution Definition: Specifies the probability for each possible value of a random variable.

  • Properties of a Probability Distribution (Axioms):

    1. Axiom #1: The probability of any specific value must be between 0 and 1, inclusive: 0 \leq P(X=xi) \leq 1 for every possible xi.

    2. Axiom #2: The sum of all possible probabilities must equal 1: \sum P(X=x_i) = 1.

  • Example: Energy Efficient Cars

    • Experiment: Four randomly selected customers purchase a car, choosing either Electric (E) or Gas (G).

    • Assumptions: Choices are independent, P(E) = 0.4, P(G) = 0.6.

    • Random Variable: X = the number of customers who buy an electric car (among the four).

    • Individual Outcome Probabilities (due to independence):

      • P(E) = 0.4

      • P(G) = 0.6

      • Example: P(EGGE) = P(E) \cdot P(G) \cdot P(G) \cdot P(E) = (0.4)(0.6)(0.6)(0.4) = 0.0576

    • Table of Outcomes, Probabilities, and X Values (Partial list for demonstration):

      Outcome

      Probability

      X Value

      GGGG

      0.1296

      0

      EGGG

      0.0864

      1

      GGGE

      0.0864

      1

      EGGE

      0.0576

      2

      EEEE

      0.0256

      4

    • Constructing the Probability Distribution for X:

      • P(X=0) = P(GGGG) = 0.1296

      • P(X=1) = P(\text{EGGG or GEGG or GGEG or GGGE}) = 4 \times 0.0864 = 0.3456

      • P(X=2) = P(\text{EEGG or EGEG or EGGE or GEEG or GEGE or GGEE}) = 6 \times 0.0576 = 0.3456

      • P(X=3) = P(\text{GEEE or EGEE or EEGE or EEEG}) = 4 \times 0.0384 = 0.1536

      • P(X=4) = P(EEEE) = 0.0256

    • Probability Distribution Table for X:

      X Value

      p(X) (Probability)

      0

      0.1296

      1

      0.3456

      2

      0.3456

      3

      0.1536

      4

      0.0256

    • Calculating Cumulative Probabilities: P(X \geq 2) = P(X=2) + P(X=3) + P(X=4) = 0.3456 + 0.1536 + 0.0256 = 0.5248

  • Probability Histogram: A graphical representation of a discrete probability distribution.

    • It features a rectangle centered above each possible value of X.

    • The area of each rectangle is proportional to the probability of the corresponding value.

Section 7.3 Probability Distributions for Continuous Random Variables

  • From Relative Frequency Histograms to Probability Density Curves:

    • For a sample, a relative frequency histogram shows the distribution of values.

    • As the sample size increases and class intervals narrow, the histogram becomes smoother.

    • For an entire population, the distribution of a continuous random variable can be approximated by a smooth curve, called a probability density curve.

  • Probability Density Curves:

    • The function defining this curve is denoted by f(x), known as the density function.

    • Properties of Continuous Probability Distributions:

      1. f(x) \geq 0: The curve cannot fall below the horizontal axis.

      2. The total area under the density curve is equal to 1: \text{Area} = 1.

  • Probabilities for Continuous Variables:

    • For a continuous random variable, the probability of X taking on a single specific value is 0: P(X=a) = 0 for any number a.

      • This is because a single point has no area under the curve.

    • The values on the y-axis (density) are not probabilities themselves but represent the concentration of probability over a region.

  • Probabilities as Areas: Probabilities for continuous random variables are represented by the area under the density curve over a specified interval.

    • P(a < X < b): The event that X assumes a value between a and b. (Area under the curve from a to b)

    • P(X < a): The event that X assumes a value less than a. (Area to the left of a)

    • P(X > b): The event that X assumes a value greater than b. (Area to the right of b)

  • Example: Application Processing Times

    • Random Variable: X = amount of time (in minutes) to process a credit card application.

    • Density Function: (A rectangular distribution is implied in the example, where the density is constant over an interval).

    • Question: What is the probability that an application is processed between 4.5 and 5.5 minutes? P(4.5 < X < 5.5).

      • Assuming a constant height (density) of 0.5 between 4.5 and 5.5 minutes (from the figure).

      • P(4.5 < X < 5.5) = \text{Area} = (\text{Base}) \times (\text{Height}) = (5.5 - 4.5)(0.5) = (1)(0.5) = 0.5.

    • Question: Find the probability it takes longer than 5.5 minutes to process an application. P(X > 5.5).

      • (If the distribution is, for instance, uniform from 3 to 7, with height 0.25):

      • P(X > 5.5) = (7 - 5.5)(0.25) = (1.5)(0.25) = 0.375

  • Cumulative Areas:

    • A cumulative area is the entire area under the density curve to the left of a specific value.

    • P(X < b) represents the cumulative area to the left of b.

    • Calculating Probabilities using Cumulative Areas:

      • P(a < X < b) = P(X < b) - P(X < a).

      • For continuous variables, including or excluding endpoints does not change the probability: P(a \leq X \leq b) = P(a < X \leq b) = P(a \leq X < b) = P(a < X < b).

Section 7.4 Mean and Standard Deviation of a Random Variable

Mean Value of a Discrete Random Variable
  • Definition: The mean value (or expected value), denoted by \mu_x or E(X), of a discrete random variable X is calculated by:

    • Multiplying each possible X value by its probability.

    • Summing these resulting quantities.

    • Formula: \mux = \sum (xi \cdot P(X=x_i))

  • Example: Exam Attempts

    • Random Variable: X = number of times a student took an online exam.

    • Probability Distribution:

      X

      p(X)

      1

      0.10

      2

      0.20

      3

      0.30

      4

      0.40

    • Calculation of Mean:

      • \mu_x = (1)(0.10) + (2)(0.20) + (3)(0.30) + (4)(0.40)

      • \mu_x = 0.10 + 0.40 + 0.90 + 1.60 = 3.0

      • On average, a student takes the exam 3 times.

Variance and Standard Deviation of a Discrete Random Variable
  • Variance Definition: The variance of a discrete random variable X, denoted by \sigma_x^2, is calculated by:

    • Formula: \sigmax^2 = \sum [(xi - \mux)^2 \cdot P(X=xi)] or \sigmax^2 = [\sum xi^2 P(X=xi)] - \mux^2

  • Standard Deviation Definition: The standard deviation of a discrete random variable X, denoted by \sigma_x, is the square root of the variance.

    • Formula: \sigmax = \sqrt{\sigmax^2}

  • Example: Glass Panels

    • Random Variable: X = number of flaws in a glass panel from Supplier X.

    • Distribution for Supplier X:

      X

      p(X)

      0

      0.4

      1

      0.3

      2

      0.2

      3

      0.1

      4

      0

    • Random Variable: Y = number of flaws in a glass panel from Supplier Y.

    • Distribution for Supplier Y:

      Y

      p(Y)

      0

      0.2

      1

      0.6

      2

      0.2

      3

      0

      4

      0

    • (The example implies calculation of mean and standard deviation for comparison, but doesn't show it explicitly in the text. The visual representation (Figure 7.12) would show differences in spread (variability) even if means are similar.)

Mean and Standard Deviation When X Is Continuous
  • For continuous probability distributions, \mux and \sigmax are defined and calculated using calculus methods (not covered in this text).

  • Interpretation for Continuous RVs: The interpretation of \mux and \sigmax remains the same as for discrete random variables.

    • \mu_x (mean value): Locates the center of the continuous distribution; represents the approximate long-run average of many observed X values.

    • \sigmax (standard deviation): Measures the extent to which the continuous distribution (density curve) spreads out around \mux; provides information about the variability expected in a long sequence of observed X values.

Mean and Variance of Linear Functions and Linear Combinations

Linear Functions
  • Definition: If X is a random variable with mean \mux and variance \sigmax^2, and a and b are numbers, the random variable Y defined by Y = a + bX is called a linear function of X.

  • Mean of a Linear Function:

    • \muY = \mu{a+bX} = a + b\mu_X

  • Variance of a Linear Function:

    • \sigmaY^2 = \sigma{a+bX}^2 = b^2\sigma_X^2

  • Standard Deviation of a Linear Function:

    • \sigmaY = |b|\sigmaX (since standard deviation is always non-negative).

  • Example: Propane Tank Pricing Models

    • Random Variable: X = number of gallons required to fill a customer's propane tank.

    • Given: \muX = 318 gallons, \sigmaX = 42 gallons.

    • Pricing Models:

      • Model 1: Y_1 = 3X (3 per gallon)

      • Model 2: Y_2 = 50 + 2.8X (50 service charge + 2.80 per gallon)

    • Calculations for Mean Billing Amount:

      • Model 1: \mu{Y1} = 3\mu_X = 3(318) = 954

      • Model 2: \mu{Y2} = 50 + 2.8\mu_X = 50 + 2.8(318) = 50 + 890.4 = 940.40

    • Interpretation: The mean billing amount for Model 1 is slightly higher (954 vs. 940.40). The text implies Model 2 will have slightly more consistent billing amounts (lower variability), which would be evident from calculating standard deviations (not shown directly in the transcript for this example).

Linear Combinations
  • Definition: If X1, X2, \dots, Xn are random variables and a1, a2, \dots, an are numbers, the random variable Y defined as Y = a1X1 + a2X2 + \dots + anXn is a linear combination of random variables.

  • Mean of a Linear Combination (Always True):

    • \muY = a1\mu1 + a2\mu2 + \dots + an\mu_n

    • This result holds true regardless of whether the X_i's are independent.

  • Variance of a Linear Combination (When X1, \dots, Xn are INDEPENDENT):

    • \sigmaY^2 = a1^2\sigma1^2 + a2^2\sigma2^2 + \dots + an^2\sigma_n^2

    • This formula is only appropriate when the X_i's are independent.

  • Example: Baggage Weights

    • Random Variable: X = weight (in pounds) of baggage checked by a randomly selected passenger.

    • Given: \mu_X = 45

    • Standard Deviation: \sigma_X = 16

    • Scenario: A flight with 10 passengers, all traveling alone.

    • Linear Combination: Total weight of checked baggage, Y = X1 + X2 + \dots + X{10} (where Xi is the baggage weight for passenger i).

    • Mean of Y:

      • \muY = \mu{X1} + \mu{X2} + \dots + \mu{X_{10}}

      • Since each \mu{Xi} = 45, \mu_Y = 10 \times 45 = 450

      • (Transcript states 420 for an X with mean 42, let's use the provided 45 for the example, $10 imes 45 = 450$)

    • Assumption: Passengers traveling alone means baggage weights are independent.

    • Variance of Y:

      • \sigmaY^2 = \sigma{X1}^2 + \sigma{X2}^2 + \dots + \sigma{X_{10}}^2

      • Since each \sigma{Xi} = 16, then \sigma{Xi}^2 = 16^2 = 256.

      • \sigma_Y^2 = 10 \times 256 = 2560

      • (Note: Transcript output had 10 \times 265 to get 2650\, this implies \sigmaX for that was \sqrt{265} \approx 16.28, I will stick to the provided 16 from text preceding calculation, then \sigmaY^2 = 10 \times 16^2 = 10 \times 256 = 2560)

    • Standard Deviation of Y:

      • \sigma_Y = \sqrt{2560} \approx 50.596

Distinction between X1 + X2 and 2X (One Last Note)
  • Though the means are the same, the variances and standard deviations are different.

  • Example (using baggage weight \muX = 42, \sigmaX = 16 from transcript's calculation section for this specific explanation):

    • Random Variable: X = baggage weight for one passenger.

    • Mean of X: \mu_X = 42

    • Standard Deviation of X: \sigma_X = 16

    • Case 1: Linear Function 2X (doubling the weight of a single randomly selected passenger)

      • Mean: \mu{2X} = 2\muX = 2(42) = 84

      • Variance: \sigma{2X}^2 = 2^2\sigmaX^2 = 4(16^2) = 4(256) = 1024

      • Standard Deviation: \sigma_{2X} = \sqrt{1024} = 32

    • Case 2: Linear Combination X1 + X2 (sum of weights for two independent randomly selected passengers)

      • Mean: \mu{X1+X2} = \mu{X1} + \mu{X_2} = 42 + 42 = 84

      • Variance: \sigma{X1+X2}^2 = \sigma{X1}^2 + \sigma{X_2}^2 = 16^2 + 16^2 = 256 + 256 = 512

      • Standard Deviation: \sigma{X1+X_2} = \sqrt{512} \approx 22.627

  • Conclusion: Observations of 2X will show more variability (\sigma = 32) than observations of X1 + X2 (\sigma \approx 22.6), which may seem counterintuitive but is correct.

Section 7.5 Binomial and Geometric Distributions

Binomial Distributions
  • Properties of a Binomial Experiment:

    1. Fixed Number of Trials (n): The experiment consists of a predetermined count of identical trials.

    2. Two Possible Outcomes: Each trial can only result in one of two outcomes: Success (S) or Failure (F).

    3. Independence: The outcomes of different trials are independent of each other.

    4. Constant Probability of Success (p): The probability that a trial results in a success is the same for every trial.

  • Binomial Random Variable: X = number of successes observed when a binomial experiment of n trials is performed.

  • Binomial Probability Distribution: The probability distribution of X (denoted P(X=x) usually).

    • Formula for Binomial Probability: P(X=x) = \nullnCx \cdot p^x \cdot (1-p)^{n-x} where \nullnCx = \frac{n!}{x!(n-x)!} (x successes among n trials).

    • Note: This formula can be tedious; statistical software, graphing calculators, or Appendix Table 9 can be used.

  • Example: Computer Sales

    • Scenario: 60\% of computers sold are laptops (p=0.60), 40\% are desktops (1-p=0.40). Observe the next 12 customers (n=12).

    • Random Variable: X = number of laptops among these 12 customers.

    • Verification as Binomial: Fixed n=12, two outcomes (laptop/desktop), independence (assumed), constant p=0.60.

    • Calculating Probabilities:

      • P(X=4) = \null{12}C4 \cdot (0.60)^4 \cdot (0.40)^8

      • P(4 \leq X \leq 7) = P(X=4) + P(X=5) + P(X=6) + P(X=7).

      • P(4 < X < 7) = P(X=5) + P(X=6).

  • Mean and Standard Deviation of a Binomial Random Variable:

    • Mean: \mu_x = np

    • Standard Deviation: \sigma_x = \sqrt{np(1-p)}

  • Example: Budgets and Tracking Spending

    • Scenario: 40\% of U.S. adults budget and track spending (p=0.4). Random sample of n=25 adults.

    • Random Variable: X = number in the sample who budget and track spending (Success).

    • Mean (\mux): \mux = np = 25(0.40) = 10.0

    • Standard Deviation (\sigmax): \sigmax = \sqrt{np(1-p)} = \sqrt{25(0.40)(0.60)} = \sqrt{6} \approx 2.449

Geometric Distributions
  • Properties of a Geometric Experiment:

    1. Independence: The trials are independent.

    2. Two Possible Outcomes: Each trial can result in Success or Failure.

    3. Constant Probability of Success (p): The probability of success is the same for all trials.

    4. No Fixed Number of Trials: The experiment continues until the first success is observed.

  • Geometric Random Variable: X = number of trials until the first success is observed (including the success trial).

  • Geometric Probability Distribution: P(X=x) = (1-p)^{x-1}p

  • Example: Jumper Cables

    • Scenario: 40\% of students (p=0.4) carry jumper cables. Stopping students until one is found with cables.

    • Random Variable: X = number of students stopped until the first one with jumper cables is found.

    • Calculating Probabilities:

      • P(X=1) = (1-0.4)^{1-1}(0.4) = (0.6)^0(0.4) = 0.4

      • P(X=2) = (1-0.4)^{2-1}(0.4) = (0.6)^1(0.4) = 0.24

      • P(X=3) = (1-0.4)^{3-1}(0.4) = (0.6)^2(0.4) = 0.144

      • P(X \leq 3) = P(X=1) + P(X=2) + P(X=3) = 0.4 + 0.24 + 0.144 = 0.784

Section 7.6 Normal Distributions

Normal Distributions (General Properties)
  • Characteristics: Continuous probability distributions that are bell-shaped and symmetric.

  • Distinguished by Parameters: Identified by their mean (\mu) and standard deviation (\sigma).

  • Other Names: Sometimes called normal curves.

Standard Normal Distribution
  • Definition: The normal distribution with a mean of \mu = 0 and a standard deviation of \sigma = 1.

  • Standard Normal Curve (or z-curve): The corresponding density curve for the standard normal distribution.

  • Z-variable: The letter z represents a random variable whose distribution is the standard normal curve.

  • Required Skills for Normal Distributions:

    1. Calculate Probabilities: Find areas under a normal curve over given intervals.

    2. Characterize Extreme Values: Determine values corresponding to the largest \%, smallest \%, etc.

  • Using the Table of Standard Normal Curve Areas (Appendix Table 2):

    • The table provides the cumulative area to the left of a given z-value: P(z < z^) or P(z \leq z^).

    • To use: Locate the row for the sign and first decimal of z^, and the column for the second decimal of z^.

  • Examples of Finding Standard Normal Curve Areas:

    • P(z < -1.76) = 0.0392 (from table, -1.7 row, .06 column). This means ~3.9\% of z values are smaller than -1.76.

    • P(z \leq 0.58) = 0.7190 (0.5 row, .08 column).

    • P(z < -4.12) \approx 0 (values far in the tail, beyond table limits, have probabilities close to 0).

    • P(z < 4.18) \approx 1 (values far in the upper tail, beyond table limits, have probabilities close to 1).

    • Area between two z-values: P(a < z < b) = P(z < b) - P(z < a).

      • P(-1.76 < z < 0.58) = P(z < 0.58) - P(z < -1.76) = 0.7190 - 0.0392 = 0.6798.

      • P(-2.00 < z < 2.00) = P(z < 2.00) - P(z < -2.00) = 0.9772 - 0.0228 = 0.9544 \approx 0.95.

        • Connection to Empirical Rule: Approximately 95\% of values are within 2 standard deviations of the mean for a normal distribution.

    • Area to the right of a z-value: P(z > z^) = 1 - P(z < z^).

      • P(z > 1.96) = 1 - P(z < 1.96) = 1 - 0.9750 = 0.0250. (This means 2.5\% of the area is to the right of 1.96).

      • P(z > -1.28) = 1 - P(z < -1.28) = 1 - 0.1003 = 0.8997 \approx 0.90.

Identifying Extreme Values
  • Goal: Find the z-value (z^*) that corresponds to a given probability (e.g., the value that separates the smallest 2\%).

  • Method: Look up the desired cumulative area in the body of Appendix Table 2 and find the corresponding z^* value.

  • Example: Finding the Smallest 2\%

    • Find z^ such that P(z < z^) = 0.02.

    • In the table, find the area closest to 0.0200, which is 0.0202, corresponding to z^* = -2.05.

    • Conclusion: Z-values less than -2.05 constitute the smallest 2\% of the standard normal distribution.

  • Example: Finding the Largest 5\%

    • Find z^ such that P(z > z^) = 0.05.

    • Since the table gives left-tail areas, convert to cumulative area: P(z < z^*) = 1 - 0.05 = 0.95.

    • In the table, 0.9500 falls exactly between 0.9495 (z=1.64) and 0.9505 (z=1.65).

    • Use the midpoint: z^* = 1.645.

    • Conclusion: Z-values greater than 1.645 constitute the largest 5\% of the standard normal distribution. By symmetry, -1.645 separates the smallest 5\% from others.

Other Normal Distributions
  • Finding Normal Distribution Probabilities (for any \mu and \sigma):

    1. Standardize Interval Endpoints: Convert the x values to z-scores.

      • z = \frac{x - \mu}{\sigma}

    2. Use z-curve Areas: Use technology or the standard normal table to find probabilities for the z-scores.

      • P(x < b) = P(z < b^*)

      • P(x > a) = P(z > a^*)

      • P(a < x < b) = P(a^* < z < b^*)

  • Example: Newborn Birth Weights

    • Random Variable: X = birth weight (grams).

    • Given: Normal distribution with \mu = 3500 grams, \sigma = 600 grams.

    • Question 1: Proportion of birth weights between 2900 and 4700 grams? (P(2900 < X < 4700))

      • Standardize x=2900: z_1 = \frac{2900 - 3500}{600} = \frac{-600}{600} = -1.00

      • Standardize x=4700: z_2 = \frac{4700 - 3500}{600} = \frac{1200}{600} = 2.00

      • P(2900 < X < 4700) = P(-1.00 < z < 2.00)

      • = P(z < 2.00) - P(z < -1.00) = 0.9772 - 0.1587 = 0.8185

      • Conclusion: About 82\% of birth weights are between 2900 and 4700 grams.

    • Question 2: Probability that a randomly selected baby has a birth weight greater than 4500 grams? (P(X > 4500))

      • Standardize x=4500: z = \frac{4500 - 3500}{600} = \frac{1000}{600} \approx 1.67

      • P(X > 4500) = P(z > 1.67)

      • = 1 - P(z < 1.67) = 1 - 0.9525 = 0.0475

Describing Extreme Values in a Normal Distribution
  • Goal: Find the x^* value that separates a given percentage of extreme values.

  • Method: First find the corresponding z^ value using the standard normal table, then translate it back to an x^ value.

    • Conversion Formula: x^* = \mu + z^*\sigma

  • Example: Garbage Truck Processing Times

    • Random Variable: X = total processing time (minutes).

    • Given: Normal distribution with \mu = 13 minutes, \sigma = 3.9 minutes.

    • Question 1: Describe the 10\% of trucks with the longest processing times.

      • This corresponds to finding x^ such that P(X > x^) = 0.10.

      • First, find z^: P(z > z^) = 0.10 \implies P(z < z^*) = 0.90.

      • From table, the closest z^* for cumulative area 0.90 is 1.28.

      • Convert to x^1: x^* = \mu + z^*\sigma = 13 + 1.28(3.9) = 13 + 4.992 = 17.992 minutes.

      • Conclusion: About 10\% of trucks have processing times longer than 17.992 minutes.

    • Question 2: Describe the 5\% of trucks with the fastest processing times.

      • This corresponds to finding x^ such that P(X < x^) = 0.05.

      • First, find z^: From table, the closest z^ for cumulative area 0.05 is -1.645.

      • Convert to x^: x^ = \mu + z^*\sigma = 13 + (-1.645)(3.9) = 13 - 6.416 = 6.584 minutes.

      • Conclusion: About 5\% of trucks have processing times less than 6.584 minutes.