Class8

Class Overview

  • Class Title: Random Variables and Probability Models

  • Course: Introduction to Statistics for Social Sciences: Statistics I

  • Institution: Department of Statistics, UC3M

Chapter 8: Key Topics

  1. Random Variables

  2. Discrete Probability Models:

    • Bernoulli Distribution

    • Binomial Distribution

    • Geometric Distribution

  3. Continuous Probability Models:

    • Normal Distribution

Objective

  • Introduce random variables and select classes of variables frequently encountered in real-life situations.

    • Helps quantify the uncertainty involved in the sampling process.

1. Random Variables

  • Move from querying specific events to analyzing distributions of events.

    • Example Question: Distribution of the number of demonstrations attended by UC3M students.

    • Questions involve average attendance in public demonstrations.

Frequency Tables and Probability Distributions

  • Sample generated data from analyzing attendance of 100 UC3M students:

    • Frequency Table

      • Times attended demonstrations (X) | Absolute Frequency (ni) | Relative Frequency (fi) | Cumulative Relative Frequency (Fi)

      • 0 | 31 | 0.31 | 0.31

      • 1 | 19 | 0.19 | 0.50

      • 2 | 8 | 0.08 | 0.58

      • 3 | 9 | 0.09 | 0.67

      • 4 | 12 | 0.12 | 0.79

      • 5 | 14 | 0.14 | 0.93

      • 6 | 7 | 0.07 | 1.00

      • 6 | 0 | 0 | 1.00

What If Scenarios

  • Consider the frequency of all UC3M students and the probability of attending demonstrations:

    • Probability Table

      • Times attended demonstrations (X) | Probability P(X=x) | Cumulative Probability P(X≤x)

      • 0 | P(X=0) | P(X≤0)

      • 1 | P(X=1) | P(X≤1)

      • ...

      • Total Probability = 1.00

Statistics Measures

  • Questions involve calculating the average and median for the population, denoted as μ and E[X].

    • Population Mean Calculation:

      • [ μ = 0 \cdot P(X = 0) + 1 \cdot P(X = 1) + 2 \cdot P(X = 2) + … ]

    • Definition of Population Median: First point where P(X ≤ x) is at least 50%.

    • Other measures: Population Variance (σ²) and Standard Deviation (σ).

2. Discrete Probability Models

Bernoulli Trials

  • Characteristics of a Bernoulli model:

    • Two possible results: success (B = 1) and failure (B = 0).

    • Independent trials with a constant probability of success (P(B=1) = p).

Geometric Distribution

  • Refers to the number of failures before the first success in a Bernoulli trial:

    • P(F=0) = p

    • P(F=1) = (1-p)⋅p

    • P(F=f) = (1-p)ⁱ⋅p for f = 0, 1, 2, ...

    • Mean (E[F]): [ E[F] = \frac{1-p}{p} ]

    • Variance (V[F]): [ V[F] = \frac{1-p}{p²} ]

Example: CCOO Delegate Probability

  • If 10% of CCOO members are delegates, the probability that the first delegate is the fourth person interviewed: [ 0.9^3 \cdot 0.1 ]

Exercises

  • YouGov Survey: Probability of finding a strongly supportive person among respondents.

  • Calculate expected opposition or support based on survey data.

3. Continuous Random Variables

Cumulative Distribution Function (CDF)

  • For a discrete variable: step function;

  • For a continuous variable: smooth, non-decreasing function.

    • Properties:

      • 0 ≤ F(x) ≤ 1

      • F(-∞) = 0

      • F(∞) = 1

Probability Density Function

  • For discrete variables: Probability mass function (pmf) is defined;

  • For continuous variables: utilize a density function (f(x)).

    • Properties:

      • Area under density equals CDF and integral equals 1.

Normal Distribution (Gaussian Distribution)

  • Commonly bell-shaped:

    • Examples: weights, heights, course grades.

  • Notation: [ X \sim N(μ,σ²) ]

Properties of the Normal Distribution

  • Approximately 95.45% of observations lie within 2 standard deviations from the mean.

Calculating Probabilities

  • In historical practices, transformed into standard normal distribution (Z-score): [ Z = \frac{(X-μ)}{σ} ]

  • Now use software like Excel.

Example: Pedro Sanchez Ratings

  • Mean rating of 4.04 with standard deviation 2.75.

  • Probability of at least 5 determined by [ p = 1 - 0.6365 = 0.3635 ]

Exercises

  • UC3M professor rating probability based on given statistics.

  • Graphical exercise related to income distribution among voters.