Class8
Class Overview
Class Title: Random Variables and Probability Models
Course: Introduction to Statistics for Social Sciences: Statistics I
Institution: Department of Statistics, UC3M
Chapter 8: Key Topics
Random Variables
Discrete Probability Models:
Bernoulli Distribution
Binomial Distribution
Geometric Distribution
Continuous Probability Models:
Normal Distribution
Objective
Introduce random variables and select classes of variables frequently encountered in real-life situations.
Helps quantify the uncertainty involved in the sampling process.
1. Random Variables
Move from querying specific events to analyzing distributions of events.
Example Question: Distribution of the number of demonstrations attended by UC3M students.
Questions involve average attendance in public demonstrations.
Frequency Tables and Probability Distributions
Sample generated data from analyzing attendance of 100 UC3M students:
Frequency Table
Times attended demonstrations (X) | Absolute Frequency (ni) | Relative Frequency (fi) | Cumulative Relative Frequency (Fi)
0 | 31 | 0.31 | 0.31
1 | 19 | 0.19 | 0.50
2 | 8 | 0.08 | 0.58
3 | 9 | 0.09 | 0.67
4 | 12 | 0.12 | 0.79
5 | 14 | 0.14 | 0.93
6 | 7 | 0.07 | 1.00
6 | 0 | 0 | 1.00
What If Scenarios
Consider the frequency of all UC3M students and the probability of attending demonstrations:
Probability Table
Times attended demonstrations (X) | Probability P(X=x) | Cumulative Probability P(X≤x)
0 | P(X=0) | P(X≤0)
1 | P(X=1) | P(X≤1)
...
Total Probability = 1.00
Statistics Measures
Questions involve calculating the average and median for the population, denoted as μ and E[X].
Population Mean Calculation:
[ μ = 0 \cdot P(X = 0) + 1 \cdot P(X = 1) + 2 \cdot P(X = 2) + … ]
Definition of Population Median: First point where P(X ≤ x) is at least 50%.
Other measures: Population Variance (σ²) and Standard Deviation (σ).
2. Discrete Probability Models
Bernoulli Trials
Characteristics of a Bernoulli model:
Two possible results: success (B = 1) and failure (B = 0).
Independent trials with a constant probability of success (P(B=1) = p).
Geometric Distribution
Refers to the number of failures before the first success in a Bernoulli trial:
P(F=0) = p
P(F=1) = (1-p)⋅p
P(F=f) = (1-p)ⁱ⋅p for f = 0, 1, 2, ...
Mean (E[F]): [ E[F] = \frac{1-p}{p} ]
Variance (V[F]): [ V[F] = \frac{1-p}{p²} ]
Example: CCOO Delegate Probability
If 10% of CCOO members are delegates, the probability that the first delegate is the fourth person interviewed: [ 0.9^3 \cdot 0.1 ]
Exercises
YouGov Survey: Probability of finding a strongly supportive person among respondents.
Calculate expected opposition or support based on survey data.
3. Continuous Random Variables
Cumulative Distribution Function (CDF)
For a discrete variable: step function;
For a continuous variable: smooth, non-decreasing function.
Properties:
0 ≤ F(x) ≤ 1
F(-∞) = 0
F(∞) = 1
Probability Density Function
For discrete variables: Probability mass function (pmf) is defined;
For continuous variables: utilize a density function (f(x)).
Properties:
Area under density equals CDF and integral equals 1.
Normal Distribution (Gaussian Distribution)
Commonly bell-shaped:
Examples: weights, heights, course grades.
Notation: [ X \sim N(μ,σ²) ]
Properties of the Normal Distribution
Approximately 95.45% of observations lie within 2 standard deviations from the mean.
Calculating Probabilities
In historical practices, transformed into standard normal distribution (Z-score): [ Z = \frac{(X-μ)}{σ} ]
Now use software like Excel.
Example: Pedro Sanchez Ratings
Mean rating of 4.04 with standard deviation 2.75.
Probability of at least 5 determined by [ p = 1 - 0.6365 = 0.3635 ]
Exercises
UC3M professor rating probability based on given statistics.
Graphical exercise related to income distribution among voters.