Study Notes on Random Variables and Their Distributions
Chapter 4: Distributions of Random Variables
Author: Leo Zexian Wang
Random Variables
Definition: A random variable (r.v.) is a numeric quantity that takes different values with specified probabilities.
Types of Random Variables:
Discrete Random Variable: Takes values from a discrete set (countable values).
Examples can be either countably finite (like the number of students in a classroom) or countably infinite (such as counting the number of trials until a success).
Continuous Random Variable: Takes values from a continuous range (e.g., any value in an interval).
Discrete Random Variables
Probability Mass Function (pmf): Denoted as , assigns a probability to each possible value in support .
The sum of probabilities must equal 1: .
Example:
Let be the discrete r.v. denoting the number of heads in two successive tosses of a fair coin.
The sample space:
Support: .
Cumulative Distribution Function (cdf): Denoted as , describes the probability that is less than or equal to . F(x) is a non-decreasing function from 0 to 1.
Example cdf:
Expected Value and Variance of Discrete Random Variables
Expected Value :
Example: For the coin toss,
Variance :
This can also be expressed as:
Example from previous scenario:
Thus,
Standard Deviation :
Given by
Continuous Random Variables
Probability Density Function (pdf): Denoted as ; the area under the curve between any two points and equals the probability that falls between them:
Total area under the curve must equal 1:
Example:
Let be the time it takes for a bus to arrive, following a uniform distribution:
Support:
CDF:
Expected Value and Variance of Continuous Random Variables
Expected Value :
Substitute values for our example:
Variance :
Example calculation:
Variance is thus calculated as:
Standard deviation:
Common Distributions of Discrete Random Variables
Discrete Uniform Distribution:
Each outcome is equally likely.
Binomial Distribution:
Models the number of successes in independent trials (e.g., flipping a coin).
Parameters: number of trials and probability of success .
Geometric Distribution:
Models the number of trials until the first success occurs.
Poisson Distribution:
Models the number of events occurring in a fixed interval of time or space, given a known average rate of occurrence.
Common Distributions of Continuous Random Variables
Continuous Uniform Distribution:
All intervals of the same length are equally probable.
Normal Distribution:
Bell-shaped curve symmetric about the mean.
Standard normal distribution: has mean 0 and variance 1.
Student’s t Distribution:
Similar to normal distribution but with heavier tails.
Chi-square Distribution:
Represents the sum of squared standard normal variables.
Exponential Distribution:
Models time until an event occurs (like waiting for a bus).
Normal Distribution
Definition: A continuous distribution characterized by its mean 0;1;\sigma (standard deviation).
Properties:
Mean = Median = Mode.
approaches but never touches the x-axis.
Changing shifts the curve left or right, and changing affects the spread.
Standardization: For any normal random variable , transform to a standard normal variable using:
Calculating Probabilities with Standardization
Example: Absorption rate of cones in the eye follows a normal distribution.
Given mean of 535 nm and standard deviation of 65 nm, calculate proportion absorbing wavelengths between 550 nm and 575 nm.
P(550 < X < 575) = P(X < 575) - P(X < 550)
Compute z-scores:
Result: P(Z < 0.62) - P(Z < 0.23) = 0.1414
Z-Table and Finding Probabilities
Z-Table values represent cumulative probabilities for corresponding z-scores.
Example: To find P(Z < 1.26) identify the area to the left of z = 1.26 in the table (0.8962).
To Find P(Z > 1.26) = 1 - P(Z < 1.26) = 0.1038.
Percentiles and Their Calculation
Finding Percentiles:
To find height in top 10% (e.g., for female heights with mean of 64 inches, sd of 2.5 inches):
Determine the corresponding z-score for the cumulative probability:
Height:
Example for Hummingbirds:
Given weight distribution with mean of 13g and SD of 3.4g, find weight less than 65% of all hummingbirds (35th percentile). Solve with z-score substitution:
yields weight =
The Empirical Rule
Definitions:
Approximately 68% of values lie within one standard deviation of the mean.
Approximately 95% of values lie within two standard deviations.
Approximately 99.7% of values lie within three standard deviations.
Application to ITBS scores:
Questions about the proportion of scores within a given range can be answered using the Empirical Rule.
Binomial Distribution Assumptions
Parameterization: Binomial(n, p) where is the number of independent trials, and is the probability of success.
Example: With a 5% chance of broken eggs in a dozen:
Geometric and Poisson Distributions
Geometric Distribution: Models trials until the first success. Repeats count as trials.
Poisson Distribution: Models the number of occurrences in a fixed time/space.
Important definitions and examples include rates (e.g. phone calls/hour) and counts (e.g. bacteria in a sample).
Memoryless Property
The condition where future probabilities do not depend on past events is fulfilled by both exponential and geometric distributions.
Student’s t Distribution
Definition: Characterized by degrees of freedom, .
As decreases, it becomes more heavy-tailed; approaches normal distribution as .
Functions and R Commands
Practical use in R for generating random variables, calculating mean, probabilities, and densities of distributions. Examples include:
Generate normal random variables:
rnorm(n=10, mean =0, sd=1).Find probabilities using functions like
pnorm, and density withdnorm.
Bivariate Distributions
Understanding multi-variable distribution via joint probability mass and density functions and calculating conditional probabilities.
Covariance and correlation metrics provide understanding of relationships between random variables.
Bivariate Normal Distribution
Joint distributions of two continuous variables, characterized by specific means, variances, and correlations.