Notes on Special Distribution and the Likelihood Function
Special Distribution and the Likelihood Function
Acknowledgement of Traditional Owners
- Recognition of Turrbal and Yugara as First Nations owners of QUT lands.
- Respect paid to their elders, stories, customs, and creation spirits.
- Acknowledgment of the role of Aboriginal and Torres Strait Islander peoples at QUT.
Unit Outline
- Overview of key units in the course:
- Data and Variables
- Visualisation
- Data Gathering
- Introduction to Probability
- Special Distributions
- Parameter Uncertainty
- Hypothesis Testing
- Linear Regression in four parts (Polynomials, Categorical Variables, Multiple Variables)
- Model Refinement
Probability Distributions
- Objective: Use sample data to answer questions about a population.
- Probability distributions help in modeling likely ranges of values in predictive models.
- Almost all statistical models are grounded in distributions.
Discrete Distributions - Review
- Defined using a probability mass function (PMF), satisfying:
[ \sum{n} pn = 1 ] - Key discrete distributions include:
- Uniform
- Poisson
- Binomial
- Negative Binomial
- Geometric
Continuous Distributions - Review
- Defined using a probability density function (PDF), satisfying:
[ \int_{x} f(x) dx = 1 ] - Key continuous distributions include:
- Uniform
- Exponential
- Normal
- Student's t
- Note: Normal and t-distributions are foundational in statistical modeling.
The Normal Distribution
- Described as a bell curve, common in statistics.
- If a variable is normally distributed:
[ X \sim N(\mu, \sigma^2) ] - PDF formula:
[ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2} ] - Variations occur as ( \mu ) (mean) and ( \sigma ) (standard deviation) change.
Estimating a Normal Distribution
- Use sample data to estimate parameters ( \mu ) and ( \sigma ):
- Sample mean ( \bar{x} ) estimates population mean ( \mu ).
- Sample standard deviation ( s ) estimates population standard deviation ( \sigma ).
- Calculate population mean from census data:
[ \mu = \frac{1}{N} \sum{i=1}^{N} xi ]. - Sample mean from sample data:
[ \bar{x} = \frac{1}{n} \sum{i=1}^{n} xi ] where ( N > n ). - The sample standard deviation is defined to adjust for degrees of freedom:
[ s = \sqrt{\frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2} ]
The Standard Normal Distribution
- Defined with ( \mu = 0 ) and ( \sigma = 1 ).
- Transformation to standard form:
[ Z = \frac{X - \mu}{\sigma} ] - Useful for comparison in statistical tests.
Quantile-Quantile Plots
- Tool for assessing normality visually.
- Standardise and plot quantiles against the standard normal distribution.
- Deviance is common at extreme percentiles (outliers).
More Continuous Distributions
- Introduce distributions relevant for specific scenarios, e.g., time between events.
Student’s t Distribution
- Developed by William Sealy Gosset (published as ‘Student’) for small samples.
- Known for heavy tails accounting for variability in small sample sizes.
- Approaches the Normal distribution as degrees of freedom ( v ) increases.
The Exponential Distribution
- Describes time between events, non-negative.
- Characterized by rate parameter ( \lambda ):
[ f(x) = \lambda e^{-\lambda x}, \; x \geq 0 ]
The Likelihood Function
- Represents a parametric probability model with parameters for observations ( Xi ).
[ L(\theta) = \prod{i=1}^{n} fi(Xi; \theta) ]
- Denotes the probability of observing these values for a given parameter ( \theta ).
Maximum Likelihood Estimator (MLE)
- Identifies the parameter value that maximizes the likelihood function.
- The log-likelihood is often used for easier computation:
[ l(\theta) = \log L(\theta) ]
Facts about the MLE
- MLE characteristics include:
- ( l' (\hat{\theta}_{MLE}) = 0 ), indicating an extremum.
- ( \hat{\theta}_{MLE} ) is approximately Normal, making it a good estimator.
- Mean Squared Error (MSE) is an important measure of performance in estimation, discussed later in the course.