Notes on Special Distribution and the Likelihood Function

Special Distribution and the Likelihood Function

Acknowledgement of Traditional Owners

  • Recognition of Turrbal and Yugara as First Nations owners of QUT lands.
  • Respect paid to their elders, stories, customs, and creation spirits.
  • Acknowledgment of the role of Aboriginal and Torres Strait Islander peoples at QUT.

Unit Outline

  • Overview of key units in the course:
    • Data and Variables
    • Visualisation
    • Data Gathering
    • Introduction to Probability
    • Special Distributions
    • Parameter Uncertainty
    • Hypothesis Testing
    • Linear Regression in four parts (Polynomials, Categorical Variables, Multiple Variables)
    • Model Refinement

Probability Distributions

  • Objective: Use sample data to answer questions about a population.
  • Probability distributions help in modeling likely ranges of values in predictive models.
  • Almost all statistical models are grounded in distributions.

Discrete Distributions - Review

  • Defined using a probability mass function (PMF), satisfying:
    [ \sum{n} pn = 1 ]
  • Key discrete distributions include:
    • Uniform
    • Poisson
    • Binomial
    • Negative Binomial
    • Geometric

Continuous Distributions - Review

  • Defined using a probability density function (PDF), satisfying:
    [ \int_{x} f(x) dx = 1 ]
  • Key continuous distributions include:
    • Uniform
    • Exponential
    • Normal
    • Student's t
  • Note: Normal and t-distributions are foundational in statistical modeling.

The Normal Distribution

  • Described as a bell curve, common in statistics.
  • If a variable is normally distributed:
    [ X \sim N(\mu, \sigma^2) ]
  • PDF formula:
    [ f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{-\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2} ]
  • Variations occur as ( \mu ) (mean) and ( \sigma ) (standard deviation) change.

Estimating a Normal Distribution

  • Use sample data to estimate parameters ( \mu ) and ( \sigma ):
    • Sample mean ( \bar{x} ) estimates population mean ( \mu ).
    • Sample standard deviation ( s ) estimates population standard deviation ( \sigma ).
  • Calculate population mean from census data:
    [ \mu = \frac{1}{N} \sum{i=1}^{N} xi ].
  • Sample mean from sample data:
    [ \bar{x} = \frac{1}{n} \sum{i=1}^{n} xi ] where ( N > n ).
  • The sample standard deviation is defined to adjust for degrees of freedom:
    [ s = \sqrt{\frac{1}{n-1} \sum{i=1}^{n} (xi - \bar{x})^2} ]

The Standard Normal Distribution

  • Defined with ( \mu = 0 ) and ( \sigma = 1 ).
  • Transformation to standard form:
    [ Z = \frac{X - \mu}{\sigma} ]
  • Useful for comparison in statistical tests.

Quantile-Quantile Plots

  • Tool for assessing normality visually.
  • Standardise and plot quantiles against the standard normal distribution.
  • Deviance is common at extreme percentiles (outliers).

More Continuous Distributions

  • Introduce distributions relevant for specific scenarios, e.g., time between events.

Student’s t Distribution

  • Developed by William Sealy Gosset (published as ‘Student’) for small samples.
  • Known for heavy tails accounting for variability in small sample sizes.
  • Approaches the Normal distribution as degrees of freedom ( v ) increases.

The Exponential Distribution

  • Describes time between events, non-negative.
  • Characterized by rate parameter ( \lambda ):
    [ f(x) = \lambda e^{-\lambda x}, \; x \geq 0 ]

The Likelihood Function

  • Represents a parametric probability model with parameters for observations ( Xi ). [ L(\theta) = \prod{i=1}^{n} fi(Xi; \theta) ]
  • Denotes the probability of observing these values for a given parameter ( \theta ).

Maximum Likelihood Estimator (MLE)

  • Identifies the parameter value that maximizes the likelihood function.
  • The log-likelihood is often used for easier computation:
    [ l(\theta) = \log L(\theta) ]

Facts about the MLE

  • MLE characteristics include:
    • ( l' (\hat{\theta}_{MLE}) = 0 ), indicating an extremum.
    • ( \hat{\theta}_{MLE} ) is approximately Normal, making it a good estimator.
    • Mean Squared Error (MSE) is an important measure of performance in estimation, discussed later in the course.