Probability Distributions and Linear Functions Notes

Linear Functions in Statistics

  • Linear functions are commonly used in statistics.
  • A new variable is created as a function of another variable.
  • Example: Profit as a function of sales.

Profit Calculation Example

  • Profit = 0.3 \times \text{Sales} - \text{Fixed Cost}, where fixed cost is R6,000.
  • The 0.3% comes from the company's records.
  • Sales is a variable.
  • If sales is R1,000, then the profit is (0.3 \times 1000) - 6000.

General Form of a Linear Function

  • Z = aX + b, where a is a constant, X is the variable, and b can be positive or negative.
  • a can be a fraction (division) or greater than one (multiplication).
  • b includes subtraction and addition.

Expanding the Equation with Multiple Variables

  • Z = a1X1 + a2X2 + … + apXp + c, where there are multiple variables.
  • In practice, profit depends on sales, advertising, labor costs, etc.

Linear Combinations in Statistics

  • Powerful, useful, and mathematically simple to work with.
  • Multivariate statistics focuses on these types of models.
  • Questions include how to estimate the coefficients (a values).

Applying Linear Functions

  • Calculate the expected value of Z if we know something about X.
  • Z could be profit, X could be sales.

Expected Value Calculation

  • Expected value of sales is given as 25,000.
  • a = 0.3, b = 6,000.
  • Question: What is the expected profit?
  • Expected profit = 0.3 \times \text{Expected Sales} - 6000.
  • = 0.3 \times 25000 - 6000.

Variance Calculation

  • Standard deviation (\sigma) is given as 4,000.
  • Variance of sales (\sigma^2) = (4000)^2 = 16,000,000.

Variance Formula

  • Variance of profit = \text{Variance}(0.3 \times \text{Sales} - 6000).
  • The constant falls away, so 0.3^2 \times \text{Variance of Sales}.
  • Variance of profit = 0.3^2 \times (4000)^2 = 0.09 \times 16,000,000 = 1,440,000.

Interpretation of Variance

  • The variance tells us something about the fluctuation of the profit.
  • Graph: Sales vs. Profit with a positive slope and a negative y-intercept at -6,000.
  • The slope is 0.3.
  • Expected value of profit is on the line.
  • Variance/standard deviation allows studying fluctuation around expected value.

Estimation

  • The 0.3 and -6,000 are often estimated from data.
  • Sales fluctuate, but we look for a signal within that fluctuation.

Regression

  • Regression deals with linear combinations (simple and multiple regression).

Probability Distributions

  • Building blocks in statistics.
  • Distributions like binomial, Poisson, hypergeometric, etc.
  • Mathematical expressions upon which models are built.

Binomial Distribution

  • Logistic regression is used in practice, which relies on binomial distribution.
  • Predicts whether a person should get a loan or not, or whether they have a disease.

Poisson Distribution

  • Poisson regression.
  • Predicting counts (e.g., the number of phone calls a call center receives).

Normal Distribution

  • Deals with continuous data.
  • Least squares regression, LASSO, or ridge regression.
  • Foundation for hypothesis tests and confidence intervals.

Binomial Distribution Details

  • Examples:
    • Tossing a coin (two outcomes).
    • Throwing a dice (six outcomes).
    • Inspecting light bulbs (defective or non-defective).
  • Random variable: number of heads or number of defective light bulbs.
  • Predict outcomes if probabilities are known.

Success and Failure

  • Success: The event of interest (e.g., getting heads, defective product).
  • Probability of success: \pi.
  • Probability of failure: 1 - \pi.
  • Empirical probabilities are used for defective products.

Applications

  • Manufacturing plant (defective/non-defective).
  • Contracts (yes/no).
  • Always a yes or no involved.

Combinations

  • Sample size.
  • Two outcomes: success and failure.
  • X is the random variable, and x are the possible values (0, 1, 2, …, N).

Formula for Binomial Distribution

  • P(X = x) = \binom{n}{x} \pi^x (1 - \pi)^{n-x}, where \binom{n}{x} is n combination x.
  • Completes the table of probabilities.

Properties of Binomial Distribution

  • Expected value: E[X] = n \pi
  • Variance: Var(X) = n \pi (1 - \pi)
  • \mu = n \pi, \sigma^2 = n \pi (1 - \pi)
  • X \sim \text{Binomial}(n, \pi)

Poisson Distribution details:

  • Has a different setup than the binomial.
  • Deals with an area of opportunity.

Area of Opportunity Examples:

  • Time: Number of phone calls per day (day is area of opportunity).

  • Space: Number of scratches on a car per square meter.

  • Variable is still a screen variable.

Modeling with Poisson Distribution:

  • The number of computer crashes per day

  • The number of mosquito bites on a person

  • Lambda (\lambda): Expected value per area of opportunity.