Probability Distributions and Linear Functions Notes
Linear Functions in Statistics
- Linear functions are commonly used in statistics.
- A new variable is created as a function of another variable.
- Example: Profit as a function of sales.
Profit Calculation Example
- Profit = 0.3 \times \text{Sales} - \text{Fixed Cost}, where fixed cost is R6,000.
- The 0.3% comes from the company's records.
- Sales is a variable.
- If sales is R1,000, then the profit is (0.3 \times 1000) - 6000.
General Form of a Linear Function
- Z = aX + b, where a is a constant, X is the variable, and b can be positive or negative.
- a can be a fraction (division) or greater than one (multiplication).
- b includes subtraction and addition.
Expanding the Equation with Multiple Variables
- Z = a1X1 + a2X2 + … + apXp + c, where there are multiple variables.
- In practice, profit depends on sales, advertising, labor costs, etc.
Linear Combinations in Statistics
- Powerful, useful, and mathematically simple to work with.
- Multivariate statistics focuses on these types of models.
- Questions include how to estimate the coefficients (a values).
Applying Linear Functions
- Calculate the expected value of Z if we know something about X.
- Z could be profit, X could be sales.
Expected Value Calculation
- Expected value of sales is given as 25,000.
- a = 0.3, b = 6,000.
- Question: What is the expected profit?
- Expected profit = 0.3 \times \text{Expected Sales} - 6000.
- = 0.3 \times 25000 - 6000.
Variance Calculation
- Standard deviation (\sigma) is given as 4,000.
- Variance of sales (\sigma^2) = (4000)^2 = 16,000,000.
Variance Formula
- Variance of profit = \text{Variance}(0.3 \times \text{Sales} - 6000).
- The constant falls away, so 0.3^2 \times \text{Variance of Sales}.
- Variance of profit = 0.3^2 \times (4000)^2 = 0.09 \times 16,000,000 = 1,440,000.
Interpretation of Variance
- The variance tells us something about the fluctuation of the profit.
- Graph: Sales vs. Profit with a positive slope and a negative y-intercept at -6,000.
- The slope is 0.3.
- Expected value of profit is on the line.
- Variance/standard deviation allows studying fluctuation around expected value.
Estimation
- The 0.3 and -6,000 are often estimated from data.
- Sales fluctuate, but we look for a signal within that fluctuation.
Regression
- Regression deals with linear combinations (simple and multiple regression).
Probability Distributions
- Building blocks in statistics.
- Distributions like binomial, Poisson, hypergeometric, etc.
- Mathematical expressions upon which models are built.
Binomial Distribution
- Logistic regression is used in practice, which relies on binomial distribution.
- Predicts whether a person should get a loan or not, or whether they have a disease.
Poisson Distribution
- Poisson regression.
- Predicting counts (e.g., the number of phone calls a call center receives).
Normal Distribution
- Deals with continuous data.
- Least squares regression, LASSO, or ridge regression.
- Foundation for hypothesis tests and confidence intervals.
Binomial Distribution Details
- Examples:
- Tossing a coin (two outcomes).
- Throwing a dice (six outcomes).
- Inspecting light bulbs (defective or non-defective).
- Random variable: number of heads or number of defective light bulbs.
- Predict outcomes if probabilities are known.
Success and Failure
- Success: The event of interest (e.g., getting heads, defective product).
- Probability of success: \pi.
- Probability of failure: 1 - \pi.
- Empirical probabilities are used for defective products.
Applications
- Manufacturing plant (defective/non-defective).
- Contracts (yes/no).
- Always a yes or no involved.
Combinations
- Sample size.
- Two outcomes: success and failure.
- X is the random variable, and x are the possible values (0, 1, 2, …, N).
Formula for Binomial Distribution
- P(X = x) = \binom{n}{x} \pi^x (1 - \pi)^{n-x}, where \binom{n}{x} is n combination x.
- Completes the table of probabilities.
Properties of Binomial Distribution
- Expected value: E[X] = n \pi
- Variance: Var(X) = n \pi (1 - \pi)
- \mu = n \pi, \sigma^2 = n \pi (1 - \pi)
- X \sim \text{Binomial}(n, \pi)
Poisson Distribution details:
- Has a different setup than the binomial.
- Deals with an area of opportunity.
Area of Opportunity Examples:
Time: Number of phone calls per day (day is area of opportunity).
Space: Number of scratches on a car per square meter.
Variable is still a screen variable.
Modeling with Poisson Distribution:
The number of computer crashes per day
The number of mosquito bites on a person
Lambda (\lambda): Expected value per area of opportunity.