Probabilistic Models
Fitting the Model: The Least Squares Approach
Model Assumptions
Assessing the Utility of the Model: Making Inferences about the Slope β1
The Coefficients of Correlation and Determination
Using the Model for Estimation and Prediction
A Complete Example
Purpose: Relate one quantitative variable to another quantitative variable.
Assessing Model Fit: Determine how well the model represents the sample data.
Definition: Represents phenomena with relationships between variables.
Types:
Deterministic Models: Exact relationships with negligible prediction error (e.g., y = 15x).
Probabilistic Models: Incorporates random error (e.g., y = 15x + ε) due to other variables.
Equation: y = Deterministic component + Random error
Assumption: Mean value of random error (ε) is 0, leads to E(y) = Deterministic component.
Equation: y = β0 + β1x + ε
y: Dependent variable (response)
x: Independent variable (predictor)
β0: y-intercept (value of y when x = 0)
β1: Slope (change in y for a 1-unit increase in x)
Visual representation of (xi, yi) pairs to evaluate model fit.
Properties:
Mean error = 0.
Minimum sum of squared errors (SSE).
Equation for estimates: (y-hat) = β0 + β1x
Formulas for estimates:
( eta0 = \bar{y} - \beta1 \bar{x} )
( \beta1 = \frac{SS{xy}}{SS_{xx}} )
Data collected over 5 months showing advertising expenditure and sales revenue.
Calculated estimates: ŷ = -0.1 + 0.7x (slope indicates increase in revenue for advertising increase).
Mean of probability distribution of ε is 0.
Variance of ε is constant across all x values.
Normal distribution of ε.
Independence of ε values.
Slope estimator distribution is normal (dependent on assumptions).
Tests of significance can be conducted (e.g., hypothesis tests for β1).
Range: -1 to +1; measures linear relationship's strength.
Formula:
( r = \frac{SS{xy}}{\sqrt{SS{xx} \cdot SS_{yy}}} )
Measures proportion of total variability in y explained by x.
Formula:
( r^2 = 1 - \frac{SSE}{SS_{yy}} )
Probabilistic Model: Combines deterministic elements with random error.
Method of Least Squares: Estimates intercept and slope of the regression line to minimize errors.
Utility of Model: Conduct hypothesis tests to evaluate significance of predictor variable.
Correlation Coefficients: Evaluate strength and degree of association between variables without implying causation.
Coefficient of Determination: Provides insight into the effectiveness of the model in explaining variability of the dependent variable.