1/23
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Input Variable
The input of a model/function
Output Variable
The output of a model/function
Reducible Error
The model that we construct is not a perfect estimate of the actual model, which will cause error in the prediction
Irreducible Error
We cannot reduce the irreducible error no matter how well we estimate the model
Goals of Prediction
We want to use the inputs in order to predict the values of the output variable
Goals of Inference
We want to know the relationship between the independent and dependent variable so that we can influence the output variable through manipulating one or more of then input variables
Parametric Approach
making assumptions about the form or shape of the model (e.g. Linear, exponential)
use training data to estimate the models parameters
Non-parametric Approach
Does not make assumptions about the functional form of f
seeks an estimated f that is:
as close to data points as possible
not too wiggly or rough
Regression
Usually has a quantitative response
Classification
Usually has a qualitative response
When do we use simple linear regression?
When you want to model, understand, or predict the linear relationship between one continuous independent variable (X) and one continuous dependent variable (Y)
Estimating coefficients
Estimate the coefficients using the training data
Find the intercept and the slope so that the line is as close as possible to all data points
measure closeness: least squares method
Assessing coefficients
We use standard errors to estimate the confidence intervals
Standard errors are also used to perform hypothesis testing
Details of the processes and computations are beyond the scope of this course
Assessing the models accuracy
The quality of a linear regression fit is typically assessed using two related queries
RSE (Residual Standard Error): an absolute measure of the lack of fit
R-square: an alternative measure of fit
R-squared is an alternative measure of fit
takes a form of proportion
always between 0 and 1
When to use multiple linear regression?
When you want to estimate a model with more than one predictor
Estimating and assessing coefficients (MLR)
coefficients are estimated using the same least squares approach as simple linear regression
the model dimension increases as the number of predictors increases
Important Predictors
if f-value is statistically significant, atleast one of the predictors is related to y
finding out which predictors are significant is determined by the p-value
Determining the r-squared of each predictor will show the biggest impact on Y
Multiple R-Squared
As more predictors are added to the regression model, the multiple R-squared keeps increasing, just due to chance.
a model with many predictors tends to capture random noise in data
Adjusted R-Square
a modified version of R-square which has been adjusted for number of predictors in the model
compares the explanatory power of the model
Non-linear Relationships
LR assumes a linear relationship between response and predictors
In many cases, the true response is linear
Polynomial Regression
a quadratic slope
model a non-linear relations in data by fitting a curve to the relationship between an independent variable and a dependent variable
Qualitative Predictors
can include qualitative variables as predictors
if a qualitative value has only 2 values then (0, 1) are the dummy variables
qualitative predictors with 3 or more values, we need to create (p-1) dummy variables.
Probability Sampling
random selection
allow statistical inference about entire population
Nonprobabilty Sampling
Non-random selection based on convenience or other criteria
easy to collect initial data
not support statistical inference because of bias