9.30 Notes

Exam Duration: Approximately 60 minutes
Average Score: 138 or 139 out of 150
Top Performance: ~30% of students scored 150 out of 150
Difficulty Progression:
- Current exam difficulty: 4/10 to 5/10
- Future exams up to 7/10 to 7.5/10

Attendance changes:
- Transition from quizzes to regular attendance tracking
- Signing attendance sheet required at the end of class

Output Label: Charges
Attributes: Measurement Types:
- Age (Numerical)
- Sex (Categorical: Male/Female)
- Conversion: Male=0, Female=1
- Smoker (Categorical: Yes/No)
- Conversion: Yes=1, No=0
- Region (Categorical with four types)
- Method: One-hot encoding needed

One-hot Encoding Explanation:
- For categorical variables with multiple levels, create separate columns for each category except one.
- Example for regions (Northeast, Northwest, Southeast, Southwest):
- Create three columns: IsNortheast, IsNorthwest, IsSoutheast
- 0 or 1 indicator for each
Correlation Issues:
- Multicollinearity: Avoid perfect correlation among independent variables
- Example: if both gender columns are created, one is redundant and must be removed.

Hypothesis: Older patients incur higher charges
- If age shows a negative coefficient in regression, the model is flawed
Smoke status expected to show higher charges due to associated health risks

Check for missing values before model fitting
Key Steps in Model Setup:
- Split data into training (80%) and testing (20%) sets
- Perform linear regression modeling

R-squared: Measurement of variance explained by the model (unitless).
RMSE (Root Mean Square Error): Has units (Dollar amount) and should not be used for model comparison.

Purpose: Remove variables that do not contribute significantly to model accuracy
Typically, p-values > 0.05 indicate a variable could be removed, although other factors may warrant keeping it in the model.

Decision Trees can handle categorical data directly; no need for one-hot encoding.
Use of gain ratio for classification errors.

Hyperparameter tuning through grid search:
- Using tools to find optimal hyperparameters improves model efficacy.
- Example: Adjusting max depth and min size of splits in decision trees.

Primary Use: Classification problems, especially when outcome variable is binary (0 or 1)
Sigmoid Function: Converts linear model outputs to probabilities between 0 and 1.
- Formula: ext{sigmoid}(Z) = rac{1}{1 + e^{-Z}}
- Output helps determine prediction cutoff at 0.5 for classification decisions.

Explains total error in prediction; logistic regression loss functions can have multiple minima.
Different from linear regression, wherein local and global minima coincide.

Process for analyzing customer churn using logistic regression by addressing categorical variable encoding
Reminder of performance measurement after logistic inclusion.