Supervised Learning with Scikit-Learn Notes

Machine Learning: Process whereby computers learn to make decisions based on data without explicit programming.
- Example:
- Predicting whether an email is spam or not based on content and sender.
- Clustering books into different categories based on contained words and assigning new books to existing clusters.

Supervised Learning: Type of machine learning where outcomes are known.
- Aim: Build a model that can accurately predict values of previously unseen data.
- Uses features to predict a target variable.
- Example: Predicting a basketball player's position based on points per game.
Unsupervised Learning: Process of discovering hidden patterns in unlabeled data.
- Example: Grouping customers based on purchasing behavior without predefined categories.
- Focuses on clustering, which is a branch of unsupervised learning.

Classification: Predicting the label or category of an observation.
- Example: Predicting whether a bank transaction is fraudulent or non-fraudulent.
- This scenario describes binary classification (two possible outcomes).
Regression: Predicting continuous values.
- Example: Using features like the number of bedrooms and property size to predict the property's price.

Feature: Independent variable or predictor variable (used throughout the course).
Target Variable: Dependent variable or response variable (used throughout the course).

Data Criteria:
- Must not have missing values.
- Must be in numeric format.
- Must be stored in Pandas DataFrames or NumPy arrays.
Exploratory Data Analysis: Necessary to ensure data is in the correct format before performing supervised learning.
- Tools: Various Pandas methods for descriptive statistics and appropriate data visualizations.

General Workflow Steps:
1. Import a model (algorithm for the supervised learning problem) from an scikit-learn module.
2. Instantiate the model (create a variable named 'model').
3. Fit the model to the data to learn patterns about the features and target variable.
- Fit the model to:
  - x: An array of features.
  - y: An array of target variable values.
1. Use the model's predict method, passing new observations (e.g., x_new).
- Example: Feeding features from six emails to a spam classification model results in an array of six values returned by the model:
  - 1 indicates spam.
  - 0 indicates not spam.

Aim to check understanding of the principles of supervised learning and its implementation using real data throughout the course.