DSC510-machine learning

Types of Learning:
- Supervised Learning: Uses labeled input/output pairs to learn a function (y = f(X)).
  - Types:
    - Classification: Output y is discrete labels (e.g., cat or dog).
    - Regression: Output y is continuous (e.g., predicting prices).
- Unsupervised Learning: Works with unlabeled input to find patterns.
  - Types:
    - Clustering: Group data points based on similarities.
    - Dimensionality Reduction: Reduces number of variables.

Supervised Learning Applications:
- Image recognition (deciding if an image is a cat or dog).
- Predicting user ratings for restaurants.
- Spam detection in emails.
Unsupervised Learning Applications:
- Clustering handwritten digits into classes.
- Identifying trending topics on social media.

Supervised Learning Techniques:
- k-Nearest Neighbors (k-NN)
- Naïve Bayes
- Linear Regression & Logistic Regression
- Support Vector Machines (SVM)
- Random Forests
- Neural Networks
Unsupervised Learning Techniques:
- Clustering algorithms
- Matrix Factorization (PCA, SVD)
- Hidden Markov Models (HMM)

Metrics:
- Accuracy
- Area Under Curve (AUC)/Receiver Operating Characteristic (ROC)
- Precision and Recall
- F1 Score
Considerations:
- Speed and Scalability
- Robustness against outliers, noise, and missing values
- Interpretability (transparency of model decisions)
- Model compactness for deployment in mobile devices.

Concept:
- Identify the k closest labeled instances to a query item.
- Use the most frequent label among the nearest neighbors for classification.
Voting Method:
- Majority voting for classification.
- Average for regression.

Common Distances:
- Euclidean Distance: d(x, y) = || x - y ||
- Manhattan Distance: Sum of absolute differences.
- Cosine Similarity: Mainly for text data.
- Hamming Distance: Used for categorical data.
- Jaccard Distance: Measures similarity between sets.

Definitions:
- Bias: Error due to overly simplistic assumptions in the learning algorithm.
- Variance: Error due to excessive complexity in the model leading to model sensitivity to fluctuations in the training set.
Bias-Variance Tradeoff:
- Complex models tend to have lower bias and higher variance.
- Simple models tend toward higher bias and lower variance.

Tradeoff:
- Small k: Low bias but high variance.
- Large k: High bias but low variance.

Leave-One-Out: Each instance serves as a validation set at one point during training.
K-Fold Cross-Validation: Data is divided into k subsets for training/testing iterations.

Overfitting: Model performs well on training data but poor on unseen data.
Underfitting: Model does not capture underlying trend of the data adequately.

Structure:
- Flow-chart-like model for decisions and classifications.
- Nodes represent features and branches represent outcomes.
Generation:
- Constructed using greedy algorithms based on information gain or Gini impurity.

Use multiple models to improve predictions:
- Bagging: Combines predictions by averaging or voting.
- Boosting: Sequentially builds models, each correcting errors made by previous ones.
- Stacking: Combines multiple models at different levels.

Ensemble of decision trees trained on different subsets of data with random feature selection at each split.
Reduces variance and improves predictive performance.

Outputs probability estimates which can be transformed into class predictions using a logistic function.
Regression coefficients are estimated using maximum likelihood estimation.

Continuously updates weights as new data comes in, adapting to changes without retraining from scratch.