Machine Learning Notes
Module 1: Introduction to Machine Learning
Part 1: Machine Learning vs. Statistical Learning
Focus:
Statistical Learning: Hypothesis testing and interpretability.
Machine Learning: Predictive accuracy.
Driver:
Statistical Learning: Math, theory, hypothesis.
Machine Learning: Fitting data.
Data Size:
Statistical Learning: Any reasonable set.
Machine Learning: Big data.
Data Type:
Statistical Learning: Structured.
Machine Learning: Structured, unstructured, semi-structured.
Dimensions/Scalability:
Statistical Learning: Mostly low dimensional data.
Machine Learning: High dimensional data.
Model Choice:
Statistical Learning: Parameter significance & in-sample goodness of fit.
Machine Learning: Cross-validation of predictive accuracy on partitions of data.
Interpretability:
Statistical Learning: High.
Machine Learning: Low.
Strength:
Statistical Learning: Understand causal relationship & behavior.
Machine Learning: Prediction (forecasting and nowcasting).
The Big Picture
As researchers or practitioners, the goal is to solve real-world problems through inference or predictions.
Examples of relationships to explore:
Sales and advertisement/R&D expenditure/seasonality/industry.
Quantity demanded and price/income/technology/price of competitors.
Wage and education/age/gender/experience.
Simple Example: Quantifying Wage Components
Drivers: Education, age, experience, IQ, ethnicity, race, gender, industry, location, working hours.
Linear model example:
Considerations:
Interpretability of the model.
Prediction making ability.
Different Example: Cat vs. Dog Classification (Image Recognition)
Considerations:
Interpretability of the model is not as important.
Accuracy of predictions is critical.
Limitations of Econometrics/Structured ML
Econometrics/structured ML can only handle structured (tabular) data.
Unstructured data includes images, text, audio, and video.
More Complex Example: Stock Price Prediction
Classical drivers: Company's fundamentals, competitors, technical analysis, seasonality.
Other factors: Market sentiment (news, tweets, blogger opinions), satellite images from parking lots.
Why Learn Machine Learning?
Deep learning is prevalent.
Better career opportunities.
Hedge against the next recession.
Part 2: What is Machine Learning?
Machine Learning is a subset of AI that enables computers to learn from data.
A machine learning system is trained with algorithms rather than explicitly programmed.
ML involves automated detection of meaningful patterns in data and applying those patterns to make predictions on unseen data.
The goal is to maximize performance on unseen data and generalize.
Artificial Intelligence vs. Machine Learning vs. Deep Learning
Artificial Intelligence: Any technique that enables machines to mimic human behavior (1950s).
Machine Learning: A subset of AI that enables computers to learn from data, models are trained with a set of algorithms (1980s).
Deep Learning: A subset of ML that extracts patterns from data using neural networks (2010s).
Part 3: Different Types of Machine Learning Algorithms
Supervised Learning
Unsupervised Learning
Reinforcement Learning
Supervised Learning
Computers learn to model relationships based on training data where inputs and outputs are labeled.
Trained algorithms are used to predict outcomes for test data.
Regression:
Predicting stock market returns.
Predicting housing prices.
Classification:
Generating buy, sell, hold signals.
Estimating the likelihood of a successful M&A or IPO.
Predicting credit default rate.
Classification on winning and losing funds or ETFs.
Unsupervised Learning
Computers are trained on unlabeled train data without any guidance to discover underlying patterns and find groups of samples that behave similarly.
Clustering:
Grouping companies into peer groups based on non-standard characteristics.
Client profiling and asset allocation.
Portfolio diversification and stock selection based on co-movements similarities.
Dimensionality Reduction:
Identify the most predictive factors underlying asset price movements (to avoid factor zoo).
Reinforcement Learning
A computer (agent) learns from interacting with its environment by producing actions and discovering rewards. The machine explores and exploits to maximize the reward.
Example: A virtual trader (agent) follows trading rules (actions) in a market (environment) to maximize profits (reward).
ML Algorithm Road Map
Supervised:
Regression: Linear/Polynomial, Penalized regression, KNN, SVR, Tree-based Regression models
Classification: Logistic regression, KNN, SVC, Tree-based Classification models
Unsupervised:
Dimensionality Reduction: Principal Component Analysis (PCA)
Clustering: K-Mean, Hierarchical
GitHub Modules
Module 1: Introduction to Machine Learning
Module 2: Setting up Machine Learning Environment
Module 3: Linear Regression (Econometrics approach)
Module 4: Machine Learning Fundamentals
Module 5: Linear Regression (Machine Learning approach)
Module 6: Penalized Regression (Ridge, LASSO, Elastic Net)
Module 7: Logistic Regression
Module 8: K-Nearest Neighbors (KNN)
Module 9: Classification and Regression Trees (CART)
Module 10: Bagging and Boosting
Module 11: Dimensionality Reduction (PCA)
Module 12: Clustering (KMeans – Hierarchical)
Warning: A ML algorithm will always find a pattern, even if there is none.