SEIS 763 Machine Learning - Vocabulary Flashcards (Fall 2025)
Course Logistics and Orientation
- Course: SEIS 763 Machine Learning (Fall 2025)
- Welcome message: Lecture will be recorded
- Canvas link: https://stthomas.instructure.com/courses/81396
- Start Here: Syllabus, Modules, Welcome content
Course Goals
- Build foundational Machine Learning concepts and algorithms
- Understand basic Math principles that guide Machine Learning
- Use off-the-shelf Python ML Libraries
- Learn standard Training Pipelines
- Data preparation
- Model selection
- Model training and evaluation
Course Boundaries (What this course won’t do)
- Not a Python tutorial
- Not intended to make you an ML expert (yet)
- Not focused on advanced concepts (deep learning, reinforcement learning, computer vision, natural language processing)
- Not about model deployment
Prerequisites
- SEIS 631: Data Preparation and Analysis
- SEIS 632: Data Analytics and Visualization (may be taken concurrently)
- If you’re not comfortable with Python, go over tutorials
Start Here / Welcome Content (Overview)
- Machine Learning builds computational systems that learn from and adapt to data
- ML is essential in IT today and underpins applications across engineering, medicine, finance, and commerce
- Course covers widely used supervised and unsupervised ML algorithms in technical depth
- Emphasis on theoretical underpinnings and hands-on implementation
- Students will learn to evaluate effectiveness and avoid common pitfalls when applying ML
Topics and Practice: The Coding Pyramid
- The Coding Pyramid concept:
- No Code
- Open Source
- From Scratch
- Indicates progression from high-level tools to deep, hands-on implementation
AI, ML, and Deep Learning: Quick Clarifications
- Artificial Intelligence (AI): broad field of making machines exhibit intelligent behavior
- Machine Learning (ML): subset of AI focusing on learning from data to make predictions
- Deep Learning (DL): subset of ML using deep neural networks
- The slide emphasizes this hierarchy and practical distinctions
A Brief History of Artificial Intelligence (Timeline Highlights)
- 1943: The Turing Test concept introduced via natural language interaction framework
- 1950: Turing’s Computing Machinery and Intelligence (proposes the Turing Test as a measure of machine intelligence)
- 1951–1956: Dartmouth Conference marks birth of AI as a field; early milestones in AI
- 1957: Rosenblatt’s Perceptron—the first artificial neural network capable of learning
- 1965: Weizenbaum develops ELIZA, an early natural language processing program that simulates conversation
- 1967: Newell & Simon develop the General Problem Solver (GPS)
- 1974: The first AI winter begins due to unmet expectations and limited progress
- 1980: Expert systems gain popularity for financial forecasting and medical diagnoses
- 1986: Learning representations by back-propagating errors enables training of deeper networks (Hinton, Rumelhart, Williams)
- 1997: IBM Deep Blue defeats Garry Kasparov in chess
- 1999–2010s: NLP and DL advances accelerate; more milestones listed below
- 2011: IBM’s Watson defeats Jeopardy! champions
- 2012: DeepMind’s work gains prominence in deep learning applications
- 2014: DeepFace (Facebook) achieves near-human facial recognition accuracy
- 2015: AlphaGo (DeepMind) defeats top Go player Lee Sedol
- 2017: AlphaZero defeats best chess and shogi engines
- 2020: OpenAI GPT-3 marks a major breakthrough in NLP
- 2021: DeepMind’s AlphaFold2 solves protein folding problem
- 2022: Controversies around LLMs (e.g., LaMDA) and its perceived sentience
- 2023: Legal actions against AI-art tools over copyright concerns (Stability AI, DeviantArt, Midjourney) for remixing artworks
- Additional notes include DeepMind’s acquisition by Google (~2014) for ~$500M, AlphaZero’s successive victories, etc.
- The timeline highlights ethical, legal, and societal implications associated with AI and ML (e.g., copyright, accountability, transparency)
Significance: This timeline shows how ML/AI evolved from theoretical concepts to practical, high-impact technologies while highlighting ongoing ethical and societal debates.
The Human-Reasoning Bridge: Theory vs Practice
- The course emphasizes walking the tight rope between theory and practice
- The idea is to understand theory deeply while gaining hands-on experience with real data
The Coding Pyramid (Revisited)
- From scratch to open source to no-code tools; progressively build intuition and capability
What is Machine Learning? (Core Idea)
- Given some clues (features), guess the answer (prediction)
- Features are the inputs that describe the data
- The learning process optimizes how to map features to outputs
- The prediction is denoted by ŷ (hat y)
- Fundamental question: how to choose a function (model) that generalizes well to unseen data
What is the Machine Learning Problem? (Formal View)
- Given clues/features, find their importance or weights that best predict the answer
- In practice: learn weights W such that the model output matches the true labels
- Intuition: adjust weights to improve predictions on training data and generalize to new data
The 20 Questions Analogy
- ML can be thought of as iteratively asking questions to home in on the right answer by adjusting internal parameters (weights) based on feedback (loss)
Components of a Machine Learning System
- Features
- Label (Output / Target)
- Loss (Cost) Function
- Optimizer (Algorithm to adjust weights)
- Model (The function that maps features to predictions)
Features, Labels, and the Learning Objective
- Features: x (input vector)
- Label: y (true output)
- Prediction: ŷ (predicted output)
- Loss measures how far ŷ is from y
- Objective: learn weights W (and bias b) to minimize the loss
Features: A Simple Intuition with Numbers
- What number am I thinking? example using digits 0–9 to illustrate features and classification
- Visualization exercises use numbers with specific properties (start-end, curved edges, self-intersections) to motivate feature selection and decision boundaries
- This kind of exercise motivates the idea that features describe data points and that a learning system will try to separate or predict based on those features
The Linear Model: Form and Intuition
- Model form (regression):
- Pointwise form: yi = W^T xi + b
- Matrix form: ŷ = X W + b
- Notation:
- n: total samples
- d: number of features (dimensions)
- X ∈ R^{n×d} (rows are samples, columns are features)
- W ∈ R^{d×1}
- b ∈ R (bias term)
- Alternative perspective: sometimes written as y = XW + b with y ∈ R^{n×1}
- Geometric interpretation: W contains discriminative power for each feature; larger magnitude means the feature has more influence on the prediction
Two Common Linear Forms (Equivalent Views)
- View 1 (sample-wise): yi = W^T xi + b
- View 2 (dataset-wide): ŷ = X W + b
- The goal: find W and b that minimize the chosen loss on the training data
The Learning Question: How Do You Learn W?
- You define a loss function that measures prediction error
- You then optimize W (and b) to minimize the loss
- The idea is to adjust W in the direction that reduces the loss (gradient-based optimization in many cases)
- In class, a simple illustrative objective is shown as finding the slope W that minimizes the Mean Squared Error (MSE)
Loss (Cost) Function: Why It Matters
- Loss measures the discrepancy between true labels y and predictions ŷ
- Common choice for regression: Mean Squared Error (MSE)
- Formula:
J(W) = rac{1}{n}
\sum{i=1}^{n} (\hat{y}i - y_i)^2 - Aim: find W (and b) that minimize J(W)
Conceptual Visual: Gradient Descent (Intuition)
- Loss landscape: as you adjust W, the loss value changes
- The optimizer moves in the direction of decreasing loss (toward a minimum)
- Special case shown: the line slope W that minimizes J, yielding perfect or near-perfect predictions on training data
A Simple Linear Regression Example (Key Takeaways)
- Model: ŷ = W x + b
- If W = 1 and b = 0, then ŷ = x (identity mapping)
- If W ≠ 1 or b ≠ 0, predictions scale/shift relative to x
- The optimal W* and b* minimize J(W, b)
- Empirical demonstration: a table shows different W values with corresponding loss, highlighting that W* produces the smallest loss
- Notation for the learned model: y = W^T x + b
Regression vs Classification: Conceptual Difference (Brief Mention)
- Regression: predict a continuous value (e.g., height, weight, flight duration)
- Classification: predict a discrete label (e.g., male/female; digit类别)
- The same linear model concepts apply with different loss functions (e.g., MSE for regression, cross-entropy for classification)
The Standard Data Pipeline (High-Level)
- Practical sequence:
- Data collection
- Data labeling
- Data preparation
- Model selection
- Model training
- Model evaluation
- Model deployment
- Reference slide emphasizes a standard, end-to-end workflow
Practical Demo and Assignments (What’s Coming)
- Python Demo and Assignment 1 components:
- conda setup
- Jupyter Lab
- Python basics
- Submission guide
- Assignment 1 details:
- Due 09/16/2025 at 5:00 PM
- Tests basic Python skills
- Loading multiple files
- Basic data operations using pandas
- Plotting using Matplotlib
- Data source: 20 DL160 flights (MSP-AMS)
- Data fields: Timestamp, Position, Altitude, Speed, Direction
- Deliverables: plot and generate insights
- If you struggle, review Python resources and consider taking the class next semester
Reference Plots for Flight Data (Q5 & Q6 Examples)
- Q5 Reference Plot: Flight Duration (HH:MM) vs. Departure Date (2023-10-)
- Example labels shown: Flight Duration (HH:MM) and Departure Date
- Q6 Reference Plot: Time (HH:MM) breakdown by phase
- Phases: On-ground before takeoff, In-air time, On-ground after landing
- These plots illustrate typical ML/data visualization tasks (time-series, phase breakdowns, distributions)
Formulating a Machine Learning Problem (Detailed View)
- Input values/features: 𝑥
- Description of the number 8 as an example feature description
- Label/Output: 𝑦
- Objective: Learn weights W such that 𝑥 W = 𝑦
- For 1D inputs, this reduces to a simple linear relation: y = W x + b
Linear Model Details: Notation and Dimensions
- Model: y = W x + b
- General form for multiple samples:
- y ∈ R^{n×1} (n samples)
- X ∈ R^{n×d} (features)
- W ∈ R^{d×1}
- b ∈ R (bias, broadcast to all samples)
- Transposed form (for other conventions):
- y^T = W^T X^T + b^T
- The essence: labels are predicted via a weighted combination of features plus a bias term
Label, Feature, Loss, Optimizer, and Model (Summarized)
- Label: y (true value)
- Features: x (input vector)
- Loss: measures prediction error between y and ŷ
- Optimizer: algorithm that updates weights to reduce loss
- Model: the functional form that maps features to predictions
A Simple 1-D Example: Feature-Based Classification/Regression Intuition
- Features can be scalars or vectors; weights combine them into a prediction
- The loss function guides how to adjust the weights to better fit the data
Height/Weight Distribution by Sex (Example Visualization)
- A sample scatter-like concept: Height (in) vs Weight (lbs), separated by Sex (Male, Female)
- Demonstrates how linear relationships can differ across groups and how features might separate classes or predict a continuous target differently by group
Transpose and Shape Notes (Coding vs Theory)
- You can express the same linear model in multiple equivalent ways depending on the matrix orientation
- Two common forms shown:
- ŷ = X W + b (n×d times d×1 = n×1)
- yi = W^T xi + b (individual sample form)
- Dimensional consistency: n = number of samples; d = number of features
Practical Visualization and Interpretation (Homework of the Day)
- The course uses simple numeric demonstrations (e.g., digits 0–9 with features such as start/end, curves, intersections) to illustrate feature extraction and decision boundaries
- These exercises emphasize understanding what features convey about the target and how a model uses them
The Data Pipeline in Practice: Summary
- Data collection → Data labeling → Data preparation → Model selection → Model training → Model evaluation → Deployment
- Each step is essential for building reliable ML systems
Assignments and Resources Summary
- Assignment 1:
- Python skills test; file loading; pandas operations; plotting with Matplotlib
- Data source: 20 DL160 flights (MSP-AMS) with fields: Timestamp, Position, Altitude, Speed, Direction
- Emphasis on practicing data handling and visualization to extract insights
- Resources: Python tutorials and class tutorials; if needed, consider taking the course next semester
Final Quick Reference: Core Equations and Concepts
- Linear model (regression):
- Pointwise form: yi = W^T xi + b
- Dataset form: ŷ = X W + b, with X ∈ R^{n×d}, W ∈ R^{d×1}, b ∈ R
- Loss (Mean Squared Error):
- J(W) = \frac{1}{n} \sum{i=1}^{n} (\hat{y}i - y_i)^2
- Objective: find W and b that minimize J(W) (and generalize to unseen data)
- Conceptual interpretation: Weights capture the discriminative power of features
- Data pipeline components: Features, Label, Loss Function, Optimizer, Model
- Model understanding: Your brain builds a model; in ML, we learn a model by optimization
- Ethical/Practical implications highlighted in course timeline: copyright concerns in AI-generated art, governance of AI systems, and accountability for outputs
Quick Index of Key Terms from Slides
- Artificial Intelligence (AI)
- Machine Learning (ML)
- Deep Learning (DL)
- Perceptron, ELIZA, GPS, Deep Blue, Watson, DeepFace, AlphaGo, AlphaZero, GPT-3, AlphaFold2
- Turing Test, Dartmouth Conference, ELIZA, Backpropagation, Neural Networks
- Loss Function, MSE, Optimizer, Model, Features, Label
- Data Pipeline, Convolution, Go/Chess/Go-like strategic ML milestones, Ethical implications (copyright lawsuits)