SEIS 763 Machine Learning - Vocabulary Flashcards (Fall 2025)
Course Logistics and Orientation
Course: SEIS 763 Machine Learning (Fall 2025)
Welcome message: Lecture will be recorded
Canvas link: https://stthomas.instructure.com/courses/81396
Start Here: Syllabus, Modules, Welcome content
Course Goals
Build foundational Machine Learning concepts and algorithms
Understand basic Math principles that guide Machine Learning
Use off-the-shelf Python ML Libraries
Learn standard Training Pipelines
Data preparation
Model selection
Model training and evaluation
Course Boundaries (What this course won’t do)
Not a Python tutorial
Not intended to make you an ML expert (yet)
Not focused on advanced concepts (deep learning, reinforcement learning, computer vision, natural language processing)
Not about model deployment
Prerequisites
SEIS 631: Data Preparation and Analysis
SEIS 632: Data Analytics and Visualization (may be taken concurrently)
If you’re not comfortable with Python, go over tutorials
Start Here / Welcome Content (Overview)
Machine Learning builds computational systems that learn from and adapt to data
ML is essential in IT today and underpins applications across engineering, medicine, finance, and commerce
Course covers widely used supervised and unsupervised ML algorithms in technical depth
Emphasis on theoretical underpinnings and hands-on implementation
Students will learn to evaluate effectiveness and avoid common pitfalls when applying ML
Topics and Practice: The Coding Pyramid
The Coding Pyramid concept:
No Code
Open Source
From Scratch
Indicates progression from high-level tools to deep, hands-on implementation
AI, ML, and Deep Learning: Quick Clarifications
Artificial Intelligence (AI): broad field of making machines exhibit intelligent behavior
Machine Learning (ML): subset of AI focusing on learning from data to make predictions
Deep Learning (DL): subset of ML using deep neural networks
The slide emphasizes this hierarchy and practical distinctions
A Brief History of Artificial Intelligence (Timeline Highlights)
1943: The Turing Test concept introduced via natural language interaction framework
1950: Turing’s Computing Machinery and Intelligence (proposes the Turing Test as a measure of machine intelligence)
1951–1956: Dartmouth Conference marks birth of AI as a field; early milestones in AI
1957: Rosenblatt’s Perceptron—the first artificial neural network capable of learning
1965: Weizenbaum develops ELIZA, an early natural language processing program that simulates conversation
1967: Newell & Simon develop the General Problem Solver (GPS)
1974: The first AI winter begins due to unmet expectations and limited progress
1980: Expert systems gain popularity for financial forecasting and medical diagnoses
1986: Learning representations by back-propagating errors enables training of deeper networks (Hinton, Rumelhart, Williams)
1997: IBM Deep Blue defeats Garry Kasparov in chess
1999–2010s: NLP and DL advances accelerate; more milestones listed below
2011: IBM’s Watson defeats Jeopardy! champions
2012: DeepMind’s work gains prominence in deep learning applications
2014: DeepFace (Facebook) achieves near-human facial recognition accuracy
2015: AlphaGo (DeepMind) defeats top Go player Lee Sedol
2017: AlphaZero defeats best chess and shogi engines
2020: OpenAI GPT-3 marks a major breakthrough in NLP
2021: DeepMind’s AlphaFold2 solves protein folding problem
2022: Controversies around LLMs (e.g., LaMDA) and its perceived sentience
2023: Legal actions against AI-art tools over copyright concerns (Stability AI, DeviantArt, Midjourney) for remixing artworks
Additional notes include DeepMind’s acquisition by Google (~2014) for ~$500M, AlphaZero’s successive victories, etc.
The timeline highlights ethical, legal, and societal implications associated with AI and ML (e.g., copyright, accountability, transparency)
Significance: This timeline shows how ML/AI evolved from theoretical concepts to practical, high-impact technologies while highlighting ongoing ethical and societal debates.
The Human-Reasoning Bridge: Theory vs Practice
The course emphasizes walking the tight rope between theory and practice
The idea is to understand theory deeply while gaining hands-on experience with real data
The Coding Pyramid (Revisited)
From scratch to open source to no-code tools; progressively build intuition and capability
What is Machine Learning? (Core Idea)
Given some clues (features), guess the answer (prediction)
Features are the inputs that describe the data
The learning process optimizes how to map features to outputs
The prediction is denoted by ŷ (hat y)
Fundamental question: how to choose a function (model) that generalizes well to unseen data
What is the Machine Learning Problem? (Formal View)
Given clues/features, find their importance or weights that best predict the answer
In practice: learn weights W such that the model output matches the true labels
Intuition: adjust weights to improve predictions on training data and generalize to new data
The 20 Questions Analogy
ML can be thought of as iteratively asking questions to home in on the right answer by adjusting internal parameters (weights) based on feedback (loss)
Components of a Machine Learning System
Features
Label (Output / Target)
Loss (Cost) Function
Optimizer (Algorithm to adjust weights)
Model (The function that maps features to predictions)
Features, Labels, and the Learning Objective
Features: x (input vector)
Label: y (true output)
Prediction: ŷ (predicted output)
Loss measures how far ŷ is from y
Objective: learn weights W (and bias b) to minimize the loss
Features: A Simple Intuition with Numbers
What number am I thinking? example using digits 0–9 to illustrate features and classification
Visualization exercises use numbers with specific properties (start-end, curved edges, self-intersections) to motivate feature selection and decision boundaries
This kind of exercise motivates the idea that features describe data points and that a learning system will try to separate or predict based on those features
The Linear Model: Form and Intuition
Model form (regression):
Pointwise form: yi = W^T xi + b
Matrix form: ŷ = X W + b
Notation:
n: total samples
d: number of features (dimensions)
X ∈ R^{n×d} (rows are samples, columns are features)
W ∈ R^{d×1}
b ∈ R (bias term)
Alternative perspective: sometimes written as y = XW + b with y ∈ R^{n×1}
Geometric interpretation: W contains discriminative power for each feature; larger magnitude means the feature has more influence on the prediction
Two Common Linear Forms (Equivalent Views)
View 1 (sample-wise): yi = W^T xi + b
View 2 (dataset-wide): ŷ = X W + b
The goal: find W and b that minimize the chosen loss on the training data
The Learning Question: How Do You Learn W?
You define a loss function that measures prediction error
You then optimize W (and b) to minimize the loss
The idea is to adjust W in the direction that reduces the loss (gradient-based optimization in many cases)
In class, a simple illustrative objective is shown as finding the slope W that minimizes the Mean Squared Error (MSE)
Loss (Cost) Function: Why It Matters
Loss measures the discrepancy between true labels y and predictions ŷ
Common choice for regression: Mean Squared Error (MSE)
Formula:
J(W) = rac{1}{n}
\sum{i=1}^{n} (\hat{y}i - y_i)^2Aim: find W (and b) that minimize J(W)
Conceptual Visual: Gradient Descent (Intuition)
Loss landscape: as you adjust W, the loss value changes
The optimizer moves in the direction of decreasing loss (toward a minimum)
Special case shown: the line slope W that minimizes J, yielding perfect or near-perfect predictions on training data
A Simple Linear Regression Example (Key Takeaways)
Model: ŷ = W x + b
If W = 1 and b = 0, then ŷ = x (identity mapping)
If W ≠ 1 or b ≠ 0, predictions scale/shift relative to x
The optimal W* and b* minimize J(W, b)
Empirical demonstration: a table shows different W values with corresponding loss, highlighting that W* produces the smallest loss
Notation for the learned model: y = W^T x + b
Regression vs Classification: Conceptual Difference (Brief Mention)
Regression: predict a continuous value (e.g., height, weight, flight duration)
Classification: predict a discrete label (e.g., male/female; digit类别)
The same linear model concepts apply with different loss functions (e.g., MSE for regression, cross-entropy for classification)
The Standard Data Pipeline (High-Level)
Practical sequence:
Data collection
Data labeling
Data preparation
Model selection
Model training
Model evaluation
Model deployment
Reference slide emphasizes a standard, end-to-end workflow
Practical Demo and Assignments (What’s Coming)
Python Demo and Assignment 1 components:
conda setup
Jupyter Lab
Python basics
Submission guide
Assignment 1 details:
Due 09/16/2025 at 5:00 PM
Tests basic Python skills
Loading multiple files
Basic data operations using pandas
Plotting using Matplotlib
Data source: 20 DL160 flights (MSP-AMS)
Data fields: Timestamp, Position, Altitude, Speed, Direction
Deliverables: plot and generate insights
If you struggle, review Python resources and consider taking the class next semester
Reference Plots for Flight Data (Q5 & Q6 Examples)
Q5 Reference Plot: Flight Duration (HH:MM) vs. Departure Date (2023-10-)
Example labels shown: Flight Duration (HH:MM) and Departure Date
Q6 Reference Plot: Time (HH:MM) breakdown by phase
Phases: On-ground before takeoff, In-air time, On-ground after landing
These plots illustrate typical ML/data visualization tasks (time-series, phase breakdowns, distributions)
Formulating a Machine Learning Problem (Detailed View)
Input values/features: 𝑥
Description of the number 8 as an example feature description
Label/Output: 𝑦
Objective: Learn weights W such that 𝑥 W = 𝑦
For 1D inputs, this reduces to a simple linear relation: y = W x + b
Linear Model Details: Notation and Dimensions
Model: y = W x + b
General form for multiple samples:
y ∈ R^{n×1} (n samples)
X ∈ R^{n×d} (features)
W ∈ R^{d×1}
b ∈ R (bias, broadcast to all samples)
Transposed form (for other conventions):
y^T = W^T X^T + b^T
The essence: labels are predicted via a weighted combination of features plus a bias term
Label, Feature, Loss, Optimizer, and Model (Summarized)
Label: y (true value)
Features: x (input vector)
Loss: measures prediction error between y and ŷ
Optimizer: algorithm that updates weights to reduce loss
Model: the functional form that maps features to predictions
A Simple 1-D Example: Feature-Based Classification/Regression Intuition
Features can be scalars or vectors; weights combine them into a prediction
The loss function guides how to adjust the weights to better fit the data
Height/Weight Distribution by Sex (Example Visualization)
A sample scatter-like concept: Height (in) vs Weight (lbs), separated by Sex (Male, Female)
Demonstrates how linear relationships can differ across groups and how features might separate classes or predict a continuous target differently by group
Transpose and Shape Notes (Coding vs Theory)
You can express the same linear model in multiple equivalent ways depending on the matrix orientation
Two common forms shown:
ŷ = X W + b (n×d times d×1 = n×1)
yi = W^T xi + b (individual sample form)
Dimensional consistency: n = number of samples; d = number of features
Practical Visualization and Interpretation (Homework of the Day)
The course uses simple numeric demonstrations (e.g., digits 0–9 with features such as start/end, curves, intersections) to illustrate feature extraction and decision boundaries
These exercises emphasize understanding what features convey about the target and how a model uses them
The Data Pipeline in Practice: Summary
Data collection → Data labeling → Data preparation → Model selection → Model training → Model evaluation → Deployment
Each step is essential for building reliable ML systems
Assignments and Resources Summary
Assignment 1:
Python skills test; file loading; pandas operations; plotting with Matplotlib
Data source: 20 DL160 flights (MSP-AMS) with fields: Timestamp, Position, Altitude, Speed, Direction
Emphasis on practicing data handling and visualization to extract insights
Resources: Python tutorials and class tutorials; if needed, consider taking the course next semester
Final Quick Reference: Core Equations and Concepts
Linear model (regression):
Pointwise form: yi = W^T xi + b
Dataset form: ŷ = X W + b, with X ∈ R^{n×d}, W ∈ R^{d×1}, b ∈ R
Loss (Mean Squared Error):
J(W) = \frac{1}{n} \sum{i=1}^{n} (\hat{y}i - y_i)^2
Objective: find W and b that minimize J(W) (and generalize to unseen data)
Conceptual interpretation: Weights capture the discriminative power of features
Data pipeline components: Features, Label, Loss Function, Optimizer, Model
Model understanding: Your brain builds a model; in ML, we learn a model by optimization
Ethical/Practical implications highlighted in course timeline: copyright concerns in AI-generated art, governance of AI systems, and accountability for outputs
Quick Index of Key Terms from Slides
Artificial Intelligence (AI)
Machine Learning (ML)
Deep Learning (DL)
Perceptron, ELIZA, GPS, Deep Blue, Watson, DeepFace, AlphaGo, AlphaZero, GPT-3, AlphaFold2
Turing Test, Dartmouth Conference, ELIZA, Backpropagation, Neural Networks
Loss Function, MSE, Optimizer, Model, Features, Label
Data Pipeline, Convolution, Go/Chess/Go-like strategic ML milestones, Ethical implications (copyright lawsuits)