CN

SEIS 763 Machine Learning - Vocabulary Flashcards (Fall 2025)

Course Logistics and Orientation

  • Course: SEIS 763 Machine Learning (Fall 2025)

  • Welcome message: Lecture will be recorded

  • Canvas link: https://stthomas.instructure.com/courses/81396

  • Start Here: Syllabus, Modules, Welcome content

Course Goals

  • Build foundational Machine Learning concepts and algorithms

  • Understand basic Math principles that guide Machine Learning

  • Use off-the-shelf Python ML Libraries

  • Learn standard Training Pipelines

    • Data preparation

    • Model selection

    • Model training and evaluation

Course Boundaries (What this course won’t do)

  • Not a Python tutorial

  • Not intended to make you an ML expert (yet)

  • Not focused on advanced concepts (deep learning, reinforcement learning, computer vision, natural language processing)

  • Not about model deployment

Prerequisites

  • SEIS 631: Data Preparation and Analysis

  • SEIS 632: Data Analytics and Visualization (may be taken concurrently)

  • If you’re not comfortable with Python, go over tutorials

Start Here / Welcome Content (Overview)

  • Machine Learning builds computational systems that learn from and adapt to data

  • ML is essential in IT today and underpins applications across engineering, medicine, finance, and commerce

  • Course covers widely used supervised and unsupervised ML algorithms in technical depth

  • Emphasis on theoretical underpinnings and hands-on implementation

  • Students will learn to evaluate effectiveness and avoid common pitfalls when applying ML

Topics and Practice: The Coding Pyramid

  • The Coding Pyramid concept:

    • No Code

    • Open Source

    • From Scratch

  • Indicates progression from high-level tools to deep, hands-on implementation

AI, ML, and Deep Learning: Quick Clarifications

  • Artificial Intelligence (AI): broad field of making machines exhibit intelligent behavior

  • Machine Learning (ML): subset of AI focusing on learning from data to make predictions

  • Deep Learning (DL): subset of ML using deep neural networks

  • The slide emphasizes this hierarchy and practical distinctions

A Brief History of Artificial Intelligence (Timeline Highlights)

  • 1943: The Turing Test concept introduced via natural language interaction framework

  • 1950: Turing’s Computing Machinery and Intelligence (proposes the Turing Test as a measure of machine intelligence)

  • 1951–1956: Dartmouth Conference marks birth of AI as a field; early milestones in AI

  • 1957: Rosenblatt’s Perceptron—the first artificial neural network capable of learning

  • 1965: Weizenbaum develops ELIZA, an early natural language processing program that simulates conversation

  • 1967: Newell & Simon develop the General Problem Solver (GPS)

  • 1974: The first AI winter begins due to unmet expectations and limited progress

  • 1980: Expert systems gain popularity for financial forecasting and medical diagnoses

  • 1986: Learning representations by back-propagating errors enables training of deeper networks (Hinton, Rumelhart, Williams)

  • 1997: IBM Deep Blue defeats Garry Kasparov in chess

  • 1999–2010s: NLP and DL advances accelerate; more milestones listed below

  • 2011: IBM’s Watson defeats Jeopardy! champions

  • 2012: DeepMind’s work gains prominence in deep learning applications

  • 2014: DeepFace (Facebook) achieves near-human facial recognition accuracy

  • 2015: AlphaGo (DeepMind) defeats top Go player Lee Sedol

  • 2017: AlphaZero defeats best chess and shogi engines

  • 2020: OpenAI GPT-3 marks a major breakthrough in NLP

  • 2021: DeepMind’s AlphaFold2 solves protein folding problem

  • 2022: Controversies around LLMs (e.g., LaMDA) and its perceived sentience

  • 2023: Legal actions against AI-art tools over copyright concerns (Stability AI, DeviantArt, Midjourney) for remixing artworks

  • Additional notes include DeepMind’s acquisition by Google (~2014) for ~$500M, AlphaZero’s successive victories, etc.

  • The timeline highlights ethical, legal, and societal implications associated with AI and ML (e.g., copyright, accountability, transparency)

Significance: This timeline shows how ML/AI evolved from theoretical concepts to practical, high-impact technologies while highlighting ongoing ethical and societal debates.

The Human-Reasoning Bridge: Theory vs Practice

  • The course emphasizes walking the tight rope between theory and practice

  • The idea is to understand theory deeply while gaining hands-on experience with real data

The Coding Pyramid (Revisited)

  • From scratch to open source to no-code tools; progressively build intuition and capability

What is Machine Learning? (Core Idea)

  • Given some clues (features), guess the answer (prediction)

  • Features are the inputs that describe the data

  • The learning process optimizes how to map features to outputs

  • The prediction is denoted by ŷ (hat y)

  • Fundamental question: how to choose a function (model) that generalizes well to unseen data

What is the Machine Learning Problem? (Formal View)

  • Given clues/features, find their importance or weights that best predict the answer

  • In practice: learn weights W such that the model output matches the true labels

  • Intuition: adjust weights to improve predictions on training data and generalize to new data

The 20 Questions Analogy

  • ML can be thought of as iteratively asking questions to home in on the right answer by adjusting internal parameters (weights) based on feedback (loss)

Components of a Machine Learning System

  • Features

  • Label (Output / Target)

  • Loss (Cost) Function

  • Optimizer (Algorithm to adjust weights)

  • Model (The function that maps features to predictions)

Features, Labels, and the Learning Objective

  • Features: x (input vector)

  • Label: y (true output)

  • Prediction: ŷ (predicted output)

  • Loss measures how far ŷ is from y

  • Objective: learn weights W (and bias b) to minimize the loss

Features: A Simple Intuition with Numbers

  • What number am I thinking? example using digits 0–9 to illustrate features and classification

  • Visualization exercises use numbers with specific properties (start-end, curved edges, self-intersections) to motivate feature selection and decision boundaries

  • This kind of exercise motivates the idea that features describe data points and that a learning system will try to separate or predict based on those features

The Linear Model: Form and Intuition

  • Model form (regression):

    • Pointwise form: yi = W^T xi + b

    • Matrix form: ŷ = X W + b

  • Notation:

    • n: total samples

    • d: number of features (dimensions)

    • X ∈ R^{n×d} (rows are samples, columns are features)

    • W ∈ R^{d×1}

    • b ∈ R (bias term)

  • Alternative perspective: sometimes written as y = XW + b with y ∈ R^{n×1}

  • Geometric interpretation: W contains discriminative power for each feature; larger magnitude means the feature has more influence on the prediction

Two Common Linear Forms (Equivalent Views)

  • View 1 (sample-wise): yi = W^T xi + b

  • View 2 (dataset-wide): ŷ = X W + b

  • The goal: find W and b that minimize the chosen loss on the training data

The Learning Question: How Do You Learn W?

  • You define a loss function that measures prediction error

  • You then optimize W (and b) to minimize the loss

  • The idea is to adjust W in the direction that reduces the loss (gradient-based optimization in many cases)

  • In class, a simple illustrative objective is shown as finding the slope W that minimizes the Mean Squared Error (MSE)

Loss (Cost) Function: Why It Matters

  • Loss measures the discrepancy between true labels y and predictions ŷ

  • Common choice for regression: Mean Squared Error (MSE)

  • Formula:
    J(W) = rac{1}{n}
    \sum{i=1}^{n} (\hat{y}i - y_i)^2

  • Aim: find W (and b) that minimize J(W)

Conceptual Visual: Gradient Descent (Intuition)

  • Loss landscape: as you adjust W, the loss value changes

  • The optimizer moves in the direction of decreasing loss (toward a minimum)

  • Special case shown: the line slope W that minimizes J, yielding perfect or near-perfect predictions on training data

A Simple Linear Regression Example (Key Takeaways)

  • Model: ŷ = W x + b

  • If W = 1 and b = 0, then ŷ = x (identity mapping)

  • If W ≠ 1 or b ≠ 0, predictions scale/shift relative to x

  • The optimal W* and b* minimize J(W, b)

  • Empirical demonstration: a table shows different W values with corresponding loss, highlighting that W* produces the smallest loss

  • Notation for the learned model: y = W^T x + b

Regression vs Classification: Conceptual Difference (Brief Mention)

  • Regression: predict a continuous value (e.g., height, weight, flight duration)

  • Classification: predict a discrete label (e.g., male/female; digit类别)

  • The same linear model concepts apply with different loss functions (e.g., MSE for regression, cross-entropy for classification)

The Standard Data Pipeline (High-Level)

  • Practical sequence:

    • Data collection

    • Data labeling

    • Data preparation

    • Model selection

    • Model training

    • Model evaluation

    • Model deployment

  • Reference slide emphasizes a standard, end-to-end workflow

Practical Demo and Assignments (What’s Coming)

  • Python Demo and Assignment 1 components:

    • conda setup

    • Jupyter Lab

    • Python basics

    • Submission guide

  • Assignment 1 details:

    • Due 09/16/2025 at 5:00 PM

    • Tests basic Python skills

    • Loading multiple files

    • Basic data operations using pandas

    • Plotting using Matplotlib

    • Data source: 20 DL160 flights (MSP-AMS)

    • Data fields: Timestamp, Position, Altitude, Speed, Direction

    • Deliverables: plot and generate insights

  • If you struggle, review Python resources and consider taking the class next semester

Reference Plots for Flight Data (Q5 & Q6 Examples)

  • Q5 Reference Plot: Flight Duration (HH:MM) vs. Departure Date (2023-10-)

    • Example labels shown: Flight Duration (HH:MM) and Departure Date

  • Q6 Reference Plot: Time (HH:MM) breakdown by phase

    • Phases: On-ground before takeoff, In-air time, On-ground after landing

  • These plots illustrate typical ML/data visualization tasks (time-series, phase breakdowns, distributions)

Formulating a Machine Learning Problem (Detailed View)

  • Input values/features: 𝑥

  • Description of the number 8 as an example feature description

  • Label/Output: 𝑦

  • Objective: Learn weights W such that 𝑥 W = 𝑦

  • For 1D inputs, this reduces to a simple linear relation: y = W x + b

Linear Model Details: Notation and Dimensions

  • Model: y = W x + b

  • General form for multiple samples:

    • y ∈ R^{n×1} (n samples)

    • X ∈ R^{n×d} (features)

    • W ∈ R^{d×1}

    • b ∈ R (bias, broadcast to all samples)

  • Transposed form (for other conventions):

    • y^T = W^T X^T + b^T

  • The essence: labels are predicted via a weighted combination of features plus a bias term

Label, Feature, Loss, Optimizer, and Model (Summarized)

  • Label: y (true value)

  • Features: x (input vector)

  • Loss: measures prediction error between y and ŷ

  • Optimizer: algorithm that updates weights to reduce loss

  • Model: the functional form that maps features to predictions

A Simple 1-D Example: Feature-Based Classification/Regression Intuition

  • Features can be scalars or vectors; weights combine them into a prediction

  • The loss function guides how to adjust the weights to better fit the data

Height/Weight Distribution by Sex (Example Visualization)

  • A sample scatter-like concept: Height (in) vs Weight (lbs), separated by Sex (Male, Female)

  • Demonstrates how linear relationships can differ across groups and how features might separate classes or predict a continuous target differently by group

Transpose and Shape Notes (Coding vs Theory)

  • You can express the same linear model in multiple equivalent ways depending on the matrix orientation

  • Two common forms shown:

    • ŷ = X W + b (n×d times d×1 = n×1)

    • yi = W^T xi + b (individual sample form)

  • Dimensional consistency: n = number of samples; d = number of features

Practical Visualization and Interpretation (Homework of the Day)

  • The course uses simple numeric demonstrations (e.g., digits 0–9 with features such as start/end, curves, intersections) to illustrate feature extraction and decision boundaries

  • These exercises emphasize understanding what features convey about the target and how a model uses them

The Data Pipeline in Practice: Summary

  • Data collection → Data labeling → Data preparation → Model selection → Model training → Model evaluation → Deployment

  • Each step is essential for building reliable ML systems

Assignments and Resources Summary

  • Assignment 1:

    • Python skills test; file loading; pandas operations; plotting with Matplotlib

    • Data source: 20 DL160 flights (MSP-AMS) with fields: Timestamp, Position, Altitude, Speed, Direction

    • Emphasis on practicing data handling and visualization to extract insights

  • Resources: Python tutorials and class tutorials; if needed, consider taking the course next semester

Final Quick Reference: Core Equations and Concepts

  • Linear model (regression):

    • Pointwise form: yi = W^T xi + b

    • Dataset form: ŷ = X W + b, with X ∈ R^{n×d}, W ∈ R^{d×1}, b ∈ R

  • Loss (Mean Squared Error):

    • J(W) = \frac{1}{n} \sum{i=1}^{n} (\hat{y}i - y_i)^2

  • Objective: find W and b that minimize J(W) (and generalize to unseen data)

  • Conceptual interpretation: Weights capture the discriminative power of features

  • Data pipeline components: Features, Label, Loss Function, Optimizer, Model

  • Model understanding: Your brain builds a model; in ML, we learn a model by optimization

  • Ethical/Practical implications highlighted in course timeline: copyright concerns in AI-generated art, governance of AI systems, and accountability for outputs

Quick Index of Key Terms from Slides

  • Artificial Intelligence (AI)

  • Machine Learning (ML)

  • Deep Learning (DL)

  • Perceptron, ELIZA, GPS, Deep Blue, Watson, DeepFace, AlphaGo, AlphaZero, GPT-3, AlphaFold2

  • Turing Test, Dartmouth Conference, ELIZA, Backpropagation, Neural Networks

  • Loss Function, MSE, Optimizer, Model, Features, Label

  • Data Pipeline, Convolution, Go/Chess/Go-like strategic ML milestones, Ethical implications (copyright lawsuits)