CN

SEIS 763 Machine Learning - Vocabulary Flashcards (Fall 2025)

Course Logistics and Orientation

  • Course: SEIS 763 Machine Learning (Fall 2025)
  • Welcome message: Lecture will be recorded
  • Canvas link: https://stthomas.instructure.com/courses/81396
  • Start Here: Syllabus, Modules, Welcome content

Course Goals

  • Build foundational Machine Learning concepts and algorithms
  • Understand basic Math principles that guide Machine Learning
  • Use off-the-shelf Python ML Libraries
  • Learn standard Training Pipelines
    • Data preparation
    • Model selection
    • Model training and evaluation

Course Boundaries (What this course won’t do)

  • Not a Python tutorial
  • Not intended to make you an ML expert (yet)
  • Not focused on advanced concepts (deep learning, reinforcement learning, computer vision, natural language processing)
  • Not about model deployment

Prerequisites

  • SEIS 631: Data Preparation and Analysis
  • SEIS 632: Data Analytics and Visualization (may be taken concurrently)
  • If you’re not comfortable with Python, go over tutorials

Start Here / Welcome Content (Overview)

  • Machine Learning builds computational systems that learn from and adapt to data
  • ML is essential in IT today and underpins applications across engineering, medicine, finance, and commerce
  • Course covers widely used supervised and unsupervised ML algorithms in technical depth
  • Emphasis on theoretical underpinnings and hands-on implementation
  • Students will learn to evaluate effectiveness and avoid common pitfalls when applying ML

Topics and Practice: The Coding Pyramid

  • The Coding Pyramid concept:
    • No Code
    • Open Source
    • From Scratch
  • Indicates progression from high-level tools to deep, hands-on implementation

AI, ML, and Deep Learning: Quick Clarifications

  • Artificial Intelligence (AI): broad field of making machines exhibit intelligent behavior
  • Machine Learning (ML): subset of AI focusing on learning from data to make predictions
  • Deep Learning (DL): subset of ML using deep neural networks
  • The slide emphasizes this hierarchy and practical distinctions

A Brief History of Artificial Intelligence (Timeline Highlights)

  • 1943: The Turing Test concept introduced via natural language interaction framework
  • 1950: Turing’s Computing Machinery and Intelligence (proposes the Turing Test as a measure of machine intelligence)
  • 1951–1956: Dartmouth Conference marks birth of AI as a field; early milestones in AI
  • 1957: Rosenblatt’s Perceptron—the first artificial neural network capable of learning
  • 1965: Weizenbaum develops ELIZA, an early natural language processing program that simulates conversation
  • 1967: Newell & Simon develop the General Problem Solver (GPS)
  • 1974: The first AI winter begins due to unmet expectations and limited progress
  • 1980: Expert systems gain popularity for financial forecasting and medical diagnoses
  • 1986: Learning representations by back-propagating errors enables training of deeper networks (Hinton, Rumelhart, Williams)
  • 1997: IBM Deep Blue defeats Garry Kasparov in chess
  • 1999–2010s: NLP and DL advances accelerate; more milestones listed below
  • 2011: IBM’s Watson defeats Jeopardy! champions
  • 2012: DeepMind’s work gains prominence in deep learning applications
  • 2014: DeepFace (Facebook) achieves near-human facial recognition accuracy
  • 2015: AlphaGo (DeepMind) defeats top Go player Lee Sedol
  • 2017: AlphaZero defeats best chess and shogi engines
  • 2020: OpenAI GPT-3 marks a major breakthrough in NLP
  • 2021: DeepMind’s AlphaFold2 solves protein folding problem
  • 2022: Controversies around LLMs (e.g., LaMDA) and its perceived sentience
  • 2023: Legal actions against AI-art tools over copyright concerns (Stability AI, DeviantArt, Midjourney) for remixing artworks
  • Additional notes include DeepMind’s acquisition by Google (~2014) for ~$500M, AlphaZero’s successive victories, etc.
  • The timeline highlights ethical, legal, and societal implications associated with AI and ML (e.g., copyright, accountability, transparency)

Significance: This timeline shows how ML/AI evolved from theoretical concepts to practical, high-impact technologies while highlighting ongoing ethical and societal debates.

The Human-Reasoning Bridge: Theory vs Practice

  • The course emphasizes walking the tight rope between theory and practice
  • The idea is to understand theory deeply while gaining hands-on experience with real data

The Coding Pyramid (Revisited)

  • From scratch to open source to no-code tools; progressively build intuition and capability

What is Machine Learning? (Core Idea)

  • Given some clues (features), guess the answer (prediction)
  • Features are the inputs that describe the data
  • The learning process optimizes how to map features to outputs
  • The prediction is denoted by ŷ (hat y)
  • Fundamental question: how to choose a function (model) that generalizes well to unseen data

What is the Machine Learning Problem? (Formal View)

  • Given clues/features, find their importance or weights that best predict the answer
  • In practice: learn weights W such that the model output matches the true labels
  • Intuition: adjust weights to improve predictions on training data and generalize to new data

The 20 Questions Analogy

  • ML can be thought of as iteratively asking questions to home in on the right answer by adjusting internal parameters (weights) based on feedback (loss)

Components of a Machine Learning System

  • Features
  • Label (Output / Target)
  • Loss (Cost) Function
  • Optimizer (Algorithm to adjust weights)
  • Model (The function that maps features to predictions)

Features, Labels, and the Learning Objective

  • Features: x (input vector)
  • Label: y (true output)
  • Prediction: ŷ (predicted output)
  • Loss measures how far ŷ is from y
  • Objective: learn weights W (and bias b) to minimize the loss

Features: A Simple Intuition with Numbers

  • What number am I thinking? example using digits 0–9 to illustrate features and classification
  • Visualization exercises use numbers with specific properties (start-end, curved edges, self-intersections) to motivate feature selection and decision boundaries
  • This kind of exercise motivates the idea that features describe data points and that a learning system will try to separate or predict based on those features

The Linear Model: Form and Intuition

  • Model form (regression):
    • Pointwise form: yi = W^T xi + b
    • Matrix form: ŷ = X W + b
  • Notation:
    • n: total samples
    • d: number of features (dimensions)
    • X ∈ R^{n×d} (rows are samples, columns are features)
    • W ∈ R^{d×1}
    • b ∈ R (bias term)
  • Alternative perspective: sometimes written as y = XW + b with y ∈ R^{n×1}
  • Geometric interpretation: W contains discriminative power for each feature; larger magnitude means the feature has more influence on the prediction

Two Common Linear Forms (Equivalent Views)

  • View 1 (sample-wise): yi = W^T xi + b
  • View 2 (dataset-wide): ŷ = X W + b
  • The goal: find W and b that minimize the chosen loss on the training data

The Learning Question: How Do You Learn W?

  • You define a loss function that measures prediction error
  • You then optimize W (and b) to minimize the loss
  • The idea is to adjust W in the direction that reduces the loss (gradient-based optimization in many cases)
  • In class, a simple illustrative objective is shown as finding the slope W that minimizes the Mean Squared Error (MSE)

Loss (Cost) Function: Why It Matters

  • Loss measures the discrepancy between true labels y and predictions ŷ
  • Common choice for regression: Mean Squared Error (MSE)
  • Formula:
    J(W) = rac{1}{n}
    \sum{i=1}^{n} (\hat{y}i - y_i)^2
  • Aim: find W (and b) that minimize J(W)

Conceptual Visual: Gradient Descent (Intuition)

  • Loss landscape: as you adjust W, the loss value changes
  • The optimizer moves in the direction of decreasing loss (toward a minimum)
  • Special case shown: the line slope W that minimizes J, yielding perfect or near-perfect predictions on training data

A Simple Linear Regression Example (Key Takeaways)

  • Model: ŷ = W x + b
  • If W = 1 and b = 0, then ŷ = x (identity mapping)
  • If W ≠ 1 or b ≠ 0, predictions scale/shift relative to x
  • The optimal W* and b* minimize J(W, b)
  • Empirical demonstration: a table shows different W values with corresponding loss, highlighting that W* produces the smallest loss
  • Notation for the learned model: y = W^T x + b

Regression vs Classification: Conceptual Difference (Brief Mention)

  • Regression: predict a continuous value (e.g., height, weight, flight duration)
  • Classification: predict a discrete label (e.g., male/female; digit类别)
  • The same linear model concepts apply with different loss functions (e.g., MSE for regression, cross-entropy for classification)

The Standard Data Pipeline (High-Level)

  • Practical sequence:
    • Data collection
    • Data labeling
    • Data preparation
    • Model selection
    • Model training
    • Model evaluation
    • Model deployment
  • Reference slide emphasizes a standard, end-to-end workflow

Practical Demo and Assignments (What’s Coming)

  • Python Demo and Assignment 1 components:
    • conda setup
    • Jupyter Lab
    • Python basics
    • Submission guide
  • Assignment 1 details:
    • Due 09/16/2025 at 5:00 PM
    • Tests basic Python skills
    • Loading multiple files
    • Basic data operations using pandas
    • Plotting using Matplotlib
    • Data source: 20 DL160 flights (MSP-AMS)
    • Data fields: Timestamp, Position, Altitude, Speed, Direction
    • Deliverables: plot and generate insights
  • If you struggle, review Python resources and consider taking the class next semester

Reference Plots for Flight Data (Q5 & Q6 Examples)

  • Q5 Reference Plot: Flight Duration (HH:MM) vs. Departure Date (2023-10-)
    • Example labels shown: Flight Duration (HH:MM) and Departure Date
  • Q6 Reference Plot: Time (HH:MM) breakdown by phase
    • Phases: On-ground before takeoff, In-air time, On-ground after landing
  • These plots illustrate typical ML/data visualization tasks (time-series, phase breakdowns, distributions)

Formulating a Machine Learning Problem (Detailed View)

  • Input values/features: 𝑥
  • Description of the number 8 as an example feature description
  • Label/Output: 𝑦
  • Objective: Learn weights W such that 𝑥 W = 𝑦
  • For 1D inputs, this reduces to a simple linear relation: y = W x + b

Linear Model Details: Notation and Dimensions

  • Model: y = W x + b
  • General form for multiple samples:
    • y ∈ R^{n×1} (n samples)
    • X ∈ R^{n×d} (features)
    • W ∈ R^{d×1}
    • b ∈ R (bias, broadcast to all samples)
  • Transposed form (for other conventions):
    • y^T = W^T X^T + b^T
  • The essence: labels are predicted via a weighted combination of features plus a bias term

Label, Feature, Loss, Optimizer, and Model (Summarized)

  • Label: y (true value)
  • Features: x (input vector)
  • Loss: measures prediction error between y and ŷ
  • Optimizer: algorithm that updates weights to reduce loss
  • Model: the functional form that maps features to predictions

A Simple 1-D Example: Feature-Based Classification/Regression Intuition

  • Features can be scalars or vectors; weights combine them into a prediction
  • The loss function guides how to adjust the weights to better fit the data

Height/Weight Distribution by Sex (Example Visualization)

  • A sample scatter-like concept: Height (in) vs Weight (lbs), separated by Sex (Male, Female)
  • Demonstrates how linear relationships can differ across groups and how features might separate classes or predict a continuous target differently by group

Transpose and Shape Notes (Coding vs Theory)

  • You can express the same linear model in multiple equivalent ways depending on the matrix orientation
  • Two common forms shown:
    • ŷ = X W + b (n×d times d×1 = n×1)
    • yi = W^T xi + b (individual sample form)
  • Dimensional consistency: n = number of samples; d = number of features

Practical Visualization and Interpretation (Homework of the Day)

  • The course uses simple numeric demonstrations (e.g., digits 0–9 with features such as start/end, curves, intersections) to illustrate feature extraction and decision boundaries
  • These exercises emphasize understanding what features convey about the target and how a model uses them

The Data Pipeline in Practice: Summary

  • Data collection → Data labeling → Data preparation → Model selection → Model training → Model evaluation → Deployment
  • Each step is essential for building reliable ML systems

Assignments and Resources Summary

  • Assignment 1:
    • Python skills test; file loading; pandas operations; plotting with Matplotlib
    • Data source: 20 DL160 flights (MSP-AMS) with fields: Timestamp, Position, Altitude, Speed, Direction
    • Emphasis on practicing data handling and visualization to extract insights
  • Resources: Python tutorials and class tutorials; if needed, consider taking the course next semester

Final Quick Reference: Core Equations and Concepts

  • Linear model (regression):
    • Pointwise form: yi = W^T xi + b
    • Dataset form: ŷ = X W + b, with X ∈ R^{n×d}, W ∈ R^{d×1}, b ∈ R
  • Loss (Mean Squared Error):
    • J(W) = \frac{1}{n} \sum{i=1}^{n} (\hat{y}i - y_i)^2
  • Objective: find W and b that minimize J(W) (and generalize to unseen data)
  • Conceptual interpretation: Weights capture the discriminative power of features
  • Data pipeline components: Features, Label, Loss Function, Optimizer, Model
  • Model understanding: Your brain builds a model; in ML, we learn a model by optimization
  • Ethical/Practical implications highlighted in course timeline: copyright concerns in AI-generated art, governance of AI systems, and accountability for outputs

Quick Index of Key Terms from Slides

  • Artificial Intelligence (AI)
  • Machine Learning (ML)
  • Deep Learning (DL)
  • Perceptron, ELIZA, GPS, Deep Blue, Watson, DeepFace, AlphaGo, AlphaZero, GPT-3, AlphaFold2
  • Turing Test, Dartmouth Conference, ELIZA, Backpropagation, Neural Networks
  • Loss Function, MSE, Optimizer, Model, Features, Label
  • Data Pipeline, Convolution, Go/Chess/Go-like strategic ML milestones, Ethical implications (copyright lawsuits)