Vector Spaces & Matrices – Lecture Notes
House-Keeping, Technical Glitches & Session Logistics
The class opened with several minutes of logistical coordination (host rights, attendee list not visible, participants joining late, screen-sharing permissions, etc.). Although not conceptually important, the episode reminds you that online sessions need a stable host, confirmed audio, and screen-sharing before instruction can actually begin. After a brief wait for late arrivals the instructor started the technical material.
Review of Previous Lecture – Evolution of Number Systems
The instructor briefly recalled the previous class:
Numbers evolved from individual real numbers to ordered pairs (complex numbers), then to ordered n-tuples, which in turn led to the abstract notion of vector spaces.
Aggregating reals further yielded matrices, higher-order tensors, and ultimately data structures (images, video, etc.).
Real-World Motivation: How Data Are Packed
Scalars – single numbers such as temperature or weight.
3-element tuples – e.g.- a date (day, month, year),
spatial coordinates (x,y,z).
1-D temporally indexed sequences – audio waveforms (sample series).
2-D tables / spreadsheets – addressed by row and column indices.
3-D tensors – grayscale images seen as matrices; colour images as RGB stacks (x,y,c).
4-D tensors – video as x,y,c,t.
The lecture’s first goal is to formalise such multi-indexed objects with vector-space language so that ML algorithms can measure distances, separability, etc.
Why Vector Spaces Matter in ML
Using the “cat vs. dog” image example, the instructor motivates:
A feature-extraction transformation maps raw images to a feature space where each image is a feature vector.
A good transformation clusters dog vectors together, cat vectors together, and leaves a large distance between the two clusters.
Therefore we must (i) design feature extractors, (ii) choose meaningful notions of distance, and (iii) exploit vector-space structure to improve separability via further linear or non-linear transforms.
Vector Space – Formal Definition & Axioms
Take a set V equipped with
Addition +: V x V -> V (closure)
Scalar multiplication .: R x V -> V
plus a distinguished null vector 0.
Properties required:
Commutativity x + y = y + x
Associativity (x + y) + z = x + (y + z)
Additive identity x + 0 = 0 + x = x
Additive inverse x + (-x) = 0
Scalar rules
0x = 0, 1x = x, c(dx) = (cd)x
Distributivity
c(x + y) = cx + cy, (c + d)x = cx + dx.
No complex scalars will be used; all scalars are real.
Linear Independence & Bases
Vectors {v1, …, vn} are linearly independent if
Sum over i=1 to n (alphai * vi) = 0 implies alpha_i = 0 for all i.
Example in R^3:
v1 = (1,0,0), v2 = (0,1,0), v3 = (0,0,1) are linearly independent.
A basis of an n-dimensional space is a set of n linearly independent vectors. Every x in R^n can be expressed uniquely as
x = Sum over i=1 to n (ai * vi).
Inner (Dot) Product
Two equivalent forms
= Sum over i=1 to n (xi * yi) = ||x|| * ||y|| * cos(theta).
Interpretations:
Measures alignment; 0 implies orthogonality.
Basic building block for correlation, pattern matching, and for counting Multiply–Accumulate (MAC) operations.
Example: x=(1,2,3), y=(2,-1,4)
= 12 + 2(-1) + 3*4 = 12, angle theta = cos^(-1)(12 / (sqrt(14) * sqrt(21))) approximately 38 degrees.
Computational cost for an n-dimensional dot product: n MACs = 2n FLOPs.
Norms (Lengths)
General lp norm ||x||p = (Sum over i=1 to n (|x_i|^p))^(1/p).
Special cases:
p=1 Manhattan (taxicab) length.
p=2 Euclidean length.
Example x=(1,-2,4): ||x||1 = 7, ||x||2 = sqrt(21).
Geometric picture in 2-D: ||(4,3)||1 = 7 is the “right-angled bend” perimeter, whereas ||(4,3)||2 = 5 is the straight-line hypotenuse.
Metrics – Formal Distance Functions
A function d: V x V -> [0, infinity) is a metric if it satisfies
Non-negativity d(x,y) >= 0 (with equality iff x = y).
Symmetry d(x,y) = d(y,x).
Triangle inequality d(x,z) + d(z,y) >= d(x,y).
Common metrics:
Euclidean dE(x,y) = sqrt(Sum over i=1 to n ((xi - y_i)^2)).
Manhattan dM(x,y) = Sum over i=1 to n (|xi - y_i|).
Weighted Euclidean Distance & Feature Scaling
When feature components have disparate numeric ranges (e.g. weight 0–100 kg, aspect-ratio 0–1) plain Euclidean distance can be dominated by large-range dimensions. Remedy:
dW(x,y) = sqrt(Sum over i=1 to n (((xi - yi)^2) / (si^2))) where si rescales each coordinate (often chosen as range, standard deviation, etc.). A forthcoming statistics lecture will show that the Mahalanobis distance has the same structure but learns si (and cross-covariances) from data.
Introduction to Matrices
Preliminary definition – a two-dimensional array of real numbers of size m x n (rows x columns).
Notation: A is an element of R^(m x n), A_ij is element in row i, column j.
Row extraction: Ai,:; Column extraction: A:,j.
A square matrix has m = n.
Transpose
Aij^T = Aji. Example:
A = [[1, 2, 3], [11, 12, 13]] implies A^T = [[1, 11], [2, 12], [3, 13]].
A matrix with A = A^T is symmetric.
Constructing a Random Symmetric Matrix
Sample any square matrix R.
Set B = R + R^T. Then B = B^T is symmetric.
Scalar–Matrix Interaction
For scalar c in R,
cA multiplies every entry: [cA]ij = c * Aij.
Matrix Addition / Subtraction
Defined element-wise only if the two matrices share identical shape.
For scalars c,d,
C = cA + dB, Cij = c * Aij + d * B_ij.
Matrix × Vector Multiplication – Linear Operator View
If A is an element of R^(m x n) and x is an element of R^n, then
y = Ax is an element of R^m, where yi = Sum over j=1 to n (Aij * x_j).
Observations:
Each y_i is an inner product between row i of A and x.
Computational load: m rows x n-length dot product implies mn MACs = 2mn FLOPs.
Linear operator property:
A(alpha * x1 + beta * x2) = alpha * (A * x1) + beta * (A * x2).
Thus a matrix transforms an n-D vector space to an m-D one in a linear fashion.
Matrix × Matrix Multiplication
For A is an element of R^(m x n), B is an element of R^(n x p) define
C = AB is an element of R^(m x p), where Cij = Sum over k=1 to n (Aik * B_kj).
Each entry is again an inner product (row of A with column of B).
Computational cost: mp such inner products implies mpn MACs = 2mpn FLOPs.
Worked Example
A = [[1, 2, -3], [7, -1, 2]], B = [[-2, 3], [2, -1], [1, 3]]
produces
AB = [[-1, -6], [-14, 42]].
Multiplication Rules
Non-commutative: generally AB != BA.
Associative: (AB)C = A(BC).
Left & right distributive: A(B + C) = AB + AC, (A + B)C = AC + BC.
Transpose of a product: (AB)^T = B^T * A^T.
Generalised: (Product from i=1 to n (Ai))^T = Product from j=0 to n-1 (A(n-j)^T).
FLOPs, MACs & Hardware Relevance
Throughout, the instructor repeatedly asked for FLOP or MAC counts – critical for estimating computational complexity of ML models and for exploiting specialised hardware (e.g. multiply-accumulate units, GPUs, TPUs). Inner products (dot products) dominate these costs.
Session Closure & Next Steps
The lecture paused for a five-minute break midway, then finished the matrix segment. Students were reminded to fill a feedback form. A follow-up session ("part 2") will continue the linear-algebra foundations (including gradient, further products, statistics-based distances).