Vector Spaces & Matrices – Lecture Notes

House-Keeping, Technical Glitches & Session Logistics

The class opened with several minutes of logistical coordination (host rights, attendee list not visible, participants joining late, screen-sharing permissions, etc.). Although not conceptually important, the episode reminds you that online sessions need a stable host, confirmed audio, and screen-sharing before instruction can actually begin. After a brief wait for late arrivals the instructor started the technical material.

Review of Previous Lecture – Evolution of Number Systems

The instructor briefly recalled the previous class:

Numbers evolved from individual real numbers to ordered pairs (complex numbers), then to ordered n-tuples, which in turn led to the abstract notion of vector spaces.
Aggregating reals further yielded matrices, higher-order tensors, and ultimately data structures (images, video, etc.).

Real-World Motivation: How Data Are Packed

Scalars – single numbers such as temperature or weight.
3-element tuples – e.g.- a date (day, month, year),
- spatial coordinates (x,y,z).
1-D temporally indexed sequences – audio waveforms (sample series).
2-D tables / spreadsheets – addressed by row and column indices.
3-D tensors – grayscale images seen as matrices; colour images as RGB stacks (x,y,c).
4-D tensors – video as x,y,c,t.
The lecture’s first goal is to formalise such multi-indexed objects with vector-space language so that ML algorithms can measure distances, separability, etc.

Why Vector Spaces Matter in ML

Using the “cat vs. dog” image example, the instructor motivates:

A feature-extraction transformation maps raw images to a feature space where each image is a feature vector.
A good transformation clusters dog vectors together, cat vectors together, and leaves a large distance between the two clusters.
Therefore we must (i) design feature extractors, (ii) choose meaningful notions of distance, and (iii) exploit vector-space structure to improve separability via further linear or non-linear transforms.

Vector Space – Formal Definition & Axioms

Take a set V equipped with

Addition +: V x V -> V (closure)
Scalar multiplication .: R x V -> V
plus a distinguished null vector 0.
Properties required:

Commutativity x + y = y + x
Associativity (x + y) + z = x + (y + z)
Additive identity x + 0 = 0 + x = x
Additive inverse x + (-x) = 0
Scalar rules
0x = 0, 1x = x, c(dx) = (cd)x
Distributivity
c(x + y) = cx + cy, (c + d)x = cx + dx.

No complex scalars will be used; all scalars are real.

Linear Independence & Bases

Vectors {v1, …, vn} are linearly independent if
Sum over i=1 to n (alphai * vi) = 0 implies alpha_i = 0 for all i.
Example in R^3:
v1 = (1,0,0), v2 = (0,1,0), v3 = (0,0,1) are linearly independent.
A basis of an n-dimensional space is a set of n linearly independent vectors. Every x in R^n can be expressed uniquely as
x = Sum over i=1 to n (ai * vi).

Inner (Dot) Product

Two equivalent forms

= Sum over i=1 to n (xi * yi) = ||x|| * ||y|| * cos(theta).

Interpretations:

Measures alignment; 0 implies orthogonality.
Basic building block for correlation, pattern matching, and for counting Multiply–Accumulate (MAC) operations.
Example: x=(1,2,3), y=(2,-1,4)
= 12 + 2(-1) + 3*4 = 12, angle theta = cos^(-1)(12 / (sqrt(14) * sqrt(21))) approximately 38 degrees.
Computational cost for an n-dimensional dot product: n MACs = 2n FLOPs.

Norms (Lengths)

General lp norm ||x||p = (Sum over i=1 to n (|x_i|^p))^(1/p).

Special cases:

p=1 Manhattan (taxicab) length.
p=2 Euclidean length.
Example x=(1,-2,4): ||x||1 = 7, ||x||2 = sqrt(21).
Geometric picture in 2-D: ||(4,3)||1 = 7 is the “right-angled bend” perimeter, whereas ||(4,3)||2 = 5 is the straight-line hypotenuse.

Metrics – Formal Distance Functions

A function d: V x V -> [0, infinity) is a metric if it satisfies

Non-negativity d(x,y) >= 0 (with equality iff x = y).
Symmetry d(x,y) = d(y,x).
Triangle inequality d(x,z) + d(z,y) >= d(x,y).

Common metrics:

Euclidean dE(x,y) = sqrt(Sum over i=1 to n ((xi - y_i)^2)).
Manhattan dM(x,y) = Sum over i=1 to n (|xi - y_i|).

Weighted Euclidean Distance & Feature Scaling

When feature components have disparate numeric ranges (e.g. weight 0–100 kg, aspect-ratio 0–1) plain Euclidean distance can be dominated by large-range dimensions. Remedy:

dW(x,y) = sqrt(Sum over i=1 to n (((xi - yi)^2) / (si^2))) where si rescales each coordinate (often chosen as range, standard deviation, etc.). A forthcoming statistics lecture will show that the Mahalanobis distance has the same structure but learns si (and cross-covariances) from data.

Introduction to Matrices

Preliminary definition – a two-dimensional array of real numbers of size m x n (rows x columns).
Notation: A is an element of R^(m x n), A_ij is element in row i, column j.
Row extraction: Ai,:; Column extraction: A:,j.
A square matrix has m = n.

Transpose

Aij^T = Aji. Example:

A = [[1, 2, 3], [11, 12, 13]] implies A^T = [[1, 11], [2, 12], [3, 13]].

A matrix with A = A^T is symmetric.

Constructing a Random Symmetric Matrix

Sample any square matrix R.
Set B = R + R^T. Then B = B^T is symmetric.

Scalar–Matrix Interaction

For scalar c in R,

cA multiplies every entry: [cA]ij = c * Aij.

Matrix Addition / Subtraction

Defined element-wise only if the two matrices share identical shape.

For scalars c,d,

C = cA + dB, Cij = c * Aij + d * B_ij.

Matrix × Vector Multiplication – Linear Operator View

If A is an element of R^(m x n) and x is an element of R^n, then

y = Ax is an element of R^m, where yi = Sum over j=1 to n (Aij * x_j).

Observations:

Each y_i is an inner product between row i of A and x.
Computational load: m rows x n-length dot product implies mn MACs = 2mn FLOPs.
Linear operator property:
A(alpha * x1 + beta * x2) = alpha * (A * x1) + beta * (A * x2).
Thus a matrix transforms an n-D vector space to an m-D one in a linear fashion.

Matrix × Matrix Multiplication

For A is an element of R^(m x n), B is an element of R^(n x p) define

C = AB is an element of R^(m x p), where Cij = Sum over k=1 to n (Aik * B_kj).

Each entry is again an inner product (row of A with column of B).

Computational cost: mp such inner products implies mpn MACs = 2mpn FLOPs.

Worked Example

A = [[1, 2, -3], [7, -1, 2]], B = [[-2, 3], [2, -1], [1, 3]]

produces

AB = [[-1, -6], [-14, 42]].

Multiplication Rules

Non-commutative: generally AB != BA.
Associative: (AB)C = A(BC).
Left & right distributive: A(B + C) = AB + AC, (A + B)C = AC + BC.
Transpose of a product: (AB)^T = B^T * A^T.
Generalised: (Product from i=1 to n (Ai))^T = Product from j=0 to n-1 (A(n-j)^T).

FLOPs, MACs & Hardware Relevance

Throughout, the instructor repeatedly asked for FLOP or MAC counts – critical for estimating computational complexity of ML models and for exploiting specialised hardware (e.g. multiply-accumulate units, GPUs, TPUs). Inner products (dot products) dominate these costs.

Session Closure & Next Steps

The lecture paused for a five-minute break midway, then finished the matrix segment. Students were reminded to fill a feedback form. A follow-up session ("part 2") will continue the linear-algebra foundations (including gradient, further products, statistics-based distances).