Vector Spaces & Matrices – Lecture Notes

House-Keeping, Technical Glitches & Session Logistics

The class opened with several minutes of logistical coordination (host rights, attendee list not visible, participants joining late, screen-sharing permissions, etc.). Although not conceptually important, the episode reminds you that online sessions need a stable host, confirmed audio, and screen-sharing before instruction can actually begin. After a brief wait for late arrivals the instructor started the technical material.


Review of Previous Lecture – Evolution of Number Systems

The instructor briefly recalled the previous class:

  • Numbers evolved from individual real numbers to ordered pairs (complex numbers), then to ordered n-tuples, which in turn led to the abstract notion of vector spaces.

  • Aggregating reals further yielded matrices, higher-order tensors, and ultimately data structures (images, video, etc.).


Real-World Motivation: How Data Are Packed
  1. Scalars – single numbers such as temperature or weight.

  2. 3-element tuples – e.g.- a date (day, month, year),

    • spatial coordinates (x,y,z).

  3. 1-D temporally indexed sequences – audio waveforms (sample series).

  4. 2-D tables / spreadsheets – addressed by row and column indices.

  5. 3-D tensors – grayscale images seen as matrices; colour images as RGB stacks (x,y,c).

  6. 4-D tensors – video as x,y,c,t.

    The lecture’s first goal is to formalise such multi-indexed objects with vector-space language so that ML algorithms can measure distances, separability, etc.


Why Vector Spaces Matter in ML

Using the “cat vs. dog” image example, the instructor motivates:

  • A feature-extraction transformation maps raw images to a feature space where each image is a feature vector.

  • A good transformation clusters dog vectors together, cat vectors together, and leaves a large distance between the two clusters.

  • Therefore we must (i) design feature extractors, (ii) choose meaningful notions of distance, and (iii) exploit vector-space structure to improve separability via further linear or non-linear transforms.


Vector Space – Formal Definition & Axioms

Take a set V equipped with

  • Addition +: V x V -> V (closure)

  • Scalar multiplication .: R x V -> V

    plus a distinguished null vector 0.

    Properties required:

  1. Commutativity x + y = y + x

  2. Associativity (x + y) + z = x + (y + z)

  3. Additive identity x + 0 = 0 + x = x

  4. Additive inverse x + (-x) = 0

  5. Scalar rules

    0x = 0, 1x = x, c(dx) = (cd)x

  6. Distributivity

    c(x + y) = cx + cy, (c + d)x = cx + dx.

No complex scalars will be used; all scalars are real.


Linear Independence & Bases
  • Vectors {v1, …, vn} are linearly independent if

    Sum over i=1 to n (alphai * vi) = 0 implies alpha_i = 0 for all i.

    Example in R^3:

    v1 = (1,0,0), v2 = (0,1,0), v3 = (0,0,1) are linearly independent.

  • A basis of an n-dimensional space is a set of n linearly independent vectors. Every x in R^n can be expressed uniquely as

    x = Sum over i=1 to n (ai * vi).


Inner (Dot) Product

Two equivalent forms

= Sum over i=1 to n (xi * yi) = ||x|| * ||y|| * cos(theta).

Interpretations:

  • Measures alignment; 0 implies orthogonality.

  • Basic building block for correlation, pattern matching, and for counting Multiply–Accumulate (MAC) operations.

  • Example: x=(1,2,3), y=(2,-1,4)

    = 12 + 2(-1) + 3*4 = 12, angle theta = cos^(-1)(12 / (sqrt(14) * sqrt(21))) approximately 38 degrees.

  • Computational cost for an n-dimensional dot product: n MACs = 2n FLOPs.


Norms (Lengths)

General lp norm ||x||p = (Sum over i=1 to n (|x_i|^p))^(1/p).

Special cases:

  • p=1 Manhattan (taxicab) length.

  • p=2 Euclidean length.

    Example x=(1,-2,4): ||x||1 = 7, ||x||2 = sqrt(21).

    Geometric picture in 2-D: ||(4,3)||1 = 7 is the “right-angled bend” perimeter, whereas ||(4,3)||2 = 5 is the straight-line hypotenuse.


Metrics – Formal Distance Functions

A function d: V x V -> [0, infinity) is a metric if it satisfies

  1. Non-negativity d(x,y) >= 0 (with equality iff x = y).

  2. Symmetry d(x,y) = d(y,x).

  3. Triangle inequality d(x,z) + d(z,y) >= d(x,y).

Common metrics:

  • Euclidean dE(x,y) = sqrt(Sum over i=1 to n ((xi - y_i)^2)).

  • Manhattan dM(x,y) = Sum over i=1 to n (|xi - y_i|).

Weighted Euclidean Distance & Feature Scaling

When feature components have disparate numeric ranges (e.g. weight 0–100 kg, aspect-ratio 0–1) plain Euclidean distance can be dominated by large-range dimensions. Remedy:

dW(x,y) = sqrt(Sum over i=1 to n (((xi - yi)^2) / (si^2))) where si rescales each coordinate (often chosen as range, standard deviation, etc.). A forthcoming statistics lecture will show that the Mahalanobis distance has the same structure but learns si (and cross-covariances) from data.


Introduction to Matrices
  • Preliminary definition – a two-dimensional array of real numbers of size m x n (rows x columns).

    Notation: A is an element of R^(m x n), A_ij is element in row i, column j.

  • Row extraction: Ai,:; Column extraction: A:,j.

  • A square matrix has m = n.

Transpose

Aij^T = Aji. Example:

A = [[1, 2, 3], [11, 12, 13]] implies A^T = [[1, 11], [2, 12], [3, 13]].

A matrix with A = A^T is symmetric.

Constructing a Random Symmetric Matrix

  1. Sample any square matrix R.

  2. Set B = R + R^T. Then B = B^T is symmetric.

Scalar–Matrix Interaction

For scalar c in R,

cA multiplies every entry: [cA]ij = c * Aij.

Matrix Addition / Subtraction

Defined element-wise only if the two matrices share identical shape.

For scalars c,d,

C = cA + dB, Cij = c * Aij + d * B_ij.


Matrix × Vector Multiplication – Linear Operator View

If A is an element of R^(m x n) and x is an element of R^n, then

y = Ax is an element of R^m, where yi = Sum over j=1 to n (Aij * x_j).

Observations:

  • Each y_i is an inner product between row i of A and x.

  • Computational load: m rows x n-length dot product implies mn MACs = 2mn FLOPs.

  • Linear operator property:

    A(alpha * x1 + beta * x2) = alpha * (A * x1) + beta * (A * x2).

    Thus a matrix transforms an n-D vector space to an m-D one in a linear fashion.


Matrix × Matrix Multiplication

For A is an element of R^(m x n), B is an element of R^(n x p) define

C = AB is an element of R^(m x p), where Cij = Sum over k=1 to n (Aik * B_kj).

Each entry is again an inner product (row of A with column of B).

Computational cost: mp such inner products implies mpn MACs = 2mpn FLOPs.

Worked Example

A = [[1, 2, -3], [7, -1, 2]], B = [[-2, 3], [2, -1], [1, 3]]

produces

AB = [[-1, -6], [-14, 42]].

Multiplication Rules
  1. Non-commutative: generally AB != BA.

  2. Associative: (AB)C = A(BC).

  3. Left & right distributive: A(B + C) = AB + AC, (A + B)C = AC + BC.

  4. Transpose of a product: (AB)^T = B^T * A^T.

    Generalised: (Product from i=1 to n (Ai))^T = Product from j=0 to n-1 (A(n-j)^T).


FLOPs, MACs & Hardware Relevance

Throughout, the instructor repeatedly asked for FLOP or MAC counts – critical for estimating computational complexity of ML models and for exploiting specialised hardware (e.g. multiply-accumulate units, GPUs, TPUs). Inner products (dot products) dominate these costs.


Session Closure & Next Steps

The lecture paused for a five-minute break midway, then finished the matrix segment. Students were reminded to fill a feedback form. A follow-up session ("part 2") will continue the linear-algebra foundations (including gradient, further products, statistics-based distances).