Comprehensive Notes on Linear Algebra Concepts

Vector Perspectives and Representations

Scalars vs. vectors (from the transcript):
- Scalars: values that can be added, subtracted, etc., representing some quantity.
- Vectors: lists of numbers that represent something in a geometry/coordinate sense.
Computer science perspective: vectors are lists of numbers that represent something.
Physicists perspective: vectors have magnitude and a direction; their meaning is independent of the plane they lie in.
Mathematicians perspective: vectors are a combination of magnitude and direction, generalized to broader contexts.

Vector Operations

Vector addition and subtraction:
- If V1 = (x1, y1) and V2 = (x2, y2), then
- V1 + V2 = (x1 + x2, y1 + y2)
- V1 - V2 = (x1 - x2, y1 - y2)
- Interpretation: addition yields the displacement resulting from performing V1 followed by V2 (tip-to-tail method).
- Example from transcript: V1 = (x1, y1) and V2 = (x2, y2); they showed a combined displacement: V1 + V2 = (x1 + x2, y1 + y2).
Scalar multiplication:
- If k is a scalar and V = (x, y), then kV = (k x, k y).
- Direction and magnitude:
- If k > 0, magnitude scales by |k| and direction stays the same.
- If k < 0, magnitude scales by |k| and direction is reversed.
- Transcript note: multiplying by a positive scalar grows/shrinks the vector; multiplying by a negative scalar changes direction.
Projection (shadow of one vector on another):
- Given V1 and V2, the projection concepts:
- Scalar projection (length of the projection of V1 onto V2): $ext{proj}{ ext{scalar}}{V2}(V1) = rac{V1 \,\bullet\ V2}{|V2|}$
- Vector projection: $ext{proj}_{V2}(V1) = rac{V1\,\bullet\,V2}{|V2|^2}\,V2$
- Significance: projections help analyze components of an unknown vector along a known direction.
- Example outline: If V1 = (x1, y1) and V2 = (x2, y2), you can compute the scalar projection length and the vector projection using the formulas above.
Notation and intuition:
- Projections relate to how much of V1 lies in the direction of V2.
- Projections can be used to infer information about an unknown vector by projecting onto a known one.
- In coordinates, shifting vectors is allowed as long as magnitude and direction are preserved (vectors are independent of the plane location).

Vector Representations and Bases

Unit vectors: i and j, where
- $\mathbf{i} = (1,0) = (1,0)\mathbf{i}$ , $\mathbf{j} = (0,1) = (0,1)\mathbf{j}$
Examples of vector representations:
- If a = 3î - 2ĵ and b = 2î + 3ĵ, then
- a = (3, -2), b = (2, 3)
- a + b = (3+2, -2+3) = (5, 1)
- If a = (3, 2) and b = (2, 3) (from a different example in the transcript), then
- a + b = (3+2, 2+3) = (5, 5)
In general, vectors can be shifted in the plane without changing their magnitude and direction; representation in i, j is just a coordinate depiction.

Dot Product and Magnitude (Length) of Vectors

Dot product definition for a = (x1, y1) and b = (x2, y2):
- $a\cdot b = x1 x2 + y1 y2$
Length (magnitude) of a: for a = (x, y)
- $|a| = \sqrt{x^2 + y^2}$
Law of cosines relation (for vectors a and b with angle θ between them):
- $|a - b|^2 = |a|^2 + |b|^2 - 2|a||b|\cos\theta$
- This allows computing cos θ via: $\cos\theta = \frac{|a|^2 + |b|^2 - |a - b|^2}{2 |a| |b|}$
Special angle cases mentioned in the transcript:
- If θ = 0°, vectors point in the same direction; if θ = 180°, opposite directions; if θ = 90°, vectors are orthogonal (dot product 0).
- Orthogonality condition: $a\cdot b = 0\quad\text{iff}\quad \theta = 90^{\circ}$
Relationship between dot product and angle:
- $a\cdot b = |a||b|\cos\theta$
Normalized projection discussion (context from transcript): projections help quantify alignment and can be used to infer unknown vectors from known directions.

Projection Revisited: Scalar and Vector Projection Connections

Scalar projection (length of the projection of a onto b): $\text{proj}_{\text{scalar}} = \frac{a\cdot b}{|b|}$
Vector projection (the actual projected vector along b): $\text{proj}_{b}(a) = \frac{a\cdot b}{|b|^2}\,b$
The dot product is central to both projections; when two vectors are in the same direction (0° ≤ θ ≤ 90°), there is a nonzero projection; at 90°, projection length is zero.
For two vectors in the same direction, the scalar and vector projections align with the cosine of the angle between them.

Matrices: Basics and Intuition

A matrix is a rectangular (or square) array of numbers, symbols, or expressions used to transform vectors or to represent systems of linear equations.
A matrix A can transform a vector X into another vector Y: $A\mathbf{x} = \mathbf{y}$ , where A is the transformation matrix.
The idea that a matrix can encode multiple linear equations (e.g., representing a system of lines) in a compact form.
Notation hints from transcript:
- Matrix A, Vector I (input), Vector 2 (output) illustrate the transformation concept.
Core idea: matrices are building blocks to convert vector forms, perform transformations, and simplify operations.

Matrix Operations

Matrix addition and subtraction:
- Two matrices must be of the same size (order) to add/subtract element-wise.
- If A and B are both m × n, then (A ± B){ij} = a{ij} ± b_{ij}.
Matrix multiplication:
- If A is m × n and B is n × p, then AB is m × p with
- $(AB){ij} = \sum{k=1}^n a{ik} b{kj}$
- The inner dimensions must match (n in A and n in B).
- Example format in transcript showed row-by-column multiplication.
Transpose of a matrix:
- The transpose A^T interchanges rows and columns: (A^T){ij} = a{ji}.
- Useful for flipping dimensions and for certain transformations.
Determinant (Det):
- Determinant is a scalar value associated with a square matrix that encodes scale factor of the linear transformation and whether it preserves orientation.
- For a 2×2 matrix A = \begin{pmatrix} a & b \ c & d \end{pmatrix},
- $\det(A) = ad - bc$
- The determinant equals zero if the matrix is singular (no inverse).
Inverse of a matrix:
- A^{-1} exists iff det(A) ≠ 0.
- For 2×2: if A = \begin{pmatrix} a & b \ c & d \end{pmatrix}, then
- $A^{-1} = \frac{1}{\det(A)} \begin{pmatrix} d & -b \ -c & a \end{pmatrix}$
- Property: A A^{-1} = A^{-1} A = I, where I is the identity matrix.
- Not all matrices have inverses (det = 0 means no inverse).
The identity matrix:
- 2×2 identity: $I = \begin{pmatrix} 1 & 0 \ 0 & 1 \end{pmatrix}$
Practical note: inverse computation can be done via Gaussian elimination / row reduction to obtain the inverse (augment A with I and reduce to I | A^{-1}).

Inverse Method and Gaussian Elimination

Concept: If you can solve AX = B for X with given A and B, you can derive inverses and solve systems efficiently.
The transcript mentions using Gaussian elimination and echelon forms as a computational method to obtain inverses and solve linear systems.
Example outline (conceptual): To find X solving AX = I, perform row operations to transform [A | I] into [I | A^{-1}].
Important takeaway: If det(A) = 0, the inverse does not exist; the system may be underdetermined or have infinite solutions.

Eigenvalues and Eigenvectors

Definitions:
- An eigenvector v of a linear transformation represented by A satisfies A v = \lambda v, where \lambda is the corresponding eigenvalue.
- Eigenvectors point in directions that are preserved under the transformation (they may be stretched or shrunk, but not rotated away from their line).
Interpretation across transformations (from transcript):
- Scaling: both x- and y-direction vectors may remain along their original directions with eigenvalues indicating magnitudes of scaling along those directions.
- Shearing: horizontal vectors may keep direction while magnitudes change; some vectors may remain aligned only in specific directions.
- Rotation: in general, a rotation has very few eigenvectors; a rotation by 180° is a special case where every vector is an eigenvector with eigenvalue -1 (all vectors are flipped.
Key statements:
- Eigenvectors lie along the same line before and after the transformation.
- Eigenvalues measure how much those eigenvectors are stretched or compressed by the transformation.
- PageRank algorithm, famously associated with Larry Page, uses eigenvectors/eigenvalues concept for ranking (links between eigenanalysis and network structure).

Applications in Machine Learning and Data Science

Principal Component Analysis (PCA):
- Used for dimensionality reduction to improve data quality and simplify datasets by capturing the directions of maximum variance.
- Considered the most important application mentioned in the transcript.
Transformations on image data (pixel data):
- Linear algebra underpins many image processing techniques via coordinate transformations, filtering, and feature extraction.
Encoding of datasets: linear algebra underlies encoding schemes and representations of data.
Singular Value Decomposition (SVD):
- A factorization that reveals intrinsic structure in data and is used for dimensionality reduction, denoising, and collaborative filtering in some ML contexts.
Optimization of deep learning models:
- Linear algebra operations underpin many optimization routines and neural network computations.
Conceptual note: Dimensionality reduction refers to reducing the number of input variables while preserving as much information as possible.

Quick Worked Examples (from the transcript)

Example 1: Addition of vectors
- Let V1 = (3, 2) and V2 = (2, 3).
- V1 + V2 = (3+2, 2+3) = (5, 5).
Example 2: Scalar multiplication and a different pair
- Let a = (3, 2) and b = (-2, 3) (as shown in a later part where -2b was computed).
- 3a = (9, 6).
- -2b = (4, -6).
Example 3: Dot product and projection (from projection section)
- Let a = (3, 2) and b = (2, 3).
- a · b = 32 + 23 = 6 + 6 = 12.
- ||b|| = \sqrt{2^2 + 3^2} = \sqrt{13}.
- Scalar projection of a onto b: $\text{proj}_{\text{scalar}}(a\text{ onto } b) = \frac{a\cdot b}{|b|} = \frac{12}{\sqrt{13}}.$ e
- Vector projection: $\text{proj}_{b}(a) = \frac{a\cdot b}{|b|^2}\,b = \frac{12}{13}\,(2,3) = \left(\frac{24}{13}, \frac{36}{13}\right).$
Example 4: Unit vectors and vector addition in i, j form
- a = 3i - 2j = (3, -2)
- b = 2i + 3j = (2, 3)
- a + b = (3+2, -2+3) = (5, 1).
Example 5: 2×2 Matrix Operations (simple forms)
- For A = \begin{pmatrix} a & b \ c & d \end{pmatrix} and B = \begin{pmatrix} e & f \ g & h \end{pmatrix},
- A + B = \begin{pmatrix} a+e & b+f \ c+g & d+h \end{pmatrix},
- AB = \begin{pmatrix} a e + b g & a f + b h \ c e + d g & c f + d h \end{pmatrix}.
Example 6: 2×2 Determinant and Inverse
- det(A) = ad - bc for A = \begin{pmatrix} a & b \ c & d \end{pmatrix}.
- If det(A) ≠ 0, A^{-1} = \frac{1}{ad - bc} \begin{pmatrix} d & -b \ -c & a \end{pmatrix}.

Summary of Key Concepts to Remember

Vector operations: addition, subtraction, scalar multiplication; projection concepts; dot product and magnitude relations.
Vector representations: coordinates with i and j; basis; unit vectors.
Core matrix concepts: transformation interpretation, matrix operations (addition, subtraction, multiplication, transpose), determinant, inverse, identity matrix.
Inverse method: Gaussian elimination as a practical method to compute inverses and solve linear systems.
Eigenvalues and eigenvectors: invariant directions under linear transforms; interpretation across scaling, shearing, rotation; special cases (e.g., rotation by 180° where all vectors are eigenvectors with eigenvalue -1).
Applications in ML: PCA, SVD, dimensionality reduction, transformations of image data, and optimization in deep learning.

Notation Cheat Sheet

Vector: $\mathbf{v} = (x, y)$ or $\mathbf{v} = x\mathbf{i} + y\mathbf{j}$
Dot product: $\mathbf{a}\cdot\mathbf{b} = ax bx + ay by$
Magnitude: $|\mathbf{a}| = \sqrt{ax^2 + ay^2}$
Projection (scalar): $\text{proj}_{\text{scalar}}(\mathbf{a} \text{ onto } \mathbf{b}) = \frac{\mathbf{a}\cdot\mathbf{b}}{|\mathbf{b}|}$
Projection (vector): $\text{proj}_{\mathbf{b}}(\mathbf{a}) = \frac{\mathbf{a}\cdot\mathbf{b}}{|\mathbf{b}|^2}\,\mathbf{b}$
Term: Matrix Product
Definition: The operation where if matrix A is m x n and matrix B is n x p, then their product AB is an m x p matrix. An element (AB)ij(AB)ij is computed as the sum of products of corresponding elements from the ithith row of A and the jthjth column of B.
Inverse (2×2): $A^{-1} = \frac{1}{\det(A)} \begin{pmatrix} d & -b \ -c & a \end{pmatrix}, \quad \det(A)=ad-bc$
Identity matrix (2×2): $I = \begin{pmatrix} 1 & 0 \ 0 & 1 \end{pmatrix}$
Eigenvalue equation: $A\mathbf{v} = \lambda \mathbf{v}$
PCA, SVD, and dimensionality reduction concepts: short-hand references to the transcript’s ML applications.