Linear Algebra and Geometric Interpretations

Dot Product

  • Corresponds to the angle between two vectors.
  • Given two vectors, take the transpose of the first vector (making it a row vector).
  • Multiply the transposed vector with the second vector.
  • This involves multiplying corresponding terms in each vector and summing the results.
  • The dot product results in a single number (scalar).

Geometric Interpretation

  • In two dimensions, consider vector x and vector y.
  • \theta_x is the angle vector x subtends on the horizontal axis.
  • x1 = ||x||2 \cos(\thetax), where ||x||2 is the two-norm (magnitude) of x.
  • x2 = ||x||2 \sin(\theta_x)
  • Similarly, for vector y: y1 = ||y||2 \cos(\thetay) and y2 = ||y||2 \sin(\thetay)
  • x1y1 + x2y2 = ||x||2 ||y||2 \cos(\thetax) \cos(\thetay) + ||x||2 ||y||2 \sin(\thetax) \sin(\thetay)
  • This simplifies to ||x||2 ||y||2 \cos(\thetax - \thetay), which is the product of magnitudes times the cosine of the angle between them.
  • If the angle is zero, the dot product is the product of the magnitudes.
  • If the angle is 90 degrees (orthogonal), the dot product is zero, since \cos(90^\circ) = 0.
  • When vectors are normalized (two-norm equals one), the dot product measures the angle between them.

Geometric Interpretation with Unit Vector

  • If one vector is a unit vector (e.g., \beta) and the other is not (e.g., x), the dot product x^T \beta has a geometric meaning.
  • x^T \beta = ||x||2 ||\beta||2 \cos(\theta), where ||\beta||_2 = 1.
  • So, x^T \beta = ||x||_2 \cos(\theta)
  • This represents the length of the projection of x onto \beta.
  • Consider a right triangle formed by dropping a perpendicular from point x onto the \beta axis.
  • The dot product gives the length of the base of this triangle.
  • Projections are fundamental in many reasoning tasks involving points.
  • This concept is crucial when discussing hyperplanes and using linear surfaces for class separation.

Matrices as Collections of Points

  • A matrix can be viewed as a collection of points in high-dimensional space, where each column is a point.

Pre-multiplying with a Unit Vector

  • Consider a unit vector \beta and a matrix A.
  • Pre-multiplying A with \beta^T (i.e., \beta^T A) has a geometric interpretation.
  • Algebraically, if A has columns a1, a2, \dots, then \beta^T A results in a row vector: [\beta^T a1, \beta^T a2, \dots].
  • Each element \beta^T ai is the projection of point ai onto the unit vector \beta.
  • This operation projects all points in the matrix onto a single vector \beta, effectively looking at those points in that dimension.
  • This principle underlies Principal Component Analysis (PCA).

Matrix-Vector Multiplication Geometry

  • Given a matrix A and a vector x, the product Ax results in a new vector.
  • If A is n \times k and x is k \times 1, the result is an n \times 1 vector.
  • Each entry in the resulting vector is a dot product of a row of A with the vector x.
  • Alternatively, this can be viewed as a linear combination of the columns of A, where the coefficients are the entries of x.
  • Ax = x1 a1 + x2 a2 + \dots + xk ak, where ai are the columns of A and xi are the elements of x.
  • Geometrically, you take each column vector of A, scale it by the corresponding entry in x, and then sum all the scaled vectors to get a new point.

Vector Space and Rank

  • If a matrix A has only two columns, all linear combinations of these columns will lie in the same plane.
  • The span of all possible linear combinations of the columns of A forms a vector space.
  • The rank of the vector space is the number of orthogonal dimensions covered by the space.
  • If all points lie on a plane, the rank is 2, even if the original space is higher-dimensional.
  • The vector space is the set of all points that can be reached by Ax for all possible choices of x.

Outer Product Geometry

  • The outer product of two vectors x and y (i.e., xy^T) results in a matrix.
  • If x is n \times 1 and y is 1 \times m, then xy^T is an n \times m matrix.
  • When m = 1, xy^T is just the vector x scaled by the scalar y_1.
  • If y has multiple entries (y1, y2, \dots, y_m), each column of the resulting matrix is a scaled version of the vector x.
  • All columns of the matrix will lie on the same line passing through the origin.
  • This matrix has a rank of 1 because its vector space is confined to a single dimension.

Key Takeaways

  • Vectors are points in high-dimensional space.
  • Dot product and projections.
  • Matrices as collections of points.
  • Matrix-vector multiplication.
  • Vector space and rank.
  • Emphasis on geometric intuition.

Supervised Classification and Hyperplanes

  • Given two classes of points, if they are separated in space, the goal is to find a surface that separates them.
  • In two dimensions, this is a line; in three dimensions, it's a plane; in n dimensions, it's a hyperplane.
  • A hyperplane is an (n-1)-dimensional surface.
  • There is a line orthogonal to all points on the hyperplane in an n-dimensional space.
  • The unit vector \beta normal to the hyperplane, and the shortest distance from the hyperplane to the origin \beta_0, define the hyperplane.

Hyperplane Definition

  • Let x1 be a point on the hyperplane. Then, \beta^T x1 = \beta_0.
  • The projection of any point on the hyperplane onto \beta is always \beta_0.
  • The hyperplane L is the set of all points x such that \beta^T x = \beta_0.
  • In two dimensions, this generalizes the equation of a line.

Distance from a Point to a Hyperplane

  • For a point x not on the hyperplane, the shortest distance from x to the hyperplane is given by |\beta^T x - \beta_0|.
  • \beta^T x is the projection of x onto \beta.
  • Subtracting \beta_0 gives the signed distance to the hyperplane.
  • If \beta^T x - \beta_0 = 0, the point lies on the hyperplane.
  • The sign of \beta^T x - \beta_0 indicates which side of the hyperplane the point lies on.
  • If two points have the same sign for \beta^T x - \beta_0, they are on the same side; if they have opposite signs, they are on opposite sides.

Applications in Bioinformatics

  • Gene expression or proteomics experiments result in an expression matrix.
  • Samples fall into classes (e.g., ALL and AML leukemia).
  • The problem is to predict the class of a new sample based on its expression values (supervised classification).
  • Another problem is to determine if points intrinsically separate into different classes (unsupervised learning).

Single-Cell Data Analysis

  • Single-cell data provides expression vectors for individual cells.
  • The goal is to cluster cells by type and project them into lower dimensions for visualization.
  • Each cell type should ideally form a distinct cluster.
  • This allows for comparison of cell compositions between samples.

Dimensionality Reduction

  • Given a matrix (collection of points), represent each sample by a single point in a lower dimension.
  • Algebraically, find a direction \beta and pre-multiply the matrix by \beta^T.
  • Geometrically, project all points onto the direction \beta.

Principal Component Analysis (PCA) Intuition

  • The goal is to project points in a way that preserves as much information as possible.
  • If points are projected onto a direction where they cluster tightly, information is lost.
  • The best direction is the one along which the projected points have the maximum variance.
  • Find \beta such that the variance of the projected points is maximized. This \beta is the principal component.

PCA Algebra

  • Compute the mean vector m of all data points.
  • Subtract the mean from each data point to center the data at the origin: xi \leftarrow xi - m.
  • Project the centered data points onto \beta: \beta^T (x_i - m).
  • These projected points have a mean of zero.

Variance Calculation

  • The sample variance of the projected points is \sumi (\beta^T (xi - m))^2.
  • The goal is to maximize this variance over all choices of the unit vector \beta.

Maximizing Variance

  • \sumi (\beta^T (xi - m))^2 = \sumi \beta^T (xi - m) (xi - m)^T \beta = \beta^T S \beta, where S = \sumi (xi - m) (xi - m)^T is the covariance matrix.
  • The problem becomes: Maximize \beta^T S \beta subject to ||\beta||_2 = 1.