Linear Algebra and Geometric Interpretations
Dot Product
- Corresponds to the angle between two vectors.
- Given two vectors, take the transpose of the first vector (making it a row vector).
- Multiply the transposed vector with the second vector.
- This involves multiplying corresponding terms in each vector and summing the results.
- The dot product results in a single number (scalar).
Geometric Interpretation
- In two dimensions, consider vector x and vector y.
- \theta_x is the angle vector x subtends on the horizontal axis.
- x1 = ||x||2 \cos(\thetax), where ||x||2 is the two-norm (magnitude) of x.
- x2 = ||x||2 \sin(\theta_x)
- Similarly, for vector y: y1 = ||y||2 \cos(\thetay) and y2 = ||y||2 \sin(\thetay)
- x1y1 + x2y2 = ||x||2 ||y||2 \cos(\thetax) \cos(\thetay) + ||x||2 ||y||2 \sin(\thetax) \sin(\thetay)
- This simplifies to ||x||2 ||y||2 \cos(\thetax - \thetay), which is the product of magnitudes times the cosine of the angle between them.
- If the angle is zero, the dot product is the product of the magnitudes.
- If the angle is 90 degrees (orthogonal), the dot product is zero, since \cos(90^\circ) = 0.
- When vectors are normalized (two-norm equals one), the dot product measures the angle between them.
Geometric Interpretation with Unit Vector
- If one vector is a unit vector (e.g., \beta) and the other is not (e.g., x), the dot product x^T \beta has a geometric meaning.
- x^T \beta = ||x||2 ||\beta||2 \cos(\theta), where ||\beta||_2 = 1.
- So, x^T \beta = ||x||_2 \cos(\theta)
- This represents the length of the projection of x onto \beta.
- Consider a right triangle formed by dropping a perpendicular from point x onto the \beta axis.
- The dot product gives the length of the base of this triangle.
- Projections are fundamental in many reasoning tasks involving points.
- This concept is crucial when discussing hyperplanes and using linear surfaces for class separation.
Matrices as Collections of Points
- A matrix can be viewed as a collection of points in high-dimensional space, where each column is a point.
Pre-multiplying with a Unit Vector
- Consider a unit vector \beta and a matrix A.
- Pre-multiplying A with \beta^T (i.e., \beta^T A) has a geometric interpretation.
- Algebraically, if A has columns a1, a2, \dots, then \beta^T A results in a row vector: [\beta^T a1, \beta^T a2, \dots].
- Each element \beta^T ai is the projection of point ai onto the unit vector \beta.
- This operation projects all points in the matrix onto a single vector \beta, effectively looking at those points in that dimension.
- This principle underlies Principal Component Analysis (PCA).
Matrix-Vector Multiplication Geometry
- Given a matrix A and a vector x, the product Ax results in a new vector.
- If A is n \times k and x is k \times 1, the result is an n \times 1 vector.
- Each entry in the resulting vector is a dot product of a row of A with the vector x.
- Alternatively, this can be viewed as a linear combination of the columns of A, where the coefficients are the entries of x.
- Ax = x1 a1 + x2 a2 + \dots + xk ak, where ai are the columns of A and xi are the elements of x.
- Geometrically, you take each column vector of A, scale it by the corresponding entry in x, and then sum all the scaled vectors to get a new point.
Vector Space and Rank
- If a matrix A has only two columns, all linear combinations of these columns will lie in the same plane.
- The span of all possible linear combinations of the columns of A forms a vector space.
- The rank of the vector space is the number of orthogonal dimensions covered by the space.
- If all points lie on a plane, the rank is 2, even if the original space is higher-dimensional.
- The vector space is the set of all points that can be reached by Ax for all possible choices of x.
Outer Product Geometry
- The outer product of two vectors x and y (i.e., xy^T) results in a matrix.
- If x is n \times 1 and y is 1 \times m, then xy^T is an n \times m matrix.
- When m = 1, xy^T is just the vector x scaled by the scalar y_1.
- If y has multiple entries (y1, y2, \dots, y_m), each column of the resulting matrix is a scaled version of the vector x.
- All columns of the matrix will lie on the same line passing through the origin.
- This matrix has a rank of 1 because its vector space is confined to a single dimension.
Key Takeaways
- Vectors are points in high-dimensional space.
- Dot product and projections.
- Matrices as collections of points.
- Matrix-vector multiplication.
- Vector space and rank.
- Emphasis on geometric intuition.
Supervised Classification and Hyperplanes
- Given two classes of points, if they are separated in space, the goal is to find a surface that separates them.
- In two dimensions, this is a line; in three dimensions, it's a plane; in n dimensions, it's a hyperplane.
- A hyperplane is an (n-1)-dimensional surface.
- There is a line orthogonal to all points on the hyperplane in an n-dimensional space.
- The unit vector \beta normal to the hyperplane, and the shortest distance from the hyperplane to the origin \beta_0, define the hyperplane.
Hyperplane Definition
- Let x1 be a point on the hyperplane. Then, \beta^T x1 = \beta_0.
- The projection of any point on the hyperplane onto \beta is always \beta_0.
- The hyperplane L is the set of all points x such that \beta^T x = \beta_0.
- In two dimensions, this generalizes the equation of a line.
Distance from a Point to a Hyperplane
- For a point x not on the hyperplane, the shortest distance from x to the hyperplane is given by |\beta^T x - \beta_0|.
- \beta^T x is the projection of x onto \beta.
- Subtracting \beta_0 gives the signed distance to the hyperplane.
- If \beta^T x - \beta_0 = 0, the point lies on the hyperplane.
- The sign of \beta^T x - \beta_0 indicates which side of the hyperplane the point lies on.
- If two points have the same sign for \beta^T x - \beta_0, they are on the same side; if they have opposite signs, they are on opposite sides.
- Gene expression or proteomics experiments result in an expression matrix.
- Samples fall into classes (e.g., ALL and AML leukemia).
- The problem is to predict the class of a new sample based on its expression values (supervised classification).
- Another problem is to determine if points intrinsically separate into different classes (unsupervised learning).
Single-Cell Data Analysis
- Single-cell data provides expression vectors for individual cells.
- The goal is to cluster cells by type and project them into lower dimensions for visualization.
- Each cell type should ideally form a distinct cluster.
- This allows for comparison of cell compositions between samples.
Dimensionality Reduction
- Given a matrix (collection of points), represent each sample by a single point in a lower dimension.
- Algebraically, find a direction \beta and pre-multiply the matrix by \beta^T.
- Geometrically, project all points onto the direction \beta.
Principal Component Analysis (PCA) Intuition
- The goal is to project points in a way that preserves as much information as possible.
- If points are projected onto a direction where they cluster tightly, information is lost.
- The best direction is the one along which the projected points have the maximum variance.
- Find \beta such that the variance of the projected points is maximized. This \beta is the principal component.
PCA Algebra
- Compute the mean vector m of all data points.
- Subtract the mean from each data point to center the data at the origin: xi \leftarrow xi - m.
- Project the centered data points onto \beta: \beta^T (x_i - m).
- These projected points have a mean of zero.
Variance Calculation
- The sample variance of the projected points is \sumi (\beta^T (xi - m))^2.
- The goal is to maximize this variance over all choices of the unit vector \beta.
Maximizing Variance
- \sumi (\beta^T (xi - m))^2 = \sumi \beta^T (xi - m) (xi - m)^T \beta = \beta^T S \beta, where S = \sumi (xi - m) (xi - m)^T is the covariance matrix.
- The problem becomes: Maximize \beta^T S \beta subject to ||\beta||_2 = 1.