Norms, p-Norms, Matrix Norms, and Min-Norm Problems
Norms and p-Norms
- Norm: a function ||·|| : V → R on a vector space V (over a field F) that satisfies for all c ∈ F and v,w ∈ V:
- Absolute homogeneity: ||c v|| = |c| · ||v||
- Triangle inequality: ||v + w|| ≤ ||v|| + ||w||
- Positive definiteness: ||v|| ≥ 0, with ||v|| = 0 iff v = 0
- In shorthand, these are often listed as: ||c v|| = |c| ||v||; ||v + w|| ≤ ||v|| + ||w||; ||v|| ≥ 0, ||v|| = 0 ⇔ v = 0.
- Norm of a vector in R^n (Euclidean norm as a specific norm):
- Length of x = (x1, …, xn) in R^n is
|x| = \sqrt{\langle x, x \rangle} = \sqrt{|x1|^2 + \cdots + |xn|^2}.
- F-norm and examples help motivate the general definition (norm is a length/size function).
p-Norm of a Vector
- For p ≥ 1 real, the p-norm on C^n (or R^n) is defined for x = (x1, …, xn) as
|x|p = \left( |x1|^p + \cdots + |x_n|^p \right)^{1/p}. - Special cases (commonly used):
- p = 1: |x|1 = |x1| + \cdots + |x_n|. (Manhattan norm)
- p = 2: |x|2 = \sqrt{|x1|^2 + \cdots + |x_n|^2}. (Euclidean norm)
- p = ∞: |x|\infty = \max{1 \le i \le n} |x_i|. (Maximum norm)
- For p ≥ 1, these norms measure different notions of length/distance in R^n and induce different unit balls.
Geometric Interpretation in R^2 (Unit-balls and sub-level curves)
- Unit ball for p = 2 (Euclidean norm) is a circle: {x ∈ R^2 : |x|_2 ≤ 1}.
- Unit ball for p = 1 is a diamond (rhombus) with corners at (±1,0) and (0,±1).
- Unit ball for p = ∞ is a square with corners at (±1,±1).
- Sub-level curves (level sets) of the p-norm give the shapes shown in the lecture:
- p = 2: concentric circles / circular level sets.
- p = 1: diamond-shaped level sets (rhombi).
- p = ∞: square-shaped level sets.
- Distances and closest-point behavior depend on the chosen p-norm; nearby points to a given vector may lie on different-shaped level sets.
- For x ∈ R^2, the norms are:
- |x|1 = |x1| + |x_2|.
- |x|2 = \sqrt{x1^2 + x_2^2}.
- |x|\infty = \max{|x1|, |x_2|}.
- Sub-level curves for p = 1 are diamonds; for p = 2 are circles; for p = ∞ are squares.
Weighted Norm of a Vector
- Weighted-norm of a vector x with weight matrix W is defined as
|x|_w = \sqrt{ x^T W x }. - W is typically a symmetric positive definite matrix to ensure a norm exists.
- This allows anisotropic weighting of components in the vector distance measurement.
Matrix Norms: Overview
- Matrix norms extend vector norms to matrices; common examples include:
- Frobenius norm (also called Hilbert–Schmidt norm):
|A|F = \sqrt{ \sum{i,j} a_{ij}^2 }. - Spectral (operator) norm (Induced by the Euclidean vector norm, p = 2):
|A|2 = \max{|x|2 = 1} |A x|2 = \sigma{\max}(A),
where (\sigma{\max}(A)) is the largest singular value of A. - Nuclear norm (trace norm): sum of singular values:
|A|* = \sumi \sigma_i(A). - Induced matrix norms from vector p-norms are defined by
|A|p = \max{|x|p = 1} |A x|p.
- Summary of meanings:
- Frobenius norm measures overall energy of A; invariant under orthogonal transformations.
- Operator norm (p = 2) gives the maximum stretching factor of A on unit vectors.
- Nuclear norm promotes low-rank solutions in optimization (analogous to L1 promoting sparsity in vectors).
- Frobenius norm:
|A|F = \sqrt{ \sum{i,j} |a_{ij}|^2 }. - Operator (spectral) norm (p = 2):
|A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A). - Nuclear norm:
|A|* = \sumi \sigma_i(A).
Norm Applications in Optimization (Min-Norm Problems)
- General problem:minimize ||x||_p subject to y = A x (exact model)
- x ∈ R^n, y ∈ R^m, A ∈ R^{m×n}.
- When the system is underdetermined (m < n), there are infinitely many x such that A x = y; the norm objective selects a preferred one.
- Common special cases (p ∈ {1, 2, ∞}):
- p = 2 (Euclidean norm):
- If A has full row rank (m ≤ n, rank(A) = m), the minimum-norm solution is the unique x* with smallest Euclidean norm among all feasible x, given by
x^* = A^T (A A^T)^{-1} y. - p = 1 (L1 norm):
- Encourages sparsity (few nonzero components in x).
- Can be formulated as a linear program: minimize \sumi ti subject to -ti ≤ xi ≤ ti and Ax = y, ti ≥ 0.
- p = ∞ (L∞ norm):
- Minimizes the maximum absolute component of x:
minimize t subject to -t ≤ x_i ≤ t for all i and Ax = y; t ≥ 0. Solvable as a linear program.
- Noisy measurements (robust/min-norm under noise):
- If observations are Ax ≈ y (with noise e), one common formulation is
\min |x|_p \quad \text{subject to} \quad |y - A x| \le \delta,
where δ reflects the noise level and the chosen norm for the residual.
- Typical interpretation for different p in underdetermined systems (x ∈ R^n, m < n):
- p = 1: tends to produce sparse x (few nonzeros).
- p = 2: yields the least-norm (closest to origin in Euclidean sense) among all feasible x.
- p = ∞: minimizes the largest absolute entry of x, balancing all coordinates.
Example Highlights (Conceptual)
- Case I (underdetermined): lines in R^2 intersect a line y = A x; minimizing ||x||_2 finds the intersection point closest to the origin among all feasible x.
- Case II (two-column A, y = Ax): underdetermined systems admit infinitely many solutions; the norm objective selects a unique minimizer depending on p.
- Visualization intuition:
- For p = 2, the feasible set intersects the level set ||x||_2 = r at some single point closest to the origin as r increases.
- For p = 1, the feasible set intersects the diamond ||x||_1 = r; the optimal point tends to lie on a coordinate axis (sparse solution).
- For p = ∞, the feasible set intersects the square ||x||_∞ = r; the solution pushes coordinates toward equal magnitude when possible.
Recap: What Each p-norm Encourages
- p = 1: sparsity in x (many zeros) -> useful in compressed sensing and sparse recovery.
- p = 2: smallest Euclidean norm (least-norm solution) among feasible x; connects to projection onto column space of A.
- p = ∞: minimizes the maximum magnitude among components; distributes weight more evenly across coordinates.
- In the exact model (y = A x), for p = 2 the solution is x* = A^T (A A^T)^{-1} y if A has full row rank; for p = 1 or ∞, the problem is convex and typically solved via linear programming or related convex formulations.
Notes on Real-World Relevance and Connections
- Norms quantify length, size, and distance in different ways; the choice of norm affects regularization, stability, and interpretability in algorithms.
- L1 regularization (sparsity) is central to feature selection, compressed sensing, and sparse signal representation.
- L2 regularization (Ridge) yields smooth solutions and numeric stability; ties to least-squares and projection onto column space.
- Nuclear norm regularization promotes low-rank solutions in matrix problems (e.g., matrix completion, collaborative filtering).
- Weighted norms allow anisotropic scaling, emphasizing some directions more than others, which is important in ill-conditioned problems.
- Vector norms
- |x|p = \left( \sum{i=1}^n |x_i|^p \right)^{1/p}, \quad p \ge 1.
- Special cases: |x|1 = \sumi |xi|, \quad |x|2 = \sqrt{\sumi xi^2}, \quad |x|\infty = \maxi |x_i|.
- Weighted norm
- |x|_w = \sqrt{ x^T W x },
- W is symmetric positive definite.
- Matrix norms
- Frobenius: |A|F = \sqrt{\sum{i,j} a_{ij}^2}.
- Operator (spectral) norm: |A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A).
- Nuclear: |A|* = \sumi \sigma_i(A).
- Minimum-norm problem (exact data)
- Minimize |x|_p subject to A x = y.
- p = 2: if rank(A) = m ≤ n, then x^* = A^T (A A^T)^{-1} y.
- p = 1 or p = ∞: formulate as linear programs (LPs) to obtain the minimizers.
- Noisy data variant
- Minimize |x|_p subject to |y - A x| \le \delta.
Suggested Takeaways for Exam Preparation
- Be able to state and prove the three norm axioms, and recognize their geometric meaning.
- Recall the definitions and key special cases of p-norms, and know their unit-ball shapes in R^2.
- Understand the difference between Frobenius, nuclear, and operator (spectral) norms for matrices.
- Explain why p = 1 promotes sparsity and p = ∞ distributes weight more evenly (and how this affects optimization results).
- Know the classic minimum-norm solution for p = 2 in an underdetermined system: x* = A^T (A A^T)^{-1} y (when A has full row rank).
- Be able to recast p = 1 and p = ∞ minimization problems as linear programs for practical solution methods.
- Recognize the impact of measurement noise and how it shifts problems from exact constraints to inequality constraints on residuals.