Norms, p-Norms, Matrix Norms, and Min-Norm Problems

Norms and p-Norms

Norm: a function ||·|| : V → R on a vector space V (over a field F) that satisfies for all c ∈ F and v,w ∈ V:
- Absolute homogeneity: ||c v|| = |c| · ||v||
- Triangle inequality: ||v + w|| ≤ ||v|| + ||w||
- Positive definiteness: ||v|| ≥ 0, with ||v|| = 0 iff v = 0
- In shorthand, these are often listed as: ||c v|| = |c| ||v||; ||v + w|| ≤ ||v|| + ||w||; ||v|| ≥ 0, ||v|| = 0 ⇔ v = 0.
Norm of a vector in R^n (Euclidean norm as a specific norm):
- Length of x = (x1, …, xn) in R^n is
  |x| = \sqrt{\langle x, x \rangle} = \sqrt{|x1|^2 + \cdots + |xn|^2}.
F-norm and examples help motivate the general definition (norm is a length/size function).

p-Norm of a Vector

For p ≥ 1 real, the p-norm on C^n (or R^n) is defined for x = (x1, …, xn) as
|x|p = \left( |x1|^p + \cdots + |x_n|^p \right)^{1/p}.
Special cases (commonly used):
- p = 1: |x|1 = |x1| + \cdots + |x_n|. (Manhattan norm)
- p = 2: |x|2 = \sqrt{|x1|^2 + \cdots + |x_n|^2}. (Euclidean norm)
- p = ∞: |x|\infty = \max{1 \le i \le n} |x_i|. (Maximum norm)
For p ≥ 1, these norms measure different notions of length/distance in R^n and induce different unit balls.

Geometric Interpretation in R^2 (Unit-balls and sub-level curves)

Unit ball for p = 2 (Euclidean norm) is a circle: {x ∈ R^2 : |x|_2 ≤ 1}.
Unit ball for p = 1 is a diamond (rhombus) with corners at (±1,0) and (0,±1).
Unit ball for p = ∞ is a square with corners at (±1,±1).
Sub-level curves (level sets) of the p-norm give the shapes shown in the lecture:
- p = 2: concentric circles / circular level sets.
- p = 1: diamond-shaped level sets (rhombi).
- p = ∞: square-shaped level sets.
Distances and closest-point behavior depend on the chosen p-norm; nearby points to a given vector may lie on different-shaped level sets.

Case Study: p-Norms in R^2 (Explicit forms)

For x ∈ R^2, the norms are:
- |x|1 = |x1| + |x_2|.
- |x|2 = \sqrt{x1^2 + x_2^2}.
- |x|\infty = \max{|x1|, |x_2|}.
Sub-level curves for p = 1 are diamonds; for p = 2 are circles; for p = ∞ are squares.

Weighted Norm of a Vector

Weighted-norm of a vector x with weight matrix W is defined as
|x|_w = \sqrt{ x^T W x }.
W is typically a symmetric positive definite matrix to ensure a norm exists.
This allows anisotropic weighting of components in the vector distance measurement.

Matrix Norms: Overview

Matrix norms extend vector norms to matrices; common examples include:
- Frobenius norm (also called Hilbert–Schmidt norm):
  |A|F = \sqrt{ \sum{i,j} a_{ij}^2 }.
- Spectral (operator) norm (Induced by the Euclidean vector norm, p = 2):
  |A|2 = \max{|x|2 = 1} |A x|2 = \sigma{\max}(A), where (\sigma{\max}(A)) is the largest singular value of A.
- Nuclear norm (trace norm): sum of singular values:
  |A|* = \sumi \sigma_i(A).
- Induced matrix norms from vector p-norms are defined by
  |A|p = \max{|x|p = 1} |A x|p.
Summary of meanings:
- Frobenius norm measures overall energy of A; invariant under orthogonal transformations.
- Operator norm (p = 2) gives the maximum stretching factor of A on unit vectors.
- Nuclear norm promotes low-rank solutions in optimization (analogous to L1 promoting sparsity in vectors).

Key Formulas (Matrix Norms)

Frobenius norm:
|A|F = \sqrt{ \sum{i,j} |a_{ij}|^2 }.
Operator (spectral) norm (p = 2):
|A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A).
Nuclear norm:
|A|* = \sumi \sigma_i(A).

Norm Applications in Optimization (Min-Norm Problems)

General problem:minimize ||x||_p subject to y = A x (exact model)
- x ∈ R^n, y ∈ R^m, A ∈ R^{m×n}.
- When the system is underdetermined (m < n), there are infinitely many x such that A x = y; the norm objective selects a preferred one.
Common special cases (p ∈ {1, 2, ∞}):
- p = 2 (Euclidean norm):
- If A has full row rank (m ≤ n, rank(A) = m), the minimum-norm solution is the unique x* with smallest Euclidean norm among all feasible x, given by
  x^* = A^T (A A^T)^{-1} y.
- p = 1 (L1 norm):
- Encourages sparsity (few nonzero components in x).
- Can be formulated as a linear program: minimize \sumi ti subject to -ti ≤ xi ≤ ti and Ax = y, ti ≥ 0.
- p = ∞ (L∞ norm):
- Minimizes the maximum absolute component of x:
  minimize t subject to -t ≤ x_i ≤ t for all i and Ax = y; t ≥ 0. Solvable as a linear program.
Noisy measurements (robust/min-norm under noise):
- If observations are Ax ≈ y (with noise e), one common formulation is
  \min |x|_p \quad \text{subject to} \quad |y - A x| \le \delta,
  where δ reflects the noise level and the chosen norm for the residual.
Typical interpretation for different p in underdetermined systems (x ∈ R^n, m < n):
- p = 1: tends to produce sparse x (few nonzeros).
- p = 2: yields the least-norm (closest to origin in Euclidean sense) among all feasible x.
- p = ∞: minimizes the largest absolute entry of x, balancing all coordinates.

Example Highlights (Conceptual)

Case I (underdetermined): lines in R^2 intersect a line y = A x; minimizing ||x||_2 finds the intersection point closest to the origin among all feasible x.
Case II (two-column A, y = Ax): underdetermined systems admit infinitely many solutions; the norm objective selects a unique minimizer depending on p.
Visualization intuition:
- For p = 2, the feasible set intersects the level set ||x||_2 = r at some single point closest to the origin as r increases.
- For p = 1, the feasible set intersects the diamond ||x||_1 = r; the optimal point tends to lie on a coordinate axis (sparse solution).
- For p = ∞, the feasible set intersects the square ||x||_∞ = r; the solution pushes coordinates toward equal magnitude when possible.

Recap: What Each p-norm Encourages

p = 1: sparsity in x (many zeros) -> useful in compressed sensing and sparse recovery.
p = 2: smallest Euclidean norm (least-norm solution) among feasible x; connects to projection onto column space of A.
p = ∞: minimizes the maximum magnitude among components; distributes weight more evenly across coordinates.
In the exact model (y = A x), for p = 2 the solution is x* = A^T (A A^T)^{-1} y if A has full row rank; for p = 1 or ∞, the problem is convex and typically solved via linear programming or related convex formulations.

Notes on Real-World Relevance and Connections

Norms quantify length, size, and distance in different ways; the choice of norm affects regularization, stability, and interpretability in algorithms.
L1 regularization (sparsity) is central to feature selection, compressed sensing, and sparse signal representation.
L2 regularization (Ridge) yields smooth solutions and numeric stability; ties to least-squares and projection onto column space.
Nuclear norm regularization promotes low-rank solutions in matrix problems (e.g., matrix completion, collaborative filtering).
Weighted norms allow anisotropic scaling, emphasizing some directions more than others, which is important in ill-conditioned problems.

Quick Reference (Key Formulas)

Vector norms
- |x|p = \left( \sum{i=1}^n |x_i|^p \right)^{1/p}, \quad p \ge 1.
- Special cases: |x|1 = \sumi |xi|, \quad |x|2 = \sqrt{\sumi xi^2}, \quad |x|\infty = \maxi |x_i|.
Weighted norm
- |x|_w = \sqrt{ x^T W x },
- W is symmetric positive definite.
Matrix norms
- Frobenius: |A|F = \sqrt{\sum{i,j} a_{ij}^2}.
- Operator (spectral) norm: |A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A).
- Nuclear: |A|* = \sumi \sigma_i(A).
Minimum-norm problem (exact data)
- Minimize |x|_p subject to A x = y.
- p = 2: if rank(A) = m ≤ n, then x^* = A^T (A A^T)^{-1} y.
- p = 1 or p = ∞: formulate as linear programs (LPs) to obtain the minimizers.
Noisy data variant
- Minimize |x|_p subject to |y - A x| \le \delta.

Suggested Takeaways for Exam Preparation

Be able to state and prove the three norm axioms, and recognize their geometric meaning.
Recall the definitions and key special cases of p-norms, and know their unit-ball shapes in R^2.
Understand the difference between Frobenius, nuclear, and operator (spectral) norms for matrices.
Explain why p = 1 promotes sparsity and p = ∞ distributes weight more evenly (and how this affects optimization results).
Know the classic minimum-norm solution for p = 2 in an underdetermined system: x* = A^T (A A^T)^{-1} y (when A has full row rank).
Be able to recast p = 1 and p = ∞ minimization problems as linear programs for practical solution methods.
Recognize the impact of measurement noise and how it shifts problems from exact constraints to inequality constraints on residuals.