Norms, p-Norms, Matrix Norms, and Min-Norm Problems

Norms and p-Norms

  • Norm: a function ||·|| : V → R on a vector space V (over a field F) that satisfies for all c ∈ F and v,w ∈ V:
    • Absolute homogeneity: ||c v|| = |c| · ||v||
    • Triangle inequality: ||v + w|| ≤ ||v|| + ||w||
    • Positive definiteness: ||v|| ≥ 0, with ||v|| = 0 iff v = 0
    • In shorthand, these are often listed as: ||c v|| = |c| ||v||; ||v + w|| ≤ ||v|| + ||w||; ||v|| ≥ 0, ||v|| = 0 ⇔ v = 0.
  • Norm of a vector in R^n (Euclidean norm as a specific norm):
    • Length of x = (x1, …, xn) in R^n is
      |x| = \sqrt{\langle x, x \rangle} = \sqrt{|x1|^2 + \cdots + |xn|^2}.
  • F-norm and examples help motivate the general definition (norm is a length/size function).

p-Norm of a Vector

  • For p ≥ 1 real, the p-norm on C^n (or R^n) is defined for x = (x1, …, xn) as
    |x|p = \left( |x1|^p + \cdots + |x_n|^p \right)^{1/p}.
  • Special cases (commonly used):
    • p = 1: |x|1 = |x1| + \cdots + |x_n|. (Manhattan norm)
    • p = 2: |x|2 = \sqrt{|x1|^2 + \cdots + |x_n|^2}. (Euclidean norm)
    • p = ∞: |x|\infty = \max{1 \le i \le n} |x_i|. (Maximum norm)
  • For p ≥ 1, these norms measure different notions of length/distance in R^n and induce different unit balls.

Geometric Interpretation in R^2 (Unit-balls and sub-level curves)

  • Unit ball for p = 2 (Euclidean norm) is a circle: {x ∈ R^2 : |x|_2 ≤ 1}.
  • Unit ball for p = 1 is a diamond (rhombus) with corners at (±1,0) and (0,±1).
  • Unit ball for p = ∞ is a square with corners at (±1,±1).
  • Sub-level curves (level sets) of the p-norm give the shapes shown in the lecture:
    • p = 2: concentric circles / circular level sets.
    • p = 1: diamond-shaped level sets (rhombi).
    • p = ∞: square-shaped level sets.
  • Distances and closest-point behavior depend on the chosen p-norm; nearby points to a given vector may lie on different-shaped level sets.

Case Study: p-Norms in R^2 (Explicit forms)

  • For x ∈ R^2, the norms are:
    • |x|1 = |x1| + |x_2|.
    • |x|2 = \sqrt{x1^2 + x_2^2}.
    • |x|\infty = \max{|x1|, |x_2|}.
  • Sub-level curves for p = 1 are diamonds; for p = 2 are circles; for p = ∞ are squares.

Weighted Norm of a Vector

  • Weighted-norm of a vector x with weight matrix W is defined as
    |x|_w = \sqrt{ x^T W x }.
  • W is typically a symmetric positive definite matrix to ensure a norm exists.
  • This allows anisotropic weighting of components in the vector distance measurement.

Matrix Norms: Overview

  • Matrix norms extend vector norms to matrices; common examples include:
    • Frobenius norm (also called Hilbert–Schmidt norm):
      |A|F = \sqrt{ \sum{i,j} a_{ij}^2 }.
    • Spectral (operator) norm (Induced by the Euclidean vector norm, p = 2):
      |A|2 = \max{|x|2 = 1} |A x|2 = \sigma{\max}(A), where (\sigma{\max}(A)) is the largest singular value of A.
    • Nuclear norm (trace norm): sum of singular values:
      |A|* = \sumi \sigma_i(A).
    • Induced matrix norms from vector p-norms are defined by
      |A|p = \max{|x|p = 1} |A x|p.
  • Summary of meanings:
    • Frobenius norm measures overall energy of A; invariant under orthogonal transformations.
    • Operator norm (p = 2) gives the maximum stretching factor of A on unit vectors.
    • Nuclear norm promotes low-rank solutions in optimization (analogous to L1 promoting sparsity in vectors).

Key Formulas (Matrix Norms)

  • Frobenius norm:
    |A|F = \sqrt{ \sum{i,j} |a_{ij}|^2 }.
  • Operator (spectral) norm (p = 2):
    |A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A).
  • Nuclear norm:
    |A|* = \sumi \sigma_i(A).

Norm Applications in Optimization (Min-Norm Problems)

  • General problem:minimize ||x||_p subject to y = A x (exact model)
    • x ∈ R^n, y ∈ R^m, A ∈ R^{m×n}.
    • When the system is underdetermined (m < n), there are infinitely many x such that A x = y; the norm objective selects a preferred one.
  • Common special cases (p ∈ {1, 2, ∞}):
    • p = 2 (Euclidean norm):
    • If A has full row rank (m ≤ n, rank(A) = m), the minimum-norm solution is the unique x* with smallest Euclidean norm among all feasible x, given by
      x^* = A^T (A A^T)^{-1} y.
    • p = 1 (L1 norm):
    • Encourages sparsity (few nonzero components in x).
    • Can be formulated as a linear program: minimize \sumi ti subject to -ti ≤ xi ≤ ti and Ax = y, ti ≥ 0.
    • p = ∞ (L∞ norm):
    • Minimizes the maximum absolute component of x:
      minimize t subject to -t ≤ x_i ≤ t for all i and Ax = y; t ≥ 0. Solvable as a linear program.
  • Noisy measurements (robust/min-norm under noise):
    • If observations are Ax ≈ y (with noise e), one common formulation is
      \min |x|_p \quad \text{subject to} \quad |y - A x| \le \delta,
      where δ reflects the noise level and the chosen norm for the residual.
  • Typical interpretation for different p in underdetermined systems (x ∈ R^n, m < n):
    • p = 1: tends to produce sparse x (few nonzeros).
    • p = 2: yields the least-norm (closest to origin in Euclidean sense) among all feasible x.
    • p = ∞: minimizes the largest absolute entry of x, balancing all coordinates.

Example Highlights (Conceptual)

  • Case I (underdetermined): lines in R^2 intersect a line y = A x; minimizing ||x||_2 finds the intersection point closest to the origin among all feasible x.
  • Case II (two-column A, y = Ax): underdetermined systems admit infinitely many solutions; the norm objective selects a unique minimizer depending on p.
  • Visualization intuition:
    • For p = 2, the feasible set intersects the level set ||x||_2 = r at some single point closest to the origin as r increases.
    • For p = 1, the feasible set intersects the diamond ||x||_1 = r; the optimal point tends to lie on a coordinate axis (sparse solution).
    • For p = ∞, the feasible set intersects the square ||x||_∞ = r; the solution pushes coordinates toward equal magnitude when possible.

Recap: What Each p-norm Encourages

  • p = 1: sparsity in x (many zeros) -> useful in compressed sensing and sparse recovery.
  • p = 2: smallest Euclidean norm (least-norm solution) among feasible x; connects to projection onto column space of A.
  • p = ∞: minimizes the maximum magnitude among components; distributes weight more evenly across coordinates.
  • In the exact model (y = A x), for p = 2 the solution is x* = A^T (A A^T)^{-1} y if A has full row rank; for p = 1 or ∞, the problem is convex and typically solved via linear programming or related convex formulations.

Notes on Real-World Relevance and Connections

  • Norms quantify length, size, and distance in different ways; the choice of norm affects regularization, stability, and interpretability in algorithms.
  • L1 regularization (sparsity) is central to feature selection, compressed sensing, and sparse signal representation.
  • L2 regularization (Ridge) yields smooth solutions and numeric stability; ties to least-squares and projection onto column space.
  • Nuclear norm regularization promotes low-rank solutions in matrix problems (e.g., matrix completion, collaborative filtering).
  • Weighted norms allow anisotropic scaling, emphasizing some directions more than others, which is important in ill-conditioned problems.

Quick Reference (Key Formulas)

  • Vector norms
    • |x|p = \left( \sum{i=1}^n |x_i|^p \right)^{1/p}, \quad p \ge 1.
    • Special cases: |x|1 = \sumi |xi|, \quad |x|2 = \sqrt{\sumi xi^2}, \quad |x|\infty = \maxi |x_i|.
  • Weighted norm
    • |x|_w = \sqrt{ x^T W x },
    • W is symmetric positive definite.
  • Matrix norms
    • Frobenius: |A|F = \sqrt{\sum{i,j} a_{ij}^2}.
    • Operator (spectral) norm: |A|2 = \max{|x|2 = 1} |A x|2 = \sigma_{\max}(A).
    • Nuclear: |A|* = \sumi \sigma_i(A).
  • Minimum-norm problem (exact data)
    • Minimize |x|_p subject to A x = y.
    • p = 2: if rank(A) = m ≤ n, then x^* = A^T (A A^T)^{-1} y.
    • p = 1 or p = ∞: formulate as linear programs (LPs) to obtain the minimizers.
  • Noisy data variant
    • Minimize |x|_p subject to |y - A x| \le \delta.

Suggested Takeaways for Exam Preparation

  • Be able to state and prove the three norm axioms, and recognize their geometric meaning.
  • Recall the definitions and key special cases of p-norms, and know their unit-ball shapes in R^2.
  • Understand the difference between Frobenius, nuclear, and operator (spectral) norms for matrices.
  • Explain why p = 1 promotes sparsity and p = ∞ distributes weight more evenly (and how this affects optimization results).
  • Know the classic minimum-norm solution for p = 2 in an underdetermined system: x* = A^T (A A^T)^{-1} y (when A has full row rank).
  • Be able to recast p = 1 and p = ∞ minimization problems as linear programs for practical solution methods.
  • Recognize the impact of measurement noise and how it shifts problems from exact constraints to inequality constraints on residuals.