Notes on Least Squares and Systems of Linear Equations

The Method of Least Squares

  • Goal: find a straight line y = mx + b that best fits a set of data points (xi, yi) by minimizing the total squared residuals.
  • Data setup: n data points Pi with coordinates (xi, y_i).
  • Practical use: the resulting trend line can be used to forecast or predict future values (e.g., sales trends).
  • Visual aid: plot a scatter diagram of the data and overlay the least-squares regression line.
  • General form of the regression line:
    y=f(x)=mx+by = f(x) = mx + b
  • What makes the line “best” in the least-squares sense: the residuals ei = yi - (mxi + b) are as small as possible in the sum of squares, i.e., minimize S=</em>i=1n(y<em>i(mx</em>i+b))2.S = \sum</em>{i=1}^n \bigl(y<em>i - (m x</em>i + b)\bigr)^2.

Normal equations (the conditions for the least-squares line)

  • The constants m and b satisfy the normal equations obtained by setting the partial derivatives of S with respect to m and b to zero:

    \begin{cases}
    \displaystyle \sum{i=1}^n yi = m\sum{i=1}^n xi + b\,n, \
    \displaystyle \sum{i=1}^n xi yi = m\sum{i=1}^n xi^2 + b\sum{i=1}^n x_i.
    \end{cases}
  • These two equations can be solved simultaneously for m and b.
  • A compact way to write them using sums:
    y<em>i=mx</em>i+bn,x<em>iy</em>i=mx<em>i2+bx</em>i.\sum y<em>i = m\sum x</em>i + b\,n, \quad \sum x<em>i y</em>i = m\sum x<em>i^2 + b\sum x</em>i.
  • Brief derivation sketch:
    • Define residials ei = yi - (m x_i + b).
    • Minimize S = \sum e_i^2 with respect to m and b.
    • Set the partial derivatives ∂S/∂m = 0 and ∂S/∂b = 0 to obtain the normal equations above.
  • Practical steps to compute:
    • Compute the sums: Σxi, Σyi, Σxi^2, Σxi y_i, and n.
    • Solve the 2×2 linear system given by the normal equations for m and b.
  • Example results from transcript:
    • Example 1 (data: P1(1,8), P2(2,6), P3(5,6), P4(7,4), P5(10,1)) yields the regression line
      Y=0.685x+8.426.Y = -0.685\,x + 8.426.
    • Here, m0.685m \approx -0.685 and b8.426b \approx 8.426.
    • Example 2 (another dataset) yields the line
      Y=0.95x+10.35.Y = -0.95\,x + 10.35.
    • Here, m=0.95m = -0.95 and b=10.35b = 10.35.

Worked outline for applying the method

  • Step 1: collect data points (xi, yi).
  • Step 2: compute the required sums: x<em>i,y</em>i,x<em>i2,x</em>iyi,n\sum x<em>i, \sum y</em>i, \sum x<em>i^2, \sum x</em>i y_i, n.
  • Step 3: plug sums into the normal equations and solve for m and b.
  • Step 4: plot the data and draw the regression line y = mx + b.
  • Step 5: use the model to predict future values and assess fit (e.g., via residuals, R^2, or other metrics).

Connections to broader principles

  • The least-squares solution is the orthogonal projection of the data vector y onto the column space of the design matrix X = [ [1, x1], [1, x2], …, [1, x_n] ].
  • In matrix form, the normal equations can be written as (X^T X) [m, b]^T = X^T y.
  • The method assumes:
    • A linear relationship between x and y (linear in parameters m and b).
    • Homoscedastic, roughly normally distributed errors with constant variance (for inference).
    • No or manageable influence from outliers (outliers can distort the fit).

Systems of Linear Equations: Overview

  • A system consists of two or more linear equations in two variables (x, y): L1: a1 x + b1 y = c1, L2: a2 x + b2 y = c2, etc.
  • Key question: how many solutions does the system have? (one, infinite, or none)
  • Terminology:
    • Unique solution: the two lines L1 and L2 intersect at exactly one point.
    • Infinitely many solutions (dependent system): the equations represent the same line (parallel and coincident).
    • No solution (inconsistent system): the equations represent distinct parallel lines.
  • Terminology about consistency:
    • A system is consistent if it has at least one solution (one or infinitely many).
    • A system is inconsistent if it has no solution.
  • Quick classification criterion (two-equation case):
    • If the determinant Δ = a1 b2 − a2 b1 ≠ 0, the system has a unique solution.
    • If Δ = 0, check for dependence or inconsistency by comparing ratios (a1:a2, b1:b2, c1:c2).
    • If (a2, b2, c2) is a scalar multiple of (a1, b1, c1), there are infinitely many solutions.
    • If (a2, b2, c2) is not a scalar multiple of (a1, b1, c1), there is no solution.

Worked Examples (from transcript)

Example 1: Unique solution

  • System:

    \begin{cases}
    2x - 4y = -10, \
    3x + 2y = 1.
    \end{cases}
  • Solve by substitution:
    • From the first equation: x = 2y - 5.
    • Substitute into the second: 3(2y - 5) + 2y = 1.
    • Compute: 6y - 15 + 2y = 1 \Rightarrow 8y = 16 \Rightarrow y = 2.
    • Then x = 2(2) - 5 = -1.
  • Solution:
    (x,y)=(1,2).(x, y) = (-1, \, 2).
  • Verification:
    • 2(-1) - 4(2) = -2 - 8 = -10 ✔
    • 3(-1) + 2(2) = 3 + 4 = 7 ≠ 1?
      Note: The original second equation is 3x + 2y = 1; substituting x = -1, y = 2 gives 3(-1) + 2(2) = -3 + 4 = 1 ✔.

Example 2: Infinitely many solutions

  • System:

    \begin{cases}
    5x - 6y = 8, \
    10x - 12y = 16.
    \end{cases}
  • Observation: The second equation is exactly 2 times the first equation (2 × (5x - 6y) = 10x - 12y and 2 × 8 = 16).
  • Conclusion: The two equations are dependent and represent the same line; there are infinitely many solutions.
  • Description of the solution set: all points (x, y) that satisfy 5x - 6y = 8; e.g., one can parametrize quickly: x = (8 + 6y)/5 for any y.

Quick Reference: How to classify a 2×2 linear system

  • Given system:
    {<br/>a<em>1x+b</em>1y=c<em>1, a</em>2x+b<em>2y=c</em>2.<br/>\begin{cases}<br /> a<em>1 x + b</em>1 y = c<em>1, \ a</em>2 x + b<em>2 y = c</em>2.<br /> \end{cases}
  • Compute determinant: Δ=a<em>1b</em>2a<em>2b</em>1.\Delta = a<em>1 b</em>2 - a<em>2 b</em>1.
  • If Δ0\Delta \neq 0: unique solution.
  • If Δ=0\Delta = 0: check for dependence vs inconsistency:
    • If there exists a scalar k with (a2, b2, c2) = k(a1, b1, c1): infinitely many solutions (coincident lines).
    • If no such k exists (the triples are not proportional): no solution (parallel distinct lines).

Connections, implications, and practical notes

  • Least-squares regression provides the best linear fit in the sense of minimizing squared errors under the model assumptions.
  • Practical considerations:
    • Data quality, outliers, and extrapolation beyond the data range can affect accuracy and predictive power.
    • The normal equations give a stable method when data are well-conditioned; ill-conditioned data (e.g., x_i very close in value) can lead to numerical instability.
  • Real-world relevance:
    • Used for forecasting sales trends, demand planning, and many other predictive analytics tasks.
    • In linear systems, understanding the number of solutions helps in modeling feasibility and consistency with observed data.

Key formulas (quick reference in LaTeX)

  • Least-squares line:
    y=mx+by = mx + b
  • Normal equations:
    <em>i=1ny</em>i=m<em>i=1nx</em>i+bn,<em>i=1nx</em>iy<em>i=m</em>i=1nx<em>i2+b</em>i=1nxi.\sum<em>{i=1}^n y</em>i = m\sum<em>{i=1}^n x</em>i + b\,n, \qquad \sum<em>{i=1}^n x</em>i y<em>i = m\sum</em>{i=1}^n x<em>i^2 + b\sum</em>{i=1}^n x_i.
  • Determinant criterion (2×2 system):
    Δ=a<em>1b</em>2a<em>2b</em>1.\Delta = a<em>1 b</em>2 - a<em>2 b</em>1.
  • Solution existence:
    • If Δ0\Delta \neq 0, unique solution.
    • If Δ=0\Delta = 0 and ratios match, infinite solutions; if not, no solution.

Notes

  • The transcript provides concrete example lines (e.g., Y = -0.685x + 8.426 and Y = -0.95x + 10.35) illustrating the least-squares fits.
  • The substitution method shown in the examples demonstrates a straightforward approach to solving linear systems by expressing one variable in terms of the other and back-substituting.
  • Ethical/practical caveats: regression results should be interpreted in light of data quality, context, and the risk of overreliance on extrapolation."