Notes on Least Squares and Systems of Linear Equations
The Method of Least Squares
- Goal: find a straight line y = mx + b that best fits a set of data points (xi, yi) by minimizing the total squared residuals.
- Data setup: n data points Pi with coordinates (xi, y_i).
- Practical use: the resulting trend line can be used to forecast or predict future values (e.g., sales trends).
- Visual aid: plot a scatter diagram of the data and overlay the least-squares regression line.
- General form of the regression line:
y=f(x)=mx+b - What makes the line “best” in the least-squares sense: the residuals ei = yi - (mxi + b) are as small as possible in the sum of squares, i.e., minimize
S=∑</em>i=1n(y<em>i−(mx</em>i+b))2.
Normal equations (the conditions for the least-squares line)
- The constants m and b satisfy the normal equations obtained by setting the partial derivatives of S with respect to m and b to zero:
\begin{cases}
\displaystyle \sum{i=1}^n yi = m\sum{i=1}^n xi + b\,n, \
\displaystyle \sum{i=1}^n xi yi = m\sum{i=1}^n xi^2 + b\sum{i=1}^n x_i.
\end{cases}
- These two equations can be solved simultaneously for m and b.
- A compact way to write them using sums:
∑y<em>i=m∑x</em>i+bn,∑x<em>iy</em>i=m∑x<em>i2+b∑x</em>i. - Brief derivation sketch:
- Define residials ei = yi - (m x_i + b).
- Minimize S = \sum e_i^2 with respect to m and b.
- Set the partial derivatives ∂S/∂m = 0 and ∂S/∂b = 0 to obtain the normal equations above.
- Practical steps to compute:
- Compute the sums: Σxi, Σyi, Σxi^2, Σxi y_i, and n.
- Solve the 2×2 linear system given by the normal equations for m and b.
- Example results from transcript:
- Example 1 (data: P1(1,8), P2(2,6), P3(5,6), P4(7,4), P5(10,1)) yields the regression line
Y=−0.685x+8.426. - Here, m≈−0.685 and b≈8.426.
- Example 2 (another dataset) yields the line
Y=−0.95x+10.35. - Here, m=−0.95 and b=10.35.
Worked outline for applying the method
- Step 1: collect data points (xi, yi).
- Step 2: compute the required sums: ∑x<em>i,∑y</em>i,∑x<em>i2,∑x</em>iyi,n.
- Step 3: plug sums into the normal equations and solve for m and b.
- Step 4: plot the data and draw the regression line y = mx + b.
- Step 5: use the model to predict future values and assess fit (e.g., via residuals, R^2, or other metrics).
Connections to broader principles
- The least-squares solution is the orthogonal projection of the data vector y onto the column space of the design matrix X = [ [1, x1], [1, x2], …, [1, x_n] ].
- In matrix form, the normal equations can be written as (X^T X) [m, b]^T = X^T y.
- The method assumes:
- A linear relationship between x and y (linear in parameters m and b).
- Homoscedastic, roughly normally distributed errors with constant variance (for inference).
- No or manageable influence from outliers (outliers can distort the fit).
Systems of Linear Equations: Overview
- A system consists of two or more linear equations in two variables (x, y): L1: a1 x + b1 y = c1, L2: a2 x + b2 y = c2, etc.
- Key question: how many solutions does the system have? (one, infinite, or none)
- Terminology:
- Unique solution: the two lines L1 and L2 intersect at exactly one point.
- Infinitely many solutions (dependent system): the equations represent the same line (parallel and coincident).
- No solution (inconsistent system): the equations represent distinct parallel lines.
- Terminology about consistency:
- A system is consistent if it has at least one solution (one or infinitely many).
- A system is inconsistent if it has no solution.
- Quick classification criterion (two-equation case):
- If the determinant Δ = a1 b2 − a2 b1 ≠ 0, the system has a unique solution.
- If Δ = 0, check for dependence or inconsistency by comparing ratios (a1:a2, b1:b2, c1:c2).
- If (a2, b2, c2) is a scalar multiple of (a1, b1, c1), there are infinitely many solutions.
- If (a2, b2, c2) is not a scalar multiple of (a1, b1, c1), there is no solution.
Worked Examples (from transcript)
Example 1: Unique solution
- System:
\begin{cases}
2x - 4y = -10, \
3x + 2y = 1.
\end{cases}
- Solve by substitution:
- From the first equation: x = 2y - 5.
- Substitute into the second: 3(2y - 5) + 2y = 1.
- Compute: 6y - 15 + 2y = 1 \Rightarrow 8y = 16 \Rightarrow y = 2.
- Then x = 2(2) - 5 = -1.
- Solution:
(x,y)=(−1,2). - Verification:
- 2(-1) - 4(2) = -2 - 8 = -10 ✔
- 3(-1) + 2(2) = 3 + 4 = 7 ≠ 1?
Note: The original second equation is 3x + 2y = 1; substituting x = -1, y = 2 gives 3(-1) + 2(2) = -3 + 4 = 1 ✔.
Example 2: Infinitely many solutions
- System:
\begin{cases}
5x - 6y = 8, \
10x - 12y = 16.
\end{cases}
- Observation: The second equation is exactly 2 times the first equation (2 × (5x - 6y) = 10x - 12y and 2 × 8 = 16).
- Conclusion: The two equations are dependent and represent the same line; there are infinitely many solutions.
- Description of the solution set: all points (x, y) that satisfy 5x - 6y = 8; e.g., one can parametrize quickly: x = (8 + 6y)/5 for any y.
Quick Reference: How to classify a 2×2 linear system
- Given system:
{<br/>a<em>1x+b</em>1y=c<em>1, a</em>2x+b<em>2y=c</em>2.<br/> - Compute determinant: Δ=a<em>1b</em>2−a<em>2b</em>1.
- If Δ=0: unique solution.
- If Δ=0: check for dependence vs inconsistency:
- If there exists a scalar k with (a2, b2, c2) = k(a1, b1, c1): infinitely many solutions (coincident lines).
- If no such k exists (the triples are not proportional): no solution (parallel distinct lines).
Connections, implications, and practical notes
- Least-squares regression provides the best linear fit in the sense of minimizing squared errors under the model assumptions.
- Practical considerations:
- Data quality, outliers, and extrapolation beyond the data range can affect accuracy and predictive power.
- The normal equations give a stable method when data are well-conditioned; ill-conditioned data (e.g., x_i very close in value) can lead to numerical instability.
- Real-world relevance:
- Used for forecasting sales trends, demand planning, and many other predictive analytics tasks.
- In linear systems, understanding the number of solutions helps in modeling feasibility and consistency with observed data.
- Least-squares line:
y=mx+b - Normal equations:
∑<em>i=1ny</em>i=m∑<em>i=1nx</em>i+bn,∑<em>i=1nx</em>iy<em>i=m∑</em>i=1nx<em>i2+b∑</em>i=1nxi. - Determinant criterion (2×2 system):
Δ=a<em>1b</em>2−a<em>2b</em>1. - Solution existence:
- If Δ=0, unique solution.
- If Δ=0 and ratios match, infinite solutions; if not, no solution.
Notes
- The transcript provides concrete example lines (e.g., Y = -0.685x + 8.426 and Y = -0.95x + 10.35) illustrating the least-squares fits.
- The substitution method shown in the examples demonstrates a straightforward approach to solving linear systems by expressing one variable in terms of the other and back-substituting.
- Ethical/practical caveats: regression results should be interpreted in light of data quality, context, and the risk of overreliance on extrapolation."