A b o u t   M e      |       P r o j e c t s     |       N o t e s       |       T h e   D a y    ︎ ︎

## Normal Equations

For an mtimes n matrix A and a vector b in mathbb{R^m}, the linear system Ax=b is to find the optimal solution x in mathbb{R^n}.

# 1. The Residual Minimization

The problem Ax=b can be replaced by the problem minimizing the residual, ||b-Ax||_2^2. Let A_i be the i-th row vector of A. Then the cost function f(x) can be defined as follows:
\begin{align} f(x)=\left\| b-Ax \right\|_2^2 = \left\| \begin{pmatrix} b_1 - A_1^t x \\ \vdots \\ b_m-A_m^t x \end{pmatrix} \right\| _2^2 = \sum_{k} (b_k-A_k^t x)^2 \end{align}    For minimizing f, finding the critical point is required. Let a_{ij} be the (i, j) element of A, then \begin{align} \frac{\partial f}{\partial x_i} &= \sum_{k} 2(b_k-A_k^tx) \frac{\partial}{\partial x_i}(b_k-A_k^t x) = 2\sum_{k} (b_k-A_k^tx) (-a_{ki}) = 0 \\ \\ &\Rightarrow \sum_{k} a_{ki} A_k^t x = \sum_{k} a_{ki} b_k =  \begin{pmatrix} a_{1i} & \cdots & a_{mi} \end{pmatrix} \begin{pmatrix} A_1^t \\ \vdots \\ A_m^t \end{pmatrix} x = \begin{pmatrix} a_{1i} & \cdots & a_{mi} \end{pmatrix} \begin{pmatrix} b_1 \\ \vdots \\ b_m \end{pmatrix} \\ &\Rightarrow \begin{pmatrix} a_{11} & \cdots & a_{m1} \\ \  & \vdots & \   \\ a_{1n} & \cdots & a_{mn} \end{pmatrix} \begin{pmatrix} A_1^t \\ \vdots \\ A_m^t \end{pmatrix} x = \begin{pmatrix} a_{11} & \cdots & a_{m1} \\ \  & \vdots & \   \\a_{1n} & \cdots & a_{mn}\end{pmatrix} \begin{pmatrix} b_1 \\ \vdots \\ b_m \end{pmatrix} \\ \\ &\Rightarrow \color{red}{A^tAx=A^tb} \end{align}
This equation is called the normal equation. Moreover, A^tA is positive (semi-)definite because x^tA^tAx=(Ax)^t(Ax)=||Ax||_2^2 geq 0, so it minimizes f. Especially, if ||Ax||_2^2 ne 0, it tells that there is no x such that Ax=0 except x=0.  It also means that A is nonsingular whose column vectors are linearly independent. Therefore, the solution of A^tAx=A^tb is the optimal solution of Ax=b when A is nonsingular.

2. Another Approach When m>n

In this case, A maps the lower dimension to the higher dimension. For example, when m=3 and n=2, there may be a point b  in the objective space which Ax cannot cover.

So, the linear system Ax=b cannot be solved exactly. However, we can still find the point which is closest to b. Let y=Ax be the closest point to b.
When y is closest to b, the vector b-y=b-Ax is perpendicular to the normal of hyperplane text{Im}  A. Suppose that e_1, cdots, e_n are the standard basis in the original space. Then Ae_1, cdots, Ae_n consist of the basis of the objective space.
In other words, each column vector of A consists of the hyperplane text{Im}  A, which means it is perpendicular to b-Ax. Therefore, A^t(b-Ax)=0, it yields the normal equation. Accordingly, the normal equation estimates the optimal solution of Ax=b even though the exact solution cannot be found.

# 3. Another Approach Using Projector

A square matrix P is idempotent if P^2=P, and it is called projector. P projects any vector to text{Im}  P and if the vectors are already on text{Im}  P, they would be still there after projected.
If P is symmetric, it is called orthogonal projector, and I-P is also orthogonal projector to the hyperplane which is perpendicular to text{Im}  P. Of course, (I-P)^2=I-2P+P^2=I-P. For a vector v,

\begin{align} (Pv)^t((I-P)v) &= v^tP^t(v-Pv)=v^tP(v-Pv) \\&=v^tPv-v^tP^2v = v^tPv-v^tPv =0 \\ \\ & \Rightarrow Pv \  \bot \  (I-P)v \\ & \Rightarrow v=Pv+(I-P)v \end{align}    Now, to apply this to solving Ax=b, assume that P is the orthogonal projector onto text{span}  A, which means PA=A. Then the cost function f(x) can be rewritten as \begin{align} f(x)=\left\| b-Ax \right\|_2^2 &= \left\| P(b-Ax)+(I-P)(b-Ax) \right\|_2^2 \\ &=  \left\| P(b-Ax) \right\|_2^2 + \left\| (I-P)(b-Ax) \right\|_2^2 \\ &= \left\| Pb-PAx \right\|_2^2 + \left\| (I-P)b-(I-P)Ax \right\|_2^2 \\ &= \left\| Pb-Ax \right\|_2^2 + \left\| (I-P)b \right\|_2^2. \end{align}    Therefore, for minimizing this cost function, || Pb-Ax ||_2^2  should be zero. \begin{align} Pb-Ax &= A^tPb-A^tAx= A^tP^tb-A^tAx \\ &= (PA)^tb-A^tAx= A^tb-A^tAx=O \end{align}
Note that it also produces the normal equation.

4. Evaluation

The solution from the normal equation can be evaluated by the condition number and angle between b and Ax.
The condition number of A tells how close to singular A is. However, if m ne n, A is not invertible, so the pseudoinverse A^+=(A^tA)^{-1}A^t should be introduced. In this case, the condition number of A is
\begin{align} \text{Cond}_2(A)=\left\| A \right\|_2  \left\| A^+ \right\|_2. \end{align}
Meanwhile, as the angle between b and Ax is smaller, b is closer to Ax, so it is a good measure to find whether the solution is well-conditioned. This angle can be obtain as cos theta = frac{||Ax||_2}{||b||_2}.

︎Reference

[1] Michael T. Heath, Scientific Computing: An Introductory Survey. 2nd Edition, McGraw-Hill Higher Education.

emoy.net