A b o u t   M e      |       P r o j e c t s     |       N o t e s       |       T h e   D a y    ︎ ︎

Geometrical Meaning About the Gradient

    The gradient of a function `f` is `nabla f: mathbb{R}^n rightarrow mathbb{R}^n`,
\begin{align} \nabla f(x)=  \begin{pmatrix} \frac{\partial f(x)}{\partial x_1} \\ \vdots \\ \frac{\partial f(x)}{\partial x_n} \end{pmatrix} \end{align}

1. The gradient points to direction `f` is increasing.

    By Taylor Theorem, for `s` near `x`,
\begin{align} f(x+s) \approx f(x)+\nabla f(x)^t s \end{align}     For maximizing `f`, we can choose a good `s`, which means `x` should be moved to the direction `f` is increasing. Note that `nabla f(x)^t s` is maximized when `f` is maximized. As `nabla f(x)^t s` is the inner product of two vectors, \begin{align} \nabla f(x)^t s = \left\| \nabla f(x) \right\| \left\| s \right\| \cos\theta \end{align}
    where `theta` is the angle between `nabla f(x)` and `s`. It is maximized when `theta=0`. In other words, when `nabla f(x)` and `s` have the same direction, it is maximized. Therefore, `x` should be moved to `nabla f(x)` direction to locally maximize `f`.
    For example, consider `f(x)=x^2` and `f(x,y)=x^2+y^2` for `x, y in mathbb{R}`. Then their gradients are `nabla f(x)=2x` and `nabla f(x, y)=(2x, 2y)^t`.

    Their gradients point to the direction each `f` is increasing at the point `x`. Moreover, `-nabla f(x)` points to the direction `f` is decreasing.

2. The gradient is perpendicular to the tangent plane in terms of an implicit function.

    The gradient has the different meaning for explicit and implicit functions.
  • The gradient of an explicit function `y=f(x)` means the tangent vector at `x`.
  • The gradient of an implicit function `f(x,y)=0` means the normal vector of the tangent plane at `(x,y)^t`.

    For instance, consider `f(x,y)=x^2-y=0`. Then its gradient is `nabla f=(2x, -1)^t`. The total derivative of `f` is `2x dx-dy=0`, so `nabla f^t (dx, dy)^t=0`. Since `(dx, dy)^t` is the tangent of `f`, `nabla f` is perpendicular to this.

    For another example, consider `f(x,y,z)=x^2+y^2-z=0`. Then its gradient is `nabla f=(2x, 2y, -1)^t`. The total derivative of `f` is `2x dx+2y dy-dz=0`, so `nabla f^t (dx, dy, dz)^t=0`. Since `(dx, dy, dz)^t` is the tangent of `f`, `nabla f` is perpendicular to this.

︎ Reference 

[1] Michael T. Heath, Scientific Computing: An Introductory Survey. 2nd Edition, McGraw-Hill Higher Education.