[人工智能] Gradient and Directional Derivatives--How CNN Learn

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> Gradient and Directional Derivatives--How CNN Learn -> 正文阅读

[人工智能]Gradient and Directional Derivatives--How CNN Learn

在阅读这篇文章之前，先阅读这一篇文章：
Partial Derivatives and Vector Fields

1. Gradient

The gradient of a scalar-valued multivariable function $\dots)$ , denoted $\nabla{f}$ , packages all its partial derivative information into a vector:
$\nabla{f}=\begin{bmatrix} \\\frac{\partial f}{\partial x} \\ \\\frac{\partial f}{\partial y} \\\dots \end{bmatrix}$

In particular, this means $\nabla {f}$ , $f$ is a vector-valued function.

If you imagine standing at a point $(x_{0}, y_{0}, \dots)$ in the input space of $f$ , the vector $\nabla f$ tells you which direction you should travel to increase the value of $f$ most rapidly.
These gradient vectors $\nabla f$ are also perpendicular to the Contour lines of $f$ .

In the case of scalar-valued multivariable functions, those with a multidimensional input but a one-dimensional output, the full derivative of such a function is the gradient.

Credit To: The gradient

2. Directional Derivatives

Consider some multivariable function:

$f(x,y)=x^{2}?xy$

We know that the partial derivatives with respect to $x$ and $y$ tell us the rate of change of $f$ as we nudge the input either in the $x$ or $y$ direction.

The question now is what happens when we nudge the input of $f$ in a direction which is not parallel to the $x$ or $y$ axes.

For example, the image below shows the graph of $f$ along with a small step along a vector $\overrightarrow{v}$ in the input space, meaning the $x y$ -plane in this case. Is there an operation which tells us how the height of the graph above the tip of $\overrightarrow{v}$ compares to the height of the graph above its tail?

请添加图片描述

As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.

Just as the partial derivative is taken with respect to some input variable—e.g., $x$ or $y$ .
//
The directional derivative is taken along some vector $\overrightarrow{v}$ in the input space.

One very helpful way to think about this is to picture a point in the input space moving with velocity(速度) $\overrightarrow{v}$ .
The directional derivative of $f$ along $\overrightarrow{v}$ is:
The directional derivative of $f$ along $\overrightarrow{v}$ is:
The directional derivative of $f$ along $\overrightarrow{v}$ is : 函数输出的结果变化率。

3. Compute the Directional Derivative

Let’s say you have a multivariable $f (x, y, z)$ , which takes in three variables— $x$ , $y$ and $z$ —and you want to compute its directional derivative(函数输出的结果变化率) along the following vector:

$\overrightarrow{v}=\begin{bmatrix}2\\3\\1\end{bmatrix}$

The answer, as it turns out, is

$\nabla_{\overrightarrow{v}}{f}=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+(-1)\frac{\partial f}{\partial z}$

This should make sense because a tiny nudge along $\overrightarrow{v}$ can be broken down into two tiny nudges in the $x$ -direction, three tiny nudges in the $y$ -direction, and a tiny nudge backwards, by $? 1$ in the $z$ -direction.

More generally, we can write the vector $\overrightarrow{v}$ abstractly as follows:

$\overrightarrow{v}=\begin{bmatrix}v_{1}\\v_{2}\\v_{3}\end{bmatrix}$

The directional derivative looks like this:

$\nabla_{\overrightarrow{v}}{f}=v_{1}\frac{\partial f}{\partial x}+v_{2}\frac{\partial f}{\partial y}+v_{3}\frac{\partial f}{\partial z}$

This can be written in a super-pleasing compact way using the dot product and the gradient:

请添加图片描述

$\nabla_{\overrightarrow{v}}{f}=\nabla{f}\cdot{\overrightarrow{v}}$

Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That’s so many directions! Left, right, up, down, north-north-east, $34.8^\circ$ degrees clockwise from the $x$ -axis… Madness!

Credit To: Directional derivatives (introduction)

4. Difference Between the Gradient and the Directional Derivate

For those who are a little confused about the difference between the gradient and the directional derivate:

In the case given in the video. The gradient is a vector whose components are scalars, each representing the rate of change of the function along the standard unit vectors of whatever basis being used. (A lot of the time it’s the Cartesian plane and the unit basis vectors are $i, j$ and $k$ ).

The gradient only tells us how the function is changing with respect to the axes of our coordinate system. But it’s hardly the case that our mathematical interests lie solely on the axes of our coordinate system, therefore we need the directional derivative.

The directional derivative is a scalar value which represents the rate of change of the function along a direction which is typically NOT in the direction of one of the standard basis vectors.

In conclusion, if you want to find the derivative of a multi variable function along a vector V, then first you must find a unit vector in the direction of V, called u, and then take (?f dot u). If u = < a, b > then (?f dot u) = a*(df/dx) + b*(df/dy).

这个视频告诉了我们, CNN 是怎么通过 gradient decent 进行学习的，即 gradient decent 是怎么改变每个 weight 的，这个对应的模型是全部都是全连接层。

$\nabla{C}$ 就是 cost function 在点 $(1, 1)$ 处的 gradient, 它告诉我了我们哪个方向是cost function的output增加最快的方向，所以我们只要减去这个就行了。这样就能同时该改变 $x, y$ 的值，而且改变的程度不同，即x可能是增大，y减小。也可能是x减小，y增大。
在这里插入图片描述