IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> Gradient and Directional Derivatives--How CNN Learn -> 正文阅读

[人工智能]Gradient and Directional Derivatives--How CNN Learn

在阅读这篇文章之前,先阅读这一篇文章:
Partial Derivatives and Vector Fields

1. Gradient


The gradient of a scalar-valued multivariable function f ( x , y , … ? ) f(x, y, \dots) f(x,y,), denoted ? f \nabla{f} ?f, packages all its partial derivative information into a vector:
? f = [ ? f ? x ? f ? y … ] \nabla{f}=\begin{bmatrix} \\\frac{\partial f}{\partial x} \\ \\\frac{\partial f}{\partial y} \\\dots \end{bmatrix} ?f=????????x?f??y?f?????????

In particular, this means ? f \nabla {f} ?f, f f f is a vector-valued function.

  • If you imagine standing at a point ( x 0 , y 0 , … ? ) (x_{0}, y_{0}, \dots) (x0?,y0?,) in the input space of f f f, the vector ? f \nabla f ?f tells you which direction you should travel to increase the value of f f f most rapidly.

  • These gradient vectors ? f \nabla f ?f are also perpendicular to the Contour lines of f f f.

In the case of scalar-valued multivariable functions, those with a multidimensional input but a one-dimensional output, the full derivative of such a function is the gradient.

Credit To: The gradient

2. Directional Derivatives


Consider some multivariable function:

f ( x , y ) = x 2 ? x y f(x,y)=x^{2}?xy f(x,y)=x2?xy

We know that the partial derivatives with respect to x x x and y y y tell us the rate of change of f f f as we nudge the input either in the x x x or y y y direction.

The question now is what happens when we nudge the input of f f f in a direction which is not parallel to the x x x or y y y axes.

For example, the image below shows the graph of f f f along with a small step along a vector v → \overrightarrow{v} v in the input space, meaning the x y xy xy-plane in this case. Is there an operation which tells us how the height of the graph above the tip of v → \overrightarrow{v} v compares to the height of the graph above its tail?

请添加图片描述

As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.

Just as the partial derivative is taken with respect to some input variable—e.g., x x x or y y y.
//
The directional derivative is taken along some vector v → \overrightarrow{v} v in the input space.

One very helpful way to think about this is to picture a point in the input space moving with velocity(速度) v → \overrightarrow{v} v .
The directional derivative of f f f along v → \overrightarrow{v} v is:
The directional derivative of f f f along v → \overrightarrow{v} v is:
The directional derivative of f f f along v → \overrightarrow{v} v is : 函数输出的结果变化率。

3. Compute the Directional Derivative


Let’s say you have a multivariable f ( x , y , z ) f(x, y, z) f(x,y,z), which takes in three variables— x x x, y y y and z z z—and you want to compute its directional derivative(函数输出的结果变化率) along the following vector:

v → = [ 2 3 1 ] \overrightarrow{v}=\begin{bmatrix}2\\3\\1\end{bmatrix} v =???231????

The answer, as it turns out, is

? v → f = 2 ? f ? x + 3 ? f ? y + ( ? 1 ) ? f ? z \nabla_{\overrightarrow{v}}{f}=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+(-1)\frac{\partial f}{\partial z} ?v ?f=2?x?f?+3?y?f?+(?1)?z?f?

This should make sense because a tiny nudge along v → \overrightarrow{v} v can be broken down into two tiny nudges in the x x x-direction, three tiny nudges in the y y y-direction, and a tiny nudge backwards, by ? 1 -1 ?1 in the z z z-direction.

More generally, we can write the vector v → \overrightarrow{v} v abstractly as follows:

v → = [ v 1 v 2 v 3 ] \overrightarrow{v}=\begin{bmatrix}v_{1}\\v_{2}\\v_{3}\end{bmatrix} v =???v1?v2?v3?????

The directional derivative looks like this:

? v → f = v 1 ? f ? x + v 2 ? f ? y + v 3 ? f ? z \nabla_{\overrightarrow{v}}{f}=v_{1}\frac{\partial f}{\partial x}+v_{2}\frac{\partial f}{\partial y}+v_{3}\frac{\partial f}{\partial z} ?v ?f=v1??x?f?+v2??y?f?+v3??z?f?

This can be written in a super-pleasing compact way using the dot product and the gradient:

请添加图片描述

? v → f = ? f ? v → \nabla_{\overrightarrow{v}}{f}=\nabla{f}\cdot{\overrightarrow{v}} ?v ?f=?f?v

Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That’s so many directions! Left, right, up, down, north-north-east, 34. 8 ° 34.8^\circ 34.8°degrees clockwise from the x x x-axis… Madness!

Credit To: Directional derivatives (introduction)

4. Difference Between the Gradient and the Directional Derivate


For those who are a little confused about the difference between the gradient and the directional derivate:

In the case given in the video. The gradient is a vector whose components are scalars, each representing the rate of change of the function along the standard unit vectors of whatever basis being used. (A lot of the time it’s the Cartesian plane and the unit basis vectors are i , j i,j i,j and k k k).

The gradient only tells us how the function is changing with respect to the axes of our coordinate system. But it’s hardly the case that our mathematical interests lie solely on the axes of our coordinate system, therefore we need the directional derivative.

The directional derivative is a scalar value which represents the rate of change of the function along a direction which is typically NOT in the direction of one of the standard basis vectors.

In conclusion, if you want to find the derivative of a multi variable function along a vector V, then first you must find a unit vector in the direction of V, called u, and then take (?f dot u). If u = < a, b > then (?f dot u) = a*(df/dx) + b*(df/dy).


这个视频告诉了我们, CNN 是怎么通过 gradient decent 进行学习的,即 gradient decent 是怎么改变每个 weight 的,这个对应的模型是全部都是全连接层。

? C \nabla{C} ?C 就是 cost function 在点 ( 1 , 1 ) (1,1) (1,1)处的 gradient, 它告诉我了我们哪个方向是cost function的output增加最快的方向,所以我们只要减去这个就行了。这样就能同时该改变 x , y x,y x,y的值,而且改变的程度不同,即x可能是增大,y减小。也可能是x减小,y增大。
在这里插入图片描述

在这里插入图片描述

看完上面的视频,下面这个更好,从细节上将CNN是怎么学习的! What is backpropagation really doing?

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-04-06 23:10:15  更:2022-04-06 23:10:34 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/8 4:34:18-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码