| |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
-> 人工智能 -> Gradient and Directional Derivatives--How CNN Learn -> 正文阅读 |
|
[人工智能]Gradient and Directional Derivatives--How CNN Learn |
在阅读这篇文章之前,先阅读这一篇文章: 1. Gradient
In particular, this means ? f \nabla {f} ?f, f f f is a vector-valued function.
Credit To: The gradient 2. Directional DerivativesConsider some multivariable function: f ( x , y ) = x 2 ? x y f(x,y)=x^{2}?xy f(x,y)=x2?xy We know that the partial derivatives with respect to x x x and y y y tell us the rate of change of f f f as we nudge the input either in the x x x or y y y direction. The question now is what happens when we nudge the input of f f f in a direction which is not parallel to the x x x or y y y axes. For example, the image below shows the graph of f f f along with a small step along a vector v → \overrightarrow{v} v in the input space, meaning the x y xy xy-plane in this case. Is there an operation which tells us how the height of the graph above the tip of v → \overrightarrow{v} v compares to the height of the graph above its tail?
One very helpful way to think about this is to picture a point in the input space moving with velocity(速度)
v
→
\overrightarrow{v}
v. 3. Compute the Directional DerivativeLet’s say you have a multivariable f ( x , y , z ) f(x, y, z) f(x,y,z), which takes in three variables— x x x, y y y and z z z—and you want to compute its directional derivative(函数输出的结果变化率) along the following vector: v → = [ 2 3 1 ] \overrightarrow{v}=\begin{bmatrix}2\\3\\1\end{bmatrix} v=???231???? The answer, as it turns out, is ? v → f = 2 ? f ? x + 3 ? f ? y + ( ? 1 ) ? f ? z \nabla_{\overrightarrow{v}}{f}=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+(-1)\frac{\partial f}{\partial z} ?v?f=2?x?f?+3?y?f?+(?1)?z?f? This should make sense because a tiny nudge along v → \overrightarrow{v} v can be broken down into two tiny nudges in the x x x-direction, three tiny nudges in the y y y-direction, and a tiny nudge backwards, by ? 1 -1 ?1 in the z z z-direction. More generally, we can write the vector v → \overrightarrow{v} v abstractly as follows: v → = [ v 1 v 2 v 3 ] \overrightarrow{v}=\begin{bmatrix}v_{1}\\v_{2}\\v_{3}\end{bmatrix} v=???v1?v2?v3????? The directional derivative looks like this: ? v → f = v 1 ? f ? x + v 2 ? f ? y + v 3 ? f ? z \nabla_{\overrightarrow{v}}{f}=v_{1}\frac{\partial f}{\partial x}+v_{2}\frac{\partial f}{\partial y}+v_{3}\frac{\partial f}{\partial z} ?v?f=v1??x?f?+v2??y?f?+v3??z?f? This can be written in a super-pleasing compact way using the dot product and the gradient:
Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That’s so many directions! Left, right, up, down, north-north-east, 34. 8 ° 34.8^\circ 34.8°degrees clockwise from the x x x-axis… Madness! Credit To: Directional derivatives (introduction) 4. Difference Between the Gradient and the Directional DerivateFor those who are a little confused about the difference between the gradient and the directional derivate: In the case given in the video. The gradient is a vector whose components are scalars, each representing the rate of change of the function along the standard unit vectors of whatever basis being used. (A lot of the time it’s the Cartesian plane and the unit basis vectors are i , j i,j i,j and k k k). The gradient only tells us how the function is changing with respect to the axes of our coordinate system. But it’s hardly the case that our mathematical interests lie solely on the axes of our coordinate system, therefore we need the directional derivative. The directional derivative is a scalar value which represents the rate of change of the function along a direction which is typically NOT in the direction of one of the standard basis vectors. In conclusion, if you want to find the derivative of a multi variable function along a vector V, then first you must find a unit vector in the direction of V, called u, and then take (?f dot u). If u = < a, b > then (?f dot u) = a*(df/dx) + b*(df/dy). 这个视频告诉了我们, CNN 是怎么通过 gradient decent 进行学习的,即 gradient decent 是怎么改变每个 weight 的,这个对应的模型是全部都是全连接层。
?
C
\nabla{C}
?C 就是 cost function 在点
(
1
,
1
)
(1,1)
(1,1)处的 gradient, 它告诉我了我们哪个方向是cost function的output增加最快的方向,所以我们只要减去这个就行了。这样就能同时该改变
x
,
y
x,y
x,y的值,而且改变的程度不同,即x可能是增大,y减小。也可能是x减小,y增大。 看完上面的视频,下面这个更好,从细节上将CNN是怎么学习的! What is backpropagation really doing? |
|
|
上一篇文章 下一篇文章 查看所有文章 |
|
开发:
C++知识库
Java知识库
JavaScript
Python
PHP知识库
人工智能
区块链
大数据
移动开发
嵌入式
开发工具
数据结构与算法
开发测试
游戏开发
网络协议
系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程 数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁 |
360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 | -2025/1/8 4:34:18- |
|
网站联系: qq:121756557 email:121756557@qq.com IT数码 |