1. requires_grad
如果需要为张量计算所需的梯度,那么我们就需要对张量设置requires_grad=True ;张量创建的时候默认requires_grad=False
- 如果不设置
requires_grad=True ,后续计算梯度的时候就会报错 (1)requires_grad=False&默认设置
import torch
from torch import nn
x = torch.ones(5)
y = 2*torch.dot(x,x)
y.backward()
print(f"x.grad={x.grad}")
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
(2)requires_grad=False
import torch
from torch import nn
x_false = torch.ones(5, requires_grad=False)
y_false = 2 * torch.dot(x_false, x_false)
y_false.backward()
print(f"x_false.grad={x_false.grad}")
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
(3)requires_grad=True
import torch
from torch import nn
x_true = torch.ones(5, requires_grad=True)
y_true = 2 * torch.dot(x_true,x_true)
y_true.backward()
print(f"x_true.grad={x_true.grad}")
print(f"4*x_true={4*x_true}")
print(f"x_true.grad==4*x_true={x_true.grad==4*x_true}")
x_true.grad=tensor([4., 4., 4., 4., 4.])
4*x_true=tensor([4., 4., 4., 4., 4.], grad_fn=<MulBackward0>)
x_true.grad==4*x_true=tensor([True, True, True, True, True])
2. grad_fn,grad
grad: 表示当执行完y.backward()后,可以通过x.grad计算x变量的梯度 grad_fn 是用来记录变量是怎么来的,记录图节点的方式,为了后续反向传播做准备
z
=
2
?
x
2
+
6
z=2*x^2+6
z=2?x2+6 由上述公式可得:
- x:最底层的生物,牛马如我;故x.grad_fn=None
-
y
=
2
?
x
2
y=2*x^2
y=2?x2:来源于乘法,故y.grad_fn = MulBackward
-
z
=
y
+
6
z=y+6
z=y+6:来源于加法,故z.grad_fn = AddBackward
x_true = torch.ones(5, requires_grad=True)
y_true = 2 * torch.dot(x_true, x_true)
z_true = y_true + 6
z_true.backward()
print(f"x_true.grad={x_true.grad}")
print(f"x_true.grad_fn={x_true.grad_fn}")
print(f"y_true.grad_fn={y_true.grad_fn}")
print(f"z_true.grad_fn={z_true.grad_fn}")
结果:
x_true.grad=tensor([4., 4., 4., 4., 4.])
x_true.grad_fn=None
y_true.grad_fn=<MulBackward0 object at 0x00000180E0DF3550>
z_true.grad_fn=<AddBackward0 object at 0x00000180E0DF3550>
3. with torch.no_grad()
torch.no_grad 禁用梯度计算的上下文管理器;当您确定不会调用Tensor.backward()时,禁用梯度计算对推断很有用。它将减少原本需要requires_grad=True的计算的内存消耗 有两种方式设置:
- with torch.no_grad()
- @torch.no_grad()
x = torch.tensor([1.0], requires_grad=True)
with torch.no_grad():
y = x * 2
print(f"y.requires_grad={y.requires_grad}")
@torch.no_grad()
def doubler(x):
return x * 2
z = doubler(x)
print(f"z.requires_grad={z.requires_grad}")
4. tensor.detach()
返回一个新的张量,从当前图分离出来。
import torch
from torch import nn
x = torch.ones(5)
y = torch.zeros(3)
w = torch.randn((5, 3), requires_grad=True)
b = torch.randn(3, requires_grad=True)
z = torch.matmul(x, w) + b
loss = torch.nn.functional.binary_cross_entropy_with_logits(z,y)
print(f"z.grad_fn={z.grad_fn}")
print(f"loss.grad_fn={loss.grad_fn}")
loss.backward()
print(f"w.grad={w.grad}")
print(f"b.grad={b.grad}")
print(f"z.requires_grad={z.requires_grad}")
z_dect = z.detach()
print(f"after_detach:z.requires_grad={z_dect.requires_grad}")
5. 小结
有如下原因需要禁用梯度: (1)将神经网络中的一些参数标记为冻结参数。这是对预先训练的网络进行微调的一个非常常见的场景; (2)在只进行正向传递的情况下加快计算速度,因为在不跟踪梯度的张量上的计算将更加有效
|