一、torch.Tensor中的hook

在使用pytorch时，只有叶节点（即直接指定数值的节点，而不是由其他变量计算得到的节点，比如网络输入）的梯度会保留，其余中间节点梯度在反向传播完成后就会自动释放以节省显存。

比如：

import torch

x=torch.Tensor([1,2]).requires_grad_(True)
y=torch.Tensor([3,4]).requires_grad_(True)
z=((y-x)**2).mean()
# z.retain_grad()
z.backward()

print('x.requires_grad:',x.requires_grad)
print('y.requires_grad:',y.requires_grad)
print('z.requires_grad:',z.requires_grad)

print('x.grad:',x.grad)
print('y.grad:',y.grad)
print('z.grad:',z.grad)

输出：

x.requires_grad: True
y.requires_grad: True
z.requires_grad: True
x.grad: tensor([-2., -2.])
y.grad: tensor([2., 2.])
/home/wangguoyu/test.py:14: UserWarning: The .grad attribute of a Tensor that is not a leaf Tensor is being accessed. Its .grad attribute won't be populated during autograd.backward(). If you indeed want the gradient for a non-leaf Tensor, use .retain_grad() on the non-leaf Tensor. If you access the non-leaf Tensor by mistake, make sure you access the leaf Tensor instead. See github.com/pytorch/pytorch/pull/30531 for more information.
  print('z.grad:',z.grad)
z.grad: None

这里x和y是叶子节点，因此在backward后会保留grad，而z的requires_grad虽然为True，但是由于它不是叶子节点，因此梯度没有保留。如果我们确实需要非叶子节点的梯度信息，那么我们需要在backward前使用retain_grad方法（即将上面的注释去掉），这就可以访问z的梯度信息。但是，使用retain_grad保留的grad会占用显存，如果不想要占用显存，那么我们可以使用hook。对于中间节点的变量a，我们可以使用a.register_hook(hook_fn)对其梯度进行操作（可以进行修改或者保存等操作）。这里hook_fn是以恶自定义的函数，其函数声明为：

hook_fn(grad) -> Tensor or None

其输入变量为a的grad，如果返回Tensor，则该Tensor取代a原有的grad，并向前传播；如果不反悔或者返回None，那么a的grad不变，继续向前传播。

import torch

def hook_fn(grad):
  print('here is the hook_fn')
  print(grad)
  
x=torch.Tensor([1,2]).requires_grad_(True)
y=torch.Tensor([3,4]).requires_grad_(True)
z=((y-x)**2).mean()

z.register_hook(hook_fn)

print('before backward')
z.backward()
print('after backward')

print('x.requires_grad:',x.requires_grad)
print('y.requires_grad:',y.requires_grad)
print('z.requires_grad:',z.requires_grad)

print('x.grad:',x.grad)
print('y.grad:',y.grad)
print('z.grad:',z.grad)

输出：

before backward
here is the hook_fn
tensor(1.)
after backward
x.requires_grad: True
y.requires_grad: True
z.requires_grad: True
x.grad: tensor([-2., -2.])
y.grad: tensor([2., 2.])
z.grad: None

可以看到，在z绑定了hook_fn后，backward时，打印了z的grad，因为我们返回None，最后z的grad不变，接下来我们改变z的grad：

import torch

def hook_fn(grad):
  grad*=2
  print('here is the hook_fn')
  print(grad)
  return grad
  
x=torch.Tensor([1,2]).requires_grad_(True)
y=torch.Tensor([3,4]).requires_grad_(True)
z=((y-x)**2).mean()

# z.register_hook(lambda x: 2*x)
z.register_hook(hook_fn)

print('before backward')
z.backward()
print('after backward')

print('x.requires_grad:',x.requires_grad)
print('y.requires_grad:',y.requires_grad)
print('z.requires_grad:',z.requires_grad)

print('x.grad:',x.grad)
print('y.grad:',y.grad)
print('z.grad:',z.grad)

输出：

before backward
here is the hook_fn
tensor(2.)
after backward
x.requires_grad: True
y.requires_grad: True
z.requires_grad: True
x.grad: tensor([-4., -4.])
y.grad: tensor([4., 4.])
z.grad: None