[人工智能] pytorch 深度学习补充

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> pytorch 深度学习补充 -> 正文阅读

[人工智能]pytorch 深度学习补充

  a = torch.ones((4,8)) * 6
  b = torch.ones(8) * 4
  b[2]=2
  a/b # / 必须维度要匹配，除非b是单一元素（会自动扩展）。除以b是会按照b的每个元素分别来除（按位除）

  a@b.T
  # 等价于
  a.matmul(b.T) # 不会改变 a
  a.mean(0)

  x = torch.tensor([2.], requires_grad=True)
  y = x * x * 4 # 3x^2
  y.backward()
  pp.pprint(x.grad) # d(y)/d(x) = d(3x^2)/d(x) = 6x = 12
  # (如果还有z = x*x 之类的。z.backward() 的话，x.grad 是二者的和)
  # This is also the reason why we need to run `zero_grad()` in every training iteration
  # (more on this later).
  # zero_grad() 是 optimizer进行的。

We can use nn.Linear(H_in, H_out) to create a a linear layer. This will take a matrix of (N, *, H_in) dimensions and output a matrix of (N, *, H_out). The * denotes that there could be arbitrary number of dimensions in between. The linear layer performs the operation Ax+b, where A and b are initialized randomly.
:LOGBOOK:
CLOCK: [2022-04-17 Sun 13:09:12]–[2022-04-17 Sun 13:09:13] => 00:00:01
:END:
原来！线性层只会对每个词进行处理，所以第一维第二维都是不会变滴
list(conv1.parameters()) wow or named_parameters()
in channel 控制卷积核第一维的大小，kernel size 控制 2、3（宽高）
Embedding的数学本质，就是以one hot为输入的单层全连接。也就是说，世界上本没什么Embedding，有的只是one hot。 #DL
Pytorch autograd,backward详解
tensor 默认 require_grad False
retain_graph: 通常在调用一次backward后，pytorch会自动把计算图销毁，所以要想对某个变量重复调用backward，则需要将该参数设置为True. 所以backward之后，非叶子节点的grad为None，仅保留叶子节点。可以：y.retain_grad()
RuntimeError: grad can be implicitly created only for scalar outputs。backwards的变量只能是一个标量。非也，可以传入grad_tensors：
- ```
  x = torch.ones(2,requires_grad=True)
  z = x + 2
  z.backward(torch.ones_like(z)) # grad_tensors需要与输入tensor大小一致
  print(x.grad)
  
  >>> tensor([1., 1.])
```
- 作者你好，如果你知道向量的链式法则的话，应该很好理解的，假设是在z点backward，输入grad_tensors应该是目标函数(scalar)f对z的梯度，那么delta(f)/delta(x) = (delta(z)/delta(x))*delta(f)/delta(z)，后边的第二项就是传入的grad_tensors。“The graph is differentiated using the chain rule. If any of tensors are non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifying grad_tensors. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors (None is an acceptable value for all tensors that don’t need gradient tensors).
- 计算图计算得到的是z，从z怎么得到f本来就是人为指定的。换句话说delta_f/delta_z 就是人为指定的，一般情况下就是所有元素直接相加，那它就是单位向量。
1 in {1:2, 2:3} is True, no need to .keys()
模型从embedding开始开始接触tensor，所以embedding是叶子节点，反向传播的时候会求导到这里。所以一路经历的参数都会改变。当然，模型的其他部分参数也是叶子节点，因此，都会求偏导到那里。
torch.nn.Conv1d 之类的，require_grad 默认是True

  >>> from functools import partial
  # 冻结参数
  >>> inttwo = partial(int,base=2)
  >>> inttwo("10101")
  21

We can create these windows by using for loops, but there is a faster PyTorch alternative, which is the unfold(dimension, size, step) method.

  if self.freeze_embeddings:
        self.embed_layer.weight.requires_grad = False

  # Function that will be called in every epoch
  def train_epoch(loss_function, optimizer, model, loader):
    
    # Keep track of the total loss for the batch
    total_loss = 0
    for batch_inputs, batch_labels, batch_lengths in loader:
      # Clear the gradients
      optimizer.zero_grad()
      # Run a forward pass
      outputs = model.forward(batch_inputs)
      # Compute the batch loss
      loss = loss_function(outputs, batch_labels, batch_lengths)
      # Calculate the gradients
      loss.backward()
      # Update the parameteres
      optimizer.step()
      total_loss += loss.item()
  
    return total_loss
  
  
  # Function containing our main training loop
  def train(loss_function, optimizer, model, loader, num_epochs=10000):
  
    # Iterate through each epoch and call our train_epoch function
    for epoch in range(num_epochs):
      epoch_loss = train_epoch(loss_function, optimizer, model, loader)
      if epoch % 100 == 0: print(epoch_loss)

  # Create test sentences
  test_corpus = ["She comes from Paris"]
  test_sentences = [s.lower().split() for s in test_corpus]
  test_labels = [[0, 0, 0, 1]]
  
  # Create a test loader
  test_data = list(zip(test_sentences, test_labels))
  batch_size = 1
  shuffle = False
  window_size = 2
  collate_fn = partial(custom_collate_fn, window_size=2, word_to_ix=word_to_ix)
  test_loader = torch.utils.data.DataLoader(test_data, 
                                             batch_size=1, 
                                             shuffle=False, 
                                             collate_fn=collate_fn)