开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 人工智能 -> DCGAN代码demo注释解读（基于PyTorch） -> 正文阅读

[人工智能]DCGAN代码demo注释解读（基于PyTorch）

源代码：https://pytorch.org/tutorials/beginner/dcgan_faces_tutorial.html

1? celeba数据集的下载以及数据及读取

1.1? 数据集下载

????????celeba是一个大规模人脸数据集，拥有超过200K 的名人图像，每个图像有40 个属性注释。该数据集中的图像涵盖了大的姿势变化和背景杂波。CelebA 多样性大、数量多、注释丰富，包括：

10,177个身份，
202,599张人脸图像，以及
5 个地标位置，每张图像40 个二元属性注释。

????????该数据集可用作以下计算机视觉任务的训练和测试集：人脸属性识别、人脸检测、地标（或面部部分）定位以及人脸编辑和合成。

官网下载：http://mmlab.ie.cuhk.edu.hk/projects/CelebA.html

百度云网盘：https://pan.baidu.com/s/1wULCjsqPh1bowOYMBZ0IEw? 提取码：1ndf

1.2? 数据集存放

????????下载img_align_celeba.zip数据集解压后可以看到路径为.\img_align_celeba\xxx.jpg，这时要注意torchvision.datasets.ImageFolder这个数据集的子类，如果用这个默认类读取图片文件，需要在该文件下再创建文件夹作为类别标签，因为它的格式是如下所示，所以在img_align_celeba文件夹外再套一个文件夹，把ImageFolder的path参数名改为外面那个文件夹即可。即.\data\img_align_celeba\xxx.jpg，由于我的计算机无法进行全数据集的训练，故我只选取前20000个数据并存放于.\dataset\data\img_align_celeba\xxx.jpg，即后文的dataroot = "dataset/data"

img_file:
	label1:
		1.jpg
		2.jpg
		3.jpg
	label2:
		1...

2??导入相关功能包

from __future__ import print_function
#%matplotlib inline
import argparse
import os
import random
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from IPython.display import HTML

# 设置随机种子以获得重复性
manualSeed = 999
#manualSeed = random.randint(1, 10000) # use if you want new results
print("Random Seed: ", manualSeed)
random.seed(manualSeed)
torch.manual_seed(manualSeed)

输出如下：

Random Seed:  999
<torch._C.Generator at 0x2389116cf90>

3??参数设置以及图片集路径

????????注意图片存放的格式

# 图片数据集路径 ./dataset/data/celebA/xxx.jpg
dataroot = "dataset/data"

# 用于用 DataLoader 加载数据的工作线程数=2（感觉这里是pycharm报错的位置）
workers = 2

# 训练中使用的批次大小。这里的DCGAN使用的批次大小为128
batch_size = 128

# 用于训练的图像的空间大小。此实现默认为64x64。 如果需要另一个尺寸，则必须改变D和G的结构，因为需要更改输出输出。
image_size = 64

# 输入图像的颜色通道数. 彩色图像是3通道的。
nc = 3

# 潜在向量(latent vector)的长度(随机向量的维度)
nz = 100

# Size of feature maps in generator
# 生成器特征映射的大小
ngf = 64

# Size of feature maps in discriminator
# 判别器特征映射的大小
ndf = 64

# Number of training epochs
# 要运行的训练回合(epoch)数。长期的训练可能会带来更好的效果，但也需要更长的时间。
num_epochs = 5

# Learning rate for optimizers 优化器学习率
lr = 0.0002

# Beta1 hyperparam for Adam optimizers Adam 优化器的beta1超参数
beta1 = 0.5

# Number of GPUs available. Use 0 for CPU mode. 可用的 GPUs 数量。
ngpu = 1

4??对数据集的图片进行预处理以及数据划分

???由于电脑算力有限，我仅选取前20000个图片进行训练。

三个重要概念：

Epoch：所有训练样本都已输入到模型中，称为一个Epoch
Iteration：一批样本输入到模型中，称为一个Iteration
Batchsize：批大小，决定一个Epoch有多少个Iteration

pytorch中加载数据的顺序是：

创建一个dataset对象（主要是在进行数据预处理）
创建一个dataloader对象（进行数据划分）
循环dataloader对象，将data,label拿到模型中去训练

# We can use an image folder dataset the way we have it setup.
# Create the dataset    dataset感觉是图片预处理操作并进行格式转换
dataset = dset.ImageFolder(root=dataroot,
                               transform=transforms.Compose([
                               transforms.Resize(image_size), # 全部reshape为64*64
                               transforms.CenterCrop(image_size),
                               transforms.ToTensor(), # 转为tensor形式
                               transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), # 归一化
                           ]))
# Create the dataloader   dataloader是对dataset这个向量进行批次划分
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size,
                                         shuffle=True, num_workers=workers) # num_workers=workers

# Decide which device we want to run on  如果有GPU则用GPU跑
device = torch.device("cuda:0" if (torch.cuda.is_available() and ngpu > 0) else "cpu")

# Plot some training images  画出部分训练集的图片
real_batch = next(iter(dataloader))  # next(iter(dataloader))返回一个batch的数据(128)
# print(real_batch.shape())
plt.figure(figsize=(8,8))
plt.axis("off")
plt.title("Training Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=2, normalize=True).cpu(),(1,2,0)))

可以看到部分训练数据如下所示：

5? 实现（初始化和实例化）

5.1? 权重初始化

????????在DCGAN论文中，作者指出所有模型权重应当从均值为0，标准差为0.02的正态分布中随机初始化。weights_init函数以初始化的模型为输入，重新初始化所有卷积层、反卷积层和批标准化层，以满足这一标准。该函数在初始化后立即应用于模型。

# custom weights initialization called on netG and netD
# 从DCGAN的文献中，作者指出所有模型的权重都应从均值=0，stdev=0.2的正态分布中随机初始化。
# 权值函数以初始化模型作为输入，并重新初始化所有卷积、卷积-转置和批处理归一化层，以满足这一标准。 该函数在初始化后立即应用于模型。
def weights_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        nn.init.normal_(m.weight.data, 0.0, 0.02)
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

5.2? 定义生成器并实例化

# Generator Code
# 生成器 反卷积
class Generator(nn.Module):
    def __init__(self, ngpu):
        super(Generator, self).__init__()
        self.ngpu = ngpu  # Number of GPUs available. Use 0 for CPU mode. 可用的 GPUs 数量。
        self.main = nn.Sequential(
            # input is Z, going into a convolution   100维 
            nn.ConvTranspose2d( nz, ngf * 8, 4, 1, 0, bias=False), #（1-1）*1+4-2*0=4  
            nn.BatchNorm2d(ngf * 8),
            nn.ReLU(True),
            # state size. (ngf*8) x 4 x 4   4*4*512
            '''
            class torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0, 
                                                       output_padding=0, groups=1, bias=True, dilation=1)
            '''
            
            
            nn.ConvTranspose2d(ngf * 8, ngf * 4, 4, 2, 1, bias=False), # (4-1)*2-2*1+4=8
            nn.BatchNorm2d(ngf * 4),
            nn.ReLU(True),
            # state size. (ngf*4) x 8 x 8   8*8*256
            
            nn.ConvTranspose2d( ngf * 4, ngf * 2, 4, 2, 1, bias=False), # (8-1)*2-2*1+4=16
            nn.BatchNorm2d(ngf * 2),
            nn.ReLU(True),
            # state size. (ngf*2) x 16 x 16  16*16*128
            
            nn.ConvTranspose2d( ngf * 2, ngf, 4, 2, 1, bias=False), # (16-1)*2-2*1+4=32
            nn.BatchNorm2d(ngf),
            nn.ReLU(True),
            # state size. (ngf) x 32 x 32   32*32*64
            
            
            nn.ConvTranspose2d( ngf, nc, 4, 2, 1, bias=False), # (32-1)*2-2*1+4=64
            nn.Tanh()
            # state size. (nc) x 64 x 64   64*64*3
        )

    def forward(self, input):
        return self.main(input)

?实例化生成器

# 可以实例化生成器并应用 weights_init 函数
# Create the generator
netG = Generator(ngpu).to(device)

# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
    netG = nn.DataParallel(netG, list(range(ngpu)))

# Apply the weights_init function to randomly initialize all weights
#  to mean=0, stdev=0.2.
netG.apply(weights_init)

# Print the model
print(netG)

可以得到我们定义的生成器网络结构如下：

Generator(
  (main): Sequential(
    (0): ConvTranspose2d(100, 512, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (4): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (7): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (8): ReLU(inplace=True)
    (9): ConvTranspose2d(128, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (10): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (11): ReLU(inplace=True)
    (12): ConvTranspose2d(64, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (13): Tanh()
  )
)

5.3? 定义判别器并实例化

# batch norm 和leaky relu函数促进了健康的梯度流
class Discriminator(nn.Module):
    def __init__(self, ngpu):
        super(Discriminator, self).__init__()
        self.ngpu = ngpu
        self.main = nn.Sequential(
            # input is (nc) x 64 x 64     128,3*64*64
            nn.Conv2d(nc, ndf, 4, 2, 1, bias=False), # 64-4+2/2+1=32  
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf) x 32 x 32  64*32*32
            
            nn.Conv2d(ndf, ndf * 2, 4, 2, 1, bias=False), # 32-4+2/2+1=16
            nn.BatchNorm2d(ndf * 2),
            nn.LeakyReLU(0.2, inplace=True), 
            # state size. (ndf*2) x 16 x 16  128*16*16
            
            nn.Conv2d(ndf * 2, ndf * 4, 4, 2, 1, bias=False), # 16-4+2/2+1=8
            nn.BatchNorm2d(ndf * 4), # 256
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*4) x 8 x 8   256*8*8
            
            nn.Conv2d(ndf * 4, ndf * 8, 4, 2, 1, bias=False), # 8-4+2/2+1=4
            nn.BatchNorm2d(ndf * 8),
            nn.LeakyReLU(0.2, inplace=True),
            # state size. (ndf*8) x 4 x 4  512*4*4
            
            nn.Conv2d(ndf * 8, 1, 4, 1, 0, bias=False), # 4-4/2+1=1
            nn.Sigmoid() # 128,1*1024
        )

    def forward(self, input):
        return self.main(input)

实例化判别器：

# 创建判别器，应用 weights_init 函数
# Create the Discriminator
netD = Discriminator(ngpu).to(device)

# Handle multi-gpu if desired
if (device.type == 'cuda') and (ngpu > 1):
    netD = nn.DataParallel(netD, list(range(ngpu)))

# Apply the weights_init function to randomly initialize all weights
#  to mean=0, stdev=0.2.
netD.apply(weights_init)

# Print the model
print(netD)

可以得到我们定义的判别器网络结构如下：

Discriminator(
  (main): Sequential(
    (0): Conv2d(3, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): LeakyReLU(negative_slope=0.2, inplace=True)
    (2): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (3): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (4): LeakyReLU(negative_slope=0.2, inplace=True)
    (5): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (6): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (7): LeakyReLU(negative_slope=0.2, inplace=True)
    (8): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (9): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (10): LeakyReLU(negative_slope=0.2, inplace=True)
    (11): Conv2d(512, 1, kernel_size=(4, 4), stride=(1, 1), bias=False)
    (12): Sigmoid()
  )
)

5.4? 定义损失函数和优化器

# Initialize BCELoss function
# 使用二值交叉熵损失(Binary Cross Entropy loss (BCELoss)) 函数
criterion = nn.BCELoss()

# 创建一批 latent vectors 用于可视化生成器的进度过程
# Create batch of latent vectors that we will use to visualize
#  the progression of the generator
fixed_noise = torch.randn(64, nz, 1, 1, device=device)

# 为在训练过程中的真假标签建立约定
# Establish convention for real and fake labels during training
real_label = 1.
fake_label = 0.

# 为 G 和 D 设置 Adam optimizers
# Setup Adam optimizers for both G and D
optimizerD = optim.Adam(netD.parameters(), lr=lr, betas=(beta1, 0.999))
optimizerG = optim.Adam(netG.parameters(), lr=lr, betas=(beta1, 0.999))

6? 开始训练

????????由于电脑算力有限，我仅选取前20000个图片进行训练。但是从结果图来看已经有了人脸的样子，虽然还比较模糊和畸形。

# Training Loop
# 训练分为两个主要部分。 第1部分更新判别器，第2部分更新生成器。

# Lists to keep track of progress
img_list = []
G_losses = []
D_losses = []
iters = 0

print("Starting Training Loop...") # 开始训练
# For each epoch
for epoch in range(num_epochs):
    # For each batch in the dataloader
    for i, data in enumerate(dataloader, 0):

        
        ##############################################################
        # (1) Update D network: maximize log(D(x)) + log(1 - D(G(z)))
        # 更新判别器网络 求最大的log(D(x)) + log(1 - D(G(z)))
        # 固定生成器网络，训练判别器网络
        ##############################################################
        
        # 对真实图片进行训练
        netD.zero_grad() # 梯度清零
        
        # Format batch 格式化批处理
        real_cpu = data[0].to(device)
        b_size = real_cpu.size(0)
        label = torch.full((b_size,), real_label, dtype=torch.float, device=device)
        
        # Forward pass real batch through D
        output = netD(real_cpu).view(-1)
        
        # 计算对真实图片的loss
        errD_real = criterion(output, label)
        # 反向传播
        errD_real.backward()
        D_x = output.mean().item()

        # 对假图片进行训练
        # 生成100维的向量
        noise = torch.randn(b_size, nz, 1, 1, device=device)
        
        # 用生成器网络生成假图片
        fake = netG(noise)
        label.fill_(fake_label)
        
        # 用判别器进行判别
        output = netD(fake.detach()).view(-1)
        # 得到判别器网络在假图片的loss
        errD_fake = criterion(output, label)
        
        # 反向传播
        # Calculate the gradients for this batch, accumulated (summed) with previous gradients
        errD_fake.backward()
        D_G_z1 = output.mean().item()
        
        # 计算总error
        # Compute error of D as sum over the fake and the real batches
        errD = errD_real + errD_fake
        # 更新判别器网络
        optimizerD.step()

        ##############################################################
        # (2) Update G network: maximize log(D(G(z)))
        # 固定判别器网络，训练生成器网络
        ##############################################################
        netG.zero_grad() # 梯度清零
        
        label.fill_(real_label)  # fake labels are real for generator cost
        
        # Since we just updated D, perform another forward pass of all-fake batch through D
        output = netD(fake).view(-1)
        # Calculate G's loss based on this output
        errG = criterion(output, label)
        # Calculate gradients for G
        errG.backward()
        D_G_z2 = output.mean().item()
        # Update G
        optimizerG.step()

        # Output training stats
        if i % 50 == 0:
            print('[%d/%d][%d/%d]\tLoss_D: %.4f\tLoss_G: %.4f\tD(x): %.4f\tD(G(z)): %.4f / %.4f'
                  % (epoch, num_epochs, i, len(dataloader),
                     errD.item(), errG.item(), D_x, D_G_z1, D_G_z2))

        # Save Losses for plotting later
        G_losses.append(errG.item())
        D_losses.append(errD.item())

        # Check how the generator is doing by saving G's output on fixed_noise
        if (iters % 500 == 0) or ((epoch == num_epochs-1) and (i == len(dataloader)-1)):
            with torch.no_grad():
                fake = netG(fixed_noise).detach().cpu()
            img_list.append(vutils.make_grid(fake, padding=2, normalize=True))

        iters += 1

训练结果如下所示：?

7? 结果展示

7.1? ?损失随迭代次数的变化趋势图

plt.figure(figsize=(10,5))
plt.title("Generator and Discriminator Loss During Training")
plt.plot(G_losses,label="G")
plt.plot(D_losses,label="D")
plt.xlabel("iterations")
plt.ylabel("Loss")
plt.legend()
plt.show()

?7.2??可视化G的训练过程

#%%capture
fig = plt.figure(figsize=(8,8))
plt.axis("off")
ims = [[plt.imshow(np.transpose(i,(1,2,0)), animated=True)] for i in img_list]
ani = animation.ArtistAnimation(fig, ims, interval=1000, repeat_delay=1000, blit=True)

HTML(ani.to_jshtml())

7.3??真假图对比

# Grab a batch of real images from the dataloader
real_batch = next(iter(dataloader))

# Plot the real images
plt.figure(figsize=(15,15))
plt.subplot(1,2,1)
plt.axis("off")
plt.title("Real Images")
plt.imshow(np.transpose(vutils.make_grid(real_batch[0].to(device)[:64], padding=5, normalize=True).cpu(),(1,2,0)))

# Plot the fake images from the last epoch
plt.subplot(1,2,2)
plt.axis("off")
plt.title("Fake Images")
plt.imshow(np.transpose(img_list[-1],(1,2,0)))
plt.show()