IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> Pytorch中几种调整学习率scheduler机制(策略)的用法即其可视化 -> 正文阅读

[人工智能]Pytorch中几种调整学习率scheduler机制(策略)的用法即其可视化

申明此篇博文是以AlexNet为网络架构(其需要输入的图像大小为227x227x3),CIFAR10为数据集,SGD为梯度下降函数举例。

运行此程序时,文件的结构:

/content/drive/MyDrive/coder/Simple-CV-Pytorch-master
|
|
|
|----AlexNet----train.py(train_adjust_learning_rate.py,train_MultiStepLR.py等等)
|
|
|
|----tensorboard(保存tensorboard的文件夹)
|
|
|
|----checkpoint(保存模型的文件夹)
|
|
|
|----data(数据集所在文件夹)
|
|
|
|----run.ipynb(运行.ipynb文件)

首先,我们设置的学习率在一定时候可能无法使我们当前的损失下降,所以此时需要重新调节学习率,如果是使用Pytorch编程,则这个时候就会用到Pytorch中的scheduler。

scheduler机制(策略)位于torch.optim.lr_scheduler.XX中

2dd239eb5c3f4fd1a98d29621387607a.png

?83fb8f5b834f48308005accd4bd454d2.png

99f3a1f7ac204b6aa2a6ec3f4445f137.png如果不使用任何机制(策略)直接修改学习率?

for param_group in optim.param_groups:
    param_group['lr'] = lr

scheduler机制(策略)常用的大致有七种形式,我们逐一介绍,并给出代码,为更好理解将其可视化:

?

1.自定义衰减学习率:adjust_learning_rate()

(作者写的)函数讲解:分段,每隔几(2)段个epoch,第一个epoch为序号0不计,使学习率变乘以0.1的epoch次方数

def adjust_learning_rate(optim, epoch, size=2, gamma=0.1):
    if (epoch + 1) % size == 0:
        pow = (epoch + 1) // size
        lr = learning_rate * np.power(gamma, pow)
        for param_group in optim.param_groups:
            param_group['lr'] = lr

若想知道训练代码如何理解可看我往期博文:

19.初识Pytorch之完整的模型套路-整理后的代码https://blog.csdn.net/XiaoyYidiaodiao/article/details/122720320?spm=1001.2014.3001.5501注意:此段代码无比简陋,仅为我平时书写代码的雏形,不符合规范,大致能理解尚可!

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np


def adjust_learning_rate(optim, epoch, size=2, gamma=0.1):
    if (epoch + 1) % size == 0:
        pow = (epoch + 1) // size
        lr = learning_rate * np.power(gamma, pow)
        for param_group in optim.param_groups:
            param_group['lr'] = lr


# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
lr = learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)

# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    adjust_learning_rate(optim, i)
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

注意:以上程序直接使用Pycharm上运行。

我们的train.py文件在AlexNet文件夹里,文件夹data、tensorboard、checkpoint与AlexNet文件夹平级,所以使用data、tensorboard、checkpoint在前面加入返回上一级../data、../tensorboard、../checkpoint。

运行结果:

f22165c55e62446e860997cfaa2224ac.png

90ded5095e8e423e8a3134b2a9a44778.png?bb056d66c48448fa929059b04a61f114.png

2376c191db18466f892bdf57321469f1.png?

可视化lr与loss: lr(看橙色透明线条)

abff595c04ca4defaf4ea76fa4c22439.png

?b5bd7bc0d9cb4ca28b887c8ca1bcc820.png

分析:

(1) 从0-1 epoch时,lr为0.001;
(2) 从2-3 epoch时,lr为0.0001;
(3) 从4-5 epoch时,lr为1.0000000000000003e-05; 
(4) 从6-7 epoch时,lr为1.0000000000000002e-06; 
(5) 从8-9 epoch时,lr为1.0000000000000002e-07;
(6) 从10-11 epoch时,lr为1.0000000000000004e-08; 
(7) 从12-13 epoch时,lr为1.0000000000000005e-09;
(8) 从14-15 epoch时,lr为1.0000000000000004e-10;
(9) 从16-17 epoch时,lr为1.0000000000000006e-11;
(10) 从18-19 epoch时,lr为1.0000000000000006e-12。

?

2.分区间,分频率衰减学习率:MultiStepLR()

scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)

其参数:

 def __init__(self, optimizer, milestones, gamma=0.1, last_epoch=-1, verbose=False):
optimizer: 需优化的变量
miestones: 分段区域
gamma: 到达分段点之后,乘以gamma
last_epoch=-1: 已经走了多少个epoch,下一个milestone减去last_epoch就是需要的epoch数, 最好别修改
verbose=False: 是否打印

例如
MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)
lr=1e-3,  len(epoch)=20, milestones=[5, 10, 15], gamma=0.1
epoch <=4, lr=1e-3
5<= epoch <=9, lr=1e-4
10<= epoch <=14, lr=1e-5
15<= epoch <20, lr=1e-6

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.MultiStepLR(optim, milestones=[5, 10, 15], gamma=0.1, verbose=True)

# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr1: {} | lr2: {} |loss: {} | np.mean(loss): {} "
                    .format(i, iter, scheduler.get_lr()[0], scheduler.get_last_lr()[0], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", scheduler.get_lr()[0], i)
    writer.add_scalar("lr_last", scheduler.get_last_lr()[0], i)
    scheduler.step()
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

注意:以上程序的运行.ipynb文件为:

import os
os.chdir("/content/drive/MyDrive/coder/Simple-CV-Pytorch-master")
!python AlexNet/train.py

也就是说,我们的train.py文件虽然在AlexNet文件夹里,但是此文件base_dir为 /content/drive/MyDrive/coder/Simple-CV-Pytorch-master,所以文件夹data、tensorboard、checkpoint直接使用data、tensorboard、checkpoint不在前面加入返回上一级../data、../tensorboard、../checkpoint。

若想知道如何白嫖谷歌的服务器可看我往期博客:

穷学生我本人如何免费使用谷歌GPU教程https://blog.csdn.net/XiaoyYidiaodiao/article/details/122751289?spm=1001.2014.3001.5501

运行结果:

e4001c3124bc4778ae04aaa2dde548a4.png

5e1ee7d8e5cd4dbb807020f7a5037d74.png

5943aef9d51e4b27af7d4092050e1866.png

?21e8115fe43643c0ab1117fbf930ff7d.png

?

可视化lr与loss:lr(看橙色透明线条)

4ec915d414b54acfb9ff115ac4b31293.png

bfd016bd07454504844408488f99272b.png

9c3e8720e6c84d3cb32a15efc30dc4f1.png

分析:

lr: scheduler.get_lr()[0]

(1) 从 0-4 epoch时,lr为1e-3;
(2) 5 epoch时,lr为1e-5;
(3) 从 6-9 epoch时,lr为1e-4; 
(4) 10 epoch时,lr为1e-6; 
(5) 从 11-14 epoch时,lr为1e-5;
(6) 15 epoch时,lr为1e-7; 
(7) 从 16-19 epoch时,lr为1e-6。

?

lr_last: scheduler.get_last_lr()[0]
(1) 从 0-4 epoch时,lr_last为1e-3;
(2) 从 5-9 epoch时,lr_last为1e-4;
(3) 从 10-14 epoch时,lr_last为1e-5;
(4) 从 15-19 epoch时,lr_last为1e-6;

?则证明我们真正的学习率显示为:scheduler.get_last_lr()[0]

?

?3.分步长,衰减学习率:StepLR()

scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=5, gamma=0.2)

其参数:

def __init__(self, optimizer, step_size, gamma=0.1, last_epoch=-1, verbose=False):

?

optimizer: 需优化的变量
step_size: 衰减的步长
gamma: 到达此步长之后,乘以gamma
last_epoch=-1: 最好别修改
verbose=False: 是否打印

例如
StepLR(optim, step_size=5, gamma=0.2)
lr=1e-3, len(epoch)=20, step_size=5, gamma=0.2
epoch <5, lr=1e-3
5<= epoch <10, lr=2e-4
10<= epoch <15, lr=4e-5
15<= epoch <20, lr=8e-6

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.StepLR(optim, step_size=5, gamma=0.2)
# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, scheduler.get_last_lr()[0], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", scheduler.get_last_lr()[0], i)
    scheduler.step()
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

注意:以上程序直接使用Pycharm上运行。(谷歌GPU服务器的时常用完了)

我们的train.py文件在AlexNet文件夹里,文件夹data、tensorboard、checkpoint与AlexNet文件夹平级,所以使用data、tensorboard、checkpoint在前面加入返回上一级../data、../tensorboard、../checkpoint。

运行结果:

9dde8da62c624b58bc85a87bde4e48b4.png

27944565d5e0493abf1b57a23a21c950.png?f7543146e83a4fd0938fc2741e48fcd7.png

?3e58838f097b4b93b063f822d060adfc.png

731a295bf8404d6da4883a0b75ae339f.png?

可视化lr与loss: lr(看橙色透明线条):

7dbd9a0ddb4d423093fe2b1b303a81c8.png

?101250b403d54f888018095fcafb0465.png

分析:

lr_last: scheduler.get_last_lr()[0]
(1) 从 0-4 epoch时,lr_last为1e-3;
(2) 从 5-9 epoch时,lr_last为2e-4;
(3) 从 10-14 epoch时,lr_last为4e-5;
(4) 从 15-19 epoch时,lr_last为8e-6;

4.匿名调整学习率:LambdaLR()?

lambda1 = lambda epoch: (epoch) // 2
scheduler = torch.optim.lr_scheduler.LambdaLR(optim, lr_lambda=lambda1)

其参数:

def __init__(self, optimizer, lr_lambda, last_epoch=-1, verbose=False):

?

optimizer: 需优化的变量
lr_lambda: 函数或者函数列表
last_epoch=-1: 最好别修改
verbose=False: 是否打印

例如
new_lr=lr_lambda(epoch) * initial_lr

lambda1 = lambda epoch: epoch // 2
LambdaLR(optimizer, lr_lambda=lambda1, last_epoch=-1)
当epoch=0时,new_lr = (0 // 2) * 0.001 = 0 * 0.001 = 0
当epoch=1时,new_lr = (1 // 2) * 0.001 = 0 * 0.001 = 0
当epoch=2时,new_lr = (2 // 2) * 0.001 = 1 * 0.001 = 0.001
当epoch=3时,new_lr = (3 // 2) * 0.001 = 1 * 0.001 = 0.001
当epoch=4时,new_lr = (4 // 2) * 0.001 = 2 * 0.001 = 0.002
...

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
lambda1 = lambda epoch: (epoch) // 2
scheduler = torch.optim.lr_scheduler.LambdaLR(optim, lr_lambda=lambda1)
# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    scheduler.step()
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

运行结果:

b51e582505c8413fa7537378177bd171.png

?16c5c3b5d5194b2aaf339ccbb9d698db.png

aff39bc8a6dc4460b818e4ed0187587b.png?c3d77cbfcd424bad97f8f184cb56d9c5.png

4320bd3f555a49ef95bc57394c86bead.png

可视化lr与loss:

172db2426d154a9697024df6697a3fed.png

?a8b70675e76e4f68a80c9c72d9e164e1.png

分析:

new_lr=lr_lambda(epoch) * initial_lr

lambda1 = lambda epoch: epoch // 2
LambdaLR(optimizer, lr_lambda=lambda1, last_epoch=-1)
当epoch=0时,new_lr = (0 // 2) * 0.001 = 0 * 0.001 = 0
当epoch=1时,new_lr = (1 // 2) * 0.001 = 0 * 0.001 = 0
当epoch=2时,new_lr = (2 // 2) * 0.001 = 1 * 0.001 = 0.001
当epoch=3时,new_lr = (3 // 2) * 0.001 = 1 * 0.001 = 0.001
当epoch=4时,new_lr = (4 // 2) * 0.001 = 2 * 0.001 = 0.002
当epoch=5时,new_lr = (5 // 2) * 0.001 = 2 * 0.001 = 0.002
当epoch=6时,new_lr = (6 // 2) * 0.001 = 3 * 0.001 = 0.003
当epoch=7时,new_lr = (7 // 2) * 0.001 = 3 * 0.001 = 0.003
当epoch=8时,new_lr = (8 // 2) * 0.001 = 4 * 0.001 = 0.004
当epoch=9时,new_lr = (9 // 2) * 0.001 = 4 * 0.001 = 0.004
当epoch=10时,new_lr = (10 // 2) * 0.001 = 5 * 0.001 = 0.005
当epoch=11时,new_lr = (11 // 2) * 0.001 = 5 * 0.001 = 0.005
当epoch=12时,new_lr = (12 // 2) * 0.001 = 6 * 0.001 = 0.006
当epoch=13时,new_lr = (13 // 2) * 0.001 = 6 * 0.001 = 0.006
当epoch=14时,new_lr = (14 // 2) * 0.001 = 7 * 0.001 = 0.007
当epoch=15时,new_lr = (15 // 2) * 0.001 = 7 * 0.001 = 0.007
当epoch=16时,new_lr = (16 // 2) * 0.001 = 8 * 0.001 = 0.008
当epoch=17时,new_lr = (17 // 2) * 0.001 = 8 * 0.001 = 0.008
当epoch=18时,new_lr = (18 // 2) * 0.001 = 9 * 0.001 = 0.009
当epoch=19时,new_lr = (19 // 2) * 0.001 = 9 * 0.001 = 0.009

?

5.自适应调整学习率:ReduceLROnPlateau()

该策略能够读取模型的性能指标,当该指标停止改善时,持续关注(patience)几个epochs之后,自动减小学习率。

scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=3, verbose=True)

scheduler.step(np.mean(loss))

?其参数:

 def __init__(self, optimizer, mode='min', factor=0.1, patience=10,
                 threshold=1e-4, threshold_mode='rel', cooldown=0,
                 min_lr=0, eps=1e-8, verbose=False):

?

optimizer: 需优化的变量
mode: min表示指标不再减小(loss)时降低学习率,max表示指标不再增加(accuracy)时降低学习率
factor: 学习率改变的因子new_lr = lr * factor, 默认情况下为0.1
patience: 观察几个epoch之后降低学习率,默认情况下10个epoch降低学习率
threshold: 只关注超过阈值的显著变化,默认情况下为1e-4
threshold_mode: 有rel和abs两种阈值计算模式;
rel规则:max模式下如果超过best(1+threshold)为显著,min模式下如果低于best(1-threshold)为显著;
abs规则:max模式下如果超过best+threshold为显著,min模式下如果低于best-threshold为显著
cooldown: 触发一次条件后,等待一定epoch再进行检测,避免lr下降过速,默认情况下为0
min_lr=0: 学习率的下限,默认情况下为0
eps=1e-8: 新旧学习率之间的差异小于eps,则忽略更新,默认值情况下为1e-8
verbose=False: 是否打印

例如
ReduceLROnPlateau(optim, patience=3, verbose=True)
loss停止改善时,持续关注(patience)3个epochs之后,自动减小学习率

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optim, patience=3, verbose=True)
# 8. Set some parameters to control loop
# epoch
epoch = 20
iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        loss = 0
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        loss = loss_train.item() + loss
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
                            np.mean(loss)))

    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    scheduler.step(np.mean(loss))
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

运行结果:

efa97d6a1d9d41be803f62d0077fd15c.png

9d12f97cf4be49f49265732caadb0188.png

f6c830570d1243e9825f98768f8e2c25.png

可视化lr与loss: lr(看橙色透明线条):

96cdffeb58dc4ceb8bd5894aaa8fc166.png

b3ed614efd314ddba86ba8115ada959a.png

?

?6.指数式调整学习率:ExponentialLR()

scheduler = torch.optim.lr_scheduler.ExponentialLR(optim, gamma=0.2)

?其参数:

def __init__(self, optimizer, gamma, last_epoch=-1, verbose=False):

?

optimizer: 需优化的变量
gamma: 学习速率衰减的乘法因子
last_epoch=-1: 最好别修改
verbose=False: 是否打印

例如
lr=1e-3,ExponentialLR(optim, gamma=0.2)
new_lr = lr * gamma^(epoch)
当epoch=0时,new_lr = 0.001 * 0.2^0 = 0.001
当epoch=1时,new_lr = 0.001 * 0.2^1 = 0.0002
当epoch=2时,new_lr = 0.001 * 0.2^2 = 4e-5
...

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.ExponentialLR(optim, gamma=0.2)
# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    scheduler.step()
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

运行结果:

e3cce468d7e34a5ea90ee35ca4c30fa4.png

da65217e177c4f6789443ffed59effd1.png

a65f6f31c7cc435e9b91789e800153f6.png?0e2431668d744531883b609f1065985c.png

可视化lr与loss: lr(看橙色透明线条):?

5478abb3e4be42fcadf10c099ed5f5d3.png

26ac6a8d86ee4a46b94d4ed4de58ec1f.png

分析:?

new_lr = lr * gamma^(epoch)
当epoch=0时,new_lr = 0.001 * 0.2^0 = 0.001
当epoch=1时,new_lr = 0.001 * 0.2^1 = 0.0002
当epoch=2时,new_lr = 0.001 * 0.2^2 = 4e-5
当epoch=3时,new_lr = 0.001 * 0.2^3 = 8e-6
当epoch=4时,new_lr = 0.001 * 0.2^4 = 1.6e-6
当epoch=5时,new_lr = 0.001 * 0.2^5 = 3.2e-7
当epoch=6时,new_lr = 0.001 * 0.2^6 = 6.4e-8
当epoch=7时,new_lr = 0.001 * 0.2^7 = 1.28e-8
当epoch=8时,new_lr = 0.001 * 0.2^8 = 2.56e-9
当epoch=9时,new_lr = 0.001 * 0.2^9 = 5.12e-10
当epoch=10时,new_lr = 0.001 * 0.2^10 = 1.024e-10
当epoch=11时,new_lr = 0.001 * 0.2^11 = 2.048e-11
当epoch=12时,new_lr = 0.001 * 0.2^12 = 4.096e-12
当epoch=13时,new_lr = 0.001 * 0.2^13 = 8.192e-13
当epoch=14时,new_lr = 0.001 * 0.2^14 = 1.6384e-13
当epoch=15时,new_lr = 0.001 * 0.2^15 = 3.2768e-14
当epoch=16时,new_lr = 0.001 * 0.2^16 = 6.5536e-15
当epoch=17时,new_lr = 0.001 * 0.2^17 = 1.31072e-15
当epoch=18时,new_lr = 0.001 * 0.2^18 = 2.62144e-16
当epoch=19时,new_lr = 0.001 * 0.2^19 = 5.24288e-17

?7. 余弦退火调整学习率:CosineAnnealingLR()

scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max=5)

其参数:

def __init__(self, optimizer, T_max, eta_min=0, last_epoch=-1, verbose=False):

?

余弦退火学习率中LR的变化是周期性的
optimizer: 需优化的变量
T_max(int): 周期的1/2,一次学习率周期的迭代次数,即 T_max 个 epoch 之后重新设置学习率
eta_min(float): 最小学习率,即在一个周期中,学习率最小会下降到 eta_min,默认情况下为0
last_epoch=-1: 上一个epoch数,该变量表示学习率是否需要调整, 最好别修改
verbose=False: 是否打印

例如
CosineAnnealingLR(optim, T_max=5)
new_lr = eta_min + 0.5 * (initial_lr - eta_min) * (1 + cos(epoch / T_max * Π))
eta_min为最小学习率,T_max为cos周期的1/2

代码:

from torch.utils.data import DataLoader
from torchvision.models import AlexNet
from torchvision import transforms
import torchvision
import torch
from torch import nn
from torch.utils.tensorboard import SummaryWriter
import time
import numpy as np

# 1.Create SummaryWriter
writer = SummaryWriter("../tensorboard")

# 2.Ready dataset
train_dataset = torchvision.datasets.CIFAR10(root="../data", train=True, transform=transforms.Compose(
    [transforms.Resize(227), transforms.ToTensor()]), download=True)

print('CUDA available: {}'.format(torch.cuda.is_available()))

# 3.Length
train_dataset_size = len(train_dataset)
print("the train dataset size is {}".format(train_dataset_size))

# 4.DataLoader
train_dataloader = DataLoader(dataset=train_dataset, batch_size=64)

# 5.Create model
model = AlexNet()

if torch.cuda.is_available():
    model = model.cuda()
    model = torch.nn.DataParallel(model).cuda()
else:
    model = torch.nn.DataParallel(model)

# 6.Create loss
cross_entropy_loss = nn.CrossEntropyLoss()

# 7.Optimizer
learning_rate = 1e-3
optim = torch.optim.SGD(model.parameters(), lr=learning_rate)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optim, T_max=5)

# 8. Set some parameters to control loop
# epoch
epoch = 20

iter = 0
t0 = time.time()
for i in range(epoch):
    t1 = time.time()
    print(" -----------------the {} number of training epoch --------------".format(i))
    model.train()
    for data in train_dataloader:
        imgs, targets = data
        if torch.cuda.is_available():
            cross_entropy_loss = cross_entropy_loss.cuda()
            imgs, targets = imgs.cuda(), targets.cuda()
        outputs = model(imgs)
        loss_train = cross_entropy_loss(outputs, targets)
        writer.add_scalar("train_loss", loss_train.item(), iter)
        optim.zero_grad()
        loss_train.backward()
        optim.step()
        iter = iter + 1
        if iter % 100 == 0:
            print(
                "Epoch: {} | Iteration: {} | lr: {} | loss: {} | np.mean(loss): {} "
                    .format(i, iter, optim.param_groups[0]['lr'], loss_train.item(),
                            np.mean(loss_train.item())))

    writer.add_scalar("lr", optim.param_groups[0]['lr'], i)
    scheduler.step()
    t2 = time.time()
    h = (t2 - t1) // 3600
    m = ((t2 - t1) % 3600) // 60
    s = ((t2 - t1) % 3600) % 60
    print("epoch {} is finished, and time is {}h{}m{}s".format(i, int(h), int(m), int(s)))

    if i % 1 == 0:
        print("Save state, iter: {} ".format(i))
        torch.save(model.state_dict(), "../checkpoint/AlexNet_{}.pth".format(i))

torch.save(model.state_dict(), "../checkpoint/AlexNet.pth")
t3 = time.time()
h_t = (t3 - t0) // 3600
m_t = ((t3 - t0) % 3600) // 60
s_t = ((t3 - t0) % 3600) // 60
print("The finished time is {}h{}m{}s".format(int(h_t), int(m_t), int(s_t)))
writer.close()

运行结果:

6486638adee4409cbac0267310a3383c.png

21f77650ffdc4409a009a2486e8ff789.png?adf3a71aedb8414eb1a9eda9a6a8773e.png

f67ba451fe5e409e90ca963db70acbc4.png

可视化lr与loss:

3543ad4fd5474cb588ad4d2faba8b110.png

?0e5545536a8049a08cb7654765844906.png

?

5110fff43ed54cffb45246cd9da66d62.png

收工!

?

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2022-05-13 11:44:22  更:2022-05-13 11:48:05 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2025年1日历 -2025/1/1 23:09:49-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码