天天看点

Kaggle猫狗大战——基于Pytorch的CNN网络分类:主程序、训练网络、准确率曲线绘制(2)Kaggle猫狗大战——基于Pytorch的CNN网络分类:主程序、训练网络、准确率曲线绘制(2)

Kaggle猫狗大战——基于Pytorch的CNN网络分类:主程序、训练网络、准确率曲线绘制(2)

主程序是训练神经网络的枝干,其他的都是挂件,运行的时候一般运行main.py就可以了。二话不说,先上主程序部分代码:(整体代码在最后)

主程序

if __name__ == '__main__':
    # os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

    ## about model
    num_classes = 2

    ## about data
    data_dir = "../data/"
    input_size = 224
    batch_size = 36

    ## about training
    num_epochs = 30
    lr = 0.001
    log_interval = 200

    ## model initialization from models.py
    model = models.model_A(num_classes=num_classes)
    print(model)

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    # if torch.cuda.device_count() > 1:
    #     print("Let's use", torch.cuda.device_count(), "GPUs!")
    #     model = nn.DataParallel(model)
    print('\ncurrent device is', device)
    model = model.to(device)

    ## data preparation
    train_loader, valid_loader = data.load_data(data_dir=data_dir, input_size=input_size, batch_size=batch_size)

    ## optimizer
    optimizer = optim.Adam(model.parameters(), lr=lr)

    ## loss function
    criterion = nn.CrossEntropyLoss()

    ## 训练模型
    train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=num_epochs)
           

下面是对主程序的详细解析:

(1)设置环境:看你有几台GPU,给每台GPU编号(我只有一台,就没有用到);

(2)数据参数:数据路径、数据标准尺寸(因为图片尺寸不一,需要先统一)、batch_size(上一节讲过,就是每个数据包的大小);

(3)训练参数:

num_epochs,最大训练次数,调试网络的时候建议不要设的太大,有个10就能看出趋势了,细调再放大;

lr:学习率,如果曲线波动很大(但是收敛了),调低学习率;学习率太低会影响收敛速度;这里用的是固定学习率

log_interval:每隔多少个batch打印一次训练进度(一般不需要调整,设的太小会降低处理速度,也别太大)。

(4)网络载入:num_classes:分类数目(2分类问题、多分类问题)

(5)设备设定:GPU or CPU

(6)数据准备:调用上一节在data.py里定义的data.load_data函数,生成train_loader、valid_loader两个数据集;

(7)定义优化器:构建一个优化器optimizer,可进行迭代优化的包含了所有参数的列表。 然后,指定程序优化特定的选项,例如学习速率,权重衰减等。

(8)定义损失函数:这里用的是交叉熵函数。

(9)调用模型训练函数。

模型训练函数

def train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=20):
    def train():
        model.train(True)
        train_correct = 0                             # 训练正确的次数
        train_total = 0                               # 训练的总次数
        for i, data in enumerate(train_loader):       # 一个batch一个batch地训练
            inputs, labels = data                     # inputs:图片 labels:标签
            inputs = Variable(inputs.to(device))
            labels = Variable(labels.to(device))
            optimizer.zero_grad()                     # 把梯度置零,即把loss关于weight的导数变为0
            outputs = model(inputs)                   # 网络前向传播
            loss = criterion(outputs, labels)         # 定义损失函数和优化方式
            loss.backward()
            optimizer.step()                          # 反向传播求梯度
        ## Write your code here to return the accuracy rate.
            train_total += labels.size(0)             # 记录这次训练中的标签个数,累加到总训练次数上
            _, train_predictions = torch.max(outputs.data, 1)     # 获取预测值(表示一组batch预测结果的数列)
            train_correct += (train_predictions == labels.data).sum()   # 获取训练正确地次数
        train_acc_0 = 100 * train_correct.double() / train_total
        train_acc = train_acc_0.cpu().numpy()                  # train_acc_0是一个tensor数组,需要先转为cpu.tensor,再转为numpy
        print('train_acc =', ('%.4f' % train_acc))             # 保留四位小数
        return train_acc

    def valid():
        model.train(False)
        valid_correct = 0
        valid_total = 0
        for inputs, labels in valid_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
        ## Write your code here to return the accuracy rate.
            valid_total += labels.size(0)
            _, valid_predictions = torch.max(outputs.data, 1)
            valid_correct += (valid_predictions == labels.data).sum()
        valid_acc_0 = 100 * valid_correct.double() / valid_total
        valid_acc = valid_acc_0.cpu().numpy()            # valid_acc_0是一个Tensor类型的数组,转换为numpy
        print('valid_acc =',('%.4f' % valid_acc))
        return valid_acc

    train_acc = np.zeros(num_epochs)
    valid_acc = np.zeros(num_epochs)

    best_valid_acc = 0.0
    for epoch in range(num_epochs):                # 迭代次数
        print('The',epoch+1,'time of training and testing')
        train_acc[epoch] = train()                
        valid_acc[epoch] = valid()
        if valid_acc[epoch] > best_valid_acc:       # 保留准确率最高的那次的模型
            best_model = model
            torch.save(best_model, 'best_model.pt')
    plot_curves(num_epochs,train_acc,valid_acc)      # 绘制准确率曲线
           

这里定义了两个函数,分别是训练集train函数与测试集train函数。每次迭代,都先进行训练集训练,再进行测试集训练,若后一次迭代的结果比上次更优,则将新的模型保存为最优模型。所有次数迭代完成后,输出准确率曲线。(也有实时输出准确率曲线的方法,但是好像要调用某些软件,我就偷个懒)

准确率曲线绘制

def plot_curves(epoch,train_acc,valid_acc):
    ## your code here
    epochs = range(1,epoch+1)
    plt.plot(epochs,train_acc,ls='-',lw=2,label='train accuracy',color='b')       # 绘制训练准确率曲线,蓝色
    plt.plot(epochs,valid_acc,ls='-',lw=2,label='valid accuracy',color='r')       # 绘制测试准确率曲线,红色
    plt.legend()                                                                  # 显示图例
    train_max_indx = np.argmax(train_acc)                                         # 找出最优点
    valid_max_indx = np.argmax(valid_acc)
    plt.plot(train_max_indx+1,train_acc[train_max_indx],'ks')                     # 显示最优点
    plt.plot(valid_max_indx+1,valid_acc[valid_max_indx],'gs')
    show_max_1 = '['+ str('Best Training accuracy') +' ' + ('%.4f' % train_acc[train_max_indx]) + ']'      # 显示最优准确率
    show_max_2 = '['+ str('Best Training accuracy') +' ' + ('%.4f' % valid_acc[valid_max_indx]) + ']'
    plt.annotate(show_max_1, xy=(train_max_indx+1,train_acc[train_max_indx]), xytext=(25,75))              # 最优准确率显示位置
    plt.annotate(show_max_2, xy=(train_max_indx+1,train_acc[train_max_indx]), xytext=(25,70))
    plt.title('Training and Validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.savefig('./acc.png')
    plt.show()
    print('The acc curves is saved.\n')
    pass
           

main.py

import torch
import torch.nn as nn
import torch.optim as optim
import data
import models
import os
import time
import matplotlib.pyplot as plt
import numpy as np
import torchvision
from torchvision import transforms
from PIL import Image
from torch.autograd import Variable


# "data" and "models" imported is .py file made by ourselves

## Note that: here we provide a basic solution for training and validation.
## You can directly change it if you find something wrong or not good enough.

def train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=20):
    def train():
        model.train(True)
        train_correct = 0                             # 训练正确的次数
        train_total = 0                               # 训练的总次数
        for i, data in enumerate(train_loader):       # 一个batch一个batch地训练
            inputs, labels = data                     # inputs:图片 labels:标签
            inputs = Variable(inputs.to(device))
            labels = Variable(labels.to(device))
            optimizer.zero_grad()                     # 把梯度置零,即把loss关于weight的导数变为0
            outputs = model(inputs)                   # 网络前向传播
            loss = criterion(outputs, labels)         # 定义损失函数和优化方式
            loss.backward()
            optimizer.step()                          # 反向传播求梯度
        ## Write your code here to return the accuracy rate.
            train_total += labels.size(0)             # 记录这次训练中的标签个数,累加到总训练次数上
            _, train_predictions = torch.max(outputs.data, 1)     # 获取预测值(表示一组batch预测结果的数列)
            train_correct += (train_predictions == labels.data).sum()   # 获取训练正确地次数
        train_acc_0 = 100 * train_correct.double() / train_total
        train_acc = train_acc_0.cpu().numpy()                  # train_acc_0是一个tensor数组,需要先转为cpu.tensor,再转为numpy
        print('train_acc =', ('%.4f' % train_acc))             # 保留四位小数
        return train_acc

    def valid():
        model.train(False)
        valid_correct = 0
        valid_total = 0
        for inputs, labels in valid_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
        ## Write your code here to return the accuracy rate.
            valid_total += labels.size(0)
            _, valid_predictions = torch.max(outputs.data, 1)
            valid_correct += (valid_predictions == labels.data).sum()
        valid_acc_0 = 100 * valid_correct.double() / valid_total
        valid_acc = valid_acc_0.cpu().numpy()            # valid_acc_0是一个Tensor类型的数组,转换为numpy
        print('valid_acc =',('%.4f' % valid_acc))
        return valid_acc

    train_acc = np.zeros(num_epochs)
    valid_acc = np.zeros(num_epochs)

    best_valid_acc = 0.0
    for epoch in range(num_epochs):                # 迭代次数
        print('The',epoch+1,'time of training and testing')
        train_acc[epoch] = train()
        valid_acc[epoch] = valid()
        if valid_acc[epoch] > best_valid_acc:       # 保留准确率最高的那次的模型
            best_model = model
            torch.save(best_model, 'best_model.pt')
    plot_curves(num_epochs,train_acc,valid_acc)      # 绘制准确率曲线


## plot training and validation curves and save as .png
## please complete this funtion.

def plot_curves(epoch,train_acc,valid_acc):
    ## your code here
    epochs = range(1,epoch+1)
    plt.plot(epochs,train_acc,ls='-',lw=2,label='train accuracy',color='b')       # 绘制训练准确率曲线,蓝色
    plt.plot(epochs,valid_acc,ls='-',lw=2,label='valid accuracy',color='r')       # 绘制测试准确率曲线,红色
    plt.legend()                                                                  # 显示图例
    train_max_indx = np.argmax(train_acc)                                         # 找出最优点
    valid_max_indx = np.argmax(valid_acc)
    plt.plot(train_max_indx+1,train_acc[train_max_indx],'ks')                     # 显示最优点
    plt.plot(valid_max_indx+1,valid_acc[valid_max_indx],'gs')
    show_max_1 = '['+ str('Best Training accuracy') +' ' + ('%.4f' % train_acc[train_max_indx]) + ']'      # 显示最优准确率
    show_max_2 = '['+ str('Best Training accuracy') +' ' + ('%.4f' % valid_acc[valid_max_indx]) + ']'
    plt.annotate(show_max_1, xy=(train_max_indx+1,train_acc[train_max_indx]), xytext=(25,75))              # 最优准确率显示位置
    plt.annotate(show_max_2, xy=(train_max_indx+1,train_acc[train_max_indx]), xytext=(25,70))
    plt.title('Training and Validation Accuracy')
    plt.xlabel('Epochs')
    plt.ylabel('Accuracy')
    plt.savefig('./acc.png')
    plt.show()
    print('The acc curves is saved.\n')
    pass


if __name__ == '__main__':
    # os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3,4,5,6,7"

    ## about model
    num_classes = 2

    ## about data
    data_dir = "../data/"
    input_size = 224
    batch_size = 36

    ## about training
    num_epochs = 30
    lr = 0.0002
    log_interval = 200

    ## model initialization from models.py
    model = models.model_A(num_classes=num_classes)
    print(model)

    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    # if torch.cuda.device_count() > 1:
    #     print("Let's use", torch.cuda.device_count(), "GPUs!")
    #     model = nn.DataParallel(model)
    print('\ncurrent device is', device)
    model = model.to(device)

    ## data preparation
    train_loader, valid_loader = data.load_data(data_dir=data_dir, input_size=input_size, batch_size=batch_size)

    ## optimizer
    optimizer = optim.Adam(model.parameters(), lr=lr)

    ## loss function
    criterion = nn.CrossEntropyLoss()

    ## 训练模型
    train_model(model, train_loader, valid_loader, criterion, optimizer, num_epochs=num_epochs)