天天看點

多标簽分類算法詳解及實踐(Keras)

目錄

多标簽分類

如何使用多标簽分類

多标簽使用執行個體

訓練

引入庫,設定超參數

設定全局參數

生成多分類的标簽

切分訓練集和驗證集

資料增強

設定callback函數

設定模型

訓練模型,并儲存最終的模型

列印出訓練的log

完整代碼:

測試

multi-label classification problem:多标簽分類(或者叫多标記分類),是指一個樣本的标簽數量不止一個,即一個樣本對應多個标簽。

在預測多标簽分類問題時,假設隐藏層的輸出是[-1.0, 5.0, -0.5, 5.0, -0.5 ],如果用softmax函數的話,那麼輸出為:

z = np.array([-1.0, 5.0, -0.5, 5.0, -0.5])

print(Softmax_sim(z))

# 輸出為[ 0.00123281  0.49735104  0.00203256  0.49735104  0.00203256]

通過使用softmax,我們可以清楚地選擇标簽2和标簽4。但我們必須知道每個樣本需要多少個标簽,或者為機率選擇一個門檻值。這顯然不是我們想要的,因為樣本屬于每個标簽的機率應該是獨立的。

對于一個二分類問題,常用的激活函數是sigmoid函數:

ps: sigmoid函數之是以在之前很長一段時間作為神經網絡激活函數(現在大家基本都用Relu了),一個很重要的原因是sigmoid函數的導數很容易計算,可以用自身表示:

python 代碼為:

import numpy as np

def Sigmoid_sim(x):

   return  1 /(1+np.exp(-x))

a = np.array([-1.0, 5.0, -0.5, 5.0, -0.5])

print(Sigmoid_sim(a))

#輸出為: [ 0.26894142  0.99330715  0.37754067  0.99330715  0.37754067]

此時,每個标簽的機率即是獨立的。完整整個模型建構之後,最後一步中最重要的是為模型的編譯選擇損失函數。在多标簽分類中,大多使用binary_crossentropy損失而不是通常在多類分類中使用的categorical_crossentropy損失函數。這可能看起來不合理,但因為每個輸出節點都是獨立的,選擇二進制損失,并将網絡輸出模組化為每個标簽獨立的bernoulli分布。整個多标簽分類的模型為:

from keras.models import Model

from keras.layers import Input,Dense

inputs = Input(shape=(10,))

hidden = Dense(units=10,activation='relu')(inputs)

output = Dense(units=5,activation='sigmoid')(hidden)

model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

我們使用最常用的衣服資料集來實作多标簽分類,網絡模型使用ResNet50。

資料集位址:連結:

https://pan.baidu.com/s/1eANXTnWl2nf853IEiLOvWg

提取碼:jo4h

多标簽分類算法詳解及實踐(Keras)

我們的資料集由5547張圖檔組成,它們來自12個不同的種類,包括:

black_dress(333張圖檔)

black_jeans(344張圖檔)

black_shirt(436張圖檔)

black_shoe(534張圖檔)

blue_dress(386張圖檔)

blue_jeans(356張圖檔)

blue_shirt(369張圖檔)

red_dress(384張圖檔)

red_shirt(332張圖檔)

red_shoe(486張圖檔)

white_bag(747張圖檔)

white_shoe(840張圖檔)

我們的卷積神經網絡的目标是同時預測顔色和服飾類别。代碼使用Tensorflow2.0以上版本編寫。下面對我實作算法的代碼作講解:

# import the necessary packages
 
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from imutils import paths
import tensorflow as tf
import numpy as np
import argparse
import random
import pickle
import cv2
import os
from tensorflow.python.keras.applications.resnet import ResNet50
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator, img_to_array
 
# construct the argument parse and parse the arguments
 
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", default='../dataset',
                help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", default='model.h5',
                help="path to output model")
ap.add_argument("-l", "--labelbin", default='labelbin',
                help="path to output label binarizer")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
                help="path to output accuracy/loss plot")
args = vars(ap.parse_args())      

超參數的解釋:

--dataset:輸入的資料集路徑。

--model:輸出的Keras序列模型路徑。

--labelbin:輸出的多标簽二值化對象路徑。

--plot:輸出的訓練損失及正确率圖像路徑。

EPOCHS = 150

INIT_LR = 1e-3

BS = 16

IMAGE_DIMS = (224, 224, 3)

加載資料

print("[INFO] loading images...")

imagePaths = sorted(list(paths.list_images(args["dataset"])))

random.seed(42)

random.shuffle(imagePaths)

# initialize the data and labels

data = []

labels = []

# loop over the input images

for imagePath in imagePaths:

   # load the image, pre-process it, and store it in the data list

   image = cv2.imread(imagePath)

   image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))

   image = img_to_array(image)

   data.append(image)

   # extract set of class labels from the image path and update the

   # labels list

   l = label = imagePath.split(os.path.sep)[-2].split("_")

   labels.append(l)

# scale the raw pixel intensities to the range [0, 1]

data = np.array(data, dtype="float") / 255.0

labels = np.array(labels)

print(labels)

運作結果:

[['red' 'shirt']

['black' 'jeans']

['black' 'shoe']

...

['black' 'dress']

['black' 'shirt']

['white' 'shoe']]

print("[INFO] class labels:")

mlb = MultiLabelBinarizer()

labels = mlb.fit_transform(labels)

# loop over each of the possible class labels and show them

for (i, label) in enumerate(mlb.classes_):

print("{}. {}".format(i + 1, label))

通過MultiLabelBinarizer()的fit就可以得到label的編碼。我們将類别和生成後的标簽列印出來。類别結果如下:

[INFO] class labels:

1. bag

2. black

3. blue

4. dress

5. jeans

6. red

7. shirt

8. shoe

9. white

lables的輸出結果如下:

[[0 0 0 ... 1 0 0]

[0 1 0 ... 0 0 0]

[0 1 0 ... 0 1 0]

[0 1 0 ... 1 0 0]

[0 0 0 ... 0 1 1]]

為了友善大家了解标簽,我通過下面的表格說明

Bag

Black

Blue

Dress

Jeans

Red

Shirt

Shoe

White

[‘red’ ’shirt’]

1

[‘black’ ’jeans’]

['white' 'shoe']

然後,将MultiLabelBinarizer()訓練的模型儲存,友善測試時使用。代碼如下:

print("[INFO] serializing label binarizer...")

f = open(args["labelbin"], "wb")

f.write(pickle.dumps(mlb))

f.close()

(trainX, testX, trainY, testY) = train_test_split(data,

                                                 labels, test_size=0.2, random_state=42)

# construct the image generator for data augmentation

aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,

                        height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,

                        horizontal_flip=True, fill_mode="nearest")

checkpointer = ModelCheckpoint(filepath='weights_best_Reset50_model.hdf5',

                              monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')

reduce = ReduceLROnPlateau(monitor='val_accuracy', patience=10,

                          verbose=1,

                          factor=0.5,

                          min_lr=1e-6)

checkpointer的作用是儲存最好的訓練模型。reduce動态調整學習率。

model = ResNet50(weights=None, classes=len(mlb.classes_))

optimizer = Adam(lr=INIT_LR)

model.compile(loss="binary_crossentropy", optimizer=optimizer,

             metrics=["accuracy"])

print("[INFO] training network...")

history = model.fit(

   x=aug.flow(trainX, trainY, batch_size=BS),

   validation_data=(testX, testY),

   steps_per_epoch=len(trainX) // BS,

epochs=EPOCHS, callbacks=[checkpointer, reduce], verbose=1)

# save the model to disk

print("[INFO] serializing network...")

model.save(args["model"], save_format="h5")

# plot the training loss and accuracy

loss_trend_graph_path = r"WW_loss.jpg"

acc_trend_graph_path = r"WW_acc.jpg"

import matplotlib.pyplot as plt

print("Now,we start drawing the loss and acc trends graph...")

# summarize history for accuracy

fig = plt.figure(1)

plt.plot(history.history["accuracy"])

plt.plot(history.history["val_accuracy"])

plt.title("Model accuracy")

plt.ylabel("accuracy")

plt.xlabel("epoch")

plt.legend(["train", "test"], loc="upper left")

plt.savefig(acc_trend_graph_path)

plt.close(1)

# summarize history for loss

fig = plt.figure(2)

plt.plot(history.history["loss"])

plt.plot(history.history["val_loss"])

plt.title("Model loss")

plt.ylabel("loss")

plt.savefig(loss_trend_graph_path)

plt.close(2)

print("We are done, everything seems OK...")

# import the necessary packages
 
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from imutils import paths
import tensorflow as tf
import numpy as np
import argparse
import random
import pickle
import cv2
import os
from tensorflow.python.keras.applications.resnet import ResNet50
from tensorflow.keras.optimizers import Adam
from tensorflow.python.keras.callbacks import ModelCheckpoint, ReduceLROnPlateau
from tensorflow.python.keras.preprocessing.image import ImageDataGenerator, img_to_array
 
# construct the argument parse and parse the arguments
 
ap = argparse.ArgumentParser()
ap.add_argument("-d", "--dataset", default='../dataset',
                help="path to input dataset (i.e., directory of images)")
ap.add_argument("-m", "--model", default='model.h5',
                help="path to output model")
ap.add_argument("-l", "--labelbin", default='labelbin',
                help="path to output label binarizer")
ap.add_argument("-p", "--plot", type=str, default="plot.png",
                help="path to output accuracy/loss plot")
args = vars(ap.parse_args())
 
# initialize the number of epochs to train for, initial learning rate,
# batch size, and image dimensions
EPOCHS = 150
INIT_LR = 1e-3
BS = 16
IMAGE_DIMS = (224, 224, 3)
# disable eager execution
tf.compat.v1.disable_eager_execution()
# grab the image paths and randomly shuffle them
print("[INFO] loading images...")
imagePaths = sorted(list(paths.list_images(args["dataset"])))
random.seed(42)
random.shuffle(imagePaths)
# initialize the data and labels
data = []
labels = []
# loop over the input images
for imagePath in imagePaths:
    # load the image, pre-process it, and store it in the data list
    image = cv2.imread(imagePath)
    image = cv2.resize(image, (IMAGE_DIMS[1], IMAGE_DIMS[0]))
    image = img_to_array(image)
    data.append(image)
    # extract set of class labels from the image path and update the
    # labels list
    l = label = imagePath.split(os.path.sep)[-2].split("_")
    labels.append(l)
# scale the raw pixel intensities to the range [0, 1]
data = np.array(data, dtype="float") / 255.0
labels = np.array(labels)
print("[INFO] data matrix: {} images ({:.2f}MB)".format(
    len(imagePaths), data.nbytes / (1024 * 1000.0)))
# binarize the labels using scikit-learn's special multi-label
# binarizer implementation
print("[INFO] class labels:")
mlb = MultiLabelBinarizer()
labels = mlb.fit_transform(labels)
# loop over each of the possible class labels and show them
for (i, label) in enumerate(mlb.classes_):
    print("{}. {}".format(i + 1, label))
print(labels)
# partition the data into training and testing splits using 80% of
# the data for training and the remaining 20% for testing
(trainX, testX, trainY, testY) = train_test_split(data,
                                                  labels, test_size=0.2, random_state=42)
print("[INFO] serializing label binarizer...")
f = open(args["labelbin"], "wb")
f.write(pickle.dumps(mlb))
f.close()
# construct the image generator for data augmentation
aug = ImageDataGenerator(rotation_range=25, width_shift_range=0.1,
                         height_shift_range=0.1, shear_range=0.2, zoom_range=0.2,
                         horizontal_flip=True, fill_mode="nearest")
 
checkpointer = ModelCheckpoint(filepath='weights_best_Reset50_model.hdf5',
                               monitor='val_accuracy', verbose=1, save_best_only=True, mode='max')
 
reduce = ReduceLROnPlateau(monitor='val_accuracy', patience=10,
                           verbose=1,
                           factor=0.5,
                           min_lr=1e-6)
model = ResNet50(weights=None, classes=len(mlb.classes_))
optimizer = Adam(lr=INIT_LR)
model.compile(loss="binary_crossentropy", optimizer=optimizer,
              metrics=["accuracy"])
# train the network
print("[INFO] training network...")
history = model.fit(
    x=aug.flow(trainX, trainY, batch_size=BS),
    validation_data=(testX, testY),
    steps_per_epoch=len(trainX) // BS,
    epochs=EPOCHS, callbacks=[checkpointer, reduce], verbose=1)
# save the model to disk
print("[INFO] serializing network...")
model.save(args["model"], save_format="h5")
# save the multi-label binarizer to disk
 
# plot the training loss and accuracy
loss_trend_graph_path = r"WW_loss.jpg"
acc_trend_graph_path = r"WW_acc.jpg"
import matplotlib.pyplot as plt
 
print("Now,we start drawing the loss and acc trends graph...")
# summarize history for accuracy
fig = plt.figure(1)
plt.plot(history.history["accuracy"])
plt.plot(history.history["val_accuracy"])
plt.title("Model accuracy")
plt.ylabel("accuracy")
plt.xlabel("epoch")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(acc_trend_graph_path)
plt.close(1)
# summarize history for loss
fig = plt.figure(2)
plt.plot(history.history["loss"])
plt.plot(history.history["val_loss"])
plt.title("Model loss")
plt.ylabel("loss")
plt.xlabel("epoch")
plt.legend(["train", "test"], loc="upper left")
plt.savefig(loss_trend_graph_path)
plt.close(2)
print("We are done, everything seems OK...")