醫學圖像 | 使用深度學習實作乳腺癌分類（附python演練）

乳腺癌是全球第二常見的女性癌症。2012年，它占所有新癌症病例的12%，占所有女性癌症病例的25%。

當乳腺細胞生長失控時，乳腺癌就開始了。這些細胞通常形成一個惡性良性腫瘤，通常可以在x光片上直接看到或感覺到有一個腫塊。如果癌細胞能生長到周圍組織或擴散到身體的其他地方，那麼這個惡性良性腫瘤就是惡性的。

以下是報告：

大約八分之一的美國女性（約12%）将在其一生中患上浸潤性乳腺癌。
2019年，美國預計将有268,600例新的侵襲性乳腺癌病例，以及62,930例新的非侵襲性乳腺癌。
大約85%的乳腺癌發生在沒有乳腺癌家族史的女性身上。這些發生是由于基因突變，而不是遺傳突變
如果一名女性的一級親屬(母親、姐妹、女兒)被診斷出患有乳腺癌，那麼她患乳腺癌的風險幾乎會增加一倍。在患乳腺癌的女性中，隻有不到15%的人的家人被診斷出患有乳腺癌。

挑戰

建構一個算法，通過檢視活檢圖像自動識别患者是否患有乳腺癌。算法必須非常精确，因為人的生命安全是第一的。

資料

資料集可以從這裡(https://web.inf.ufpr.br/vri/databases/breast-cancer-histopathological-database-breakhis/)下載下傳。這是二分類問題。我把資料拆分如圖所示

dataset train
  benign
   b1.jpg
   b2.jpg
   //
  malignant
   m1.jpg
   m2.jpg
   //  validation
   benign
    b1.jpg
    b2.jpg
    //
   malignant
    m1.jpg
    m2.jpg
    //...

訓練檔案夾在每個類别中有1000個圖像，而驗證檔案夾在每個類别中有250個圖像。

以上兩張圖檔是良性樣本

以上兩張圖檔是惡性樣本

環境和工具

scikit-learn
keras
numpy
pandas
matplotlib

圖像分類

完整的圖像分類流程可以形式化如下：

我們的輸入是一個由N個圖像組成的訓練資料集，每個圖像都有相應的标簽。

然後，我們使用這個訓練集來訓練分類器，來學習每個類。

最後，我們通過讓分類器預測一組從未見過的新圖像的标簽來評估分類器的品質。然後我們将這些圖像的真實标簽與分類器預測的标簽進行比較。

代碼實作

讓我們開始使用代碼。github上的完整項目可以在此連結(https://github.com/abhinavsagar/Breast-cancer-classification)。

讓我們從加載所有庫和依賴項開始。

import json
import math
import os
import cv2
from PIL import Image
import numpy as np
from keras import layers
from keras.applications import DenseNet201
from keras.callbacks import Callback, ModelCheckpoint, ReduceLROnPlateau, TensorBoard
from keras.preprocessing.image import ImageDataGenerator
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.optimizers import Adam
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.metrics import cohen_kappa_score, accuracy_score
import scipy
from tqdm import tqdm
import tensorflow as tf
from keras import backend as K
import gc
from functools import partial
from sklearn import metrics
from collections import Counter
import json
import itertools

接下來，我将圖像加載到相應的檔案夾中。

def Dataset_loader(DIR, RESIZE, sigmaX=10):
    IMG = []
    read = lambda imname: np.asarray(Image.open(imname).convert("RGB"))
    for IMAGE_NAME in tqdm(os.listdir(DIR)):
        PATH = os.path.join(DIR,IMAGE_NAME)
        _, ftype = os.path.splitext(PATH)
        if ftype == ".png":
            img = read(PATH)

            img = cv2.resize(img, (RESIZE,RESIZE))

            IMG.append(np.array(img))
    return IMG

benign_train = np.array(Dataset_loader('data/train/benign',224))
malign_train = np.array(Dataset_loader('data/train/malignant',224))
benign_test = np.array(Dataset_loader('data/validation/benign',224))
malign_test = np.array(Dataset_loader('data/validation/malignant',224))

之後，我建立了一個全0的numpy數組，用于标記良性圖像，以及全1的numpy數組，用于标記惡性圖像。我還重新整理了資料集，并将标簽轉換為分類格式。

benign_train_label = np.zeros(len(benign_train))
malign_train_label = np.ones(len(malign_train))
benign_test_label = np.zeros(len(benign_test))
malign_test_label = np.ones(len(malign_test))

X_train = np.concatenate((benign_train, malign_train), axis = 0)
Y_train = np.concatenate((benign_train_label, malign_train_label), axis = 0)
X_test = np.concatenate((benign_test, malign_test), axis = 0)
Y_test = np.concatenate((benign_test_label, malign_test_label), axis = 0)

s = np.arange(X_train.shape[0])
np.random.shuffle(s)
X_train = X_train[s]
Y_train = Y_train[s]

s = np.arange(X_test.shape[0])
np.random.shuffle(s)
X_test = X_test[s]
Y_test = Y_test[s]

Y_train = to_categorical(Y_train, num_classes= 2)
Y_test = to_categorical(Y_test, num_classes= 2)

然後我将資料集分成兩組，分别具有80%和20%圖像的訓練集和測試集。讓我們看一些樣本良性和惡性圖像。

x_train, x_val, y_train, y_val = train_test_split(
    X_train, Y_train, 
    test_size=0.2, 
    random_state=11
)

w=60
h=40
fig=plt.figure(figsize=(15, 15))
columns = 4
rows = 3

for i in range(1, columns*rows +1):
    ax = fig.add_subplot(rows, columns, i)
    if np.argmax(Y_train[i]) == 0:
        ax.title.set_text('Benign')
    else:
        ax.title.set_text('Malignant')
    plt.imshow(x_train[i], interpolation='nearest')
plt.show()

我使用的batch值為16。batch是深度學習中最重要的超參數之一。我更喜歡使用更大的batch來訓練我的模型，因為它允許從gpu的并行性中提高計算速度。但是，衆所周知，batch太大會導緻泛化效果不好。在一個極端下，使用一個等于整個資料集的batch将保證收斂到目标函數的全局最優。但是這是以收斂到最優值較慢為代價的。另一方面，使用更小的batch已被證明能夠更快的收斂到好的結果。這可以直覺地解釋為，較小的batch允許模型在必須檢視所有資料之前就開始學習。使用較小的batch的缺點是不能保證模型收斂到全局最優。是以，通常建議從小batch開始，通過訓練慢慢增加batch大小來加快收斂速度。

我還做了一些資料擴充。資料擴充的實踐是增加訓練集規模的一種有效方式。訓練執行個體的擴充使網絡在訓練過程中可以看到更加多樣化，仍然具有代表性的資料點。

然後，我建立了一個資料生成器，自動從檔案夾中擷取資料。Keras為此提供了友善的python生成器函數。

BATCH_SIZE = 16

train_generator = ImageDataGenerator(
        zoom_range=2,  # 設定範圍為随機縮放
        rotation_range = 90,
        horizontal_flip=True,  # 随機翻轉圖檔
        vertical_flip=True,  # 随機翻轉圖檔
    )

下一步是構模組化型。這可以通過以下3個步驟來描述：

我使用DenseNet201作為訓練前的權重，它已經在Imagenet比賽中訓練過了。設定學習率為0.0001。
在此基礎上，我使用了globalaveragepooling層和50%的dropout來減少過拟合。
我使用batch标準化和一個以softmax為激活函數的含有2個神經元的全連接配接層，用于2個輸出類的良惡性。
我使用Adam作為優化器，使用二進制交叉熵作為損失函數。

def build_model(backbone, lr=1e-4):
    model = Sequential()
    model.add(backbone)
    model.add(layers.GlobalAveragePooling2D())
    model.add(layers.Dropout(0.5))
    model.add(layers.BatchNormalization())
    model.add(layers.Dense(2, activation='softmax'))

    model.compile(
        loss='binary_crossentropy',
        optimizer=Adam(lr=lr),
        metrics=['accuracy']
    )
    return model

resnet = DenseNet201(
    weights='imagenet',
    include_top=False,
    input_shape=(224,224,3)
)

model = build_model(resnet ,lr = 1e-4)
model.summary()

讓我們看看每個層中的輸出形狀和參數。

在訓練模型之前，定義一個或多個回調函數很有用。非常友善的是：ModelCheckpoint和ReduceLROnPlateau。

ModelCheckpoint：當訓練通常需要多次疊代并且需要大量的時間來達到一個好的結果時，在這種情況下，ModelCheckpoint儲存訓練過程中的最佳模型。
ReduceLROnPlateau：當度量停止改進時，降低學習率。一旦學習停滞不前，模型通常會從将學習率降低2-10倍。這個回調函數會進行監視，如果在'patience'(耐心)次數下，模型沒有任何優化的話，學習率就會降低。

該模型我訓練了20個epoch。

learn_control = ReduceLROnPlateau(monitor='val_acc', patience=5,
                                  verbose=1,factor=0.2, min_lr=1e-7)

filepath="weights.best.hdf5"
checkpoint = ModelCheckpoint(filepath, monitor='val_acc', verbose=1, save_best_only=True, mode='max')

history = model.fit_generator(
    train_generator.flow(x_train, y_train, batch_size=BATCH_SIZE),
    steps_per_epoch=x_train.shape[0] / BATCH_SIZE,
    epochs=20,
    validation_data=(x_val, y_val),
    callbacks=[learn_control, checkpoint]
)

性能名額

評價模型性能最常用的名額是精度。然而，當您的資料集中隻有2%屬于一個類(惡性)，98%屬于其他類(良性)時，錯誤分類的分數就沒有意義了。你可以有98%的準确率，但仍然沒有發現惡性病例，即預測的時候全部打上良性的标簽，這是一個不好的分類器。

history_df = pd.DataFrame(history.history)
history_df[['loss', 'val_loss']].plot()

history_df = pd.DataFrame(history.history)
history_df[['acc', 'val_acc']].plot()

精度，召回率和F1度量

為了更好地了解錯誤分類，我們經常使用以下度量來更好地了解真正例(TP)、真負例(TN)、假正例(FP)和假負例(FN)。

精度反映了被分類器判定的正例中真正的正例樣本的比重。

召回率反映了所有真正為正例的樣本中被分類器判定出來為正例的比例。

F1度量是準确率和召回率的調和平均值。

F1度量越高，模型越好。對于所有三個度量，0值表示最差，而1表示最好。

混淆矩陣

混淆矩陣是分析誤分類的一個重要名額。矩陣的每一行表示預測類中的執行個體，而每一清單示實際類中的執行個體。對角線表示已正确分類的類。這很有幫助，因為我們不僅知道哪些類被錯誤分類，還知道它們為什麼被錯誤分類。

from sklearn.metrics import classification_report
classification_report( np.argmax(Y_test, axis=1), np.argmax(Y_pred_tta, axis=1))

from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(cm, classes,
                          normalize=False,
                          title='Confusion matrix',
                          cmap=plt.cm.Blues):
    if normalize:
        cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
        print("Normalized confusion matrix")
    else:
        print('Confusion matrix, without normalization')

    print(cm)

    plt.imshow(cm, interpolation='nearest', cmap=cmap)
    plt.title(title)
    plt.colorbar()
    tick_marks = np.arange(len(classes))
    plt.xticks(tick_marks, classes, rotation=55)
    plt.yticks(tick_marks, classes)
    fmt = '.2f' if normalize else 'd'
    thresh = cm.max() / 2.
    for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
        plt.text(j, i, format(cm[i, j], fmt),
                 horizontalalignment="center",
                 color="white" if cm[i, j] > thresh else "black")

    plt.ylabel('True label')
    plt.xlabel('Predicted label')
    plt.tight_layout()

cm = confusion_matrix(np.argmax(Y_test, axis=1), np.argmax(Y_pred, axis=1))

cm_plot_label =['benign', 'malignant']
plot_confusion_matrix(cm, cm_plot_label, title ='Confusion Metrix for Skin Cancer')

ROC曲線

45度的線代表是随機線，其中曲線下面積或AUC是0.5。該線的曲線越遠，AUC越高，模型越好。模型可以獲得的最高值是AUC為1，其中曲線形成直角三角形。ROC曲線還可以幫助調試模型。例如，如果曲線的左下角更接近随機線，則意味着模型在Y = 0時錯誤分類。然而，如果它在右上方是随機的，則意味着錯誤發生在Y = 1。

from sklearn.metrics import roc_auc_score, auc
from sklearn.metrics import roc_curve
roc_log = roc_auc_score(np.argmax(Y_test, axis=1), np.argmax(Y_pred_tta, axis=1))
false_positive_rate, true_positive_rate, threshold = roc_curve(np.argmax(Y_test, axis=1), np.argmax(Y_pred_tta, axis=1))
area_under_curve = auc(false_positive_rate, true_positive_rate)

plt.plot([0, 1], [0, 1], 'r--')
plt.plot(false_positive_rate, true_positive_rate, label='AUC = {:.3f}'.format(area_under_curve))
plt.xlabel('False positive rate')
plt.ylabel('True positive rate')
plt.title('ROC curve')
plt.legend(loc='best')
plt.show()
#plt.savefig(ROC_PLOT_FILE, bbox_inches='tight')
plt.close()

結果

結論

雖然這個項目還遠未完成，但看到深度學習在如此多樣的現實世界問題中取得成功是值得注意的。在這個部落格中，我示範了如何使用卷積神經網絡和遷移學習從一組顯微圖像中對良性和惡性乳腺癌進行分類。

歡迎關注磐創部落格資源彙總站：

http://docs.panchuang.net/

歡迎關注PyTorch官方中文教程站：

http://pytorch.panchuang.net/