天天看點

用卷積神經網絡進行圖像分類為何卷積神經網絡優于傳統卷積神經網絡

In the last decade, with the discovery of deep learning, the field of image classification has experienced a renaissance. Traditional machine learning methods have been replaced by newer and more powerful deep learning algorithms, such as the convolutional neural network. However, to truly understand and appreciate deep learning, we must know why does it succeed where the other methods fail. In this article, we try to answer some of those questions, by applying various classification algorithms on the Fashion MNIST dataset.

在過去的十年中,随着深度學習的發現,圖像分類領域經曆了複興。 傳統的機器學習方法已被更新和更強大的深度學習算法(例如卷積神經網絡)取代。 但是,要真正了解和欣賞深度學習,我們必須知道為什麼其他方法失敗時它會成功。 在本文中,我們嘗試通過對Fashion MNIST資料集應用各種分類算法來回答其中的一些問題。

Dataset information Fashion MNIST was introduced in August 2017, by research lab at Zalando Fashion. Its goal is to serve as a new benchmark for testing machine learning algorithms, as MNIST became too easy and overused. While MNIST consists of handwritten digits, Fashion MNISTis made of images of 10 different clothing objects. Each image has the following properties:

資料集資訊Fashion MNIST由Zalando Fashion的研究實驗室于2017年8月推出。 随着MNIST變得過于簡單和過度使用,其目标是成為測試機器學習算法的新基準。 MNIST由手寫數字組成,而Fashion MNIST由10種不同服裝對象的圖像組成。 每個圖像具有以下屬性:

  • Its size is 28 × 28 pixels.

    尺寸為28×28像素。

  • Rotated accordingly and represented in grayscale, with integer values ranging from 0 to 255.

    進行相應旋轉并以灰階表示,整數值的範圍為0到255。

  • Blank space represented by black color and having value 0.

    黑色表示的空白,值為0。

In the dataset, we distinguish between the following clothing objects:

在資料集中,我們區分以下服裝對象:

  • T-shirt/Top

    T恤/上衣

  • Trousers

    長褲

  • Pullover

    拉過來

  • Dress

    連衣裙

  • Coat

    塗層

  • Sandal

    涼鞋

  • Shirt

    襯衫

  • Sneaker

    運動鞋

  • Bag

  • Ankle Boot

    腳踝靴

Exploratory data analysis As the dataset is available as the part of the Keras library, and the images are already processed, there is no need for much preprocessing on our part. The only changes we made was converting images from a 2D array into a 1D array, as that makes them easier to work with.

探索性資料分析由于資料集可以作為Keras庫的一部分使用,并且圖像已經過處理,是以我們不需要太多預處理。 我們所做的唯一更改是将圖像從2D數組轉換為1D數組,因為這使它們更易于使用。

The dataset consists of 70000 images, of which the 60000 make the training set, and 10000 the test set. Like in the original MNIST dataset, the items are distributed evenly (6000 of each of training set and 1000 in the test set).

資料集包含70000張圖像,其中60000張為訓練集,10000張為測試集。 像在原始MNIST資料集中一樣,項目也平均配置設定(每個訓練集6000個,測試集中1000個)。

用卷積神經網絡進行圖像分類為何卷積神經網絡優于傳統卷積神經網絡

Examples of images of different items of clothing. Photo by the author. 不同衣物的圖像示例。 圖檔由作者提供。

However, a single image still has 784 dimensions, so we turned to the principal component analysis (PCA), to see which pixels are the most important. We set the traditional benchmark of 80% of the cumulative variance, and the plot told us that that is made possible with only around 25 principal components (3% of the total number of PCs). However, that is not surprising, as, we can see in the photo above, that there is a lot of shared unused space in each image and that different classes of clothing have different parts of images that are black. The latter can be connected to the fact that around 70% of the cumulative variance is explained by only 8 principal components.

但是,單個圖像仍然具有784個尺寸,是以我們轉向了主成分分析(PCA),以了解哪些像素最重要。 我們将傳統基準設定為累積方差的80%,該圖告訴我們,隻有大約25個主要元件(占PC總數的3%)才能實作這一點。 但是,這并不奇怪,因為我們可以在上一張照片中看到,每個圖像中都有大量共享的未使用空間,并且不同類别的衣服具有不同的黑色圖像部分。 後者可能與以下事實有關:僅由8個主要成分來解釋大約70%的累積方差。

用卷積神經網絡進行圖像分類為何卷積神經網絡優于傳統卷積神經網絡

Cumulative percent of variance explained. Photo by the author. 解釋了累積的方差百分比。 圖檔由作者提供。

We will apply the principal components in the Logistic regression, Random Forest and Support Vector Machines.

我們将在Logistic回歸,随機森林和支援向量機中應用主要元件。

The image classification problems represent just a small subset of classification problems. The most used image classification methods are deep learning algorithms, one of which is the convolutional neural network. The rest of the employed methods will be a small collection of common classification methods. As class labels are evenly distributed, with no misclassification penalties, we will evaluate the algorithms using accuracy metric.

圖像分類問題僅代表分類問題的一小部分。 最常用的圖像分類方法是深度學習算法,其中之一就是卷積神經網絡。 其餘采用的方法将是一小部分常見分類方法。 由于類别标簽均勻分布,沒有分類錯誤的懲罰,是以我們将使用準确性度量來評估算法。

CONVOLUTIONAL NEURAL NETWORK (CNN) The first method we employed was CNN. As the images were in grayscale, we applied only one channel. We selected the following architecture:

卷積神經網絡(CNN)我們采用的第一種方法是CNN。 由于圖像是灰階圖像,是以我們僅應用了一個通道。 我們選擇了以下架構:

  • Two convolutional layers with 32 and 64 filters, 3 × 3 kernel size, and relu activation.

    具有32和64個濾鏡,3×3核心大小和relu激活的兩個卷積層。

  • The polling layers were chosen to operate of tiles size 2 × 2 and to select the maximal element in them.

    選擇輪詢層以操作大小為2×2的圖塊并在其中選擇最大元素。

  • Two sets of dense layers, with the first one selecting 128 features, having relu and softmax activation.

    兩組密集層,其中第一層選擇128個要素,具有r​​elu和softmax激活。

There is nothing special about this architecture. In fact, it is one of the simplest architectures we can use for a CNN. That shows us the true power of this class of methods: getting great results with a benchmark structure.

這種架構沒有什麼特别的。 實際上,它是我們可用于CNN的最簡單的體系結構之一。 這向我們展示了此類方法的真正力量:通過基準結構獲得出色的結果。

For loss function, we chose categorical cross-entropy. To avoid overfitting, we have chosen 9400 images from the training set to serve as a validation set for our parameters. We used novel optimizer adam, which improves overstandard gradient descent methods and uses a different learning rate for each parameter and the batch size equal to 64. The model was trained in 50 epochs. We present the accuracy and loss values in the graphs below.

對于損失函數,我們選擇分類交叉熵。 為了避免過度拟合,我們從訓練集中選擇了9400張圖像作為我們參數的驗證集。 我們使用了新穎的優化程式adam,它改進了超标準的梯度下降方法,并且對每個參數使用了不同的學習率,并且批處理大小等于64。模型在50個曆元内進行了訓練。 我們在下圖中顯示精度和損耗值。

用卷積神經網絡進行圖像分類為何卷積神經網絡優于傳統卷積神經網絡

Photo by the author. 圖檔由作者提供。

用卷積神經網絡進行圖像分類為何卷積神經網絡優于傳統卷積神經網絡

Photo by the author. 圖檔由作者提供。

We see that the algorithm converged after 15 epochs, that it is not overtrained, so we tested it. The obtained testing accuracy was equal to89%, which is the best result obtained out of all methods!

我們看到該算法在15個紀元後收斂,沒有受到過度訓練,是以我們對其進行了測試。 獲得的測試精度等于89%,這是所有方法中獲得的最佳結果!

Before proceeding to other methods, let’s explain what have the convolutional layers done. An intuitive explanation is that the first layer was capturing straight lines and the second one curves. On both layers we applied max pooling, which selects the maximal value in the kernel, separating clothing parts from blank space. In that way, we capture the representative nature of data. In other, neural networks perform feature selection by themselves. After the last pooling layer, we get an artificial neural network. Because we are dealing with the classification problem, the final layeruses softmax activation to get class probabilities. As class probabilities follow a certain distribution, cross-entropy indicates the distance from networks preferred distribution.

在繼續其他方法之前,讓我們先解釋一下卷積層的作用。 直覺的解釋是,第一層捕獲直線,第二層捕獲曲線。 在這兩個層上,我們都應用了最大池化(max pooling),該池選擇核心中的最大值,進而将衣物部分與空白空間分開。 這樣,我們可以捕獲資料的代表性。 換句話說,神經網絡自己執行特征選擇。 在最後的合并層之後,我們得到了一個人工神經網絡。 因為我們正在處理分類問題,是以最後一層使用softmax激活來擷取類機率。 當類别機率遵循某個分布時,交叉熵表示距網絡首選分布的距離。

Multinomial Logistic Regression As pixel values are categorical variables, we can apply Multinomial Logistic Regression. We apply it one vs rest fashion, training ten binary Logistic Regression classifiers, that we will use to select items. In order not to overtrain, we have used the L2 regularization. We get 80% accuracy on this algorithm, 9% less accurate than convolutional neural networks. But we have to take into account that this algorithm worked on grayscale images which are centred and normally rotated, with lots of blank space, so it may not work for more complex images.

多項邏輯回歸由于像素值是分類變量,是以我們可以應用多項邏輯回歸。 我們将其應用于休息方式與休息方式,訓練了十個二進制Logistic回歸分類器,這些分類器将用于選擇項。 為了不過度訓練,我們使用了L2正則化。 我們在此算法上獲得80%的精度,比卷積神經網絡的精度低9%。 但是我們必須考慮到,該算法适用于居中且正常旋轉的灰階圖像,并且有很多空白,是以對于較複雜的圖像可能不起作用。

Nearest neighbors and centroid algorithms We used two different nearest distance algorithms:

最近鄰居和質心算法我們使用了兩種不同的最近距離算法:

  • K-nearest neighbors

    K近鄰

  • Nearest Centroid

    最近的質心

Nearest centroid algorithm finds mean values of elements of each class and assigns test element to the class to which the nearest centroid is assigned. Both algorithms were implemented with respect to L1 and L2 distance. The accuracy for k-nearest algorithms was 85%, while the centroid algorithm had the accuracy of 67%. These results were obtained for k=12. High accuracy of the k-nearest neighbors tells us that the images belonging to the same class tend to occupy similar places on images, and also have similar pixels intensities. While nearest neighbours obtained good results, they still perform worse than CNNs, as they don’t operate in neighbourhood of each specific feature, while centroids fail since they don’t distinguish between similar-looking objects (e.g. pullover vs t-shirt/top)

最近質心算法查找每個類别的元素的平均值,并将測試元素配置設定給配置設定了最近質心的類别。 兩種算法都是針對L1和L2距離實作的。 k最近算法的精度為85%,而質心算法的精度為67%。 對于k = 12獲得這些結果。 k最近鄰的高精度告訴我們,屬于同一類的圖像傾向于在圖像上占據相似的位置,并且具有相似的像素強度。 雖然最近的鄰居取得了不錯的效果,但它們的表現仍然比CNN差,因為它們不在每個特定特征附近工作,而質心失敗了,因為它們無法區分外觀相似的物體(例如套衫vs T恤/上衣/上衣)

Random Forest To select the best parameters for estimation, we performed grid search with squared root (bagging) and the full number of features, Gini and entropy criterion, and with trees having maximal depth 5 and 6. Grid search suggested that we should use root squared number of features with entropy criterion (both expected for classification task). However, obtained accuracy was only equal to 77%, implying that random forest is not a particularly good method for this task. The reason it failed is that principal components don’t represent the rectangular partition that an image can have, on which random forests operate. The same reasoning applies to the full-size images as well, as the trees would be too deep and lose interpretability.

随機森林要選擇估計的最佳參數,我們進行與平方根(套袋)和全多項功能,基尼和熵準則網格搜尋,并用具有最大深度5和6網格搜尋建議我們應該用根樹具有熵标準的特征的平方數(均屬于分類任務)。 但是,獲得的準确性僅等于77%,這意味着随機森林并不是執行此任務的特别好的方法。 失敗的原因是主要成分不代表圖像可以具有的矩形分區,随機森林在該矩形分區上運作。 同樣的道理也适用于全尺寸圖像,因為樹木太深并且無法解釋。

Support Vector Machines (SVM) We applied SVM using radial and polynomial kernel. The radial kernel has 77% accuracy, while the polynomial kernel fails miserably and it is only 46% accurate. Although image classification is not their strength, are still highly useful for other binary classifications tasks. Their biggest caveat is that they require feature selection, which brings accuracy down, and without it, they can be computationally expensive. Also, they apply multiclass classification in a one-vs-rest fashion, making it harder to efficiently create separating hyperplane, thus losing value when working with non-binary classification tasks.

支援向量機(SVM)我們使用徑向和多項式核心應用了SVM。 徑向核的準确度為77%,而多項式核則嚴重失敗,準确度僅為46%。 盡管圖像分類不是它們的優勢,但對于其他二進制分類任務仍然非常有用。 他們最大的警告是,他們需要特征選擇,這會降低準确性,而沒有它,它們在計算上可能會很昂貴。 而且,它們以“一對多”的方式應用多類分類,這使得更難有效地建立分離的超平面,進而在處理非二進制分類任務時失去了價值。

Conclusions In this article, we applied various classification methods on an image classification problem. We have explained why the CNNs are the best method we can employ out of considered ones, and why do the other methods fail. Some of the reasons why CNNs are the most practical and usually the most accurate method are:

結論在本文中,我們對圖像分類問題應用了各種分類方法。 我們已經解釋了為什麼CNN是我們可以考慮使用的最佳方法,以及其他方法為什麼會失敗。 CNN最實用且通常最準确的方法的一些原因包括:

  • They can transfer learning through layers, saving inferences, and making new ones on subsequent layers.

    他們可以通過層級轉移學習,儲存推理并在後續層級上進行新的學習。

  • No need for feature extraction before using the algorithm, it is done during training.

    在使用算法之前不需要特征提取,它是在訓練期間完成的。

  • It recognizes important features.

    它認識到重要功能。

However, they also have their caveats. They are known to fail on images that are rotated and scaled differently, which is not the case here, as the data was pre-processed. And, although the other methods fail to give that good results on this dataset, they are still used for other tasks related to image processing (sharpening, smoothing etc.).

但是,他們也有一些警告。 衆所周知,它們在旋轉和縮放比例不同的圖像上會失敗,在這種情況下不是這樣,因為資料已經過預處理。 而且,盡管其他方法無法在此資料集上獲得良好的結果,但它們仍用于與圖像處理有關的其他任務(銳化,平滑等)。

Code: https://github.com/radenjezic153/Stat_ML/blob/master/project.ipynb

代碼: https : //github.com/radenjezic153/Stat_ML/blob/master/project.ipynb

翻譯自: https://towardsdatascience.com/image-classification-with-fashion-mnist-why-convolutional-neural-networks-outperform-traditional-df531e0533c2

繼續閱讀