【題解】程式設計作業ex3: Multi-class Classification and Neural Networks (Machine Learning)

吐槽：有點點難，但可以推出的。。因為感覺都值得寫是以就都寫了，順便說了說思路，如果有更好的思路也可以評論我hhh

題目：

Download the programming assignment here.

This ZIP file contains the instructions in a PDF and the starter code. You may use either MATLAB or Octave (>= 3.8.0). To submit this assignment, call the included submit function from MATLAB / Octave. You will need to enter the token provided on the right-hand side of this page.

lrCostFunction我的解法：

pdf在這裡提示了兩個點，一個是向量法的輸出可以用size次元來檢測其正确性，另一個是可以用theta(2:end)切片且用.^2來做element-wise的操作。我覺得需要注意的還是theta0是不需要lambda改變的，是以無論J還是grad都需要從theta1開始考慮，這個在代碼裡面也有hint。

function [J, grad] = lrCostFunction(theta, X, y, lambda)

%LRCOSTFUNCTION Compute cost and gradient for logistic regression with

%regularization

% J = LRCOSTFUNCTION(theta, X, y, lambda) computes the cost of using

% theta as the parameter for regularized logistic regression and the

% gradient of the cost w.r.t. to the parameters.

% Initialize some useful values

m = length(y); % number of training examples

% You need to return the following variables correctly

J = 0;

grad = zeros(size(theta));

% ====================== YOUR CODE HERE ======================

% Instructions: Compute the cost of a particular choice of theta.

% You should set J to the cost.

% Compute the partial derivatives and set grad to the partial

% derivatives of the cost w.r.t. each parameter in theta

% Hint: The computation of the cost function and gradients can be

% efficiently vectorized. For example, consider the computation

% sigmoid(X * theta)

% Each row of the resulting matrix will contain the value of the

% prediction for that example. You can make use of this to vectorize

% the cost function and gradient computations.

% Hint: When computing the gradient of the regularized cost function,

% there're many possible vectorized solutions, but one solution

% looks like:

% grad = (unregularized gradient for logistic regression)

% temp = theta;

% temp(1) = 0; % because we don't add anything for j = 0

% grad = grad + YOUR_CODE_HERE (using the temp variable)

h = sigmoid(X * theta);

J = 1/m * (-y'*log(h) - (1-y)'*log(1-h)) + lambda/(2*m) * sum(theta(2:end).^2);

grad = 1/m * X' * (sigmoid(X * theta) - y);

temp = theta;

temp(1) = 0;

grad = grad + lambda/m * temp;

% =============================================================

grad = grad(:);

end

oneVsAll我的解法：

這個函數本來我有點沒了解，但是翻看了筆記裡面對one-vs-all的定義，h^(i)(x)是對于第 i 個class機率，然後max(h^(i)(x))處 i 的取值即為分類結果，是以每個h(x)都有一組theta，i個h(x)有 i 組theta。而且代碼中的注釋裡：ONEVSALL trains multiple logistic regression classifiers and returns all the classifiers in a matrix all_theta, where the i-th row of all_theta corresponds to the classifier for label i，意思就是第 i 組theta需要放在第 i 行all_theta裡面，是以需要轉置一下。而在pdf裡面的tips的代碼運作後發現傳回的是個和 a 次元一樣的隻有0和1組成的代表真假的矩陣，是以y==c中的c也隻是常數，不是一個向量。

function [all_theta] = oneVsAll(X, y, num_labels, lambda)

%ONEVSALL trains multiple logistic regression classifiers and returns all

%the classifiers in a matrix all_theta, where the i-th row of all_theta

%corresponds to the classifier for label i

% [all_theta] = ONEVSALL(X, y, num_labels, lambda) trains num_labels

% logistic regression classifiers and returns each of these classifiers

% in a matrix all_theta, where the i-th row of all_theta corresponds

% to the classifier for label i

% Some useful variables

m = size(X, 1);

n = size(X, 2);

% You need to return the following variables correctly

all_theta = zeros(num_labels, n + 1);

% Add ones to the X data matrix

X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================

% Instructions: You should complete the following code to train num_labels

% logistic regression classifiers with regularization

% parameter lambda.

% Hint: theta(:) will return a column vector.

% Hint: You can use y == c to obtain a vector of 1's and 0's that tell you

% whether the ground truth is true/false for this class.

% Note: For this assignment, we recommend using fmincg to optimize the cost

% function. It is okay to use a for-loop (for c = 1:num_labels) to

% loop over the different classes.

% fmincg works similarly to fminunc, but is more efficient when we

% are dealing with large number of parameters.

% Example Code for fmincg:

% % Set Initial theta

% initial_theta = zeros(n + 1, 1);

% % Set options for fminunc

% options = optimset('GradObj', 'on', 'MaxIter', 50);

% % Run fmincg to obtain the optimal theta

% % This function will return theta and the cost

% [theta] = ...

% fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), ...

% initial_theta, options);

for c = 1:num_labels,

% Set Initial theta

initial_theta = zeros(n + 1, 1);

% Set options for fminunc

options = optimset('GradObj', 'on', 'MaxIter', 50);

% Run fmincg to obtain the optimal theta

% This function will return theta and the cost

[theta] = fmincg (@(t)(lrCostFunction(t, X, (y == c), lambda)), initial_theta, options);

% Set theta to the c-th row in all_theta

all_theta(c, :) = theta';

endfor

% =========================================================================

end

predictOneVsAll我的解法：

一開始覺得看這個描述似乎很複雜的樣子，而且題目還提示說from 1 to num_labels，于是嘗試了一下用for循環做這個，但是沒有成功，感覺太過于繁瑣了。然後又查了一下max(A, [], 2)這個文法的含義是取每一行的最大值（https://www.cnblogs.com/liuxjie/p/12024942.html），于是思路改變一下可能就是要求出某個矩陣然後求每一行的最大值，那麼看一下次元，all_theta是 i * (n+1)，X是 m * (n+1)，而傳回值 p 是 m*1 ，是以自然的可以知道中間矩陣A是 g(X*all_theta')。

function p = predictOneVsAll(all_theta, X)

%PREDICT Predict the label for a trained one-vs-all classifier. The labels

%are in the range 1..K, where K = size(all_theta, 1).

% p = PREDICTONEVSALL(all_theta, X) will return a vector of predictions

% for each example in the matrix X. Note that X contains the examples in

% rows. all_theta is a matrix where the i-th row is a trained logistic

% regression theta vector for the i-th class. You should set p to a vector

% of values from 1..K (e.g., p = [1; 3; 1; 2] predicts classes 1, 3, 1, 2

% for 4 examples)

m = size(X, 1);

num_labels = size(all_theta, 1);

% You need to return the following variables correctly

p = zeros(size(X, 1), 1);

% Add ones to the X data matrix

X = [ones(m, 1) X];

% ====================== YOUR CODE HERE ======================

% Instructions: Complete the following code to make predictions using

% your learned logistic regression parameters (one-vs-all).

% You should set p to a vector of predictions (from 1 to

% num_labels).

% Hint: This code can be done all vectorized using the max function.

% In particular, the max function can also return the index of the

% max element, for more information see 'help max'. If your examples

% are in rows, then, you can use max(A, [], 2) to obtain the max

% for each row.

A = sigmoid(X * all_theta');

[x, p] = max(A, [], 2);

% =========================================================================

end

predict我的解法：

分析一下次元發現就是這麼做的=。=不過需要注意一下octave裡面似乎不支援多元矩陣哎，是以得寫成A1A2A3這種形式。。

function p = predict(Theta1, Theta2, X)

%PREDICT Predict the label of an input given a trained neural network

% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the

% trained weights of a neural network (Theta1, Theta2)

% Useful values

m = size(X, 1);

num_labels = size(Theta2, 1);

% You need to return the following variables correctly

p = zeros(size(X, 1), 1);

% ====================== YOUR CODE HERE ======================

% Instructions: Complete the following code to make predictions using

% your learned neural network. You should set p to a

% vector containing labels between 1 to num_labels.

% Hint: The max function might come in useful. In particular, the max

% function can also return the index of the max element, for more

% information see 'help max'. If your examples are in rows, then, you

% can use max(A, [], 2) to obtain the max for each row.

% Add ones to the X data matrix

X = [ones(m, 1) X];

A1 = X;

A2 = [ones(m, 1) sigmoid(A1 * Theta1')];

A3 = sigmoid(A2 * Theta2');

[x, p] = max(A3, [], 2);

% =========================================================================

end

【題解】程式設計作業ex3: Multi-class Classification and Neural Networks (Machine Learning)

繼續閱讀

記錄一些Matlab用法記錄一些Matlab函數用法

matlab中cumsum函數

LeetCode——224. 基本電腦(Basic Calculator)[困難]——分析及代碼（Java）一、題目二、分析及代碼三、其他

【重學Matlab】Note4 矩陣相關

數學模組化淺談

模組化：數學模組化

數學模組化——層次分析法（Matlab）【評價類問題】建立遞階層次結構構造判斷矩陣一緻性檢驗計算總權重并排序

數學模組化智能優化算法之神經網絡案例附Matlab代碼

simulink bus總線建立方法

拉格朗日插值 matlab

Lagrange插值函數及其Matlab代碼

使用自相關函數的紋理圖像周期計算

2018年高教社杯A題高溫作業專用服裝設計

圖像壓縮編碼碼matlab實作——行程編碼

【可靠性評估】電力系統可靠性評估matlab仿真1.軟體版本2.本算法理論知識

MATLAB:非線性規劃fmincon