天天看点

机器学习大作业_机器学习编程作业4-神经网络(Python版)sigmoid 函数前向传播函数¶

本次编程作业的实现环境是Python3、Anaconda3(64-bit)、Jupyter Notebook。是在深度之眼“机器学习训练营”作业基础上完成的,个别代码有修改,供交流学习之用。

机器学习大作业_机器学习编程作业4-神经网络(Python版)sigmoid 函数前向传播函数¶

加载数据。

import numpy as npimport pandas as pdimport matplotlib.pyplot as pltfrom scipy.io import loadmatdata = loadmat('ex4data1.mat')data, data['X'].shape, data['y'].shape
           

创建一些有用的变量。

X = data['X']y = data['y']​X.shape, y.shape#看下维度
           

Out[3]:

((5000, 400), (5000, 1))
           

对我们的y标签进行一次one-hot 编码。 one-hot 编码将类标签n(k类)转换为长度为k的向量,其中索引n为“hot”(1),而其余为0。 Scikitlearn有一个内置的实用程序,我们可以使用这个。

from sklearn.preprocessing import OneHotEncoder#OneHotEncoder函数实现将分类特征的每个元素转化为一个可以用来计算的值。sparse表示编码格式,默认为True,即稀疏格式,指定False则就不用toarray()了encoder = OneHotEncoder(sparse=False) y_onehot = encoder.fit_transform(y) #fit_transform对y进行编码(每行仅有一个1,列维度对应数字值)y_onehot.shape
           
y[0], y_onehot[0,:]
           

Out[5]:

(array([10], dtype=uint8), array([0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]))
           

sigmoid 函数

def sigmoid(z): return 1 / (1 + np.exp(-z))
           

前向传播函数¶

输入层(400 + 1) ->隐藏层 (25 + 1) ->输出层 (10)

def forward_propagate(X, theta1, theta2): # INPUT:参数值theta,数据X # OUTPUT:当前参数值下前项传播结果 # TODO:根据参数和输入的数据计算前项传播结果  # STEP1:获取样本个数 # your code here (appro ~ 1 lines) m = X.shape[0]  # STEP2:实现神经网络正向传播 # your code here (appro ~ 5 lines)  a1 = np.insert(X, 0, 1, axis=1) #给X矩阵插入一列壹元素 z2 = a1 * theta1.T a2 = np.insert(sigmoid(z2), 0, 1, axis=1) #注意插入一列壹元素 z3 = a2 * theta2.T h = sigmoid(z3) return a1, z2, a2, z3, h
           

代价函数

def cost(params, input_size, hidden_size, theta1, theta2, h, num_labels, X, y, lamda): # INPUT:神经网络参数,输入层维度,隐藏层维度,训练数据及标签,正则化参数 # OUTPUT:当前参数值下的代价函数 # TODO:根据上面的公式计算代价函数  # STEP1:获取样本个数 # your code here (appro ~ 1 lines) m = X.shape[0]  # STEP2:将矩阵X,y转换为numpy型矩阵 # your code here (appro ~ 2 lines) X = np.matrix(X) y = np.matrix(y)  # STEP3:从params中获取神经网络参数,并按照输入层维度和隐藏层维度重新定义参数的维度 # your code here (appro ~ 2 lines) #theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1)))) #theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1)))) # STEP4:调用前面写好的前项传播函数 # your code here (appro ~ 1 lines) #a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)  # STEP5:初始化代价函数 # your code here (appro ~ 1 lines) J = 0  # STEP6:根据公式计算代价函数 #for i in range(m): #方法1:利用循环法遍历每个样本 # your code here (appro ~ 2 lines) # first_term = -y[i,:] * np.log(h[i,:]).T # second_term = (1 - y[i,:]) * np.log(1 - h[i,:]).T # J += np.sum(first_term - second_term) #J = J / m J1 = - np.multiply(y, np.log(h)) - np.multiply((1 - y), np.log(1 - h)) #方法2:利用multiply函数实现点乘方式 J = J1.sum() / m  # STEP7:计算代价函数的正则化部分 # your code here (appro ~ 1 lines) J += (np.power(theta1[:,1:],2).sum() + np.power(theta2[:,1:],2).sum()) * lamda / (2*m) return J
           

# 初始化设置

input_size = 400hidden_size = 25num_labels = 10lamda = 1# 随机初始化完整网络参数大小的参数数组,np.random.randeom()生成随机浮点数,取值范围:[0,1]params = (np.random.random(size=hidden_size * (input_size + 1) + num_labels * (hidden_size + 1)) - 0.5) * 0.25print('params:', params, len(params))m = X.shape[0]X = np.matrix(X)y = np.matrix(y)print('m:',m)print('X:', X, type(X))print('y:',y, type(y))# 将参数数组解开为每个层的参数矩阵theta1 = np.matrix(np.reshape(params[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))theta2 = np.matrix(np.reshape(params[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))print('theta1:', theta1)print('theta2:', theta2)theta1.shape, theta2.shape
           

输出应该是:((25, 401), (10, 26)).

a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)print('a1:', a1)print('z2', z2)print('a2:', a2)print('z3:', z3)print('h:', h)a1.shape, z2.shape, a2.shape, z3.shape, h.shape
           

输出应该是:((5000, 401), (5000, 25), (5000, 26), (5000, 10), (5000, 10))

应用代价函数来计算y和h之间的总误差。

j = cost(params, input_size, hidden_size, theta1, theta2, h, num_labels, X, y_onehot, lamda)print('j:', j) 
           

输出应该是:7.1170579556373621

接下来是反向传播算法。 反向传播参数更新计算将减少训练数据上的网络误差。 我们需要的第一件事是计算我们之前创建的Sigmoid函数的梯度的函数。

def sigmoid_gradient(z): return np.multiply(sigmoid(z), (1 - sigmoid(z)))
           

实施反向传播来计算梯度。 由于反向传播所需的计算是代价函数中所需的计算过程,我们实际上将扩展代价函数以执行反向传播并返回代价和梯度。

def backprop(params, input_size, hidden_size, num_labels, X, y, lamda): # INPUT:神经网络参数,输入层维度,隐藏层维度,训练数据及标签,正则化参数 # OUTPUT:当前参数值下的代价函数 # TODO:根据上面的公式计算代价函数  # STEP1:获取样本个数 5000 # your code here (appro ~ 1 lines) m = X.shape[0]  # STEP2:将矩阵X,y转换为numpy型矩阵 # your code here (appro ~ 2 lines) X = np.matrix(X) y = np.matrix(y)  # STEP3:从params中获取神经网络参数,并按照输入层维度和隐藏层维度重新定义参数的维度 # your code here (appro ~ 2 lines),theta1(25,401),theta2(10,26) theta1 = np.matrix(np.reshape(params[:(input_size + 1) * hidden_size], (hidden_size, (input_size + 1)))) theta2 = np.matrix(np.reshape(params[(input_size + 1) * hidden_size:], (num_labels, (hidden_size + 1))))​ # STEP4:调用前面写好的前向传播函数 # your code here (appro ~ 1 lines) a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)  # STEP5:初始化 # your code here (appro ~ 3 lines) J = 0 delta1 = np.zeros(theta1.shape) #(25,401) delta2 = np.zeros(theta2.shape) #(10,26)  # STEP6:计算代价函数(调用函数) # your code here (appro ~ 1 lines) J = cost(params, input_size, hidden_size, theta1, theta2, h, num_labels, X, y, lamda)# STEP7:实现反向传播(这里用到的公式请参考原版作业PDF的第9页)。#方法1:采用for循环累计计算delta# for t in range(m): #遍历每个样本# a1t = a1[t,:] # (1, 401)# z2t = z2[t,:] # (1, 25)# a2t = a2[t,:] # (1, 26)# ht = h[t,:] # (1, 10)# yt = y[t,:] # (1, 10)#  # your code here (appro ~ 5 lines)# d3t = ht - yt #(1,10)# z2t = np.insert(z2t, 0, 1, axis=1) #(1,26)# d2t = np.multiply(theta2.T * d3t.T, sigmoid_gradient(z2t).T) #(26,1) # delta2 += np.matrix(d3t).T * np.matrix(a2t) #(10,26)# delta1 += np.matrix(d2t[1:]) * np.matrix(a1t) #(25,401)  # STEP7:方法2:采用矩阵计算delta,速度明显要快得多 z2 = np.insert(z2, 0, 1, axis=1) d3 = h - y delta1 = (np.multiply(theta2.T * d3.T, sigmoid_gradient(z2).T))[1:,:] * a1 delta2 = d3.T * a2  delta1 = delta1/m delta2 = delta2/m # STEP8:加入正则化 # your code here (appro ~ 1 lines) delta1[:,1:] = delta1[:,1:] + (theta1[:,1:]*lamda)/m delta2[:,1:] = delta2[:,1:] + (theta2[:,1:]*lamda)/m  # STEP9:将梯度矩阵转换为单个数组 grad = np.concatenate((np.ravel(delta1), np.ravel(delta2)))  return J, grad
           

让我们测试一下,以确保函数返回我们期望的。

J, grad = backprop(params, input_size, hidden_size, num_labels, X, y_onehot, lamda)J, grad.shape
           

输出应该是:(7.3264766401720607, (10285,))

我们终于准备好训练我们的网络,并用它进行预测。 这与以往的具有多类逻辑回归的练习大致相似。

from scipy.optimize import minimize​# minimize the objective function learning_rate# fun为优化的目标函数;x0为初值,即theta;args为额外传递的参数,必须为变量;fmin = minimize(fun=backprop, x0=params, args=(input_size, hidden_size, num_labels, X, y_onehot, lamda),  method='TNC', jac=True, options={'maxiter': 250})fmin
           

输出应该是:fun: 0.34648873971573946

让我们使用它找到的参数,并通过网络前向传播以获得预测。

X = np.matrix(X)theta1 = np.matrix(np.reshape(fmin.x[:hidden_size * (input_size + 1)], (hidden_size, (input_size + 1))))theta2 = np.matrix(np.reshape(fmin.x[hidden_size * (input_size + 1):], (num_labels, (hidden_size + 1))))​a1, z2, a2, z3, h = forward_propagate(X, theta1, theta2)y_pred = np.array(np.argmax(h, axis=1) + 1)y_pred
           

Out[16]:

array([[10], [10], [10], ..., [ 9], [ 9], [ 9]], dtype=int64)
           

最后,我们可以计算准确度,看看我们训练完毕的神经网络效果怎么样。

correct = [1 if a == b else 0 for (a, b) in zip(y_pred, y)]accuracy = (sum(map(int, correct)) / float(len(correct)))print ('accuracy = {0}%'.format(accuracy * 100))
           

输出:ccuracy = 99.32%