用tensorflow实现一个卷积神经网络。实现效果是下边的图。经过两次卷积运动,每次卷积运算完之后,实现一次pooling降维.
过程介绍:
首先源数据经过卷积核卷积,第一个卷积核[5,5,1,32],步长为1,大大小为5*5,输入通道为1,产生32张28*28的featuer maps,然后Relu非线性activation函数转换
Pooling阶段:第一个Pooling[1,2,2,1],大小为2*2,步长为1,方法max,所得到32 张14*14的feature maps
然后第二个卷积运算,[5,5,32,64],步长为1,大大小为5*5,输入通道为32,产生64张featuer maps,然后Relu非线性activation函数转换
Pooling阶段:第二次Pooling[1,2,2,1],大小为2*2,步长为1,方法max,所得到64 张5*5的feature maps
最后一个全连接网络算出结果,用softmax分类
# CNN 在图像处理方面强大的原因是因为他的两个特征:
# (1)局部不变形:Local invarance : pooling: 使得你的图像在平移、旋转、缩放的时候保持不变
# (2)组合性Compositionality: 每个过滤器获取了低层次的图片的一部分,组合起来成了高层次图片
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets(".", one_hot=True)
import tensorflow as tf
# Parameters
learning_rate = 0.001
training_epochs = 30
batch_size = 100
display_step = 1
# Network Parameters
n_input = 784 # MNIST data input (img shape: 28*28)
n_classes = 10 # MNIST total classes (0-9 digits)
# tf Graph input
x = tf.placeholder("float", [None, n_input])
y = tf.placeholder("float", [None, n_classes])
#宽窄CNN:(narrow vs wide CNN)#宽CNN: 0填充边界,可以是卷积核过滤边界 输出属性: n_{out}=(n_{in} + 2*n_{padding} - n_{filter}) + 1#窄CNN: 没有用0填充边界: 输入的维数-卷积核的维数+1(这里用窄的CNN没有zeropadding)
#tf.nn.conv2d 这个函数的功能是:给定4维的input和filter,计算出一个2维的卷积结果。函数的定义为:#def conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,data_format=None, name=None):#前几个参数分别是input, filter, strides, padding, use_cudnn_on_gpu, …下面来一一解释#input:待卷积的数据。格式要求为一个张量,[batch, in_height, in_width, in_channels].具体含义是[训练时一个batch的图片数量, 图片高度, 图片宽度, 图像通道数]#filter: #W 代表卷积核。格式要求为[filter_height, filter_width, in_channels, out_channels].in_channels等于input的in_channels,out_channels是卷积核的的数量,也是输出多少个featuermap(文献2)
#分别表示 卷积核的高度,宽度,输入通道数,输出通道数。#例如考虑一种最简单的情况,现在有一张3×3单通道的图像(对应的shape:[1,3,3,1]),用一个1×1的卷积核(对应的shape:[1,1,1,1])去做卷积,最后会得到一张3×3的feature map.# 如果卷积核为[2,2,1,5],就是得到5张2*2的featuer map#strides :一个长为4的list. 表示每次卷积以后卷积窗口在input中滑动的距离#strides: 步长:这里设置为1#stride第一个stride[0]=0和第三个stride[2]必须为1,stride[1]控制卷积核左移的步长,stride[3]控制卷积核下移的步长
#padding :有SAME和VALID两种选项,表示是否要保留图像边上那一圈不完全卷积的部分。如果是SAME,则保留,例如对于一个输入为5*5,核为[3,3,1,1],padding=‘same’,输入的featuer map为5*5 以为zero-padding(宽CNN). 当padding=’valid‘时候,输出feature map为3*3,为narrow CNN.
#use_cudnn_on_gpu :是否使用cudnn加速。默认是True
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='VALID')
#pooling: 就是从featuer map 提取一些子节点常用max
#作用:(1)提供了固定大小的输出矩阵,例如: 1000 卷积核,可以输出1000维,可以不管你的kenerl大小和输入的大小
#(2)降维尽量的去保留明显的特征
#pooling method
#tf.nn.avg_pool: 平均数
#tf.nn.max_pool: 最大值
#tf.nn.max_pool_with_argmax: 返回一个二维元组(output,argmax),最大值pooling,返回最大值及其相应的索引
#tf.nn.avg_pool3d #3D平均值pooling
#tf.nn.max_pool3d #3D最大值pooling
#tf.nn.fractional_avg_pool
#tf.nn.fractional_max_pool
#ksize:代表pool的大小,这里代表一个二维数据长为2宽为2
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
strides=[1, 2, 2, 1], padding='VALID')
# Create model
def multilayer_perceptron(x, weights, biases):
#now, we want to change this to a CNN network
#first reshape the data to 4-D
x_image = tf.reshape(x, [-1,28,28,1])
#then apply cnn layers
#非线性activation函数relu
h_conv1 = tf.nn.relu(conv2d(x_image, weights['conv1']) + biases['conv_b1'])
h_pool1 = max_pool_2x2(h_conv1)
h_conv2 = tf.nn.relu(conv2d(h_pool1, weights['conv2']) + biases['conv_b2'])
h_pool2 = max_pool_2x2(h_conv2)
h_pool2_flat = tf.reshape(h_pool2, [-1, 7*7*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, weights['fc1']) + biases['fc1_b'])
# Output layer with linear activation
out_layer = tf.matmul(h_fc1, weights['out']) + biases['out_b']
return out_layer
# Store layers weight & biases
#第一个卷积核为5*5,输入通道为1,数据32张featuer maps,输出为32张28*28的feature map
#第一个卷积核为5*5,输入通道为32,数据64张featuer maps,输出为64张10*10的feature map
weights = {
'conv1': tf.Variable(tf.random_normal([5, 5, 1, 32])),
'conv2': tf.Variable(tf.random_normal([5, 5, 32, 64])),
'fc1' : tf.Variable(tf.random_normal([7*7*64,256])),
'out': tf.Variable(tf.random_normal([256,n_classes]))
}
biases = {
'conv_b1': tf.Variable(tf.random_normal([32])),
'conv_b2': tf.Variable(tf.random_normal([64])),
'fc1_b': tf.Variable(tf.random_normal([256])),
'out_b': tf.Variable(tf.random_normal([n_classes]))
}
# Construct model
pred = multilayer_perceptron(x, weights, biases)
# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost)
# Initializing the variables
init = tf.global_variables_initializer()
# Launch the graph
with tf.Session() as sess:
sess.run(init)
# Training cycle
for epoch in range(training_epochs):
avg_cost = 0.
total_batch = int(mnist.train.num_examples/batch_size)
# Loop over all batches
for i in range(total_batch):
batch_x, batch_y = mnist.train.next_batch(batch_size)
# Run optimization op (backprop) and cost op (to get loss value)
_, c = sess.run([optimizer, cost], feed_dict={x: batch_x,
y: batch_y})
# Compute average loss
avg_cost += c / total_batch
# Display logs per epoch step
if epoch % display_step == 0:
print("Epoch:", '%04d' % (epoch+1), "cost=", \
"{:.9f}".format(avg_cost))
print("Optimization Finished!")
# Test model
correct_prediction = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
# Calculate accuracy
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print("Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels}))
#http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
#http://blog.csdn.net/mao_xiao_feng/article/details/53444333
#http://blog.csdn.net/lujiandong1/article/details/53728053