TensorFlow實作卷積、反卷積和空洞卷積
TensorFlow已經實作了卷積(tf.nn.conv2d卷積函數),反卷積(tf.nn.conv2d_transpose反卷積函數)以及空洞卷積(tf.nn.atrous_conv2d空洞卷積(dilated convolution)),這三個函數的參數了解,可參考網上。比較難的是計算次元,這裡提供三種方式封裝卷積、反卷積和空洞卷積的方法,方面調用:
一、卷積
- 輸入圖檔大小W×W
- Filter大小F×F
- 步長S
- padding的像素數P
于是我們可以得出
N = [(W − F + 2P )/S]+1
輸出圖檔大小為 N×N,卷積次元計算方法
可以使用TensorFlow進階的API的slim.conv2d
net = slim.conv2d(inputs=inputs,
num_outputs=num_outputs,
weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
weights_regularizer=reg,
kernel_size=[kernel, kernel],
activation_fn=activation_fn,
stride=stride,
padding=padding,
trainable=True,
scope=scope)
一些特殊情況,可以自己對feature進行填充:
def slim_conv2d(inputs,num_outputs,stride,padding,kernel,activation_fn,reg,scope):
if padding=="VALID":
padding_size=int(kernel /2)
inputs = tf.pad(inputs, paddings=[[0, 0], [padding_size, padding_size], [padding_size, padding_size], [0, 0]],
mode='REFLECT')
print("pad.inputs.shape:{}".format(inputs.get_shape()))
net = slim.conv2d(inputs=inputs,
num_outputs=num_outputs,
weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
weights_regularizer=reg,
kernel_size=[kernel, kernel],
activation_fn=activation_fn,
stride=stride,
padding=padding,
trainable=True,
scope=scope)
print("net.{}.shape:{}".format(scope,net.get_shape()))
return net
下面是使用TensorFlow自己封裝的卷積,與TensorFlow自帶的slim.conv2d進階API類似的功能
def conv2D_layer(inputs,num_outputs,kernel_size,activation_fn,stride,padding,scope,weights_regularizer):
'''
根據tensorflow slim子產品封裝一個卷積層API:包含卷積和激活函數,但不包含池化層
:param inputs:
:param num_outputs:
:param kernel_size: 卷積核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函數
:param stride: 例如:[2,2]
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正則化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
in_channels = inputs.get_shape().as_list()[3]
# kernel=[height, width, in_channels, output_channels]
kernel=[kernel_size[0],kernel_size[1],in_channels,num_outputs]
strides=[1,stride[0],stride[1],1]
# filter_weight=tf.Variable(initial_value=tf.truncated_normal(shape,stddev=0.1))
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
inputs = tf.nn.conv2d(inputs, filter_weight, strides, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs
二、反卷積
TensorFlow的進階API已經封裝好了反卷積函數,分别是: tf.layers.conv2d_transpose以及slim.conv2d_transpose,其用法基本一樣,如果想使用tf.nn.conv2d_transpose實作反卷積功能,那麼需要自己根據padding='VALID'和‘SAME’計算輸出次元,這裡提供一個函數deconv_output_length函數,可以根據輸入的次元,filter_size, padding, stride自動計算其輸出次元。
# -*-coding: utf-8 -*-
"""
@Project: YNet-python
@File : myTest.py
@Author : panjq
@E-mail : [email protected]
@Date : 2019-01-10 15:51:23
"""
import tensorflow as tf
import tensorflow.contrib.slim as slim
def deconv_output_length(input_length, filter_size, padding, stride):
"""Determines output length of a transposed convolution given input length.
Arguments:
input_length: integer.
filter_size: integer.
padding: one of SAME or VALID ,FULL
stride: integer.
Returns:
The output length (integer).
"""
if input_length is None:
return None
# 預設SAME
input_length *= stride
if padding == 'VALID':
input_length += max(filter_size - stride, 0)
elif padding == 'FULL':
input_length -= (stride + filter_size - 2)
return input_length
def conv2D_transpose_layer(inputs,num_outputs,kernel_size,activation_fn,stride,padding,scope,weights_regularizer):
'''
實作反卷積的API:包含反卷積和激活函數,但不包含池化層
:param inputs:input Tensor=[batch, in_height, in_width, in_channels]
:param num_outputs:
:param kernel_size: 卷積核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函數
:param stride: 例如:[2,2]
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正則化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
# shape = [batch_size, height, width, channel]
in_shape = inputs.get_shape().as_list()
# 計算反卷積的輸出次元
output_height=deconv_output_length(in_shape[1], kernel_size[0], padding=padding, stride=stride[0])
output_width =deconv_output_length(in_shape[2], kernel_size[1], padding=padding, stride=stride[1])
output_shape=[in_shape[0],output_height,output_width,num_outputs]
strides=[1,stride[0],stride[1],1]
# kernel=[kernel_size, kernel_size, output_channel, input_channel ]
kernel=[kernel_size[0],kernel_size[1],num_outputs,in_shape[3]]
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
inputs = tf.nn.conv2d_transpose(value=inputs, filter=filter_weight,output_shape=output_shape, strides=strides, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs
if __name__ == "__main__":
inputs = tf.ones(shape=[4, 100, 100, 3])
stride=2
kernel_size=10
padding="SAME"
net1 = tf.layers.conv2d_transpose(inputs=inputs,
filters=32,
kernel_size=kernel_size,
strides=stride,
padding=padding)
net2 = slim.conv2d_transpose(inputs=inputs,
num_outputs=32,
kernel_size=[kernel_size, kernel_size],
stride=[stride, stride],
padding=padding)
net3 = conv2D_transpose_layer(inputs=inputs,
num_outputs=32,
kernel_size=[kernel_size, kernel_size],
activation_fn=tf.nn.relu,
stride=[stride, stride],
padding=padding,
scope="conv2D_transpose_layer",
weights_regularizer=None)
print("net1.shape:{}".format(net1.get_shape()))
print("net2.shape:{}".format(net2.get_shape()))
print("net3.shape:{}".format(net3.get_shape()))
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
三、空洞卷積:增大感受野
Dilated/Atrous conv 空洞卷積/多孔卷積,又名擴張卷積(dilated convolutions),向卷積層引入了一個稱為 “擴張率(dilation rate)”的新參數,該參數定義了卷積核處理資料時各值的間距。該結構的目的是在不用pooling(pooling層會導緻資訊損失)且計算量相當的情況下,提供更大的感受野
在空洞卷積中有個重要的參數叫rate,這個參數代表了空洞的大小。 要了解空洞概念和如何操作可以從兩個角度去看。
1)從原圖角度,所謂空洞就是在原圖上做采樣。采樣的頻率是根據rate參數來設定的,當rate為1時候,就是原圖不丢失任何資訊采樣,此時卷積操作就是标準的卷積操作,當rate>1,比如2的時候,就是在原圖上每隔一(rate-1)個像素采樣,如圖b,可以把紅色的點想象成在原圖上的采樣點,然後将采樣後的圖像與kernel做卷積,這樣做其實變相增大了感受野。
2)從kernel角度去看空洞的話就是擴大kernel的尺寸,在kernel中,相鄰點之間插入rate-1個零,然後将擴大的kernel和原圖做卷積 ,這樣還是增大了感受野。
3)标準卷積為了提高感受野,可以通過池化pooling下采樣,降低圖像尺度的同時增大感受野,但pooling本身是不可學習的,也會丢失很多細節資訊。而dilated conv空洞卷積,不需要pooling,也能有較大的感受野看到更多的資訊。
4)增大卷積核的大小也可以提高感受野,但這會增加計算量
标準卷積:
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiYWan5SO2kTOyY2MwcTZ0IDNlVDMzYzX4ATM0YTM0IzLchDMyIDMy8CXn9Gbi9CXzV2Zh1WavwVbvNmLvR3YxUjLyM3Lc9CX6MHc0RHaiojIsJye.gif)
空洞卷積 :
def dilated_conv2D_layer(inputs,num_outputs,kernel_size,activation_fn,rate,padding,scope,weights_regularizer):
'''
使用Tensorflow封裝的空洞卷積層API:包含空洞卷積和激活函數,但不包含池化層
:param inputs:
:param num_outputs:
:param kernel_size: 卷積核大小,一般是[1,1],[3,3],[5,5]
:param activation_fn:激活函數
:param rate:
:param padding: SAME or VALID
:param scope: scope name
:param weights_regularizer:正則化,例如:weights_regularizer = slim.l2_regularizer(scale=0.01)
:return:
'''
with tf.variable_scope(name_or_scope=scope):
in_channels = inputs.get_shape().as_list()[3]
kernel=[kernel_size[0],kernel_size[1],in_channels,num_outputs]
# filter_weight=tf.Variable(initial_value=tf.truncated_normal(shape,stddev=0.1))
filter_weight = slim.variable(name='weights',
shape=kernel,
initializer=tf.truncated_normal_initializer(stddev=0.1),
regularizer=weights_regularizer)
bias = tf.Variable(tf.constant(0.01, shape=[num_outputs]))
# inputs = tf.nn.conv2d(inputs,filter_weight, strides, padding=padding) + bias
inputs = tf.nn.atrous_conv2d(inputs, filter_weight, rate=rate, padding=padding) + bias
if not activation_fn is None:
inputs = activation_fn(inputs)
return inputs