極智AI | 講解 TensoRT Activation 算子

歡迎關注我的公衆号 [極智視界]，擷取我的更多筆記分享

大家好，我是極智視界，本文講解一下 TensorRT Activation 算子。

激活函數在神經網絡中具有增加非線性、資料歸一化或調整資料分布的作用。在分類、目标檢測任務中都會有所涉及，如 relu、sigmoid、relu 等。這裡講解 TensorRT 中的 Activation 算子實作。

文章目錄

1 TensorRT Activation 算子介紹
2 TensorRT Activate 算子實作

1 TensorRT Activation 算子介紹

TensorRT Activation 有豐富的内置的激活函數可直接調用，可以通過

trt.ActivationType

進行檢視支援的激活函數，如下：

講激活函數一定要附上這張圖 (不是說 TensorRT 都支援，隻是因為生動、形象)：

2 TensorRT Activate 算子實作

在 TensorRT 中如何建構一個 Activate 算子呢，來看：

# 通過 add_activation 添加 activate 算子
activationLayer = network.add_activation(inputT0, trt.ActivationType.RELU)

# 重設激活函數類型
activationLayer.type = trt.ActivationType.CLIP     

# 部分激活函數需要 1 到 2 個參數，.aplha 和 .beta 預設值均為 0
activationLayer.alpha = -2 
activationLayer.beta = 2

來看一個實際的例子：

import numpy as np
from cuda import cudart
import tensorrt as trt

# 輸入張量 NCHW
nIn, cIn, hIn, wIn = 1, 1, 3, 3  

# 輸入資料
data = np.arange(-4, 5, dtype=np.float32).reshape(nIn, cIn, hIn, wIn) 

np.set_printoptions(precision=8, linewidth=200, suppress=True)
cudart.cudaDeviceSynchronize()

logger = trt.Logger(trt.Logger.ERROR)
builder = trt.Builder(logger)
network = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH))
config = builder.create_builder_config()
inputT0 = network.add_input('inputT0', trt.DataType.FLOAT, (nIn, cIn, hIn, wIn))
#-------------------------------------------------------------------------------# 替換部分
# 這裡示範使用 ReLU 激活函數
# 也可以替換成你想用的激活函數
activationLayer = network.add_activation(inputT0, trt.ActivationType.RELU)      
#-------------------------------------------------------------------------------# 替換部分
network.mark_output(activationLayer.get_output(0))
engineString = builder.build_serialized_network(network, config)
engine = trt.Runtime(logger).deserialize_cuda_engine(engineString)
context = engine.create_execution_context()
_, stream = cudart.cudaStreamCreate()

inputH0 = np.ascontiguousarray(data.reshape(-1))
outputH0 = np.empty(context.get_binding_shape(1), dtype=trt.nptype(engine.get_binding_dtype(1)))
_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)
_, outputD0 = cudart.cudaMallocAsync(outputH0.nbytes, stream)

cudart.cudaMemcpyAsync(inputD0, inputH0.ctypes.data, inputH0.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)
context.execute_async_v2([int(inputD0), int(outputD0)], stream)
cudart.cudaMemcpyAsync(outputH0.ctypes.data, outputD0, outputH0.nbytes, cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, stream)
cudart.cudaStreamSynchronize(stream)

print("inputH0 :", data.shape)
print(data)
print("outputH0:", outputH0.shape)
print(outputH0)

cudart.cudaStreamDestroy(stream)
cudart.cudaFree(inputD0)
cudart.cudaFree(outputD0)

輸入張量形狀 (1, 1, 3, 3)
輸出張量形狀 (1, 1, 3, 3)

好了，以上分享了講解 TensorRT Activation 算子，希望我的分享能對你的學習有一點幫助。

極智AI | 講解 TensoRT Activation 算子

文章目錄

1 TensorRT Activation 算子介紹

2 TensorRT Activate 算子實作

繼續閱讀

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

筆試面試題目：滑動視窗(二)

27. Remove Element(清單)題目代碼

資料結構與算法（27）——排序（二）

無人機--飛控科普

Dijkstra--簡易版（最短路徑）

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

hdu7108哈希