在之前寫的文檔 《 Docker Images: Centos7 + Python3.6 + Tensorflow + Opencv + Dlib 》中建構了基于 CPU 的圖像處理常用開發環境的 Docker 鏡像。但随着圖形處理器 GPU 的快速發展, GPU 能夠更好的保證内部高資料帶寬和執行計算能力,有效代替 CPU 的部分計算,是以,也常常搭建基于 GPU 的開發環境。本文檔主要是建構可使用 GPUs 的容器,包含: centos7 + cuda 9.0 + cudnn 7.0.5 + Python 3.6 + Tensorflow-GPU 1.5.0 + Opencv-Python + Dlib 等開發環境,并記錄了建構過程中遇到的各種問題。
Nvidia-Docker 建構可使用 GPUs 的容器 : cuda9.0 + cudnn7.0.5 + Tensorflow-GPU + Opencv-Python + Dlib
-
- 基礎鏡像的選擇
- 基礎鏡像及作者資訊
- 修改時區,安裝中文支援
- 安裝 cudnn7.0.5
- 安裝 python3.6
- 安裝 tensorflow-gpu
- 安裝 opencv-python
- 安裝 dlib
- 安裝其他 python 依賴包
- 其他設定
- 完整的 dockerfile
- 建構鏡像并測試
基礎鏡像的選擇
在使用 Nvidia-Docker 建構可使用 GPUs 的容器之前,先要确定所需建構的環境的版本對應關系,這裡主要指的是 Tensorflow-GPU 與 CUDA 、 cuDNN 的版本對應關系。
如,這裡使用的是 Tensorflow-GPU 1.5.0 版本,對應 CUDA 9.0 版本以及 cuDNN 7.0.x 版本。如果版本不對,報錯如下:
- Tensorflow-GPU 1.5.0 版本對應 CUDA 9.0 版本,否則報錯 ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory ;
- 對應 cuDNN 7.0.x 版本,否則執行 Tensorflow-GPU 代碼的時候報錯提示 cuDNN 版本不對(原本安裝的是 cuDNN 7.3.1 版本,後來降至 cuDNN 7.0.5 ,測試通過)。
其它版本的對應資訊見 Tensorflow 中文官網中 經過測試的建構配置 或者 Tensorflow 英文官網中 Tested build configurations 。
![](https://img.laitimes.com/img/__Qf2AjLwojIjJCLyojI0JCLiAzNfRHLGZkRGZkRfJ3bs92YsYTMfVmepNHL9QzVZNXOGpVdGdkY1ZlMMBjVtJWd0ckW65UbM5WOHJWa5kHT20ESjBjUIF2X0hXZ0xCMx81dvRWYoNHLrdEZwZ1Rh5WNXp1bwNjW1ZUba9VZwlHdssmch1mclRXY39CXldWYtlWPzNXZj9mcw1ycz9WL49zZuBnL0kzN5IzNxUTM5IzNwkTMwIzLc52YucWbp5GZzNmLn9Gbi1yZtl2Lc9CX6MHc0RHaiojIsJye.png)
是以,這裡選擇 nvidia/cuda:9.0-devel-centos7 作為基礎鏡像,該鏡像不帶 cuDNN ,需要自己安裝對應的 cuDNN 版本。 nvidia/cuda 官方也有提供帶 cuDNN 的基礎鏡像—— nvidia/cuda:9.0-cudnn7-devel-centos7 ,當時是 cuDNN 7.3.1 版本的。
基礎鏡像及作者資訊
這裡使用的是 nvidia/cuda:9.0-devel-centos7 作為基礎鏡像,作者是 ELN ,還可以添加電子郵箱。
FROM nvidia/cuda:9.0-devel-centos7
MAINTAINER ELN
修改時區,安裝中文支援
在基礎鏡像中直接安裝 python3.6 ,進入 python3.6 中 print 中文字元時報如下錯誤,檢查 python3.6 的預設編碼為 utf-8 ,後來發現是 docker 中的基礎鏡像出現中文亂碼。
[[email protected] /]# python3.6
Python 3.6.5 (default, Apr 10 2018, 17:08:37)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-16)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print(u"������") # print(u"中文")
File "<stdin>", line 0
^
SyntaxError: 'ascii' codec can't decode byte 0xe4 in position 8: ordinal not in range(128)
>>> import sys
>>> print(sys.getdefaultencoding())
utf-8
>>> exit()
[[email protected] /]# echo "������"
中文
檢視
/etc/localtime
的結果是
lrwxrwxrwx. 1 root root 25 Oct 6 19:15 localtime -> ../usr/share/zoneinfo/UTC
,需要修改時區,安裝中文支援,配置顯示中文。
在 dockerfile 中修改時區,安裝中文支援,配置顯示中文 :
# 修改時區,安裝中文支援,配置顯示中文
RUN rm -rf /etc/localtime && \
ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
yum -y install kde-l10n-Chinese && \
yum -y reinstall glibc-common && \
localedef -c -f UTF-8 -i zh_CN zh_CN.utf8 && \
yum clean all && rm -rf /var/cache/yum
ENV LC_ALL zh_CN.utf8
# 在終端執行: export LC_ALL=zh_CN.utf8
安裝 cudnn7.0.5
安裝 cudnn7.0.5 ,首先需要下載下傳對應版本的安裝包,安裝包較大,下載下傳較慢。當然,也可以先在本機下載下傳好( 268M),并将安裝包複制到 docker 容器中,再進行安裝。
# Install cudnn7.0.5
RUN curl -fsSL https://developer.download.nvidia.com/compute/redist/cudnn/v7.0.5/cudnn-8.0-linux-x64-v7.tgz -O && \
tar --no-same-owner -xzf cudnn-8.0-linux-x64-v7.tgz -C /usr/local && \
rm -rf cudnn-8.0-linux-x64-v7.tgz && \
ldconfig
安裝 python3.6
在 dockerfile 中,安裝 python3.6 的純淨環境,安裝一些基礎的 python 包:
# Install Python 3.6
RUN yum -y install https://centos7.iuscommunity.org/ius-release.rpm && \
yum -y install python36 && \
yum -y install python36-pip && \
yum -y install vim && \
yum clean all && rm -rf /var/cache/yum && \
# ln /usr/bin/python3.6 /usr/bin/python3 && \
# ln /usr/bin/pip3.6 /usr/bin/pip3 && \
mkdir ~/.pip/ && \
echo -e "[global]\nindex-url = http://mirrors.aliyun.com/pypi/simple\n\n[install]\ntrusted-host=mirrors.aliyun.com" > ~/.pip/pip.conf
RUN pip3.6 --no-cache-dir install \
Pillow \
h5py \
ipykernel \
jupyter \
matplotlib==2.1.1 \
numpy==1.15.4 \
pandas \
scipy==1.1.0 \
sklearn \
&& \
python3.6 -m ipykernel.kernelspec
安裝 tensorflow-gpu
這裡直接
pip
安裝即可:
# Install TensorFlow GPU version from central repo
RUN pip3.6 --no-cache-dir install tensorflow-gpu==1.5.0
注意,為了避免 numpy 1.17.0+ 下
import tensorflow
報如下錯誤,需指定 numpy==1.15.4 (版本号 <1.17.0+ )。
- FutureWarning: Deprecated numpy API calls in tf.python.framework.dtypes #30427
- Fix numpy warning with numpy 1.17.0+ #30559
[[email protected] docker]$ sudo docker run -it --rm --runtime=nvidia 1e3fc1854e8c /bin/bash
[[email protected] test]# python3
Python 3.6.8 (default, Apr 25 2019, 21:02:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:493: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:494: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:495: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:496: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:497: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/usr/local/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:502: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> exit()
查詢 CUDA 、 cuDNN 版本資訊:
[[email protected] docker]$ sudo docker run -it --rm --runtime=nvidia 1e3fc1854e8c /bin/bash
# 在容器裡安裝的 CUDA 版本是 9.0 的,在本機上安裝的 CUDA 版本是 10.1 的
[[email protected] /]# nvidia-smi
Mon Jul 29 17:26:42 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:01:00.0 On | Off |
| 26% 29C P8 11W / 250W | 472MiB / 24447MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
# 查詢 CUDA 版本資訊
[[email protected] /]# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176
[[email protected] /]# cat /usr/local/cuda/version.txt
CUDA Version 9.0.176
# 查詢 cuDNN 版本資訊
[[email protected] /]# cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
#define CUDNN_MAJOR 7
#define CUDNN_MINOR 0
#define CUDNN_PATCHLEVEL 5
--
#define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
#include "driver_types.h"
安裝 opencv-python
總結了一開始在 docker 容器中安裝 opencv-python 遇到的問題,這裡簡化了之前的文檔《 Docker Images: Centos7 + Python3.6 + Tensorflow + Opencv + Dlib 》中安裝 opencv-python 的指令。
# Install opencv-python
RUN yum -y install libSM.x86_64 \
libXrender.x86_64 \
libXext.x86_64 && \
yum clean all && rm -rf /var/cache/yum
RUN pip3.6 --no-cache-dir install opencv-python==3.4.1.15
安裝 dlib
安裝 dlib 時遇到的問題及解決方法:
- 直接安裝 dlib 報錯
CMake must be installed to build the following extensions: dlib
-
安裝 cmake 後,再安裝 dlib 報錯yum -y install cmake
subprocess.CalledProcessError: Command '['cmake', '/tmp/pip-build-g_ptsyo_/dlib/tools/python', '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY=/tmp/pip-build-g_ptsyo_/dlib/build/lib.linux-x86_64-3.6', '-DPYTHON_EXECUTABLE=/usr/bin/python3.6', '-DCMAKE_BUILD_TYPE=Release']' returned non-zero exit status 1.
-
依然報上面的錯誤yum install -y python36u-devel.x86_64
-
,再安裝 dlib 則安裝成功yum -y groupinstall "Development tools"
# Install dlib
RUN yum -y groupinstall "Development tools" && \
yum -y install cmake && \
yum clean all # && rm -rf /var/cache/yum
RUN yum install -y python36-devel.x86_64 && \
yum clean all # && rm -rf /var/cache/yum
# yum search python3 | grep devel
RUN pip3.6 --no-cache-dir install dlib
安裝其他 python 依賴包
# Install keras ...
RUN pip3.6 --no-cache-dir install Cython
RUN pip3.6 --no-cache-dir install \
keras \
flask \
flask_cors \
flask_socketio \
scikit-image \
mrcnn \
imgaug \
pycocotools
其他設定
RUN mkdir /test
WORKDIR /test
CMD ["/bin/bash"]
完整的 dockerfile
這裡隻是簡單的按照建構步驟寫的 dockerfile ,可以根據需要調整鏡像的分層結構。
FROM nvidia/cuda:9.0-devel-centos7
MAINTAINER ELN
# 修改時區,安裝中文支援,配置顯示中文
RUN rm -rf /etc/localtime && \
ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && \
yum -y install kde-l10n-Chinese && \
yum -y reinstall glibc-common && \
localedef -c -f UTF-8 -i zh_CN zh_CN.utf8 && \
yum clean all && rm -rf /var/cache/yum
ENV LC_ALL zh_CN.utf8
# 在終端執行: export LC_ALL=zh_CN.utf8
# Install cudnn7.0.5
#RUN curl -fsSL https://developer.download.nvidia.com/compute/redist/cudnn/v7.0.5/cudnn-8.0-linux-x64-v7.tgz -O && \
# tar --no-same-owner -xzf cudnn-8.0-linux-x64-v7.tgz -C /usr/local && \
# rm -rf cudnn-8.0-linux-x64-v7.tgz && \
# ldconfig
ADD cudnn-8.0-linux-x64-v7.tgz /usr/local/
RUN ldconfig
# Install Python 3.6
RUN yum -y install https://centos7.iuscommunity.org/ius-release.rpm && \
yum -y install python36 && \
yum -y install python36-pip && \
yum -y install vim && \
yum clean all && rm -rf /var/cache/yum && \
# ln /usr/bin/python3.6 /usr/bin/python3 && \
# ln /usr/bin/pip3.6 /usr/bin/pip3 && \
mkdir ~/.pip/ && \
echo -e "[global]\nindex-url = http://mirrors.aliyun.com/pypi/simple\n\n[install]\ntrusted-host=mirrors.aliyun.com" > ~/.pip/pip.conf
RUN pip3.6 --no-cache-dir install \
Pillow \
h5py \
ipykernel \
jupyter \
matplotlib==2.1.1 \
numpy==1.15.4 \
pandas \
scipy==1.1.0 \
sklearn \
&& \
python3.6 -m ipykernel.kernelspec
# Install TensorFlow GPU version from central repo
RUN pip3.6 --no-cache-dir install tensorflow-gpu==1.5.0
# Install opencv-python
RUN yum -y install libSM.x86_64 \
libXrender.x86_64 \
libXext.x86_64 && \
yum clean all && rm -rf /var/cache/yum
RUN pip3.6 --no-cache-dir install opencv-python==3.4.1.15
# Install dlib
RUN yum -y groupinstall "Development tools" && \
yum -y install cmake && \
yum clean all # && rm -rf /var/cache/yum
RUN yum install -y python36-devel.x86_64 && \
yum clean all # && rm -rf /var/cache/yum
# yum search python3 | grep devel
RUN pip3.6 --no-cache-dir install dlib
# Install keras ...
RUN pip3.6 --no-cache-dir install Cython
RUN pip3.6 --no-cache-dir install \
keras \
flask \
flask_cors \
flask_socketio \
scikit-image \
mrcnn \
imgaug \
pycocotools
RUN mkdir /test
WORKDIR /test
CMD ["/bin/bash"]
建構鏡像并測試
将上述内容寫入 dockerfile 中,建構鏡像并測試:
[[email protected] docker]$ vim dockerfile
[[email protected] docker]$ sudo docker build -t="test" .
[[email protected] docker]$ sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
test latest eb7674684afa 2 seconds ago 4.18GB
nvidia/cuda 9.0-devel-centos7 ff358ea56625 3 months ago 1.9GB
[[email protected] docker]$ sudo docker run -it --rm --runtime=nvidia test
docker 運作指令加上,如果有多塊顯示卡可以通過
--runtime=nvidia
指定使用哪塊,如
-e
-e NVIDIA_VISIBLE_DEVICES=0
[[email protected] test]# echo "中文"
中文
[[email protected] test]# pip3 --version
pip 8.1.2 from /usr/lib/python3.6/site-packages (python 3.6)
[[email protected] test]# python3
Python 3.6.8 (default, Apr 25 2019, 21:02:35)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow
>>> import cv2
>>> import dlib
>>> print("中文")
中文
>>> exit()
測試 GPU 的計算能力,測試 tensorflow-gpu 版是否安裝正确:
在容器中運作 tensorflow-gpu 測試代碼:
[[email protected] test]# vim testgpu.py
[[email protected] test]# cat testgpu.py
# -*- coding: utf-8 -*-
"""
測試 GPU 的計算能力,測試 tensorflow-GPU 版是否安裝正确
"""
import tensorflow as tf
import numpy as np
import time
value = np.random.randn(5000, 1000)
a = tf.constant(value)
b = a * a
c =0
tic = time.time()
with tf.Session() as sess:
for i in range(1000):
sess.run(b)
c+=1
if c%100 == 0:
d = c / 10
# print(d)
print("計算進行%s%%" % d)
toc = time.time()
t_cost = toc - tic
print("測試所用時間%s"%t_cost)
[[email protected] test]# python3 testgpu.py
2019-07-29 20:18:54.026579: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-07-29 20:18:54.290955: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: Quadro P6000 major: 6 minor: 1 memoryClockRate(GHz): 1.645
pciBusID: 0000:01:00.0
totalMemory: 23.87GiB freeMemory: 23.26GiB
2019-07-29 20:18:54.291030: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Quadro P6000, pci bus id: 0000:01:00.0, compute capability: 6.1)
計算進行10.0%
計算進行20.0%
計算進行30.0%
計算進行40.0%
計算進行50.0%
計算進行60.0%
計算進行70.0%
計算進行80.0%
計算進行90.0%
計算進行100.0%
測試所用時間14.024679899215698
運作測試代碼的同時另起兩個終端,分别在容器與本機中檢視 GPU 運作情況:
[[email protected] docker]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7da5e0966487 test "/bin/bash" 2 minutes ago Up 2 minutes quirky_kapitsa
[[email protected] docker]$ docker exec -it 7da5e0966487 /bin/bash
[[email protected] test]# nvidia-smi
Mon Jul 29 20:19:00 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:01:00.0 On | Off |
| 26% 26C P8 19W / 250W | 23260MiB / 24447MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
[[email protected] test]# exit
exit
[[email protected] docker]$ nvidia-smi
Mon Jul 29 20:19:05 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56 Driver Version: 418.56 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro P6000 Off | 00000000:01:00.0 On | Off |
| 26% 27C P8 19W / 250W | 23260MiB / 24447MiB | 95% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 1099 G /usr/lib/xorg/Xorg 36MiB |
| 0 1734 C python3 22779MiB |
| 0 2569 G fcitx-qimpanel 36MiB |
| 0 3667 G /usr/lib/xorg/Xorg 39MiB |
| 0 4497 G fcitx-qimpanel 36MiB |
| 0 4521 G unity-control-center 4MiB |
| 0 5564 G /usr/lib/xorg/Xorg 51MiB |
| 0 6430 G /usr/lib/xorg/Xorg 107MiB |
+-----------------------------------------------------------------------------+
# 實時監控 GPU 運作情況
[[email protected] docker]$ watch -n 0.1 -d nvidia-smi