天天看点

Docker 部署tensorflow_model_server过程记录

查看CUDA版本:

cat /usr/local/cuda/version.txt

查看文件链接到哪里:

ls -al libcuda.so.1

docker 下 tensorflow_model_server t2t部署

1.安装docker

见菜鸟教程:http://www.runoob.com/docker/docker-tutorial.html

2.下载serving镜像:

docker pull tensorflow/serving:latest-devel(文件较大3G多,下载时间较长)

3.用serving镜像创建容器:

  • docker run -it -p 9000:9000 tensorflow/serving:latest-devel  --privileged=true(调用GPU)
               

4.将模型拷贝到容器中:(新开个命令窗口)

docker cp [模型文件所在目录] 容器ID:/[容器中目录]

如:

  • docker cp E:/model/export 0f087sdf8sf:/model  
               

5.容器中运行tensorflow_model_server服务

tensorflow_model_server --port=9000 --model_name=nmt --model_base_path=/model

6.t2t连接

  • t2t-query-server --server=*.*.*.*:9000 --servable_name=nmt --problem=nmt_zhen --data_dir=/home/data --t2t_usr_dir=/home/script
               
    (...:9000地址同服务器地址,或docker启动时默认地址)

GPU版本问题:

1)tensorflow_model_server: error while loading shared libraries: libcuda.so.1: cannot open shared object file: No such file or directory

解决方法:将libcuda.so.1文件放到对应目录下

1)

cp /usr/local/cuda-10.0/compat/libcud* /usr/local/cuda/lib64/

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"

export CUDA_HOME=/usr/local/cuda

2)

升级apt-get: apt-get update

安装vim: apt-get install vim

vi ./root/.bashrc

添加:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:/usr/local/cuda-10.0/compat"

/usr/local/cuda-10.0/lib64/stubs

export CUDA_HOME=/usr/local/cuda

执行:

source ./root/.bashrc

3)

安装nvidia-docker 解决问题

/usr/local/cuda-10.0/lib64/stubs

加载单模型:

docker run -p 8501:8501

–mount type=bind,source=/path/to/my_model/,target=/models/my_model

-e MODEL_NAME=my_model -t tensorflow/serving

docker run -p 9098:8500 --mount type=bind,source=/opt/data/D_NMT/translate_enzh/export/V1.0/,target=/models/nmt_enzh -e MODEL_NAME=nmt_enzh -t tensorflow/serving

加载多模型:

docker run --runtime=nvidia -p 8500:8500 -p 8501:8501

–mount type=bind,source=/path/to/my_model/,target=/models/my_model

–mount type=bind,source=/path/to/my/models.config,target=/models/models.config

-t tensorflow/serving:latest-gpu --model_config_file=/models/models.config &

安装nvidai-docker问题:

安装教程:https://github.com/NVIDIA/nvidia-docker#quick-start

version `XZ_5.1.2alpha’ not found (required by /lib64/librpmio.so.3)

下载:liblzma.so.5.2.2到/opt/anaconda3/envs/py36/lib目录

执行软连接:sudo ln -s -f liblzma.so.5.2.2 liblzma.so.5

解决问题!

加载多模型:

sudo docker run -d -p 8500:8500 --mounttype=bind,source=/path/to/source_models/model1/,target=/models/model1 --mounttype=bind,source=/path/to/source_models/model2/,target=/models/model2 --mounttype=bind,source=/path/to/source_models/model3/,target=/models/model3 --mounttype=bind,source=/path/to/source_models/model.config,target=/models/model.config -t --name ner tensorflow/serving --model_config_file=/models/model.config

docker run --runtime=nvidia -p 9000:8500 --mount type=bind,source=/opt/data/models/nmt_enzh,target=/models/nmt_enzh --mount type=bind,source=/opt/data/models/nmt_zhen/,target=/models/nmt_zhen --mount type=bind,source=/opt/data/models/model.config,target=/models/model.config -t tensorflow/serving:latest-gpu --model_config_file=/models/model.config

[[email protected] /]# cd /usr/local/cuda-10.0/lib64

[[email protected] lib64]# ls -al libcuda.so.1

lrwxrwxrwx 1 root root 43 Mar 15 14:24 libcuda.so.1 -> /usr/local/cuda-10.0/lib64/stubs/libcuda.so