作者使用的32位系统,下面的操作也是以32位的介绍为主(如果是64位系统,就下载对应好的版本)
备注:如果刷机的时候选择的是FULL模式,那么(1,5,6步)都可以跳过了,可以跳过了,跳过了,过了,了
一、环境设置: 需要cuDNN v4、cuda 7.0.73、以及JetPack for L4T 2.2.1映像包。
1.在TX1上刷入r24.1映像包。(注意刷入系统的版本 32位or64位)
2.下载cudnn v4 armv7 版本 官方网址(https://developer.nvidia.com/rdp/cudnn-download) 32位下载地址: https://developer.nvidia.com/rdp/assets/cudnn-70-linux-armv7-v40 64位下载地址: https://developer.nvidia.com/rdp/assets/cudnn-70-linux-aarch64-v40
3.下载cuda 7.0.73 包 官方网址(https://developer.nvidia.com/embedded/linux-tegra) 32位下载地址: http://developer.nvidia.com/embedded/dlc/cuda-7-toolkit-32-bit-l4t-24-1
64位下载地址 :http://developer.nvidia.com/embedded/dlc/cuda-7-toolkit-64-bit-l4t-24-1
二、交叉编译环境设置:
[cpp] view plain copy
- $ sudo add-apt-repository universe
- $ sudo apt-get update
- $ sudo apt-get install cmake git aptitude screen g++ libboost-all-dev \
- libgflags-dev libgoogle-glog-dev protobuf-compiler libprotobuf-dev \
- bc libblas-dev libatlas-dev libhdf5-dev libleveldb-dev liblmdb-dev \
- libsnappy-dev libatlas-base-dev python-numpy libgflags-dev \
- libgoogle-glog-dev python-skimage python-protobuf python-pandas \
- libopencv-dev
三、下载caffe: [cpp] view plain copy
- $ git clone https://github.com/BVLC/caffe.git
四、修改 Makefile.config:
首先cd到你的caffe路径下,本文的caffe路径为用户名根目录 [cpp] view plain copy
- $ cd caffe
- $ mv Makefile.config.sample Makefile.config
- $ vim Makefile.config
- 第5行:开启 “USE_CUDNN := 1”
五、安装 cuda 工具包
[cpp] view plain copy
- $ sudo dpkg -i cuda-repo-l4t-r23.1-7-0-local_7.0-73_armhf.deb
- $ sudo apt-get update
- $ sudo apt-get install cuda-toolkit-7-0
- $ export LD_LIBRARY_PATH=/usr/local/cuda/lib:$LD_LIBRARY_PATH
安装完成之后,在etc/ld.so.comf.d 文件夹里面用命令行新建一个cuda.conf $cd /etc/ld.so.conf.d $sudo vim cuda.conf 键盘敲i,/usr/local/cuda/lib ,按esc退出编辑,然后输入冒号:wq 这样就配置好了cuda了
六、 安装cudnn 首先解压之前下载好的cudnn-7.0-linux-ARMv7-v4.0-prod.solitairetheme8文件,并将cudnn文件用命令行操作拷贝到cuda安装包中,记住,千万不要想直接复制黏贴到文件夹操作,会提示没有权限 [cpp] view plain copy
- $ sudo tar xvf cudnn-7.0-linux-ARMv7-v4.0-prod.solitairetheme8
[cpp] view plain copy
- $ cd cuda/include
- $ sudo cp *.h /usr/local/include/
- $ cd ../lib64
- $ sudo cp lib* /usr/local/lib/
- $ cd /usr/local/lib
[cpp] view plain copy
- $ sudo chmod +r libcudnn.so.4.0.7
- $ sudo ln -sf libcudnn.so.4.0.7 libcudnn.so.4
- $ sudo ln -sf libcudnn.so.4 libcudnn.so
- $ sudo ldconfig
(我安装的是
libcudnn.so.4.0.7
,跟上面的例子对应就好) 7、编译 caffe [cpp] view plain copy
- $ cd caffe
- $ make
8、测试 编译成功后你就可以飞起了,再cpu gpu emc没有开到最大的情况下,我这里迭代一次alexnet可以达到66ms。还算不错 [cpp] view plain copy
- caffe/build/tools/caffe_fp16 time --model=caffe/models/bvlc_alexnet/deploy.prototxt -gpu 0 -iterations 30
9、 将CPU、GPU 和 emc 提升至最大 这个提升之后貌似运行速度会更快,但是功耗肯定会有所提高,这个看情况而定吧 [cpp] view plain copy
- echo "Set Tegra CPUs to max freq"
- echo userspace > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
- echo userspace > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
- echo userspace > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
- echo userspace > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
- cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu0/cpufreq/scaling_min_freq
- cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu1/cpufreq/scaling_min_freq
- cat /sys/devices/system/cpu/cpu2/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu2/cpufreq/scaling_min_freq
- cat /sys/devices/system/cpu/cpu3/cpufreq/scaling_max_freq > /sys/devices/system/cpu/cpu3/cpufreq/scaling_min_freq
- echo "Disable Tegra CPUs quite and set current gov to runnable"
- echo 0 > /sys/devices/system/cpu/cpuquiet/tegra_cpuquiet/enable
- echo runnable > /sys/devices/system/cpu/cpuquiet/current_governor
- echo "Set Max GPU rate"
- echo 844800000 > /sys/kernel/debug/clock/override.gbus/rate
- echo 1 > /sys/kernel/debug/clock/override.gbus/state
- # burst EMC freq to top
- echo "Set Max EMC rate"
- echo 1 > /sys/kernel/debug/clock/override.emc/state
- cat /sys/kernel/debug/clock/emc/max > /sys/kernel/debug/clock/override.emc/rate