Ubuntu16.04安装cuda及docker环境下使用nvidia设备 作者: sysit 分类: d 发表于 2020-05-16 53人围观 > 本文使用NVIDIA卡,cuda已经包含了最新的驱动程序,因此不需要额外安装NVIDIA驱动。 ## 1. 物理机安装cuda ### 1.1 安装gcc ``` root@gpu:~# apt-get install build-essential ``` ### 1.2 下载cuda ``` https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1604&target_type=runfilelocal ``` ### 1.3 安装 ``` //执行安装 root@gpu:~# sh cuda_10.2.89_440.33.01_linux.run //选择accept,instll +------------------------------------------------------------------------------+ | End User License Agreement | | -------------------------- | | | | | | Preface | | ------- | | | | The Software License Agreement in Chapter 1 and the Supplement | | in Chapter 2 contain license terms and conditions that govern | | the use of NVIDIA software. By accepting this agreement, you | | agree to comply with all the terms and conditions applicable | | to the product(s) included herein. | | | | | | NVIDIA Driver | | | | | | Description | | | | This package contains the operating system driver and | |------------------------------------------------------------------------------| | Do you accept the above EULA? (accept/decline/quit): | | accept | +------------------------------------------------------------------------------+ +------------------------------------------------------------------------------+ | CUDA Installer | | - [X] Driver | | [X] 440.33.01 | | + [X] CUDA Toolkit 10.2 | | [X] CUDA Samples 10.2 | | [X] CUDA Demo Suite 10.2 | | [X] CUDA Documentation 10.2 | | Options | | Install | | | | | | | | | | | | | | | | | | | | | | | | | | | | Up/Down: Move | Left/Right: Expand | 'Enter': Select | 'A': Advanced options | +------------------------------------------------------------------------------+ //安装完成后的提示信息 =========== = Summary = =========== Driver: Installed Toolkit: Installed in /usr/local/cuda-10.2/ Samples: Installed in /home/bbders/, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-10.2/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.2/lib64, or, add /usr/local/cuda-10.2/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.2/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.2/doc/pdf for detailed information on setting up CUDA. Logfile is /var/log/cuda-installer.log ``` ### 1.4 环境变量 ``` root@gpu:~# vi /etc/profie //末尾添加 PATH=$PATH:/usr/local/cuda-10.2/bin root@gpu:~# vi /etc/ld.so.conf //末尾添加 /usr/local/cuda-10.2/lib64 //查看 root@gpu:~# cat /etc/profile # /etc/profile: system-wide .profile file for the Bourne shell (sh(1)) # and Bourne compatible shells (bash(1), ksh(1), ash(1), ...). if [ "$PS1" ]; then if [ "$BASH" ] && [ "$BASH" != "/bin/sh" ]; then # The file bash.bashrc already sets the default PS1. # PS1='\h:\w\$ ' if [ -f /etc/bash.bashrc ]; then . /etc/bash.bashrc fi else if [ "`id -u`" -eq 0 ]; then PS1='# ' else PS1='$ ' fi fi fi if [ -d /etc/profile.d ]; then for i in /etc/profile.d/*.sh; do if [ -r $i ]; then . $i fi done unset i fi export PATH=$PATH:/usr/local/cuda-10.2/bin root@gpu:~# cat /etc/ld.so.conf include /etc/ld.so.conf.d/*.conf /usr/local/cuda-10.2/lib64 //source root@gpu:~# source /etc/profile //ldconfig root@gpu:~# ldconfig ``` ### 1.5 检查 ``` root@gpu:~# nvidia-smi Fri May 15 00:21:37 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A | | 24% 29C P0 42W / 180W | 0MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:82:00.0 Off | N/A | | 23% 32C P0 44W / 180W | 0MiB / 8119MiB | 1% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ## 2 docker环境使用 * `nvidia-container-toolkit`:`docker-ce`的`19.03`版本之后已经原生支持`nvidia`,不再推荐使用`nvidia-docker2`,采用`nvidia-container-toolkit`。 * `nvidia-docker2`:`docker-ce`的`19.03`之前的版本依然使用`nvidia-docker2`。 * `nvidia-docker`的`github`地址:`https://github.com/NVIDIA/nvidia-docker` ### 2.1 安装docker ``` // 公钥 curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - // 软件仓库 sudo add-apt-repository \ "deb [arch=amd64] https://mirrors.tuna.tsinghua.edu.cn/docker-ce/linux/ubuntu \ $(lsb_release -cs) \ stable" // 安装 sudo apt-get update sudo apt-get install docker-ce //安装指定版本 //最新的docker-ce版本是19.03版本,如果要安装其他的版本,参照如下方式: apt-cache madison docker-ce apt-get -y install docker-ce=5:18.09.9~3-0~ubuntu-xenial ``` ### 2.2 nvidia-docker2 `nvidia-docker2`适用于`docker-ce 19.03`以前的版本。链接地址:`https://github.com/NVIDIA/nvidia-docker/wiki/Installation-(version-2.0)` * 仓库源 ``` curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \ sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update ``` * nvidia-docker2安装 ``` // 安装 sudo apt-get install nvidia-docker2 sudo pkill -SIGHUP dockerd ``` > 注意:/etc/docker/daemon.json会被覆盖。 * 验证使用 ``` // docker 镜像 // 镜像地址:https://hub.docker.com/r/nvidia/cuda/tags docker pull nvidia/cuda:10.2-base // 使用 root@gpu:~# docker run --runtime=nvidia --rm nvidia/cuda:10.2-base nvidia-smi Fri May 15 08:27:51 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A | | 24% 31C P0 41W / 180W | 0MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:82:00.0 Off | N/A | | 23% 34C P0 41W / 180W | 0MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` ### 2.3 nvidia-container-toolkit安装 适用于`docker-ce 19.03`版本 * 仓库 ``` # Add the package repositories distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list ``` * 安装 ``` sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` > 注意:/etc/docker/daemon.json会被覆盖。 * 验证 ``` // docker 镜像 // 镜像地址:https://hub.docker.com/r/nvidia/cuda/tags docker pull nvidia/cuda:10.2-base //测试 root@gpu:~# docker run --gpus all nvidia/cuda:10.2-base nvidia-smi Fri May 15 08:05:52 2020 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 440.33.01 Driver Version: 440.33.01 CUDA Version: 10.2 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 1080 Off | 00000000:02:00.0 Off | N/A | | 24% 31C P0 41W / 180W | 0MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 1080 Off | 00000000:82:00.0 Off | N/A | | 22% 34C P0 40W / 180W | 0MiB / 8119MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ ``` * 使用手册 ``` #### Test nvidia-smi with the latest official CUDA image docker run --gpus all nvidia/cuda:10.0-base nvidia-smi # Start a GPU enabled container on two GPUs docker run --gpus 2 nvidia/cuda:10.0-base nvidia-smi # Starting a GPU enabled container on specific GPUs docker run --gpus '"device=1,2"' nvidia/cuda:10.0-base nvidia-smi docker run --gpus '"device=UUID-ABCDEF,1"' nvidia/cuda:10.0-base nvidia-smi # Specifying a capability (graphics, compute, ...) for my container # Note this is rarely if ever used this way docker run --gpus all,capabilities=utility nvidia/cuda:10.0-base nvidia-smi ``` 如果觉得我的文章对您有用,请随意赞赏。您的支持将鼓励我继续创作! 赞赏支持