NVIDIA-SMI has failed because it couldn‘t communicate with the NVIDIA driver.

温馨提醒: 以下方法,不需要重装驱动,简单快捷。适用于Ubuntu系统下,之前已经安装过驱动,但驱动失效的问题。

如果此方法仍然无法解决问题,可参考Ubuntu下安装nvidia显卡驱动,重装驱动。

但是最近准备用GPU跑模型时,提示cuda 不存在。前段时间刚装的驱动,怎么会不存在呢?

  • 第一步,打开终端,先用nvidia-smi查看一下,发现如下报错:

    $ nvidia-smi 
    NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
    
    
  • 第二步,使用nvcc -V检查驱动和cuda。

    $ nvcc -V
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2019 NVIDIA Corporation
    Built on Wed_Oct_23_19:24:38_PDT_2019
    Cuda compilation tools, release 10.2, V10.2.89
    

    发现驱动是存在的,于是进行下一步

  • 第三步,查看已安装驱动的版本信息

    ls /usr/src |grep nvidia
    nvidia-470.63.01
    

    比如我的驱动版本是:nvidia-470.63.01

  • 第四步,依次输入以下命令

    $ sudo apt-get install dkms
    $ sudo dkms install -m nvidia -v 470.63.01
    

    等待安装完成后,再次输入nvidia-smi,查看GPU使用状态:

    ...
    DKMS: install completed.
    (base) ocr@ocr-desktop:~$ nvidia-smi 
    Tue Dec  7 09:39:49 2021       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 470.63.01    Driver Version: 470.63.01    CUDA Version: 11.4     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
    | 34%   44C    P0    28W / 175W |      0MiB /  7979MiB |      0%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+
    
    +-----------------------------------------------------------------------------+
    | Processes:                                                                  |
    |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    |        ID   ID                                                   Usage      |
    |=============================================================================|
    |  No running processes found                                                 |
    +-----------------------------------------------------------------------------+
    (base) ocr@ocr-desktop:~$
    

    最后,我们熟悉的页面又回来了!问题得以解决!