Hi there,
i have two GPUs (both Titan-XPs, i love them). One for TrueNas Scale Dragonfish and an isolated Titan-XP for CUDA-Development under Python and Fortran to a dedicated Rocky9 VM.
Rocky9 VM is starting up, so i installed the NVIDIA- and CUDA-Packages via a normal “dnf” with no issues
[nynros@rocky9entw ~]$ dnf list installed |grep nvidia-driver |grep x86
nvidia-driver.x86_64 3:560.35.03-1.el9 @cuda-rhel9-x86_64
nvidia-driver-cuda.x86_64 3:560.35.03-1.el9 @cuda-rhel9-x86_64
nvidia-driver-cuda-libs.x86_64 3:560.35.03-1.el9 @cuda-rhel9-x86_64
nvidia-driver-libs.x86_64 3:560.35.03-1.el9 @cuda-rhel9-x86_64
[nynros@rocky9entw ~]$ dnf list installed |grep cuda-driver |grep x86
cuda-driver-devel-12-6.x86_64 12.6.68-1 @cuda-rhel9-x86_64
[nynros@rocky9entw ~]$
Everything seems fine even IOMMU and the other stuff
IOMMU Group * 00:00.0 Host bridge [0600]: Intel Corporation 440FX - 82441FX PMC [Natoma] [8086:1237] (rev 02)
00:01.0 ISA bridge [0601]: Intel Corporation 82371SB PIIX3 ISA [Natoma/Triton II] [8086:7000]
00:01.1 IDE interface [0101]: Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] [8086:7010]
00:01.3 Bridge [0680]: Intel Corporation 82371AB/EB/MB PIIX4 ACPI [8086:7113] (rev 03)
00:02.0 VGA compatible controller [0300]: Red Hat, Inc. QXL paravirtual graphic card [1b36:0100] (rev 05)
00:03.0 Ethernet controller [0200]: Red Hat, Inc. Virtio network device [1af4:1000]
00:04.0 USB controller [0c03]: NEC Corporation uPD720200 USB 3.0 Host Controller [1033:0194] (rev 03)
00:05.0 Communication controller [0780]: Red Hat, Inc. Virtio console [1af4:1003]
00:06.0 SCSI storage controller [0100]: Red Hat, Inc. Virtio block device [1af4:1001]
00:07.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1)
00:08.0 Unclassified device [00ff]: Red Hat, Inc. Virtio memory balloon [1af4:1002]
[root@rocky9entw ~]# grubby --info=DEFAULT
index=0
kernel=“/boot/vmlinuz-5.14.0-427.33.1.el9_4.x86_64”
args=“ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=UUID=6bdb4f8c-e8ff-4b5a-9d72-b03d59687b41 rhgb quiet $tuned_params rd.driver.blacklist=nouveau modprobe.blacklist=nouveau intel_iommu=on iommu=pt”
root=“UUID=cbd44d25-e5be-4a46-8c4f-32d448d386c9”
initrd=“/boot/initramfs-5.14.0-427.33.1.el9_4.x86_64.img $tuned_initrd”
title=“Rocky Linux (5.14.0-427.33.1.el9_4.x86_64) 9.4 (Blue Onyx)”
id=“e1ea827cb91d4815ba486c4d2b405f62-5.14.0-427.33.1.el9_4.x86_64”
[root@rocky9entw ~]#
[root@rocky9entw ~]# lspci -vnnn | grep -i nvidia
00:07.0 VGA compatible controller [0300]: NVIDIA Corporation GP102 [TITAN Xp] [10de:1b02] (rev a1) (prog-if 00 [VGA controller])
Subsystem: NVIDIA Corporation Device [10de:123f]
Kernel modules: nouveau, nvidia_drm, nvidia
[root@rocky9entw ~]#
[root@rocky9entw ~]# dkms status
nvidia-open/560.35.03, 5.14.0-427.33.1.el9_4.x86_64, x86_64: installed
[root@rocky9entw ~]#
[root@rocky9entw ~]# sestatus
SELinux status: disabled
[root@rocky9entw ~]#
[root@rocky9entw ~]# dnf repolist
repo id repo name
appstream Rocky Linux 9 - AppStream
baseos Rocky Linux 9 - BaseOS
crb Rocky Linux 9 - CRB
cuda-rhel9-x86_64 cuda-rhel9-x86_64
epel Extra Packages for Enterprise Linux 9 - x86_64
epel-cisco-openh264 Extra Packages for Enterprise Linux 9 openh264 (From Cisco) - x86_64
extras Rocky Linux 9 - Extras
nvhpc NVIDIA HPC SDK
[root@rocky9entw ~]#
[nynros@rocky9entw ~]$ lsmod|grep nv
nvidia 9760768 1
libnvdimm 245760 1 nfit
drm 741376 7 drm_kms_helper,qxl,nvidia,drm_ttm_helper,ttm
[nynros@rocky9entw ~]$
but the driver cant communicate with the card and i dunno why …
[Sat Aug 31 15:49:49 2024] nvidia: probe of 0000:00:07.0 failed with error -1
[Sat Aug 31 15:49:49 2024] nvidia-nvlink: Unregistered Nvlink Core, major device number 236
[Sat Aug 31 15:49:49 2024] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[Sat Aug 31 15:49:49 2024] nvidia 0000:00:07.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
NVRM: nvidia.ko because it does not include the required GPU
NVRM: www.nvidia.com.
[Sat Aug 31 15:49:49 2024] nvidia: probe of 0000:00:07.0 failed with error -1
[Sat Aug 31 15:49:49 2024] nvidia-nvlink: Unregistered Nvlink Core, major device number 236
[Sat Aug 31 15:49:49 2024] nvidia-nvlink: Nvlink Core is being initialized, major device number 236
[Sat Aug 31 15:49:49 2024] nvidia 0000:00:07.0: vgaarb: VGA decodes changed: olddecodes=none,decodes=none:owns=io+mem
[nynros@rocky9entw ~]$ nvidia-smi
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
Please help, thx