Device /dev/"Device IDs" causing slow I/O on pool ZFSPool

Hi,

Here the specs of the Truenas 13.0-U6.1 of the system:

Model: SYS-520P-WTR Memory: 64 GiB Serial Number: A421480X3704158

System Drive: 2 x SAMSUNG MZ7L3480HCHQ-00A 445Gbyte

ZFS Pool created by:

2 x ATA SAMSUNG MZ7L31T9 1.75 TiB;

6 x ATA TOSHIBA MG07ACA1 10.91 TiB.

ZFS Pool

Open image-20240620-095926.png

image-20240620-095926.png

Steps to Reproduce

System is running normally, but sometimes give the errors: These alerts have been cleared: * Device /dev/gptid/ea54d164-5877-11ee-b974-3cecef0ff484 is causing slow I/O on pool ZFSPool. * Device /dev/gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 is causing slow I/O on pool ZFSPool. and the Truenas ping hang, and the NFS share is disconnected The NFS Share is also a Backup repository of Veeam Server where backups resides

Expected Result

No disconnect, i ve run several tests: [11/06 13:43] Marco Rughi root@truenas[~]# zpool iostat -v capacity operations bandwidth pool alloc free read write read write ---------------------------------------------- ----- ----- ----- ----- ----- ----- ZFSPool 23.0T 42.5T 71 584 12.9M 14.7M raidz2-0 23.0T 42.5T 71 584 12.9M 14.7M gptid/ea093413-5877-11ee-b974-3cecef0ff484 - - 12 96 2.16M 2.45M gptid/e9f532b1-5877-11ee-b974-3cecef0ff484 - - 11 98 2.15M 2.45M gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 - - 11 97 2.15M 2.45M gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484 - - 12 96 2.16M 2.45M gptid/ea132c35-5877-11ee-b974-3cecef0ff484 - - 11 98 2.15M 2.45M gptid/ea54d164-5877-11ee-b974-3cecef0ff484 - - 11 97 2.14M 2.45M cache - - - - - - gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484 1.18T 580G 41 10 249K 1.17M gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484 1.18T 578G 41 10 249K 1.17M ---------------------------------------------- ----- ----- ----- ----- ----- ----- boot-pool 3.03G 425G 0 0 14.2K 438 mirror-0 3.03G 425G 0 0 14.2K 438 ada0p2 - - 0 0 7.12K 219 ada1p2 - - 0 0 7.10K 219 ---------------------------------------------- ----- ----- ----- ----- ----- ----- root@truenas[~]# zpool status -v pool: ZFSPool state: ONLINE scan: scrub repaired 0B in 09:15:12 with 0 errors on Sun May 26 09:15:12 2024 config: NAME STATE READ WRITE CKSUM ZFSPool ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ea093413-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/e9f532b1-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/ea132c35-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/ea54d164-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 cache gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484 ONLINE 0 0 0 errors: No known data errors pool: boot-pool state: ONLINE scan: scrub repaired 0B in 00:00:06 with 0 errors on Sat Jun 8 03:45:06 2024 config: NAME STATE READ WRITE CKSUM boot-pool ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 ada0p2 ONLINE 0 0 0 ada1p2 ONLINE 0 0 0 errors: No known data errors root@truenas[~]# glabel status Name Status Components gptid/2acc72a5-5875-11ee-8691-3cecef0ff484 N/A ada0p1 gptid/2ad1ae0d-5875-11ee-8691-3cecef0ff484 N/A ada1p1 gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484 N/A da0p1 gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484 N/A da1p1 gptid/e9f532b1-5877-11ee-b974-3cecef0ff484 N/A da3p2 gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 N/A da4p2 gptid/ea093413-5877-11ee-b974-3cecef0ff484 N/A da2p2 gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484 N/A da5p2 gptid/ea132c35-5877-11ee-b974-3cecef0ff484 N/A da6p2 gptid/ea54d164-5877-11ee-b974-3cecef0ff484 N/A da7p2 gptid/2ad34b86-5875-11ee-8691-3cecef0ff484 N/A ada1p3 gptid/2ace12d3-5875-11ee-8691-3cecef0ff484 N/A ada0p3 Ho anche verificato che i dischi che sono stati venduti non appartenessero alla categoria SMR ( CMR e SMR: caratteristiche, differenze, vantaggi e svantaggi (recuperodatipersi.it) root@truenas[/tmp/truenas-smr-check]# ./smr-check.sh No known SMR drives detected. That’s a good sign, but it isn’t a guarantee. Double-check using other means.

Actual Result

Disconnection from NFS and system hangs, no need to reset the hardware to normal operation Veeam Error: Processing srvradionew Error: Storage not initialized. Failed to open storage for read access. Storage: [10.1.100.32:/mnt/ZFSPool/veeam/Replicas/CloudReplica_Meloni_new_ef9a13cd-29ae-47f9-9284-3fc39437a93c_vm-90327/2024-06-08T225756.vbk]. Processing SRV-SSIS Error: Storage not initialized. Failed to open storage for read access. Storage: [10.1.100.32:/mnt/ZFSPool/veeam/Replicas/CloudReplica_Meloni_new_ef9a13cd-29ae-47f9-9284-3fc39437a93c_vm-268245/2024-06-03T225151.vbk]. Job finished with error at 12/06/2024 07:12:42

Environment

Veeam Server on VM Windows Server 2019 latest patch 4 x vCPU, 16Gbyte RAM, NFS mount on truenas

Hardware Health

No crash

Error Message (if applicable)

Device /dev/gptid/ea54d164-5877-11ee-b974-3cecef0ff484 is causing slow I/O on pool ZFSPool.

Can’t see the png you attached can you show the pool layout either from the UI or zpool status from the CLI? Running glabel status from the CLI might also help for next step linking your gptid and ada/da numbers.

Hi,
here’s the glabel status:
root@truenas[/tmp]# glabel status
Name Status Components
gptid/2acc72a5-5875-11ee-8691-3cecef0ff484 N/A ada0p1
gptid/2ad1ae0d-5875-11ee-8691-3cecef0ff484 N/A ada1p1
gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484 N/A da0p1
gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484 N/A da1p1
gptid/ea093413-5877-11ee-b974-3cecef0ff484 N/A da2p2
gptid/e9f532b1-5877-11ee-b974-3cecef0ff484 N/A da3p2
gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 N/A da4p2
gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484 N/A da5p2
gptid/ea132c35-5877-11ee-b974-3cecef0ff484 N/A da6p2
gptid/ea54d164-5877-11ee-b974-3cecef0ff484 N/A da7p2
gptid/2ad34b86-5875-11ee-8691-3cecef0ff484 N/A ada1p3
gptid/2ace12d3-5875-11ee-8691-3cecef0ff484 N/A ada0p3

here the zpool status output:
root@truenas[/tmp]# zpool status
pool: ZFSPool
state: ONLINE
scan: scrub repaired 0B in 09:15:12 with 0 errors on Sun May 26 09:15:12 2024
config:

    NAME                                            STATE     READ WRITE CKS                                                                                                                                                             UM
    ZFSPool                                         ONLINE       0     0                                                                                                                                                                  0
      raidz2-0                                      ONLINE       0     0                                                                                                                                                                  0
        gptid/ea093413-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
        gptid/e9f532b1-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
        gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
        gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
        gptid/ea132c35-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
        gptid/ea54d164-5877-11ee-b974-3cecef0ff484  ONLINE       0     0                                                                                                                                                                  0
    cache
      gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484    ONLINE       0     0                                                                                                                                                                  0
      gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484    ONLINE       0     0                                                                                                                                                                  0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:06 with 0 errors on Wed Jun 26 03:45:06 2024
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        ada0p2  ONLINE       0     0     0
        ada1p2  ONLINE       0     0     0

errors: No known data errors
root@truenas[/tmp]# zpool status
pool: ZFSPool
state: ONLINE
scan: scrub repaired 0B in 09:15:12 with 0 errors on Sun May 26 09:15:12 2024
config:

    NAME                                            STATE     READ WRITE CKSUM
    ZFSPool                                         ONLINE       0     0     0
      raidz2-0                                      ONLINE       0     0     0
        gptid/ea093413-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
        gptid/e9f532b1-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
        gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
        gptid/ea1ac84a-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
        gptid/ea132c35-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
        gptid/ea54d164-5877-11ee-b974-3cecef0ff484  ONLINE       0     0     0
    cache
      gptid/e8b76b2a-5877-11ee-b974-3cecef0ff484    ONLINE       0     0     0
      gptid/e8bc28f8-5877-11ee-b974-3cecef0ff484    ONLINE       0     0     0

errors: No known data errors

pool: boot-pool
state: ONLINE
scan: scrub repaired 0B in 00:00:06 with 0 errors on Wed Jun 26 03:45:06 2024
config:

    NAME        STATE     READ WRITE CKSUM
    boot-pool   ONLINE       0     0     0
      mirror-0  ONLINE       0     0     0
        ada0p2  ONLINE       0     0     0
        ada1p2  ONLINE       0     0     0

errors: No known data errors
and also:
root@truenas[/tmp]# dmesg | grep mpr
mpr0: <Avago Technologies (LSI) SAS3008> port 0xd000-0xd0ff mem 0xe6640000-0xe664ffff,0xe6600000-0xe663ffff irq 18 at device 0.0 numa-domain 0 on pci9
mpr0: Firmware: 16.00.10.00, Driver: 23.00.00.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 0
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000a> enclosureHandle<0x0001> slot 1
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000b> enclosureHandle<0x0001> slot 2
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000c> enclosureHandle<0x0001> slot 3
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000d> enclosureHandle<0x0001> slot 4
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000e> enclosureHandle<0x0001> slot 5
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000f> enclosureHandle<0x0001> slot 6
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0010> enclosureHandle<0x0001> slot 7
mpr0: At enclosure level 0 and connector name ( )
da0 at mpr0 bus 0 scbus12 target 0 lun 0
da1 at mpr0 bus 0 scbus12 target 1 lun 0
da3 at mpr0 bus 0 scbus12 target 3 lun 0
da4 at mpr0 bus 0 scbus12 target 4 lun 0
da2 at mpr0 bus 0 scbus12 target 2 lun 0
da5 at mpr0 bus 0 scbus12 target 5 lun 0
da6 at mpr0 bus 0 scbus12 target 6 lun 0
da7 at mpr0 bus 0 scbus12 target 7 lun 0
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: mpr_user_pass_thru: user reply buffer (64) smaller than returned buffer (68)
mpr0: Reinitializing controller
mpr0: Firmware: 16.00.12.00, Driver: 23.00.00.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>
mpr0: Unfreezing SIM queue
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 0
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000a> enclosureHandle<0x0001> slot 1
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000b> enclosureHandle<0x0001> slot 2
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000c> enclosureHandle<0x0001> slot 3
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000d> enclosureHandle<0x0001> slot 4
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000e> enclosureHandle<0x0001> slot 5
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000f> enclosureHandle<0x0001> slot 6
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0010> enclosureHandle<0x0001> slot 7
mpr0: At enclosure level 0 and connector name ( )
mpr0: <Avago Technologies (LSI) SAS3008> port 0xd000-0xd0ff mem 0xe6640000-0xe664ffff,0xe6600000-0xe663ffff irq 18 at device 0.0 numa-domain 0 on pci9
mpr0: Firmware: 16.00.12.00, Driver: 23.00.00.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 0
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000a> enclosureHandle<0x0001> slot 1
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000b> enclosureHandle<0x0001> slot 2
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000c> enclosureHandle<0x0001> slot 3
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000d> enclosureHandle<0x0001> slot 4
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000e> enclosureHandle<0x0001> slot 5
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000f> enclosureHandle<0x0001> slot 6
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0010> enclosureHandle<0x0001> slot 7
mpr0: At enclosure level 0 and connector name ( )
da0 at mpr0 bus 0 scbus12 target 0 lun 0
da1 at mpr0 bus 0 scbus12 target 1 lun 0
da2 at mpr0 bus 0 scbus12 target 2 lun 0
da3 at mpr0 bus 0 scbus12 target 3 lun 0
da4 at mpr0 bus 0 scbus12 target 4 lun 0
da5 at mpr0 bus 0 scbus12 target 5 lun 0
da6 at mpr0 bus 0 scbus12 target 6 lun 0
da7 at mpr0 bus 0 scbus12 target 7 lun 0
mpr0: Reinitializing controller
mpr0: Firmware: 16.00.14.00, Driver: 23.00.00.00-fbsd
mpr0: IOCCapabilities: 7a85c<ScsiTaskFull,DiagTrace,SnapBuf,EEDP,TransRetry,EventReplay,MSIXIndex,HostDisc,FastPath,RDPQArray>
mpr0: Unfreezing SIM queue
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0009> enclosureHandle<0x0001> slot 0
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000a> enclosureHandle<0x0001> slot 1
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000b> enclosureHandle<0x0001> slot 2
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000c> enclosureHandle<0x0001> slot 3
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000d> enclosureHandle<0x0001> slot 4
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000e> enclosureHandle<0x0001> slot 5
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x000f> enclosureHandle<0x0001> slot 6
mpr0: At enclosure level 0 and connector name ( )
mpr0: Found device <881<SataDev,Direct>,End Device> <6.0Gbps> handle<0x0010> enclosureHandle<0x0001> slot 7
mpr0: At enclosure level 0 and connector name ( )

Cool. So this is the ‘SLOW’ drive that TrueNAS is talking about:
gptid/ea54d164-5877-11ee-b974-3cecef0ff484 N/A da7p2

Have you had this error before and if so was it the same drive or is this the first time?

I would have a look at smartctl -a /dev/da7 and make sure all looks good in there and if you are not doing already schedule short SMART tests on all your drives perhaps every week/month.

What sort of networking do you have setup on this system and how hard are they hit?

Do you have periodic scrubs scheduled and have you noticed any crossover with issues and scrubs running?

I think this is also old firmware and you should be using 16.00.20.00. Its does mention in the below link:

“you may experience some performance issues causing the controller to reset when using SATA HDDs. After working with Broadcom, we’ve come up with a firmware update that is not available on their website that should resolve these controller reset issues. To resolve this issue for FreeNAS systems affected in the field, here are the instructions for updating the firmware to version 16.00.12.00.”

Hi @Johnny_Fartpants and thanks for the reply, no there are not any scheduled scrubs job, during the time Truenas sent me those alerts.
and those alerts are often from different disks like:

  • Device /dev/gptid/ea5c7d87-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.
    or
    New alerts:
  • Device /dev/gptid/ea093413-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

Current alerts:

  • Device /dev/gptid/ea093413-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

New alerts:

  • Device /dev/gptid/ea54d164-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

  • Device /dev/gptid/ea132c35-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

Current alerts:

  • Device /dev/gptid/ea093413-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

  • Device /dev/gptid/ea54d164-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.

  • Device /dev/gptid/ea132c35-5877-11ee-b974-3cecef0ff484 is causing slow I/O on
    pool ZFSPool.
    to answer your question:
    root@truenas[/tmp]# smartctl -a /dev/da7
    smartctl 7.2 2021-09-14 r5236 [FreeBSD 13.1-RELEASE-p9 amd64] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family: Toshiba MG07ACA… Enterprise Capacity HDD
Device Model: TOSHIBA MG07ACA12TE
Serial Number: 33H0A0HSF95G
LU WWN Device Id: 5 000039 c68c99fac
Firmware Version: 0104
User Capacity: 12,000,138,625,024 bytes [12.0 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Mon Jul 1 08:53:27 2024 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 120) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: (1177) minutes.
SCT capabilities: (0x003d) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0
3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 7090
4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 17
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0
9 Power_On_Hours 0x0032 084 084 000 Old_age Always - 6667
10 Spin_Retry_Count 0x0033 100 100 030 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 17
23 Helium_Condition_Lower 0x0023 100 100 075 Pre-fail Always - 0
24 Helium_Condition_Upper 0x0023 100 100 075 Pre-fail Always - 0
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 16
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 21
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 31 (Min/Max 17/34)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
220 Disk_Shift 0x0002 100 100 000 Old_age Always - 1703936
222 Loaded_Hours 0x0032 084 084 000 Old_age Always - 6659
223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0
224 Load_Friction 0x0022 100 100 000 Old_age Always - 0
226 Load-in_Time 0x0026 100 100 000 Old_age Always - 590
240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

1 Short offline Completed without error 00% 6658 -

2 Short offline Completed without error 00% 6634 -

3 Short offline Completed without error 00% 6610 -

4 Short offline Completed without error 00% 6586 -

5 Short offline Completed without error 00% 6562 -

6 Short offline Completed without error 00% 6538 -

7 Short offline Completed without error 00% 6514 -

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last week i ve also updated to the lastest ( as far as i know from Broadcom support site ) the firmware version:
root@truenas[/tmp]# sas3flash -o -f 3008IT16.ROM
Avago Technologies SAS3 Flash Utility
Version 16.00.00.00 (2017.05.02)
Copyright 2008-2017 Avago Technologies. All rights reserved.

    Advanced Mode Set

    Adapter Selected is a Avago SAS: SAS3008(C0)

    Executing Operation: Flash Firmware Image

            Firmware Image has a Valid Checksum.
            Firmware Version 16.00.14.00
            Firmware Image compatible with Controller.

            Valid NVDATA Image found.
            NVDATA Major Version 0e.01
            Checking for a compatible NVData image...

            NVDATA Device ID and Chip Revision match verified.
            NVDATA Versions Compatible.
            Valid Initialization Image verified.
            Valid BootLoader Image verified.

            Beginning Firmware Download...
            Firmware Download Successful.

            Verifying Download...

            Firmware Flash Successful.

            Resetting Adapter...
            Adapter Successfully Reset.

            NVDATA Version 0e.01.30.28
    Finished Processing Commands Successfully.
    Exiting SAS3Flash.

Im not familiar with the latest version 16.00.14.00 so would be interested to hear from other community members as for a while now its been recommended to use 16.00.12.00.

I presume after updating your firmware you are still seeing these errors?

I guess this is happening when the pool is being hit hard? Are you just using one instance of Veeam and streaming data to TrueNAS over NFS? What sort of networking 1Gb or 10Gb? What sort of performance do you normally see? Do you know if your writes are sync or async?

After upgraded the fw, no news alerts received… yup i m backupping data over NFS froma Veeam Server, network is 10GbE, performance are really good for RW, i ve run also severals test to the Truenas appliance also from SMB, with no performance degrade, would you suggest me to disable dedup on ZFS pool?
best regards…

Dedupe is a whole different conversation. Personally I have kept away from it so I have zero experience but I can’t see that having any impact on your slow I/O disk errors.

Good to hear that the alerts have stopped since FW upgrade. It would still be nice to hear from other members regarding that specific FW version as like I say 16.00.12.00 has been the suggested one for some time now.

Are you able to saturate your 10Gb connection? With a single 6 diskZ2 vdev I’d be very surprised if you could write faster than 4Gb/s. This may well be what’s resulting in the errors i.e. you are pushing your pool to its limit.