LSI 9300-16i (with v16.12 firmware) suffers many CKSUM, Read, Write Errors

Running CrystalDiskMark 8.0.5 should not cause CKSUM, Read, Write Errors. I always see some CKSUM errors, evenly spread across the 3 disks. About half of the time, I also get read and write errors. Hardware details are below, but in short, ESXi 7.0u3 with TrueNAS v25.04.2.6 client & LSI card setup as PCI Passthrough to TrueNAS.

Most Important Notes:

  • Failing Firmwares - all IT flavor: 14, 15, 16.10, 16.12
  • When connecting disks to motherboard SATA connectors, all tests pass, without any errors.

Test 1:

  1. Client PC: Windows 10 Enterprise 22H2 19045.6456
  2. Mount TrueNAS share via smb/cifs to Z:
  3. Run CrystalDiskMark 8.0.5 x64
  4. Set test size to 32GB
  5. Run it
  6. Expected: 0 errors reported from zpool status -v
  7. Actual: CKSUM (and sometimes read & write errors) from zpool status -v

Test 2: Same as Test 1, but mount via iSCSI

Things I’ve already tried:

  • The existing hardware has been running fine for years - using SATA from the motherboard, and direct attaching disks directly to their VMs.
  • Memtest passes for 30 hours
  • disk Temps, Reallocated Sectors, and other SMART data are good
  • Extra 80mm fan attached to heat sync on 9300-16i
  • I cleaned/replaced the thermal paste on the LSI chips

LSI 9300-16i Details

root@truenas[/home/truenas_admin]# sas3flash -listall
Avago Technologies SAS3 Flash Utility Version 16.00.00.00 (2017.05.02)
Num   Ctlr            FW Ver        NVDATA        x86-BIOS         PCI Addr
----------------------------------------------------------------------------
0  SAS3008(C0)  16.00.12.00    0e.01.00.03      No Image      00:0b:00:00
root@truenas[/home/truenas_admin]# sas3flash -c 0 -list
Avago Technologies SAS3 Flash Utility Version 16.00.00.00 (2017.05.02)
        Adapter Selected is a Avago SAS: SAS3008(C0)
        Controller Number              : 0
        Controller                     : SAS3008(C0)
        PCI Address                    : 00:0b:00:00
        SAS Address                    : 500062b-2-02a7-3b00
        NVDATA Version (Default)       : 0e.01.00.03
        NVDATA Version (Persistent)    : 0e.01.00.03
        Firmware Product ID            : 0x2221 (IT)
        Firmware Version               : 16.00.12.00
        NVDATA Vendor                  : LSI
        NVDATA Product ID              : SAS9300-16i
        BIOS Version                   : N/A
        UEFI BSD Version               : N/A
        FCODE Version                  : N/A
        Board Name                     : SAS9300-16i
        Board Assembly                 : N/A
        Board Tracer Number            : N/A
        

Errors

root@truenas[/home/truenas_admin]# zpool status -v Pool1_12TB
  pool: Pool1_12TB
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 00:03:46 with 0 errors on Tue Jan  6 16:47:30 2026
config:

        NAME                                      STATE     READ WRITE CKSUM
        Pool1_12TB                                DEGRADED     0     0     0
          raidz1-0                                DEGRADED     0     0     0
            49e2e554-e443-4120-9291-2d74ae40baf7  ONLINE       0     0    24
            a7af91a6-2fcc-4113-a22c-9d742217423b  DEGRADED     0     0    24  too many errors
            f1c489d1-f9de-4777-be48-d9caff039436  ONLINE       0     0    24

Hardware:

MB        : Asus ROG Strix Z490-F
CPU       : Intel 10th gen I9
Ram       : 64GB DDR4
Controller: LSI 9600-16i
Bios ROM  : None  (or 08.37.00.00 when tried)
EFI ROM   : None  (or 18.00.00.00 when tried)
Disks     : 3 Western Digital Ultrastar DC HC530 14 TB disks.
ESXi      : 7.0u3
TrueNAS VM: Community v25.04.2.6,  4 CPUs, 16GB RAM

Disks
1 @ WDC WUH721414ALE6L4
2 @ WDC WUH721414ALE604
Temps, Reallocated Sectors, and other SMART data are good

Firmware update process

  • Boot to UEFI shell
    sas3flash.efi -o -c 1 -e 7
    sas3flash.efi -o -c 1 -f SAS9300_16i_IT_v16.12.bin
    sas3flash.efi -o -c 1 -sasadd 500062b202a73b00
  • All processes succeed, no warnings or errors

Seems like you’re done everything I would have thought of - any chance you tried swapping the cables from the HBA? If that also produces the same results I’d argue it is time for a replacement…

2 Likes

If it’s not the cables I would suspect the card itself.

I have bought a total of 3 cards and one of them ended up having one dud port causing checksum errors. Have you already tried the other ports on the card?

1 Like

Check cabling, ports… and cooling—a 9300-16i is power hugry.

3 Likes

Yeah, he said this:

But I’ve heard that particular card (9300-16 specifically) runs really hot. It uses something like 20W all on it’s own.

Maybe it’s a used card and the previous owner didn’t handle the cooling needs well.

1 Like

If possible, could you verify whether the same checksum errors occur when running the LSI HBA on bare metal (without ESXi and PCI passthrough) ?
This would help determine whether the issue is caused by the HBA / cabling / disks themselves or by the virtualization and passthrough layer.

other thing that comes to my mind: maybe try one of the other PCIe slots on the MoBo ?

Seems a good excuse to replace that hotplate 9300 with a 9305 16i. :grin:

1 Like

@Fleshmauler @etorix I have swapped PCI slots & HBA Cables – no difference.

@neofusion It is supposed to be new, and to all appearances is actually new, but I’m not 100% positive of this Amazon seller.

@kricka-kracka I guess I could try that – do you have experience that indicates this could actually be the issue?

@Bradslinux Yeah, so additional details… I picked up the 9300-16i because it is 2 cards in 1. The plan was to use card 0 for raid-1 to boot ESXi with redundancy, and pass card 1 through to TrueNAS. FWIW, I have a 9400-8i in the mail to compare against - I’ll give up redundancy on ESX in this case - but it’s a home setup, which is (depending on who you ask… kids think Plex and battlebox are essential) less critical than a prod install.

Ahh, If you had a free slot, maybe a second HBA for passthrough? I don’t think that any HBA can be used by a host and a guest for passthrough simultaneously. I could be wrong… I have never tried it. I have extra HBAs laying around, so it never came up.
I agree with the kids :wink: I don’t run VMs, no need in my use case. Plex and *arr stack for me.

not in particular…
but my personal general experience is that it always ends up being the one thing I initially ruled out (“nah, that can’t be it…”).

Murphy has been a long-time companion of mine…he seems to like me a lot… :grinning_face_with_smiling_eyes:

I would test and proceed in whatever way is fastest and safest.
Using an elimination approach usually gets you to the root cause quicker.

Ingenious… but probably overingenious.
Since you’d be using a HBA, the best plan for redundant ESXi boot would be to use “RAID mode” on the motherboard SATA controller.
Try that 9400-8i. As a bonus, if it works it will use less power than a 9300-8i or a 9305-16i for the same result.

1 Like

So I had checksum errors whenever I moved a lot of data all at once between 2 servers with HBA cards - I tried replacing my card and cables, but it had no effect. I ended up adding a 120 fan to each case that points directly at the HBA cards. I don’t have issues anymore. The LSI cards just run hot and are notorious for overheating. I wish the card manufacturers would add bigger heatsinks and possibly fans to their cards.

1 Like

Agreed, but they ain’t making them for us in mind. These are for enterprise customers that run racks with jet engine noise level of cooling.

2 Likes

Thanks Everybody for your help.

Solution: Return the LSI 9300-16i and replace it with an LSI 9400-8i.

While I suspect cooling may have helped improve things, even with full server-level turbine flow, I was still getting errors (Heat sync was cool to the touch at this point).. So “IF” it is a heat issue, it’s just a bad design since the board simply can’t shed enough heat from the chip to the heat sync.

Or maybe I simply got a lemon.

Either way, the 9400 has worked flawlessly and cool (without any extra cooling) for a week now.

3 Likes

It could have been a lemon. We forgot to check whether you had plugged the auxiliary power for the 9300-16i: It’s so power hungry that it cannot feed from the PCIe slot alone.

1 Like

Isn’t the 9300-16i sort of like some W motors where they fuse two V blocks into one motor and then hope nothing breaks?

IIRC, the 9300-16i is made up by two 9300-8i model chipsets crammed into one PCie card? Later to be replaced by the later 9305 that delivered 16 ports from a single chip, right?

If I got that right, no wonder that it runs hot, it’s literally two cards crammed into one PCIe slot.

1 Like