Lsi 9206-16e hdd are falling off

hello
I have a problem.
The disks connected via lsi 9206-16e fall off after a while. If you connect the disks directly to the motherboard, then there is no problem. There is a suspicion that the problem is in the settings. But which ones are not clear. I tried connecting two different 9206 controllers, the behavior is identical. Can you tell me what the problem is?

root@rpc-nas[~]# uname -a
FreeBSD rpc-nas.local 13.1-RELEASE-p9 FreeBSD 13.1-RELEASE-p9 n245432-de4561397a1 TRUENAS amd64

root@rpc-nas[~]# sas2flash -listall
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

    Adapter Selected is a LSI SAS: SAS2308_2(D1)

Num Ctlr FW Ver NVDATA x86-BIOS PCI Addr

0 SAS2308_2(D1) 20.00.07.00 14.01.00.06 07.39.02.00 00:84:00:00
1 SAS2308_2(D1) 20.00.07.00 14.01.00.06 No Image 00:86:00:00

    Finished Processing Commands Successfully.
    Exiting SAS2Flash.

root@rpc-nas[~]# sas2flash -c 1 -list
LSI Corporation SAS2 Flash Utility
Version 16.00.00.00 (2013.03.01)
Copyright (c) 2008-2013 LSI Corporation. All rights reserved

    Adapter Selected is a LSI SAS: SAS2308_2(D1)

    Controller Number              : 1
    Controller                     : SAS2308_2(D1)
    PCI Address                    : 00:86:00:00
    SAS Address                    : 5000d31-0-0050-fadd
    NVDATA Version (Default)       : 14.01.00.06
    NVDATA Version (Persistent)    : 14.01.00.06
    Firmware Product ID            : 0x2214 (IT)
    Firmware Version               : 20.00.07.00
    NVDATA Vendor                  : LSI
    NVDATA Product ID              : SAS9206-16e
    BIOS Version                   : N/A
    UEFI BSD Version               : N/A
    FCODE Version                  : N/A
    Board Name                     : SAS9206-16E
    Board Assembly                 : H3-25553-01A
    Board Tracer Number            : SV42822840

    Finished Processing Commands Successfully.
    Exiting SAS2Flash.

dmesg

(da0:mps1:0:0:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
(da0:mps1:0:0:0): Retrying command (per sense data)
(da0:mps1:0:0:0): READ(6). CDB: 08 00 00 28 01 00
(da0:mps1:0:0:0): CAM status: SCSI Status Error
(da0:mps1:0:0:0): SCSI status: Check Condition
(da0:mps1:0:0:0): SCSI sense: NOT READY asc:4,0 (Logical unit not ready, cause not reportable)
(da0:mps1:0:0:0): Error 5, Retries exhausted
GEOM_PART: da0 was automatically resized.
Use gpart commit da0 to save changes or gpart undo da0 to revert them.
GEOM_PART: integrity check failed (da0, GPT)
mps1: Controller reported scsi ioc terminated tgt 0 SMID 783 loginfo 31170000
mps1: Controller reported scsi ioc terminated tgt 0 SMID 782 loginfo 31170000
mps1: mpssas_prepare_remove: Sending reset for target ID 0
da0 at mps1 bus 0 scbus13 target 0 lun 0
da0: <ATA Netac SSD 1TB 915a> s/n AA202410111T21444225 detached
mps1: No pending commands: starting remove_device
(da0:mps1:0:0:0): Periph destroyed
da0 at mps1 bus 0 scbus13 target 0 lun 0
da0: <ATA Netac SSD 1TB 915a> Fixed Direct Access SPC-4 SCSI device
da0: Serial Number AA202410111T21444225
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 953869MB (1953525168 512 byte sectors)
mps1: Controller reported scsi ioc terminated tgt 0 SMID 913 loginfo 31110d00
(da0:mps1:0:0:0): WRITE(10). CDB: 2a 00 14 37 4e 10 00 00 08 00
(da0:mps1:0:0:0): CAM status: SCSI Status Error
(da0:mps1:0:0:0): SCSI status: Check Condition
(da0:mps1:0:0:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset, or bus device reset occurred)
(da0:mps1:0:0:0): Retrying command (per sense data)

I changed the thermal paste on the controller. cables too.

after cleaning, the disk works for a while, then drops out of the pool again.

root@rpc-nas[~]# zpool status system-VM
pool: system-VM
state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Replace the faulted device, or use ‘zpool clear’ to mark the device
repaired.
scan: resilvered 931M in 00:07:33 with 0 errors on Fri Feb 28 23:09:29 2025
config:

    NAME                                            STATE     READ WRITE CKSUM
    system-VM                                       DEGRADED     0     0     0
      mirror-0                                      DEGRADED     0     0     0
        gptid/a215696c-89e8-11ec-9e9f-f46d043c82a2  ONLINE       0     0     0
        da0p1                                       FAULTED      4 2.57K     0  too many errors

errors: No known data errors

What disk model(s) do you have attached? Is this an internal or external HBA and how are disks attached if external? Are the errors only happening on a single drive, no matter where you place it or is it limited to a single port or cable? Do you have very good cooling on HBA?

You need to list detailed hardware to answer a lot of the questions or to narrow down the problem.

1 Like

I use one disk, for test - ssd netac 1tb
Temperature - ok

root@rpc-nas[~]# mpsutil -u 1 show all
Adapter:
mps1 Adapter:
Board Name: SAS9206-16E
Board Assembly: H3-25553-01A
Chip Name: LSISAS2308
Chip Revision: ALL
BIOS Revision: 7.39.02.00
Firmware Revision: 20.00.07.00
Integrated RAID: no
SATA NCQ: ENABLED
PCIe Width/Speed: x8 (8.0 GB/sec)
IOC Speed: Full
Temperature: 61 C

PhyNum CtlrHandle DevHandle Disabled Speed Min Max Device
0 N 1.5 6.0 SAS Initiator
1 N 1.5 6.0 SAS Initiator
2 N 1.5 6.0 SAS Initiator
3 N 1.5 6.0 SAS Initiator
4 N 1.5 6.0 SAS Initiator
5 N 1.5 6.0 SAS Initiator
6 0001 0009 N 6.0 1.5 6.0 SAS Initiator
7 N 1.5 6.0 SAS Initiator

Devices:
B____T SAS Address Handle Parent Device Speed Enc Slot Wdt
00 00 4433221106000000 0009 0001 SATA Target 6.0 0001 06 1

Enclosures:
Slots Logical ID SEPHandle EncHandle Type
08 5000d3100050fadd 0001 Direct Attached SGPIO

Expanders:
NumPhys SAS Address DevHandle Parent EncHandle SAS Level

I removed the backplane trash to find the problem. I am currently using the SFF-8644 to 4sata cord.
the problem is on any disk, I tried only ssd.
now I’m looking for a spindle hd, maybe the problem is in modern SSDs

ssd

root@rpc-nas[~]# camcontrol identify da0
pass9: <Netac SSD 1TB H230915a> ACS-4 ATA SATA 3.x device
pass9: 600.000MB/s transfers, Command Queueing Enabled

protocol ACS-4 ATA SATA 3.x
device model Netac SSD 1TB
firmware revision H230915a
serial number AA202410111T21444225
WWN 5000000000000001
additional product id
cylinders 16383
heads 16
sectors/track 63
sector size logical 512, physical 512, offset 0
LBA supported 268435455 sectors
LBA48 supported 1953525168 sectors
PIO supported PIO4
DMA supported WDMA2 UDMA6
media RPM non-rotating
Zoned-Device Commands no

Feature Support Enabled Value Vendor
read ahead yes yes
write cache yes yes
flush cache yes yes
Native Command Queuing (NCQ) yes 32 tags
NCQ Priority Information no
NCQ Non-Data Command no
NCQ Streaming no
Receive & Send FPDMA Queued no
NCQ Autosense no
SMART yes yes
security yes no
power management yes yes
microcode download yes yes
advanced power management no no
automatic acoustic management no no
media status notification no no
power-up in Standby no no
write-read-verify no no
unload no no
general purpose logging yes yes
free-fall no no
sense data reporting no no
extended power conditions no no
device statistics notification no no
Data Set Management (DSM/TRIM) yes
DSM - max 512byte blocks yes 8
DSM - deterministic read yes zeroed
Trusted Computing no
encrypts all user data no
Sanitize no
Host Protected Area (HPA) no
Accessible Max Address Config no

I refused this adapter, I decided to install asm1166