Pool Errors and crashes on reboot

On reboot disks on the LSI 9207-8e HBA which goes to the netapp shelf (in dumb JBOD mode)

those disk throw errors and those pools on that 24 disk shelf often get suspended and fail on reboot… Disk Sync runs like crazy in the top right task view on each reboot while errors are being thrown

Current Resolution:

  1. Power Down Disk Shelf
  2. Force Cold Boot the Dell server
  3. Export pools on the current powered down disk Shelf (after the server has finished a cold boot)
  4. Power Up shelf (wait for all disks to show up)
  5. Import Pools
    ****NO ERROR but we cant reboot without the errors occurring again

Any ideas on how to troubleshoot this?
*Considering replacing HBA

Hardware

DELL Power Edge 720
LSI 9207-8e HBA → to netapp
NetApp DS4246 x24 WD Red 3.5 spinners SATA with interposers (This is just functioning as a JBOD)

Software
Truenas Scale Dragonfish current

1 Like

Glad to see this isn’t just me at least…

Just curious, how many pools are there on your system, and how many are in the JBOD?

I am experiencing this with 4 pools total on the system (including boot), and 3 of those pools are on the JBOD.

I have 4 pools total, that includes the boot pool

Boot pool is in the dells front panel NOT in the JBOD (it has ZERO issues)
App pool is also in the front panel NOT in the JBOD (it has ZERO issues)

Media pool has 3 vdev z2 18 disks total ( is in the JBOD has problems)
Working pool 2 vdev z1 6 disks ( is in the JBOD has problems)

all 24 disks in the JBOD NetApp DS4246 are the issue

Interesting…Your total number of pools is consistent with my issue. I was thinking perhaps having them in the front panel might resolve this, but it appears that the 3rd non boot pool is the trigger in general, irrespective of the jbod…

When you look at the job history on the borked reboot, do you see several disk_sync.all jobs? And what do the disk labels look like in the webgui?

Yes the disk sync job go crazy and my log look like yours in the other thread

The disc sync keeps taking for about 5 min over and over after reboot

How many power supplies are in use in your netapp? Don’t think it matters really, just trying to find how many commonalities there are.

Two power Supplies

Have you found a solution to this issue? I haven’t been able to do much of use lately on my end.

Out of curiosity, what is the output of zpool status ?

I’m not an IT guy but do I understand you both correctly, you have a bunch of drives in a JBOD configuration and trying to use ZFS? If true, I’m fairly certain that will likely end in disaster. Maybe if you could explain a bit more.

JBOD as in a disk enclosure, not like a jbod disk configuration.