Scale changes the disk position after a reboot

B52 · June 5, 2024, 10:39am

In my simple test setup,

With a Dell T 330, 4 spinners in 2 striped mirrors on the HBA in IT mode, and a boot SSD on the onboard SATA controller, I could observe a different assignment for the disks. For example, before boot, the boot SSD was sdb after it appears as sdc. The disks never change their physical position!

After the next boot, it shows sde.

The reporting shows, for example, sde and will after reboot now show the data of the new assigned disk, so you have in one graph data from 2 disks. Is this a little strange for me? I never saw this with other OSes. Is it a bug or a feature, or it doesn’t matter? Anyhow, when replacing a disk, take some care to format the right one, which seems to be recommended.

WiteWulf · June 5, 2024, 11:45am

I can’t help, but can confirm I observe this behaviour with my system, too (41x SAS SSDs on a HBA and expander). It doesn’t happen to all disks, and not on every boot, so seems somewhat unpredictable.

FWIW, I don’t believe ZFS works using device names, but rather the disk IDs. Theoretically you could take the disks out (except your boot disk!), shuffle them and put them back in and when it boots it will happily form all the vdevs and pools by ID. The exception is your boot pool.

And yes, this means that when replacing disks you’re going to need to do it by serial number, and know which disk is in which slot, or else shut the system down and search for the disk by eye. I had to do this recently.

william · June 5, 2024, 12:26pm

The fact that the device name changes is an artifact of Linux probing devices asynchronously and to my knowledge there is no way to change behavior.

The best thing is to always refer to disks by serial or /dev/disk/* instead of /dev/sd*.

Like WiteWulf said, it shouldn’t matter. ZFS puts labels on the disks,but more importantly TrueNAS creates partition on the disks and creates the ZFS pools using GPT IDs, seeding up disk tasting and import time.

EDIT: This behavior changed/started with kernel 5.3.

B52 · June 5, 2024, 12:41pm

Thank you for the explanation. For the reporting to have a more consistent graph, it would be great if it also hooked on /dev/disk/* if possible because, after reboot, it’s a mix of different disks at the moment.

dan · June 5, 2024, 12:46pm

ZFS works with whatever you give it–device names, names from /dev/disk/by-id/, or anything else. But this is one reason TrueNAS uses UUIDs for pool members–they may be a pain to read, but they’re guaranteed to be static.

B52 · June 5, 2024, 1:00pm

Thank you. Now I understand better. Unfortunately, it seems we have to live with it as long as there is no translation table UUid to something readable. If this is realizable, hmm, I have no clue.

WiteWulf · June 5, 2024, 2:09pm

When TrueNAS tells you a disk has failed it will include the disk’s serial number. That’s how you ensure you’re pulling the correct disk.

neofusion · June 5, 2024, 3:51pm

TrueNAS still uses device names when the installer creates the boot-pool.

Are there any plans to change that to the aforementioned better practice?

dan · June 5, 2024, 4:05pm

That question was discussed, somewhat heatedly, on the announcement thread for Dragonfish. I think the consensus was that the juice just wasn’t worth the squeeze.