After several reboots all drives in my running vdev are unassigned

Hi everybody,

I’m new here and also new in TrueNAS, so greetings to everyone.

So my problem looks a little bit strange, but actually there are no damaged or degraded pools on my system. So this is why I’m actually happy with this, but as I told on top, after several reboots one by one my HDD’s become unassigned. Now all of the big-pool HDD’s are unassigned, but the pool is OK, running in a working RAIDZ2.
So my question is should I be afraid of this or is there no problem.
I searched a lot, but found no posting with the same problem.
I think it could be caused by not using the HDD ID’s, so I was new so I made it easy with sda up to sdf. Ready and running :slight_smile:
But now if I look on zpool status, all sda to sdf changed to the ID’s reboot by reboot. First look in zpool status only 2 of 6 HDD’s changed.
Now this happens to all HDDs, the nvme’s are actually uneffected.

Thanks a lot for any response
Martin

How are you connecting them to the motherboard?

It’s a Terramaster NAS, so I think via the direct SATA-backbone.
Now, all 6 HDD’s.

You have a mirror of two 1 TB SSDs that you’ve partitioned to make a boot pool, a storage pool, SLOG, and L2ARC? That’s a strikingly bad configuration, but unlikely it’s your present problem. What’s the output of zpool status?

1 Like

PLS read here might be the same issue?

Bug in TrueNAS SCALE on the Reporting Graphs for Disks?

It looks like danTrueNAS Hall of Fame is on the right path!

Far too early to say that. But I’ve seen a few other SCALE systems here where for some reason ZFS is identifying pool members by their kernel names (sda, sdb, etc.) rather than the partition UUIDs (2c854638-212c-11e6-881c-002590caf340) it should be using. And one symptom of that is that the drives show as “unassigned” in the GUI. Easy enough to fix if that’s what’s going on.

root@truenas[/home/admin]# zpool status
  pool: big-pool
 state: ONLINE
  scan: scrub repaired 0B in 04:35:38 with 0 errors on Sat Jun 29 04:32:39 2024
config:

        NAME                                          STATE     READ WRITE CKSUM
        big-pool                                      ONLINE       0     0     0
          raidz2-0                                    ONLINE       0     0     0
            ata-ST4000VN006-xxxxxx_xxxxxxxx           ONLINE       0     0     0
            ata-WDC_WD40EFRX-xxxxxxx_WD-xxxxxxxxxxxx  ONLINE       0     0     0
            ata-WDC_WD40EFRX-xxxxxxx_WD-xxxxxxxxxxxx  ONLINE       0     0     0
            ata-ST4000VN006-xxxxxx_xxxxxxxx           ONLINE       0     0     0
            ata-WDC_WD40EFRX-xxxxxxx_WD-xxxxxxxxxxxx  ONLINE       0     0     0
            ata-WDC_WD40EFRX-xxxxxxx_WD-xxxxxxxxxxxx  ONLINE       0     0     0
        logs
          mirror-1                                    ONLINE       0     0     0
            nvme0n1p6                                 ONLINE       0     0     0
            nvme1n1p6                                 ONLINE       0     0     0
        cache
          nvme0n1p5                                   ONLINE       0     0     0
          nvme1n1p5                                   ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:04 with 0 errors on Fri Jun 28 03:45:05 2024
config:

        NAME           STATE     READ WRITE CKSUM
        boot-pool      ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme1n1p3  ONLINE       0     0     0
            nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: fast-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:03 with 0 errors on Fri Jun 28 23:56:32 2024
config:

        NAME           STATE     READ WRITE CKSUM
        fast-pool      ONLINE       0     0     0
          mirror-0     ONLINE       0     0     0
            nvme0n1p7  ONLINE       0     0     0
            nvme1n1p7  ONLINE       0     0     0

Yeah, your boot device partitioning is a very bad idea, and you don’t have any use for SLOG (any workload that would call for SLOG would also call for mirrors, not RAIDZ2) or L2ARC (too little RAM) anyway. You should at some point fix that, but it isn’t your immediate problem.

While your pool isn’t using the kernel names (sda, sdb, etc.), it also isn’t using the UUIDs it should be. You did create this pool through the TrueNAS GUI, right? To make sure the UUIDs are present, can you show the output of lsblk -o name,size,type,partuuid?

In TrueNNAS all HDD’a are already signed as sda to sdf.
First I build up the pools, in the colsole they all HDD’s are shown with kernal names.
Now, I don’t know why it changed in zpool status?!?
Only rebooting between, caused by bounding the network interfaces, could not be the issue.

The TrueNAS GUI hides the UUIDs–it always has. But that’s still what it (tries to) use internally.

So am I understand you right, this looks really bad, right? No UUID on the HDDs.

root@truenas[/home/admin]# lsblk -o name,size,type,partuuid
NAME          SIZE TYPE  PARTUUID
sda           3.6T disk
sdb           3.6T disk
sdc           3.6T disk
sdd           3.6T disk
sde           3.6T disk
sdf          29.3G disk
└─sdf1       29.2G part  0157f031-590d-4bbe-845d-36f0147580c3
sdg           3.6T disk
nvme1n1     931.5G disk
├─nvme1n1p1     1M part  c15fdd60-d3d1-4e4c-ba4d-f4f57afc802e
├─nvme1n1p2   512M part  8ac20550-15e3-4c8e-afea-c69c32452c3d
├─nvme1n1p3  15.9G part  1be87a0c-00c7-49fe-8035-8260a07b77df
├─nvme1n1p4    16G part  8a5bedd5-ca21-4e87-831a-3a7e4eefe8f8
│ └─md127      16G raid1
│   └─md127    16G crypt
├─nvme1n1p5    50G part  78a00275-0b16-4a88-8677-e4ac9541d89b
├─nvme1n1p6     8G part  7064c1a7-9e5f-464b-9a0b-fd6d94e700c2
└─nvme1n1p7 841.1G part  5ed40445-5b14-4c2e-97de-05623030b4e3
nvme0n1     931.5G disk
├─nvme0n1p1     1M part  d90b75b3-3c55-4a92-b087-ef9c70f144b2
├─nvme0n1p2   512M part  c17d075a-f0f1-4bbb-bfb0-6bd6b3655f48
├─nvme0n1p3  15.9G part  3df59468-8721-4793-b519-ce4860eb7e2b
├─nvme0n1p4    16G part  4803c9bb-b59e-4b10-a06e-500e7fc6e461
│ └─md127      16G raid1
│   └─md127    16G crypt
├─nvme0n1p5    50G part  fbb00606-6faa-4684-af43-d79e908b5f2f
├─nvme0n1p6     8G part  4e5d5013-8742-4515-b995-61304b94108d
└─nvme0n1p7 841.1G part  ccdd2dbf-914c-4ea1-a52f-8cd150b07a9b

Yes, it does–not only because of the lack of UUIDs, but also because of the lack of partition tables. You could offline a disk, create a partition table on it, and try to replace the offline disk, but I’m concerned that the partition table would take enough space that ZFS would refuse to do the replacement. This is, in part, why doing anything ZFS-related at the shell is discouraged.

If you want to try it anyway, here’s how I’d proceed (adapting from my instructions here):

  • Figure out which of the drives in your pool corresponds to ata-ST4000VN006-xxxxxx_xxxxxxxx. For the remaining steps, I’ll assume it’s sda.
  • zpool offline big-pool ata-ST4000VN006-xxxxxx_xxxxxxxx
  • parted /dev/sda
  • mklabel gpt
  • mkpart "" zfs 1 -1s
  • quit
  • Run zpool status big-pool. You’ll notice that ata-ST4000VN006-xxxxxx_xxxxxxxx has been replaced by a number; make a note of that number (I’ll refer to this as bignum below).
  • Run lsblk -o name,size,type,partuuid and note the partuuid for /dev/sda1 (I’ll refer to this as partuuid below)
  • zpool replace big-pool bignum /dev/disk/by-partuuid/partuuid

If the new partition is too small, you’ll get an error almost immediately. If not, it will resilver this partition into your pool. Wait for that to finish (you can monitor its progress by running zpool status, or under the tasks in the GUI). You’ll then repeat the above steps with each of the remaining disks.

1 Like

Okay, thanks a lot for your help. I’ll give it a try.
But for me, there ist no problem to kill all pools. I got all data in backup.
It is the first try with trueNAS and RaidZ2 on this new Terramaster NAS, so I’m free to rebuild everything. In germany its called LDS :slight_smile: Learning with pain!

But am I kill all, maybe I get some more information from your side.
Why do you think ist an bad idea to partiton the boot-partition?
I did this, caused by a reference with L2ARC an LOG to speed up the HDD’s for iSCSI for VMs. I don’t neet 1TB pool as boot and I don’t want to have the boot on a single USB stick. Is this idea totally wrong?
Maybe you can give me an advise how to do it right.
As I red in your guidlines for beginners, I see, you prefer metadata SvDev, if you do not need sync writes right.

You really shouldn’t be using RAIDZ as storage for VMs. See:

There are a few reasons:

  • It creates a configuration TrueNAS doesn’t expect and therefore can’t manage–this is probably the biggest issue.
  • Trying to do L2ARC and SLOG on the same device is almost always a bad idea
  • SLOG has very specific requirements, which your SSD almost certainly doesn’t meet (power loss protection foremost among them)
  • You don’t have enough RAM to make effective use of L2ARC in any event; you’d need at least 64 GB.

Probably others, but that’s what comes most readily to mind. If you dropped L2ARC and SLOG, and just partitioned those into a small pool for boot and a larger one for fast storage, most of those issues would go away, but you’d still be left with a configuration TrueNAS doesn’t support.

The way TrueNAS is designed to be used is with a dedicated boot device (or a pair of them if you’re paranoid). Get a small SSD–USB-attached if you have to–and use that as your boot device. It’ll still be overkill–the one I bought for my UGREEN NAS was 128 GB, which is easily 8x the size it needs to be–but they’re cheap enough that it hardly matters (mine was $18).

1 Like

Okay, I think I need much more background. Thanks for all.
So I’ll do it easy :slight_smile: now.