ZFS Pool went OFFLINE immediately after manual scrub task started. ElectricEel 24.10.2.4

I ran a scrub task manually today and the pool went offline. I only have 1 pool - pool0. The dashboard doesn’t show anything. The storage screen shows 3 disks with exported pool. Pool0 is visible on the disks but when I click “Add to pool” option, the drop down under existing pool is empty.
Before running the manual scrub task, I saw checksum errors (9 on Disk1 and 11 on Disk2) on the HDDs and ran the command below to clear the errors

zfs clear pool0

I have the configuration exported from around a month ago when I updated EE from 20.10.2.3 to 20.10.2.4. Would importing this restore the functionality? I haven’t changed anything after the upgrade.

The way I set it up was the 2x 12TB HDDs were mirrored on pool0 and 1TB SSD was for cache on the same pool. Most important thing is the SMB shares that have all of the family images which I would really hate to lose. I also have 5 apps running on docker which I don’t care if I lose.

I would really appreciate some guidance on how to recover the pool with minimal data loss if at all possible. I now know that I should have backed up my data properly instead of relying on this one storage only. Will set it up as soon as I am able to resolve this.

Thank you in advance.

The cache part makes me a touch worried; is still slog, l2arc, or an svdev?

Mind giving hardware details & how these drives are connected? (Directly to motherboard, hba, something less recommended?)

Output of:

lsblk

zpool status

Would help (sudo as needed).

Any chance zpool import does the needful?

Really appreciate you takeing the time to reply. I had shutdown the system in fear so nothing get overwritten. Upon starting it, seems everything is up and running again. The pool is detected and this is the output:

root@truenas[/home/admin]# zpool status
  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:41 with 0 errors on Thu Dec 25 03:45:43 2025
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p3  ONLINE       0     0     0

errors: No known data errors

  pool: pool0
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub in progress since Mon Dec 29 22:14:46 2025
        1.72T / 2.10T scanned at 8.10G/s, 17.5G / 2.10T issued at 82.8M/s
        0B repaired, 0.82% done, 07:18:56 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        pool0                                     ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            5e201dbe-5173-473e-b1f8-328271d916ec  ONLINE       0     0     0
            c3e14f98-7b0d-4afd-9924-daf7676aea4c  ONLINE       0     0     0
        cache
          f7671dfd-3e8c-48d3-bb5f-b4748afa21ea    ONLINE       0     0     0

errors: 2 data errors, use '-v' for a list

I have a Terramaster 4 bay disk station. I flashed it with Truenas about a year ago.
The 2 HDDs are in the 2 disk bays. 1 SDD is in the extra slot that comes with it.

zpool status -v will give you more info on the errors.

Scrubs put pressure on the system. An overheating controller can produce checksum errors.

A 2 way mirror is not very safe and RAID is not a backup.

L2ARC can make sense above 64GB RAM.

Output of zpool status -v

errors: Permanent errors have been detected in the following files:

        /mnt/pool0/media/Movies/Dune Part Two (2024)/Dune Part Two (2024) Bluray-2160p.mkv
        /mnt/.ix-apps/docker/overlay2/ec651abca7ab85f868f6c25413fa3d2a4de0c40b980401bfe96d39102895dcd7/diff/opt/venv/lib/python3.11/site-packages/onnx/backend/test/data/pytorch-converted/test_MaxPool2d_stride_padding_dilation/test_data_set_0/input_0.pb

I can delete the movie if that would help.
Don’t know how to fix that 2nd file though.

I will setup some other backup for the data on the NAS. This was the scare that I needed to get it done.

I have 32GB RAM. Should I remove the SSD from Cache VDev of the pool0? Is that even possible?

…Is this a usb enclosure?

Removing L2ARC or SLOG should be possible via the gui, but a svdev cannot, it is an integral part of the pool, and therefore should be mirrored. It that ssd is a svdev, and goes down, so does the pool.

The SSD is not mirrored. Its setup in the pool as follows:

Data VDEVs:     1 x MIRROR | 2 wide | 10.91 TiB
Metadata VDEVs: VDEVs not assigned
Log VDEVs:      VDEVs not assigned
Cache VDEVs:    1 x 931.51 GiB
Spare VDEVs:    VDEVs not assigned
Dedup VDEVs:    VDEVs not assigned

When I go to the pool devices, I don’t see any way to edit or remove the cache - nvme1 - which is the 1TB SSD

No, its like a pre-built entry level storage solution for home use. Its got a CPU, RAM and 4 bays of HDDs with couple of USB gigabit ports and 1 RJ-45 2.5GbE Network Port. I don’t have permission to paste a link / image here. Something like this - terra-master dot com/products/f4-425

If there is a way to remove it, I would like to set it up as its own pool and have the docker apps running off of it and data stored in the HDD pool.

Storage → View VDEVs → Select the cache VDEV and select “Remove”

This is a non-destructive operation. Cache and LOG VDEVs can be added/removed any time.

1 Like

Really appreciate it. Thank you!

I would like to set it up as its own pool and have the docker apps running off of it and data stored in the HDD pool.

Is this a good idea? My thinking is since the apps setup is via docker, I can easily do it again.

1 Like

It’s exactly what I do. I have a pool dedicated to apps on a pair of mirrored NVMEs, then for any large storage requirements I have host paths to datasets on my disks pool.

There is a certain benefit too, in that I have a daily replication task configured for my nvme pool to the considerably larger disk pool, so it’s easier to restore in case of a hardware failure.

1 Like

Thank you sir. I’ll try to set it up in a similar fashion after I take backups of the images :grinning_face: