ZFS pool KO after filling at 100%

Hello,

I have a ZFS volume which has been overloaded, only 6 GB left on 1.5 TB (I am pretty sure dedup was activated).
The TrueNAS Community 25.04.2.1 crash and reboot too soon so I cannot install the pending update.

I was able to cancel the scrub which was automatically launched after the import. I thought it was the responsible of the issue as it stopped after about 600 MB, I left it running for more than 24H without any progression.

I was able to import the volume on a debian live system and tried to remove some data. The removal freezes quite quickly, the live CD is still running but the pool is inaccessible (every command on the pool just hang without any feedback). After a reboot on the LiveCD, the removed data is still there and there is no change on the space usage (the removed data is still there).
I tried the TRIM command, the freeze still occurs.

A dmesg shows errors like “task zpool blocked for more than 240 seconds” and “task z_fr_iss blocked for more than 300 seconds”.
This is more or less the same messages I can see under TrueNAS before the automatic reboot.

This is a test machine, so it is not a big deal, but I would like to see if it is possible to recover from such situation (and this thread can be a reference for other people).

The ZFS choice appears to me as a robust solution so I cannot imagine filling a volume at 100% can definitely kill the pool, and if it is the case, why not sending a “disk full” status at, for example, 90%? I have correctly received the warnings but as it was overnight, I did not had time to cancel the task which filled the volume.

I hope I have provided all necessary information, feel free to ask for anything.
Let me know what I can try and I will provide the thread with the results.

Thanks in advance

Going to start with the useful advice first: see if there are any snapshots on the pool that you can delete. It’ll likely be the easiest & quickest way to free up some space.

Now for the rest of the advice:

I’ve seen production systems at 100% usage become completely unresponsive to any & all commands even after reboot - so, this is well within the realm of possibility.

There are multiple alerts at various fill levels:

System → Alert Settings:

1 Like

Which drive models do you use?

First of all, thanks for your time to try to help me.

Unfortunately, no snapshot, and even if I had some, I guess I would had the same issue as when trying to delete the data, a freeze.

I received the alerts but during the night while the script was still running and me… sleeping. Once I woke up, it was too late.

If you saw such behavior on production systems, it might be an improvment to prevent this?
I can try to copy my data to another NAS and put it back but it will quite long, I cannot imagine the downtime in a production environnent.

The hard disks are mechanical ones, as I said, test environment: a desktop computer with desktop hard disks. It is slow but I play with FreeNAS then TrueNAS for years without issues.

Anything I can try or should we consider the pool unusable for good and leave other people in such situation without solution
?

I’ve asked because I thought that trim is only applicable to SSDs. However, quick googling showed that it could also be applicable to SMR HDDs. And ZFS doesn’t “like” SMRs. It is perhaps unrelated to your issues, though…

Unfortunate that there aren’t any snapshots; are you sure though? Whats the output of:
zfs list -t snapshot
Maybe one was created at some point & has been eating up untold amounts of space. It might explain why deleting doesn’t do the needful, as the space wouldn’t free up until the snapshot is gone.

Edit: things aren’t hopeless, but we need to find a way to clear up some space & then life will be good again.

1 Like

Only “system” snapshots and they are very small

I reinstalled 25.10 and left the system without action: the pool is not mounted and the computer does not restart.
I import the pool and again, freeze and unable to delete data.

Unless magic idea/command, I guess the pool is definitely lost.

Meanwhile, I will check if I can at least copy the data, I will also see if I need to import as read only to avoid the freeze…

@HoneyBadger sorry to bug, but what’d be the next steps if trying to delete files hangs the system & no snapshots available to prune on a 100% full pool?

Maybe try to use a a pull replication with another, bigger capacity HDD (Its a 1,5TB HDD, so, using like a 2TB would cost less than 100USD)?

The underlying problem here is that due to ZFS’s COW* architecture your pool is in a state where it cannot do almost anything. If the system were up and pool imported, I would suggest trying to add a device, just to create some space to work in. Unfortunately, I suspect when the pool is being imported some data needs to be written out (ZIL playback maybe) and there is insufficient space to do such. Until that is resolved I don’t think the import can succeed.

I am not aware of any way out of this situation. In the early days of ZFS I would always create a dataset of 1GiB (small pools) or 4 GiB (large pools) that remained unmounted (no mount point set) with both quota and reservation set. This gave me some buffer if I did happen to fill a pool to 100%. I believe that OpenZFS does reserve a certain amount of space for this, but I have not run a pool out of space hard enough to need it recently.

A good rule of thumb for space is when you get above 80% used you need to grow the pool or destroy/delete some data (snapshots).

*COW: Copy on Write, ZFS never overwrites any data in place, it always allocates new space for any writes. This means that it needs space to write that new data (which may be used data or internal ZFS metadata). If you fill a pool completely (no free space), then ZFS cannot allocate any new space and (virtually) all operations come to a halt looking for free space. At that point the only option to recover the pool is to grow it so that there is free space available.

I will be very interested in learning if a read only import succeeds. I do not know how read only a read only import actually is, for example, is a history record written logging the read only import.

1 Like

For this test environment, I cannot add space, maybe a clue for someone who runs in such situation to solve it?

As the machine crashes very quickly (less than 15 minutes), I doubt a replication would work.
The import succeeds, but it is the reason of the crash of the machine.

I have imported the pool under my debian live, I have not set it to read-only and I am copying some data on an external drive, it has been running for several hours without crash. Once I have finished the copy, I will check a little bit the data (I will probably not detect any corruption) then I will destroy the pool and recreate it before putting the data back.

I am aware of the good practice of the 80%, but as I said, this happened unexpectedly because of a running script during the night, thing that can happen in a production environment as well.
Based on the “reservation” suggestion, I think it could be an internal behavior of ZFS or maybe TrueNAS can handle it to make this reservation and only the system is able to write in this area, reservation, also as suggested, with a size depending on the volume size.

I will continue to check after Monday, I will let you know about the results, but meanwhile, if you want me to try some things before I destroy the pool, feel free to ask

Thanks to all contributors of my thread

6GB free is not 100% full, so I think we’re actually seeing a dedup-related problem if this is correct:

The symptom of “imports okay, but freezes on delete” is pretty indicative of blocking on DDT records. More RAM might be able to partially mitigate this if you’ve crossed the max RAM boundary, but it’s more likely binding up on the disk IO itself.

@HomeBoy if you can get the pool imported and you aren’t actively copying data off, please try running

zpool status -D yourpoolname

It should show a deduplication histogram at the bottom if you have it enabled, and that’s what we want to see.

(You likely shouldn’t enable dedup on the newly created pool either.)

2 Likes

Yeah, I’ve hit that before. When ZFS runs completely out of space it just locks up. What worked for me was adding a small temporary drive with zpool add, freeing a few gigs, then removing it once the pool was stable again. Not pretty but it saved the pool.

That’s not possible if there are any RAIDZ vdevs in the pool.

2 Likes

list.txt (2.9 KB)

Please find the result of the command, I also put a df to show the 100% usage.

I cannot start 25.10 anymore, the kernel panics while loading the pool (this is why I ran again the live Debian to get the requested data). I am close to destroy the pool, I will just wait for your feedbacks

Pasting here for readability:

user@debian:~$ sudo zpool status -D
  pool: nas0
 state: ONLINE
  scan: scrub canceled on Mon Oct 27 15:44:24 2025
config:

	NAME        STATE     READ WRITE CKSUM
	nas0        ONLINE       0     0     0
	  raidz1-0  ONLINE       0     0     0
	    sda2    ONLINE       0     0     0
	    sdb2    ONLINE       0     0     0
	    sdc2    ONLINE       0     0     0
	    sdd2    ONLINE       0     0     0

errors: No known data errors

 dedup: DDT entries 15603722, size 10.7G on disk, 2.38G in core

bucket              allocated                       referenced          
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    12.0M   1.45T   1.03T   1.03T    12.0M   1.45T   1.03T   1.03T
     2    2.31M    296G    189G    189G    4.76M    608G    386G    387G
     4     260K   32.5G   13.2G   13.4G    1.22M    156G   63.0G   63.7G
     8     100K   12.4G   5.49G   5.55G    1.08M    138G   62.5G   63.1G
    16     254K   31.3G   17.2G   17.2G    4.99M    629G    346G    347G
    32    6.47K    749M    325M    330M     261K   29.2G   12.3G   12.5G
    64      642   75.2M   14.0M   14.9M    47.5K   5.47G   1.11G   1.18G
   128       88   5.48M   1.35M   1.62M    14.7K    863M    236M    282M
   256       43   4.76M   4.76M   4.77M    17.4K   1.94G   1.94G   1.94G
   512       26   2.27M   2.20M   2.23M    16.4K   1.32G   1.28G   1.30G
    1K        8    402K    275K    291K    11.8K    722M    522M    543M
    2K        1    512B    512B   5.81K    2.23K   1.11M   1.11M   12.9M
 Total    14.9M   1.81T   1.25T   1.25T    24.4M   2.98T   1.88T   1.89T

user@debian:~$ df
Filesystem             1K-blocks       Used Available Use% Mounted on
udev                     8041144          0   8041144   0% /dev
tmpfs                    1623284       1388   1621896   1% /run
/dev/mapper/ventoy       3650064    3650064         0 100% /run/live/medium
/dev/loop0               2958976    2958976         0 100% /run/live/rootfs/filesystem.squashfs
tmpfs                    8116420     357788   7758632   5% /run/live/overlay
overlay                  8116420     357788   7758632   5% /
tmpfs                    8116416          0   8116416   0% /dev/shm
tmpfs                       5120          8      5112   1% /run/lock
tmpfs                       1024          0      1024   0% /run/credentials/systemd-journald.service
tmpfs                    8116416          4   8116412   1% /tmp
tmpfs                    1623280         96   1623184   1% /run/user/1000
tmpfs                       1024          0      1024   0% /run/credentials/getty@tty1.service
nas0                     6405120        256   6404864   1% /mnt/nas0
nas0/xcpng-backup-nfs  869669120  863264256   6404864 100% /mnt/nas0/xcpng-backup-nfs
nas0/dedup            1164708224 1158303360   6404864 100% /mnt/nas0/dedup

You do appear to at least be getting some value out of deduplication, but the updates to your DDT are likely what is killing your performance and causing the syncio/deadman timer to trip, as it’s small/sub-4K IO on a RAIDZ1.

Is the pool presently mounted read-only on the Debian instance? That’s likely what lets it remain conscious here.

I am not sure I understood your answer as I am not able to read the table.

I did not import with any readonly flag so I am pretty sure it is RW.

Is there is anything I can try (still for educational purpose) or the destroy is the only last path?

Thanks

Look at the DSIZE columns for allocated (actually used on disk) vs referenced (logically written) - you’re squashing what would’ve taken 1.89T of disk space into 1.25T instead, yielding approximately a 1.5:1 dedup ratio.

However, that’s costing you ~2.38G of memory to index, and more crucially 10.7G on disk to store. That’s a small amount of space, yes, but that 10.7G is composed of tiny sub-4K sized records. Deleting data causes ZFS to have to read through those deduplication tables (in-RAM, quick) and then have to update and decrement the counters (on-disk, very slow) because you’re asking a RAIDZ1 of spinning disks to do lots of little I/Os.

You can try to delete just a single large file - don’t do a recursive delete - and see if it completes the free.

Start a second terminal/SSH and run tail /proc/spl/kstat/zfs/dbgmsg on a watch or cycle to keep an eye on whether or not it’s doing frees/deletes, or if it’s just a flurry of metaslab_load and metaslab_unload as it tries to juggle the dedup table in and out of memory.

4 Likes