Truenas unable to free the 'freeing' space

denizv · May 23, 2025, 12:55pm

Hi,

I have a very basic question. I recently deleted a large zvol around 16TB. I deleted its snapshots. So that zfs list doesn’t show this zvol. When I type zpool get freeing I see that there is a 15.4TB of data to be freed. On the other hand, The should-be-freed space doesn’t get smaller for days.

I tried to update to 25.04. I tried to export and import strategy. I tried several scrubs. I think I tried everything I can find online. Is there a way to force free space?

I don’t mind if I lose several bytes of data if it is related to some corrupted data. Disks are in raid10 mode and their smart status looks normal.

Regards
Denizv

winnielinnie · May 23, 2025, 1:22pm

What is the output of this:

zpool status -v poolname

zpool list -v poolname

When was the last time you ran a scrub? Is it possible to run a scrub now?

There’s a module parameter you can set, but it’s worth to see if your pool is experiencing any errors first.

What does this mean? Are you using a RAID controller or the motherboard’s “RAID mode” for its SATA ports?

denizv · May 23, 2025, 2:34pm

For the raid10 thing, I wanted say my vdevs are mirrored and there are 4 videvs, it has nothing to do with hardware raid. I mispoke.

  pool: havuz
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 1 days 07:15:03 with 1 errors on Fri May 23 02:51:37 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        havuz                                     ONLINE       0     0     0
          mirror-0                                ONLINE       0     0     0
            e8ae15de-cdc2-47a3-be9a-cea778de4187  ONLINE       0     0     0
            4c4f2bb2-261f-4ac8-aa73-c457987dfa51  ONLINE       0     0     0
          mirror-1                                ONLINE       0     0     0
            sdc2                                  ONLINE       0     0     0
            sdd2                                  ONLINE       0     0     0
          mirror-2                                ONLINE       0     0     0
            sda2                                  ONLINE       0     0     0
            sdg2                                  ONLINE       0     0     0
          mirror-3                                ONLINE       0     0     0
            sdb2                                  ONLINE       0     0     0
            sdh2                                  ONLINE       0     0     0
        logs
          nvme1n1p1                               ONLINE       0     0     0
        cache
          nvme4n1p1                               ONLINE       0     0     0
          nvme3n1p1                               ONLINE       0     0     0
          nvme2n1p1                               ONLINE       0     0     0
          nvme5n1p1                               ONLINE       0     0     0

errors: List of errors unavailable: no such pool or dataset``````
root@freenas[~]# zpool list -v havuz  
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
havuz                                     52.7T  34.4T  18.2T        -         -    44%    65%  1.00x    ONLINE  /mnt
  mirror-0                                18.2T  7.06T  11.1T        -         -     6%  38.8%      -    ONLINE
    e8ae15de-cdc2-47a3-be9a-cea778de4187  18.2T      -      -        -         -      -      -      -    ONLINE
    4c4f2bb2-261f-4ac8-aa73-c457987dfa51  18.2T      -      -        -         -      -      -      -    ONLINE
  mirror-1                                12.7T  9.67T  3.01T        -         -    62%  76.2%      -    ONLINE
    sdc2                                  12.7T      -      -        -         -      -      -      -    ONLINE
    sdd2                                  12.7T      -      -        -         -      -      -      -    ONLINE
  mirror-2                                9.06T  8.17T   909G        -         -    77%  90.2%      -    ONLINE
    sda2                                  9.09T      -      -        -         -      -      -      -    ONLINE
    sdg2                                  9.09T      -      -        -         -      -      -      -    ONLINE
  mirror-3                                12.7T  9.52T  3.20T        -         -    59%  74.8%      -    ONLINE
    sdb2                                  12.7T      -      -        -         -      -      -      -    ONLINE
    sdh2                                  12.7T      -      -        -         -      -      -      -    ONLINE
logs                                          -      -      -        -         -      -      -      -         -
  nvme1n1p1                                233G  1.58M   232G        -         -     0%  0.00%      -    ONLINE
cache                                         -      -      -        -         -      -      -      -         -
  nvme4n1p1                                466G   465G   465M        -         -     0%  99.9%      -    ONLINE
  nvme3n1p1                                466G   465G   547M        -         -     0%  99.9%      -    ONLINE
  nvme2n1p1                                466G   465G   456M        -         -     0%  99.9%      -    ONLINE
  nvme5n1p1                                466G   465G   484M        -         -     0%  99.9%      -    ONLINE

I uploaded the outputs of those commands. I can scrub it again. I have a problem with the errors saying list of errors unavaliable because I deleted those files and their snapshots, I did export/import, tried many things. The NAS system works perfectly, I just want the free space back

winnielinnie · May 23, 2025, 2:40pm

It can’t, until you resolve this or override a safety setting with a module parameter:

I’ve never seen a series of backticks like that in a zpool output. Did you add those?

How did you create the pool and add the vdevs? The remaining 3 vdevs are not using the expected device IDs that SCALE should be using.

Are you willing to run another full scrub, followed by a zpool clear?

If not, there’s a module parameter you can change that will ignore any I/O errors when attempting to free up the space.

denizv · May 23, 2025, 3:01pm

Those ticks are related to ‘Preformatted text’ ticks. This is my first post here. I did a mistake.

I started adding vdevs like this, I’ve been using Freenas/Truenas for a long time, back in the day I added vdevs as mirrored to the list. The first 2 vdev were recently ‘changed’, meaning I removed one of them replaced with a higher capacity one, resilvered, then did the same thing with the other low capacity one. I thought the replaced HDDs will be named like ‘sda’ or ‘sdb’. I was wrong but I don’t mind the current state. The data corruption problem existed before HDD upgrade. I deleted those files.

I’m willing to do another scrub but everytime I do it something weird happens. It scrubs the populated space (obviously). On the other hand when it gets to the last 15.4TB part, it finishes suddenly. Disk may show 2 or 4 checksum warnings. I tried clearing before the scrub I can do after the scrub but the warnings come back.

winnielinnie · May 23, 2025, 3:11pm

Did you rule out short and long selftests?

You do not want a hardware issue to go unnoticed or unresolved.

Those errors with ZFS could be an early warning to address a failing drive (or multiple drives).

HoneyBadger · May 23, 2025, 3:12pm

Errors like this with a pool refusing to free space might mean metadata corruption. Did you have any hexcodes in your “list of errors” in a 0x0c7a style format?

denizv · May 23, 2025, 3:14pm

Yes exactly, the scrub sometimes clear those errors, I think it means a reference to a file that no longer exists. Those errors can occur after deleting a file, I know that.

denizv · May 23, 2025, 3:17pm

I have some routine long test jobs, the report says they are fine but I know what you are saying, the drives exposed to high heat, in a server room, I especially witnessed after copying very large amounts of data, I/O errors can occur, I’m well aware of the fact that I need a smarter cooling system. On the other hand, there is no problem with the current state, so I’m trying to avoid copying all the data to somewhere else and then recopying it to a new pool

denizv · May 25, 2025, 4:26pm

I worked with the AI and it figured out it was the zfs_free_leak_on_eio variable to tweak. So I switched it to 1 and now it started freeing. After that I’ll do a scrub. Do you have any other recommendations after the scrub?

HoneyBadger · May 26, 2025, 6:12pm

Do you have full system specs here? I’d like to ensure we’re not dealing with failed or failing hardware - especially if we’re looking at either drive or storage controller/HBA faults. I/O errors under load are not normal or expected.

Can you show a zpool get leaked please, to make sure it didn’t just toss that space off into the ether?

denizv · May 26, 2025, 7:03pm

Sure my specs are like this:
Motherboard: Asus X99 Rampage V extreme; it has a lot of sata ports
CPU: Intel 5960X, first 8-core CPU I believe,
RAM, 8 x 16GB= 128GB ddr4 ram
Disks: The mirrored disks are specifically chosen to be different from each other in terms of brand or line-up but not in size. This is due to prevent any dual failure at the same time.
Cache and Logs: I have 4 cache nvme drives and 1 nvme log drive.

I checked the leaked value like you said and it says it is 125MB so it is not a very small number I believe.

Since there are many drives in the system, at first I thought the I/O faults could be due to shared sata power cables so I used more more sata power lanes, then I thought it could be due to bad sata cables, improper connection issues, so I changed some sata cables. I added some small fans onto the RAM sticks to cool them off more (I thought maybe it is due to that).

In short, I believe either on-board sata controllers or disks are causing this issue due to high volume traffic, or heat due to that. I’m not particularly concerned about some corrupted files, ofcourse I regularly backup the most important data to an offline drive. I just want to get rid of error messages when this sort of thing happens. Then I can try to add more fans to somewhere, or try new things to see if it occurs again.