Storage Pool Suspended

Hello - I currently run TrueNAS Scale 23.10.2 on a PowerEdge R720xd and everything has been great until this past weekend. It looks like the free space on my storage pool has been depleted with only 66.4GB/37.1TB free. This left the storage pool in a suspended state. I also noticed other critical alerts such as:

Screenshot 2024-05-06 113518

…and the jobs are stuck in a starting state:

Screenshot 2024-05-06 113502

Storage Dashboard shows:

I have tried rebooting the system, powering it down, unplugging the power cables, waiting 5 minutes and booting back up, to no success. I also tried powering down the server, removing the drives one by one, reattaching them to the controller and booting back up, but that didn’t work either. I’m at a loss right now. Do I just let these two jobs attempt to finish starting? Am I being impatient? Any help is appreciated!!

Post the output of zpool status

Are you currently able to access the pool? I’m not sure judging from the output you posted.

You need to identify the faulted drive and replace it. Furthermore the pool should not get 100 % filled, depending on whether you have access you need to free space immediately.

I can access the pool but no data displays as shown below.

If I go to Datasets from the UI, it says “No Datasets”.

Here is the output of zpool status - half of the output gets cut off unfortunately.

It looks like catalog.sync and catalog.sync_all are now running.

On the 720xd console window is shows this:

Can you copy the output and paste it into code tags so that we can actually see everything?

I assume you use encrypted datasets?

How much space do you have free according to the web GUI? Still 66.4GB/37.1TB?

Did you try clearing the browser cache?

Since you seem to use raidz3 and your pool is online I assume, unless I see the whole output of zpool status, that the problem is that your pool almost 100% full…

Raidz3-3 is way too wide while raidz3-4 is too narrow (2 data / 3 parity???). For a meagre 37 TB total, this suggests a wide collection of ridiculously small drives.
A satisfactory solution will involve building a new pool, with a sane layout.

Another question is how the pool got 99% full? Mistake, negligence or ransomware attack?

1 Like

I did encrypt the datasets.
Clearing my browser cache did not affect it.
Yeah, I was afraid of that. There is no way to free up space when the dataset(s) aren’t showing in the UI I guess?

Not my area of expertise, but the fact that the datasets can’t be unlocked doesn’t make it easier I guess.
I’m not even sure the issue related to the full pool although my money would be on it right now.

Potentially, if you don’t have backups and need the data and the full pool is indeed the culprit I guess you should be able to expand the pool further with new drives to free up space.

As @etorix pointed out the whole layout is not optimal. Maybe after getting the pool operational again you should look into rearranging the pool layout when you have a backup.

Sorry I can’t be more helpful here

Due to CoW, there’s no way to free space… without having enough free space first.

If you have an up-to-date backup, destroy the pool and create another.
Else, get some more drives and add a vdev to have more space and make the pool functional again.