Hello,
I have an IX Systems Mini 3.0 X+ running TrueNAS Scale 24.10.1 (upgraded in December , no problems)
I’ve been moving large files (10+ Terabytes of files) between data mounts in the single large pool when I ran out of space (I have lots of space)
I found out that it was the snapshots eating up all the space, so I removed them from the appropriate (/mnt/Spinner/Media and /mnt/Spinner/Media_Jellyfin ) points in the web UI.
The web UI wouldn’t do anything (yes I held down shift before hitting the ok button after clicking the “confirm” checkbox) , so I restarted from the Web UI .
When I woke up this morning it was back up and running so I continued to transfer files but noticed it was VERY slow (it would transfer a large chunk then drop to 0 bytes/second for a while)
I restarted again from the Web UI and since then none of the following works:
WebUI : won’t load the Storage tab and the Data tab shows nothing (which makes sense if the pool isn’t loading)
Web UI: If I click on disks , it can’t load the plugin
WebUI: only widget that will load on the front page are “CPU Model” and “System Information”
CLI: zpool status hangs
CLI: zfs list hangs
I am getting a bunch of info on bootup (see attachments) indicating
“task xyz blocked for more than xxx seconds” , which includes middlewared, zpool, txg_sync, etc… , which causes it to take about 20 mins to boot on restart now.
I’ve left the system running for 6 hours and I see activity going on with hte hard drives, and during boot it finds all 7 (5 spinners, 2 SSDs) , and I do notice a rather large IOWait when i run sar 1 1 (I’m watching it on the display running non stop and I can see it go up to 70% disk i/o wait , then drops down to 11% then comes back up)
I’ve checked dmesg and nothing stands out, also /var/log/messages doesn’t show much.
I’m seeing messages in either the daemon.log or net_api.log file talking about trying to start services.container .
Now when I woke up this morning, as mentioned I could access the shares and was able to delete the large snapshots , but then after another reboot (not forcefully) in the WEB UI , it is having issues with the pool.
Anyone have any suggestions or are there other logs besides /var/log/messages and dmesg I should check to see what exactly the disk i/o is doing?
Is there a log or a command (obviously nothing that’s zfs or zpool as those will just hang , even zpool history and zpool status hang) I can run to see what it is doing on the hard drives ? (is it trying to recover the RAID for some reason? Is it running fsck on the large pool )
Any assistance is appreciated