It is 0720. Around the time for that Coffee. Unfortunately I can’t stand the taste. I’ll settle for slow eating my home made breakfast Quesodilla’s while I wait.
Alright so my login timed out. Back in I see two tasks running. The reconfiguring dataset job, and the systemdataset.update. 40% on the one still and 0% of the dataset.update. Not sure what its doing or where its getting stuck.
Ok I think leave things for now and let it untwist its knickers.
Do you have a high degree of confidence it will eventually fix itself?
I’ve not used DRAID before so thats the variable here. But as I’ve mentioned I’ve been in a right mess with 90 disks plus 4 hot-spares and ZFS had no right to survive but it did however it needed a bit of time to sort itself out.
I’d leave the export overnight and hopefully between now and then it will complete. If no good then you could try a reboot and see how things look then.
It’s hard to tell what the system is doing from here. I’d love to see zpool status and htop tbh.
Still stuck this morning so I issued a restart and am waiting to see what happens. The zpool status didn’t even show an estimation anymore for resilvering. Had dropped to 6mb/s speed and no actual progress on it according to its own counter.
Ok fair enough. Let’s see how things look after a restart.
Key question is can you see your datasets within the UI. If not but you can see your pool then it may be worth trying zpool export <poolname>
from the CLI and if that exports try importing via the UI.
I see the datasets now after a restart and the resilver appears to be moving in the zpool status. I’ll wait for a few and make sure its still moving.
You guys are taking your time, that is the best way to move with a problem like this.
Not having the data from the commands available does limit some of the help and that least to assumptions, something I try to limit. Assumptions=Mistakes
Maybe you could run the command zpool status -v > /mnt/poolname/zpoolstatus.txt
and that mnt path could be any pool, if you have a second pool that is operational on the system. Then post the created file.
Thats great to hear.
Like @joeschmuck said we would be very interested to see the output of zpool status if possible as we can be more sure about the advice we are giving.
Johnny/joeschmuck,
I would love to be able to provide information when/if I can. Due to the nature of the network this system sits on in our environment its not easy to pull information off the network. USB storage of any kind is an absolute no no and burning media like a CD, while possible, is a process.
Obviously nothing sensitive is on a zpool status output like that so I may be able to get it. I’ll see what I can do.
I appreciate all the help!
Ok so new error. “WARNING: POOL ‘FS01’ has encountered an uncorrectable I/O error and has been suspended.”
Ok that could suggest hardware issue between your disks and your HBA and may have been causing the issues yesterday.
What are your hardware specs?
Intel Xeon Gold 6230R
256GB of RAM
and 4x 9305-16i HBAs
Whats the cooling and airflow like in the chassis / room? Is it in a proper temperature controlled DC? Im only guessing here but the DRAID will be working very hard during the re-build and that could possibly be casing one or more of your HBAs to overheat thus causing I/O errors. ZFS only suspends the pool as a failsafe to protect your data.
The room itself is not a datacenter. But the semi walled off area is. Raised floors, chiller units. The HBAs have also had an additional fan installed directly over them inside the chassis to keep the temps under control.
Do you monitor drive temps? If so do they look ok?
Drives that I’ve queried over the past few days have been relatively cool. In the mid to upper 20s. The last one I looked at today was 28Cs
Fair enough.
Do you have a backup of this data? If not you could export the pool and import from the CLI in readonly mode and copy the data off before trying other things. zpool import -o readonly=on -o altroot=/mnt FS01
Do you have any other suitable hardware you could move these drives over to?
I have a 2nd identical unit I could use. I also have plenty of spare drives. The problem I’m seeing is TrueNAS core and TrueNAS scale don’t seem to play well together? I tried to setup a replication task to pull data off but it didn’t want to connect.