Hi all. My TrueNAS server has been up for around 200 days, everything came to a halt recently and now I am unable to import the pool anywhere and am getting the following error anytime I use the import command :-
WARNING: Pool has encountered an uncorrectable I/O failure and has been suspended.
This locks up the TrueNAS instance and requires a hard reset at this point. I’ve followed hours of AI driven tests to try and gain access, but i’m struggling and now need some help.
I can import in single user mode only with readonly=on and access the pool - but the pool seems resistant to being in RW mode and won’t let me import and will crash with the same message. I’m running with a LSI3008 with HBA passthrough in ESXi, TrueNAS as a VM. I’ve reinstalled a fresh TrueNAS instance and attached the HBA, but this also exhibits the same issue and won’t boot naturally without this error and a system halt.
When I import the pool in readonly in single user mode and run zpool status, I can see all disks as online, but at the end I see “3 data errors” which with -v gives this :-
I’m 24 hours in now and no idea what to try next. Can anyone point me in the right direction, even if it is to just extract data at this point (can’t access anything as I can’t make changes to the filesystem to enable SSH access etc).
What ESXi version? Full details of your hardware, how you have the VM of TrueNAS setup up along with your pool info. Do you have current backup or somewhere to put all the data in case of recovery?
ESXi version 6.7.0 Update 3 (Build 15160138), LSI 3008 SAS3 HBA in passthrough to the VM, running on Tyan 7012 with two older gen X5670 cpus. I can mount readonly in single user, but getting the data off is proving tricky as I can’t make any changes to config etc. or create any shares to use over the network. It’s non-critical data but will be a pain to replace should I need to, I will buy a large HDD to temporarily house the data should I be able to get it off somehow. I don’t understand the I/O failure message, if it’s related to a single disk or the HBA, or the cabling.
The output of zpool status -v is in the original post, all online with no r/w/checksum errors, although this has been through a zpool clean already to try and resolve these with some AI guidance. There were some errors here originally and original errors were pointing to “State is faulted. The pool metadata is corrupted”. It managed to import itself and do a resilver (of the metadata I guess), but from there on in i’ve had trouble with this I/O failure message which seems to crash TrueNAS and require a hard reset.
Those files are in the .system dataset, so it should be possible to take a config backup (hopefully?) - blow away the .system dataset manually (or move it to a temporary virtual-disk backed pool) and then try to import the pool again.
I’m on 13.3 U1.2 - thanks for the help here, much appreciated. The problem i’ve got at the moment is being unable to make any changes to anything at all in single user mode, and the pool is online as read only too. I’m not sure how I go from here to be able to make any changes to the TrueNAS config or the pool itself, as booting into standard TrueNAS just hangs as it attempts to import the pool that’s faulting with this i/o error.
Are you able to boot without the pool connected (un-passthrough the LSI card?) just to see if it attempts the proper configuration (shares, users, etc) which you could then back up to a separate location?
OK, not sure how but i’ve managed to get from single user mode into the interface kicking in, and the pool is online (readonly). Still can’t create any shares to access data but it’s a step in the right direction!
Hi guys. I’ve got a workaround for this horrible issue, boot into single user mode, import the pool using the command line with the readonly=on flag. Type ‘exit’ which forces the usual load process to kick in, it avoids importing my pool which avoids the crash and allows the services and web UI to load. I can then enable SSH and use WinSCP to connect and copy files off. The files are intact, have just tested 3. This doesn’t solve my issue for the corrupt .system files which appear to be completely unfixable, but it will allow me to retrieve my data.