Seeing what the server had done,
after restarting without asking it to import the HHDs pool, I decided to try it again:
After assigning new coordinates for the drives, i.e. sdb, sdc, sdd
in each query returned the response txg: 85837.
Every time I ask him, he gives me a particular answer. Zero freezing.
Evidently, the request to import pool was doing something ‘inappropriate’.
In each query for a different drive,
it only gave me this result by changing the “guid” parameter accordingly.
Well, if all 3 of the HDDs pool drives are showing the same TXG, 85837, then that is GOOD.
Yes, Linux feels free to renumber disks after reboot. Annoying but we have to live with it.
Back to why the pool won’t import, and just hangs. Not sure. Others might be able to look at the various logs and trouble shoot further,
It is possible that you have some pool corruption in the most recent writes. This can be overcome by throwing out those writes and rolling back the pool to before those writes occurred. You can give this a try, again with R/O just in case:
zpool import -fF -R /mnt -o readonly=on HDDs
This may take a bit of time to find a prior TXG that seems good, then ZFS will attempt to import the pool.
If the pool is imported again, I’d start with backing up any critical files asap. Afterwards I’d go into more detail about the exact equipment you’re running, cpu, motherboard, etc. etc. etc. How the drives are connected (I think you mentioned a cage & sata cables before, but is it direct to motherboard or is there anything else in between.) This could help identify why this happened.
I also noticed that you have seagate drives; going to ask for similar output as before, but this time we’re going to translate the raw values into something that a human can read:
Seagate translate - replace sd’x’ as needed--------------- smartctl -a -v 1,raw48:54 /dev/sdx -v 7,raw48:54 -v 195,raw48:54
Run that for each seagate drive so we can make sure the drives aren’t actively dying. This command translates specific fields (ie: field 1, 7, & 195) for smart output.
Once solid work has been done to try & rule out any failures that could cause this again and critical files exist in multiple places… then it’d be a good idea to look into how to set automatic smart short/long tests & scrubs.
So drive ‘sdd’ had a crc error, other than that things looking fairly clean. Crc are generally due to controller and/or cabling fault (whether power or data). If you’ve already reseated/changed connections, then you may have taken care of the fault.
Now I’d make sure critical data is backed up. Afterwards, run `smartctl -t long /dev/sdX’ on each drive so it can test itself fully, then run the previous command to check the smart outputs. Then, set up scheduled smart tests and scrubs.
If you’re still paranoid then describe your system in full detail so we can check for any other possible points of failure.
Afterwards? Live happy & hope it doesn’t happen again
Correct me if I’m wrong, but am I right in thinking that 195,raw48:54 did not appear in any of the results in SMART Attributes? Is that correct?
Again: Should I import to the TrueNAS the ***.tar file I downloaded after this crash to reset to all my previous settings? Rn I do not have access to the disks from Windows level, so it is difficult for me to move them to a safer place.
Let us not get ahead of ourselves. If you imported the pool Read Only, then you can’t do anything with the pool other than verify it looks good and copy any important data off.
If you imported the pool without the R/O option, but with both the -fF options, then the pool was able to roll back to what could be a usable TXG. After copying important data off, you should run a scrub which validates all the data, metadata and critical metadata. It can even perform repairs if the damage is not too bad.
Whence the import without R/O option, copying important data off, running a scrub and checking it’s output, then:
From the Linux shell, zpool export HDDs
From the GUI, import the pool
Check things out again
Where you go after that depends on your goal. Perhaps figuring out what could have happened, (which @Fleshmauler has started with you). Or continue using this NAS.
@julko - You can turn off R/O mode by leaving off the R/O option:
zpool export HDDs
zpool import -fF -R /mnt HDDs
But, both for you and future readers, this means any ZFS writes that the -F option caused to be thrown away, are lost permanently. Thus, we say backup any important files first.
And again. With “readonly”, it took just a few seconds, but now (without it) it has been working for 5 hours to add it again. Am I doing something wrong, or should I just leave it again for a few hours until it finds the right parameters?
Then…How can I add a connection to Windows in a relatively painless way?
Unfortunately, I don’t have an empty hard drive with me that I could quickly connect via a SATA cable.
When I want to add a group of users who will be allowed to connect to Windows:
Error: Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/job.py", line 509, in run
await self.future
File "/usr/lib/python3/dist-packages/middlewared/job.py", line 556, in __run_body
rv = await self.middleware.run_in_thread(self.method, *args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1367, in run_in_thread
return await self.run_in_executor(io_thread_pool_executor, method, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/main.py", line 1364, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 183, in nf
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/schema/processor.py", line 55, in nf
res = f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem_/acl.py", line 890, in setacl
return self.setacl_nfs4(job, data)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem_/acl.py", line 602, in setacl_nfs4
self.setacl_nfs4_internal(path, data['dacl'], do_canon, verrors)
File "/usr/lib/python3/dist-packages/middlewared/plugins/filesystem_/acl.py", line 554, in setacl_nfs4_internal
raise CallError(setacl.stderr.decode())
middlewared.service_exception.CallError: [EFAULT]
Well, you just have to wait and see. And no, you did nothing wrong as far as I can tell.
This time delay is ZFS finding the appropriate time to go back to, AND throwing out any writes that occurred afterwards. (That is why I said back it up while mounted Read Only, BEFORE trying Read Write…)