50 TB Fileserver ZFS2 w/Spare (14 Drives + 2 Parity + 1 Spare)
Adaptec ASR-72405 24-Port HBA Mode
TrueNAS-22.12.0 (waiting to upgrade till after pool repair)
Pool has approx 26TB of data used.
A few months ago a friend was over and noticed a hard drive cable was being hit by a internal fan (not damaging, but making a slight noise) he decided to try to wiggle the cables and bend them so they didn’t touch (WHILE THE SERVER WAS ON!! Idiot!) He did get it to be quite, but I found out later when I got home, I tried to access the file server, the share was not connected. Checking on the server it showed as that 2 drives were disconnected. I reconnected the drives better but 1 showed it had errors, so I replaced it. during the rebuild process the other one said failed, so I replaced that one. I have had to replace both those drives 5 times do to it having errors. I finally got it to work and come online and was able to see all files and access them, with no data loss. It did show that my spare was being used as 1 drive was still showing unavailable. But at least everything was working. I took all the drives that we replaced that were reported bad and had them checked, only 1 was actually bad.So I suspected the cables had got damaged when he was bending them. I ordered 2 new sets of cables (1x4sata). There was a shipping time of 12 days, so I backed up a couple of small folders as I didn’t have enough room to back up all 24TB, I got just some family pics and some documents backed up though. I left my machine running for 3 days but started to worry that if there was any further issue we would be in trouble as the spare was already being used. I shut the system down properly, 3 days later I did need a file so I booted it back up and it showed 3 drives with errors, rebooted 2 more times and it all seemed to come back fine. I got my file I needed and just to be on safe side Exported Pool. I left it on that screen, I came back 2 hours later and my Cat was sleeping on the keyboard, the server when I logged back in to the GUI showed no pools. But I checked the drives, it showed 16 drives (1 short), checked all cables again and it showed up. So I tried re importing pool. it failed saying there was an I/O error. I checked the drives again now 2 are missing. So I thought its got to be those cables as its the 2 cable sets where my friend had interfered. I shut the machine off till the cables arrived. The cables arrived and I was going to work on it on the weekend but ended up going in the hospital and being unable to work for a month. Now I am trying to continue and after cable replacements I am still having issues importing this pool. same error, same weird effect of a couple of drives come and go. I took and hooked the 2 drives that were intermittent to the motherboard instead of the raid card — they stay online, no disconnects. but 2 - 6 other drives became intermittent then and had never had an issue. I can mess with them and get all 17 online but they wont all stay online. none show any errors in the drive list. they just disappear. I think maybe my raid card is having issues? or it has a connection issue with the cables?, as this system had been running fine for over a year till he messed with the cables.
Now in my drives section it shows the
drive name, serial,size, and pool (ZFS2-50TB (Exported)) for each drive
I tried using (on the actual server CLI not the GUI) this command
server…# zpool import
I got what looks like the pool list as it appeared before it was removed. It shows all the drives, with 2 paired together (the spare) and there is one other that says unavailable, pool is degraded but it does not say faulted. now as I said its very hard to keep all drives online. Drive names are a long letter number combo that does not correspond to anything in the GUI so the one that says unavailable I cannot identify in the GUI nor which 2 are paired.
I am assuming I can force the import with a chance of some data corruption but would like to try any other alternative before that as it only shows degraded. I am trying to buy another raid card (same model) to replace and try , and if that’s not the issue I can use it in a different server.
Any ideas, clues, etc… would be helpful.
Here is an example error in the GUI when I try importing (this is very similar to what I see and may be the same - as I can’t generate the error right now, only 12 disks online)
Looks like this
Error: concurrent.futures.process._RemoteTraceback:
“”"
Traceback (most recent call last):
File “/usr/local/lib/python3.9/concurrent/futures/process.py”, line 243, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 94, in main_worker
res = MIDDLEWARE._run(*call_args)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 39, in _call
return methodobj(*params)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 39, in _call
return methodobj(*params)
File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”, line 979, in nf
return f(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”, line 371, in import_pool
self.logger.error(
File “libzfs.pyx”, line 391, in libzfs.ZFS.exit
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”, line 365, in import_pool
zfs.import_pool(found, new_name or found.name, options, any_host=any_host)
File “libzfs.pyx”, line 1095, in libzfs.ZFS.import_pool
File “libzfs.pyx”, line 1123, in libzfs.ZFS.__import_pool
libzfs.ZFSException: I/O error
“”"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/middlewared/job.py”, line 367, in run
await self.future
File “/usr/local/lib/python3.9/site-packages/middlewared/job.py”, line 403, in __run_body
rv = await self.method(*([self] + args))
File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”, line 975, in nf
return await f(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/pool.py”, line 1421, in import_pool
await self.middleware.call(‘zfs.pool.import_pool’, pool[‘guid’], {
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1256, in call
return await self._call(
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1221, in _call
return await self._call_worker(name, *prepared_call.args)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1227, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1154, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1128, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
libzfs.ZFSException: (‘I/O error’,)