(maybe) Faulty disks, suspended pool, but can't do anything

I am experimenting with a BeeLink Me Mini PC (16 Gb version). I have 5 x 2Tb KIOXIA Exceria Pro NVMe drives in 5 of the slots (0, 1, 2, 3 and 5) and a 1TB Crucial P310 in slot 4. The Crucial is the boot drive and the KIOXIA are the ZFS pool.

I was copying some data from another (network) device when the operation just halted. After it didn’t progress for a while (it stopped on a rather large file) I checked what was going on, and found out the system gave disk degraded errors. Now, after some reading I think the problem is that the Me Mini just doesn’t have enough umph to drive the KIOXIAs, and probably this lead to the problems. I haven’t yet tried to test in depth the drives to see if they are healthy or really fucked up, but my problem is I can’t seem to do anything with the pool at this point.

I first ran: sudo zpool status -v which resulted in:

pool:   SSDs
state:  SUSPENDED
status: One or more devices are faulted in response to IO failures.
action: Make sure the affected devices are connected, then run ‘zpool clear’.
see:    https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-JQ
config:

    NAME                                      STATE     READ WRITE CKSUM
    SSDs                                      UNAVAIL      0     0     0  insufficient replicas
      raidz1-0                                UNAVAIL      0     0     0  insufficient replicas
        4fd37e28-2045-44f5-bde8-0d006be8018b  FAULTED      9     0     0  too many errors
        f0560505-e1d5-404b-8845-9298e2dec2f4  ONLINE       0     0     0
        bccef3e6-ac0a-4b6e-bd01-b166a8b52319  ONLINE       0     0     0
        11f3be1a-8bf5-4d58-9850-054f476bdef1  FAULTED      6     0     0  too many errors
        112eb41e-56a6-484a-ac35-8d63634053b6  ONLINE       0     0     0

errors: List of errors unavailable: pool I/O is currently suspended

pool: boot-pool
state: ONLINE
config:

    NAME         STATE     READ WRITE CKSUM
    boot-pool    ONLINE       0     0     0
      nvme3n1p3  ONLINE       0     0     0

errors: No known data errors

As the message sugested, and also thinking that the drives arent really faulty (yet), but maybe just suffered a power issue during the copy, I tried to clear the errors with:

sudo zpool clear SSDs which resulted in:

cannot clear errors for SSDs: I/O error

So I though, OK, I’ll recreate the pool. I went to stop the SMB service and then tried to disconnect the pool, but I got a Validation Error with this trace:

Error Name: EINVAL
Error Code: 22
Reason: [EZFS_POOLUNAVAIL]: zfs_open() failed - cannot open ‘SSDs/Applications’: pool I/O is currently suspended
Error Class: ZFSException
Trace: Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py”, line 360, in process_method_call
result = await method.call(app, id_, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py”, line 57, in call
result = await self.middleware.call_with_audit(self.name, self.serviceobj, methodobj, params, app,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 954, in call_with_audit
result = await self._call(method, serviceobj, methodobj, params, app=app,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/decorator.py”, line 108, in wrapped
result = await func(*args)
^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool/info.py", line 53, in attachments
return await self.middleware.call(‘pool.dataset.attachments_with_path’, pool[‘path’])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool/dataset_attachments.py", line 48, in attachments_with_path
for attachment in await delegate.query(path, True, options):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/common/attachment/init.py”, line 114, in query
for resource in await self.middleware.call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/decorator.py”, line 108, in wrapped
result = await func(*args)
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/service/crud_service.py”, line 163, in query
result = await self.middleware.call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/datastore/read.py”, line 156, in query
result = await self._queryset_serialize(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/datastore/read.py”, line 212, in _queryset_serialize
result = [
^
File “/usr/lib/python3/dist-packages/middlewared/plugins/datastore/read.py”, line 213, in 
await self._extend(data, extend, extend_context, extend_context_value, select)
File “/usr/lib/python3/dist-packages/middlewared/plugins/datastore/read.py”, line 249, in _extend
data = await self.middleware.call(extend, data, extend_context_value)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/service/sharing_service.py”, line 100, in sharing_task_extend
data[self.locked_field] = await self.middleware.call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 771, in _call
return await methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/service/sharing_service.py”, line 88, in sharing_task_determine_locked
return await self.middleware.call(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1051, in call
return await self._call(
^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 782, in call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 665, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool/dataset_encryption_info.py", line 238, in path_in_locked_datasets
crypto = tls.lzh.open_resource(name=i).crypto()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
truenas_pylibzfs.ZFSException: [EZFS_POOLUNAVAIL]: zfs_open() failed - cannot open ‘SSDs/Applications’: pool I/O is currently suspended

Interestinglly, despite the error, I was presented with the option to disconnect the pool. So I tried to do it, but was (of course) hit again with the same error…

OK, so it seemed that the Applications service was stopping me now. So I went there to chech the state of things, and guess what? Yep, error again…

Failed to sync catalog. Please try clicking “Refresh Catalog” manually.

The trace

Error Name: EINVAL
Error Code: 22
Reason: [Errno 6] No such device or address
Error Class: OSError
Trace: Traceback (most recent call last):
File “/usr/lib/python3/dist-packages/middlewared/api/base/server/ws_handler/rpc.py”, line 360, in process_method_call
result = await method.call(app, id_, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/server/method.py”, line 57, in call
result = await self.middleware.call_with_audit(self.name, self.serviceobj, methodobj, params, app,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 954, in call_with_audit
result = await self._call(method, serviceobj, methodobj, params, app=app,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 782, in _call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 665, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3.11/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/decorator.py”, line 116, in wrapped
result = func(*args)
^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/catalog/apps.py”, line 46, in available
for train, train_data in self.middleware.call_sync(‘catalog.apps’, {}).items():
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1084, in call_sync
return methodobj(*prepared_call.args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/api/base/decorator.py”, line 116, in wrapped
result = func(*args)
^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/catalog/apps_details.py”, line 103, in apps
trains = self.get_trains(catalog, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/catalog/apps_details.py”, line 121, in get_trains
return self.retrieve_trains_data_from_json(catalog, options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/plugins/catalog/apps_details.py”, line 133, in retrieve_trains_data_from_json
catalog_data = json.loads(f.read())
^^^^^^^^
OSError: [Errno 6] No such device or address

Now, that one was expected, the pool was suspended (although… why were the apps using the raid pool and not the system disk? Anyways, I’ll read about that another time).

So, I attempted to unset the pool and was again reminded the pool is inaccessible:

[ENOENT] Dataset ‘SSDs/ix-apps’ not found

OK, but but now I feel in a Paragraph 22 situation. I can’t disconnect the pool because of the applications service error, and I can’t stop the applications from looking for the suspended pool…Any ideas how to solve this? I really don’t understand what is going on :frowning:

According to the manufacturer these SSDs have a “typical” power consumption of 8.9 W. You are probably exceeding the power budget of the system with 5 of them present. Check if you can force them to a lower power state or use different or fewer devices.

I would love to attempt to do that, after I solve the bloody pickle I’m in with the current state :smiley:

How do I reset the Pool??!

Try searching for the Beelink Me Mini posts and see if it is documented on how to attempt at lowing power, first. We would need to try to get more of your devices coming online in a stable state.

Beelink support may be another option.