Hi All,
After a recent power outage (I believe the UPS safely shutdown my TrueNAS server and JBOD), a mirrored NVME pool (that only contains iXApplications) has exported itself and I’m having great difficulty reimporting it. For clarity, these are NVME drives direct to the motherboard, not in the JBOD enclosure (which is connected via HBA and is rock solid).
So far I’ve tried:
- Importing via the GUI, initially the pool didn’t appear in the import list in the GUI. Manually exporting the pool has now allowed the pool to show in the GUI to import (still fails import: “I/O Error”).
- using the command
zpool import -f -R /mnt fasttank
to force an import via CLI
Zpool import
shows:
root@truenas[~]# zpool import
pool: fasttank
id: 2982151281266948475
state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:fasttank ONLINE mirror-0 ONLINE nvme1n1p2 ONLINE nvme2n1p2 ONLINE
Smartctl
shows same for both disks:
root@truenas[~]# smartctl -a /dev/nvme1n1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.32-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Samsung SSD 960 EVO 500GB
Serial Number: S3EUNB0J533939H
Firmware Version: 3B7QCXE7
PCI Vendor/Subsystem ID: 0x144d
IEEE OUI Identifier: 0x002538
Total NVM Capacity: 500,107,862,016 [500 GB]
Unallocated NVM Capacity: 0
Controller ID: 2
NVMe Version: 1.2
Number of Namespaces: 1
Namespace 1 Size/Capacity: 500,107,862,016 [500 GB]
Namespace 1 Utilization: 498,540,187,648 [498 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 002538 5571b180c2
Local Time is: Thu Sep 26 09:49:00 2024 AWST
Firmware Updates (0x16): 3 Slots, no Reset required
Optional Admin Commands (0x0007): Security Format Frmw_DL
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x03): S/H_per_NS Cmd_Eff_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 77 Celsius
Critical Comp. Temp. Threshold: 79 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.04W - - 0 0 0 0 0 0
1 + 5.09W - - 1 1 1 1 0 0
2 + 4.08W - - 2 2 2 2 0 0
3 - 0.0400W - - 3 3 3 3 210 1500
4 - 0.0050W - - 4 4 4 4 2200 6000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 45 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 9%
Data Units Read: 12,135,312 [6.21 TB]
Data Units Written: 25,157,633 [12.8 TB]
Host Read Commands: 56,532,998
Host Write Commands: 481,064,541
Controller Busy Time: 799
Power Cycles: 1,352
Power On Hours: 14,325
Unsafe Shutdowns: 283
Media and Data Integrity Errors: 0
Error Information Log Entries: 2,304
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 45 Celsius
Temperature Sensor 2: 49 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS Message
0 2304 0 0x0015 0x4004 - 0 1 - Invalid Field in Command
1 2303 0 0x3013 0x4004 - 0 1 - Invalid Field in Command
2 2302 0 0x100c 0x4004 - 0 1 - Invalid Field in Command
3 2301 0 0x100b 0x4004 - 0 1 - Invalid Field in Command
4 2300 0 0x0011 0x4004 - 0 1 - Invalid Field in Command
5 2299 0 0x0004 0x4004 - 0 1 - Invalid Field in Command
6 2298 0 0x0001 0x4004 - 0 1 - Invalid Field in Command
7 2297 0 0x0019 0x4004 - 0 1 - Invalid Field in Command
8 2296 0 0x0008 0x4016 0x004 0 1 - Invalid Namespace or Format
9 2295 0 0x0001 0x4004 - 0 1 - Invalid Field in Command
10 2294 0 0x100c 0x4004 - 0 1 - Invalid Field in Command
11 2293 0 0x0009 0x4004 - 0 1 - Invalid Field in Command
12 2292 0 0x0016 0x4004 - 0 1 - Invalid Field in Command
13 2291 0 0x1004 0x4004 - 0 1 - Invalid Field in Command
14 2290 0 0x1007 0x4004 - 0 1 - Invalid Field in Command
15 2289 0 0x000b 0x4004 - 0 1 - Invalid Field in Command
... (48 entries not read)
Self-tests not supported
Error from GUI is:
Error: concurrent.futures.process.RemoteTraceback: “”" Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs/pool_actions.py", line 227, in import_pool zfs.import_pool(found, pool_name, properties, missing_log=missing_log, any_host=any_host)
File “libzfs.pyx”, line 1369, in libzfs.ZFS.import_pool File “libzfs.pyx”, line 1397, in libzfs.ZFS.__import_pool libzfs.ZFSException: cannot import ‘fasttank’ as ‘fasttank’: I/O error During handling of the above exception, another exception occurred: Traceback (most recent call last): File “/usr/lib/python3.11/concurrent/futures/process.py”, line 256, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 112, in main_worker res = MIDDLEWARE._run(*call_args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 46, in _run return self._call(name, serviceobj, methodobj, args, job=job) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 34, in call with Client(f’ws+unix://{MIDDLEWARE_RUN_DIR}/middlewared-internal.sock’, py_exceptions=True) as c: File “/usr/lib/python3/dist-packages/middlewared/worker.py”, line 40, in call return methodobj(*params) ^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 191, in nf return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs/pool_actions.py", line 207, in import_pool with libzfs.ZFS() as zfs: File “libzfs.pyx”, line 529, in libzfs.ZFS.exit
File "/usr/lib/python3/dist-packages/middlewared/plugins/zfs/pool_actions.py", line 231, in import_pool raise CallError(f’Failed to import {pool_name!r} pool: {e}', e.code) middlewared.service_exception.CallError: [EZFS_IO] Failed to import ‘fasttank’ pool: cannot import ‘fasttank’ as ‘fasttank’: I/O error “”"
The above exception was the direct cause of the following exception: Traceback (most recent call last): File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 469, in run await self.future
File “/usr/lib/python3/dist-packages/middlewared/job.py”, line 511, in _run_body rv = await self.method(*args) ^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 187, in nf return await func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/schema/processor.py”, line 47, in nf res = await f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3/dist-packages/middlewared/plugins/pool/import_pool.py", line 113, in import_pool await self.middleware.call(‘zfs.pool.import_pool’, guid, opts, any_host, use_cachefile, new_name) File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1564, in call return await self._call( ^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1425, in _call return await self._call_worker(name, *prepared_call.args) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1431, in _call_worker return await self.run_in_proc(main_worker, name, args, job) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1337, in run_in_proc return await self.run_in_executor(self.__procpool, method, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File “/usr/lib/python3/dist-packages/middlewared/main.py”, line 1321, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ middlewared.service_exception.CallError: [EZFS_IO] Failed to import ‘fasttank’ pool: cannot import ‘fasttank’ as ‘fasttank’: I/O error
Error from CLI ()same with force argument) is unhelpful:
root@truenas[~]# zpool import fasttank
cannot import 'fasttank': I/O error
Destroy and re-create the pool from
a backup source.
I’m speculating:
I’m too ignorant to interpret the error, however it appears the pool is visible and therefore the ‘header’ is intact so I’m hoping it’s some kind of caching error that has been caused by power loss. The pool had started throwing very minor checksum errors over the last 3 months (1 or 2 checksums a month) so perhaps the NVME controller on the motherboard is nearing EOL and something didn’t get written prior to an unplanned shutdown (i.e. the UPS didn’t trigger a controlled shutdown). The NVME disks both show SMART health ok but also show the same Invalid Namespace or Format and Invalid Field errors so something has corrupted.
Anyway, hoping someone has a fix that doesn’t involve me wiping the disks and recreating the pool however it seems these types of failures can take out the ZFS pool with some regularity, reading the forums…
System specs:
Threadripper 1950X
ASUS ROG Zenith Extreme
128GB Non-ECC Gskill 3466MHz
ASUS 1080Ti
Pool 1 - 2 x 512GB Samsung 960 NVMe - motherboard
Pool 2 - 8 x 16TB Seagate Ironwolf NAS HDD JBOD
Boot - 1 x 240GB WD Blue SSD - SATA to motherboard
LSI SAS9300-8e HBA (pool 2)
10GBe Aquantia NIC
1000W Seasonic Prime
UPS