Hello everyone!
A few days ago, I found that my home NAS suddenly crashed.
No matter how I restarted it, it would eventually get stuck at the boot interface and a line of prompts would appear: “sd? data cmplt err -32 uas-tag 2 inflight cmd”.
The following is the device information where the problem was found:
Key | Value |
---|---|
System | HPE ProLiant MicroServer Gen10 Plus v2 |
CPU | Intel Xeon E-2314 Processor |
Memory | SK Hynix 64G(32Gx2) DDR4 3200MHz ECC UDIMM |
Network Card | Intel I350 Quad 1GbE (built-in) |
Boot Pool | WD SN740 1TB 2230 NVMe SSD (with ITGZ JMS583 USB enclosure, on USB 3.2 Gen 2 port) |
HDD Pool | 16TB 2-way Mirror x2 |
HDD Pool Storage Device | WD Ultrastar DC HC550 16TB SATA x4 |
HDD Pool SLOG Device | N/A |
UPS | APC SPM1K Online-UPS (apcsmart via usb) |
OS | TrueNAS Scale 24.10.2.1 |
Applications | Intranet Services: SMB, Time Machine, Vaultwarden, Wiki, and etc |
VM | One for development |
Uptime | More than 600 days (7x24) |
After the problem occurred, I shut down the machine and removed the 4 hard drives and installed them on another Dell PowerEdge R730xd Rack Server that was running the same version of TrueNAS normally.
When importing the problematic data pool through the GUI, the server will crash and restart immediately.
By searching the forums I tried using shell import.
The following are the outputs of the various commands (sensitive information has been hidden).
root@x[~]# zpool import
pool: tank_x
id: xxxxxxxxxxxxxxxxxxx
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-EY
config:
tank ONLINE
indirect-0 ONLINE
mirror-1 ONLINE
xxxxxxxx-d937-41c0-b751-xxxxxxxxxxxx ONLINE
xxxxxxxx-662b-4ef7-bb89-xxxxxxxxxxxx ONLINE
mirror-2 ONLINE
xxxxxxxx-2bf0-47f5-ad4a-xxxxxxxxxxxx ONLINE
xxxxxxxx-d1f6-4357-853b-xxxxxxxxxxxx ONLINE
If the problem data pool is forcibly imported in non-read-only mode through the command, the system also crashes and restarts.
I used the read-only mode provided in the forum to import the problem data pool.
root@x[~]# zpool import -o readonly=on tank_x -R /mnt
cannot import 'tank_x': pool was previously in use from another system.
Last accessed by truenas (hostid=xxxxxxxx) at Sun May 4 22:22:53 2025
The pool can be imported, use 'zpool import -f' to import the pool.
root@x[~]# zpool import -o readonly=on tank_x -R /mnt -f
root@x[~]# zpool status -v
...
pool: tank_x
state: ONLINE
scan: scrub repaired 0B in 06:10:31 with 0 errors on Sun Apr 27 06:10:33 2025
remove: Removal of vdev 0 copied 754G in 1h6m, completed on Thu Oct 19 11:31:49 2023
6.08M memory used for removed device mappings
config:
NAME STATE READ WRITE CKSUM
tank_x ONLINE 0 0 0
mirror-1 ONLINE 0 0 0
xxxxxxxx-d937-41c0-b751-xxxxxxxxxxxx ONLINE 0 0 0
xxxxxxxx-662b-4ef7-bb89-xxxxxxxxxxxx ONLINE 0 0 0
mirror-2 ONLINE 0 0 0
xxxxxxxx-2bf0-47f5-ad4a-xxxxxxxxxxxx ONLINE 0 0 0
xxxxxxxx-d1f6-4357-853b-xxxxxxxxxxxx ONLINE 0 0 0
errors: No known data errors
After the problem data pool was imported, I immediately used the “zfs send | zfs receive” commands to transfer the data sets one by one to the normal server.
It should be noted that if you use the default App storage of TrueNAS Scale, please remember to back up the data sets under ix-apps!!!
Use “zfs list” to view all data sets.
Fortunately, all data can be read out (a lot of family memories).
After completing the data backup, I restarted to WinPE and ran Victoria to scan the four hard drives.
The quick scan result found no errors, and the slow full scan has not yet ended, but judging from the progress, the probability of errors in the hard disk itself is not high.
The recovery method I have found from searching the forum so far is that after backing up the data, I must delete the original data pool, create a new data pool, and then write the data back to the new data pool.
I will perform data pool recovery according to the above method after the hard disk scan is completed.
The above records are for reference by other TrueNASers who have the same problem.
For this failure, I have several doubts, and I hope you will give me some advice.
- After importing the problem pool in read-only mode, what do the “remove” and “vdev 0” information displayed by the command “zpool status -v” mean?
remove: Removal of vdev 0 copied 754G in 1h6m, completed on Thu Oct 19 11:31:49 2023
6.08M memory used for removed device mappings
- Is there a situation where even read-only mode cannot be imported?
- I remember that ZIL was striped and stored on the hard disk of the data pool when there was no SLOG device.
If the boot-pool crashes due to instability caused by overheating of the USB SSD, and there happens to be a write operation at this time, will the ZIL be damaged and cause the entire pool to crash?
Can this problem be avoided if two separate USB SSDs are used to form a mirror as the boot-pool? - If I install Optane as the SLOG device of the data pool on the idle PCIe4.0x16 slot of the machine, can this problem be avoided?
If a SLOG device is added to the data pool and this problem occurs, do I have to install the SLOG device on a working server to import the data pool in read-only mode? - Are there other possibilities that may cause this problem? Are there any suggestions to avoid this problem?