I’m currently experiencing an issue with my TrueNAS system. I have two Seagate Exos X14 10TB drives configured as a mirror, and after booting up, the system shows a “degraded” status. Interestingly, after performing a reboot, the problem seems to disappear, and the drives operate normally.
However, I noticed another issue when moving large files — the drives tend to disappear, and the degraded status reappears. I’m unsure whether this indicates a hardware failure or a configuration issue.
Has anyone else encountered this kind of behavior with Seagate Exos or other large-capacity drives on TrueNAS? I would greatly appreciate any suggestions on troubleshooting steps or potential fixes.
zpool status
pool: nas_pool
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-4J
scan: resilvered 9.10M in 00:00:01 with 0 errors on Thu Mar 27 16:49:21 2025
config:
NAME STATE READ WRITE CKSUM
nas_pool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
2744419453325156401 UNAVAIL 0 0 0 was /dev/disk/by-partuuid/b9801db4-4892-431a-98ab-c53e23e3d819
e487408c-2ae0-43ed-9a13-d3ada4caa9e1 ONLINE 0 0 0
logs
mirror-5 ONLINE 0 0 0
nvme0n1p5 ONLINE 0 0 0
nvme1n1p5 ONLINE 0 0 0
cache
nvme0n1p6 ONLINE 0 0 0
nvme1n1p6 ONLINE 0 0 0
errors: No known data errors
smartctl -a -v 1,raw48:54 /dev/sda -v 7,raw48:54 -v 195,raw48:54
Seagate needs some additional parameters, otherwise the values aren’t easy to read for smartctl.
Otherwise, rebooting or running zpool clear will remove the errors/warning, but it isn’t actually fixing anything. Have you tried running a scurb? Did have you tried reseating the wires? Any hardware details may also help; motherboard, how the drives are connection (directly to motherboard or through HBA, etc.), etc. Expand my signature for example of what would generally be helpful.
I have a QNAP TS-253D with two Seagate Exos X14 10 TB drives and a Crucial T500 500 GB SSD that is partitioned into multiple sections for different purposes: one for the OS, one for SLOG, one for L2ARC Cache, and another for containers, along with 16 GB DDR4 RAM.
I’ve already tried rebooting and running zpool clear, but that only temporarily resolves the issue. The problem comes back regularly, suggesting it’s not just a display issue.
The cables are securely connected, and there are no loose connections. A scrub is run weekly, but it doesn’t fix the problem permanently.
This isn’t your problem, but it’s a disastrously poor configuration, even leaving aside that you probably don’t have any use for SLOG and can’t effectively use L2ARC. And that SSD is completely unsuitable for SLOG.
I understand that the configuration may not be ideal, and I appreciate your feedback, but I’m not focused on the SSD setup at the moment since it’s not directly related to the issue I’m experiencing. My primary concern is the persistent errors that keep appearing in my ZFS pool, which zpool clear and a reboot only temporarily fix, but don’t resolve the underlying issue.
I’m aware that using an SSD for SLOG might not be the best choice and that L2ARC may not be fully optimized with my 16 GB of RAM (I realize that L2ARC typically requires more memory for optimal performance). However, for now, my goal is to address the errors and performance issues within the ZFS pool. The issues I’m facing seem to be hardware or disk-related rather than configuration-related.
I’ll look into improving the SSD setup later, but for now, any advice on fixing the ZFS errors would be greatly appreciated. Specifically, I’m looking for guidance on interpreting the SMART data, checking the disk health, and investigating if there’s any underlying hardware issue causing these warnings.
The CPU is running at 50°C, and the hard drive is at 45°C. The controller for the HDD is the standard one provided by QNAP.
It does seem very likely that this issue could be hardware or driver-related, as you mentioned. However, it’s worth noting that the drives ran without any issues in a mirror setup under Proxmox prior to this. If anyone has suggestions on how to further troubleshoot this, I’d appreciate it!
Not a cause of the problem but you should schedule weekly SMART short tests and monthly long tests, and implement @joeschmuck’s Multi-Report script so you get immediate warning of errors.
But I do note that you have never had a successful SMART long test run to completion.
The drive itself looks fine to me, but could benifit from fully completing a long test to confirm. No QNAP experience - donno what the internals look like, if it uses port multipliers or other kinds of things that would be considered as jank. Maybe drive just needs a reseat?
I’d investigate this as a hardware fault personally, not as an hdd fault.
If you want to confirm beyond doubt & got a spare pc lying around, see if you can replicate when you connect drives to it & a temporary truenas boot. It’ll at least confirm beyond doubt what should be looked at.