After updating to Dragonfish a couple of weeks ago I’ve seen some errors popping up. Today I noticed the network shares were inaccessible and logged into find a number of errors that had popped up since last Friday (4 days ago).
Failed to sync TRUENAS catalog: [EFAULT] Failed to clone ‘GitHub - truenas/charts: TrueNAS SCALE Apps Catalogs & Charts’ repository at ‘/mnt/pool-01/ix-applications/catalogs/github_com_truenas_charts_git_master’ destination: [EFAULT] Failed to clone ‘GitHub - truenas/charts: TrueNAS SCALE Apps Catalogs & Charts’ repository at ‘/mnt/pool-01/ix-applications/catalogs/github_com_truenas_charts_git_master’ destination: fatal: destination path '/mnt/pool-01/ix-…
2024-09-14 12:34:13
In the Alerts section of the web gui there are also a few errors about “cannot open pool” because the pool being suspended.
In the CLI, I see a bunch of similar lines saying:
[904147.311288] systemd-journald[644] : Data hash table of /var/log/journal/blahblah/system.journal has a fill level at 75.0 (8544 of 11377 items, 6553600 file size, 786 bytes per has table item), suggesting rotation.
Then further down, it has a bunch of similar lines saying:
[1035710.287565] sd 2:0:4:0 Power-on or device reset occurred.
Not sure if all these issues are related, or coincidental?
If I run zpool status
, it reports the state is SUSPENDED, status is One or more devices are faulted in response to IO failures, scan is scrub repaioreed 0B in 16:02:09 with 0 errors in Mon Sep 2.
All the drives are onlinem, but most have a read error count of 3, one has 6. They all have Write errors between 35 and 70.
I did see an Alert in the gui a couple of weeks ago after the update to Dragonfish that said there were 7 or 8 errors after a scan or scrub, but I can’t remember the specifics.
How do I find out where the real problem is? What should be next steps be?