Scrub Task not starting the Scrub

thomas-hn · October 20, 2024, 12:26pm

Hello,

on my TrueNAS SCALE Dragonfish-24.04.2.1 a Scrub Task is configured as shown in the screenshot attached.

In /var/log/cron.log I can see that “midclt call pool.scrub.run tank_titan_001 35 > /dev/null 2> /dev/null)” was called at 00:00:01 on Sunday. My system with the configured Scrub task is now running for roughly two months, but the TrueNAS dashboard still shows “Last Scrub: Never”. So, why is the Scrub not started? What could I check to see the reason for not starting the Scrub?

The documentation at https://www.truenas.com/docs/scale/scaleuireference/dataprotection/scrubtasksscreensscale/ states for the “Threshold days”: “Enter the number of days before a completed scrub is allowed to run again.” Does this mean that the very first Scrub will start even if there was no other Scrub before or does it wait for the threshold days since the previous Scrub which might not exist?

Thanks a lot in advance,

Thomas

oxyde · October 20, 2024, 1:25pm

Hi

Take this assumption with a grain of salt.
I have had (on Core) problem on the calc of the threshold, with the opposite result (they triggered everytime despite the threshold). Maybe, somehow related, the absence of previous scrubs in your case is preventing the right trigger of the job.
I would try to launch a manual scrub, for see if the dashboard still show the same result… And in case, if trigger the scrub after 35 days according your job setup

joeschmuck · October 20, 2024, 1:42pm

What is the output of zpool status and please use the “</>” tags to maintain the format of the data.

thomas-hn · October 20, 2024, 1:53pm

  pool: boot-pool
 state: ONLINE
  scan: scrub repaired 0B in 00:00:17 with 0 errors on Fri Oct 18 03:45:18 2024
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdh3    ONLINE       0     0     0
            sdi3    ONLINE       0     0     0

errors: No known data errors

  pool: tank_titan_001
 state: ONLINE
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank_titan_001                            ONLINE       0     0     0
          raidz3-0                                ONLINE       0     0     0
            9d81d9b3-512d-45b6-bc45-1065ec4e5272  ONLINE       0     0     0
            16fff3e8-cf8d-4006-a154-5a7383df3373  ONLINE       0     0     0
            63e9bd52-bc52-425b-9e48-d2412d29b66a  ONLINE       0     0     0
            cb510ff5-bc32-43d6-848a-81b4f1faf149  ONLINE       0     0     0
            7813ee4a-ed79-40cc-97e8-90f0620a8c22  ONLINE       0     0     0
            6fae3949-5492-45d8-bdb8-15830a26abe3  ONLINE       0     0     0
            91e4b791-593b-48b1-a2de-a7655510c190  ONLINE       0     0     0

errors: No known data errors

I am wondering about the “scrub repaired 0B in 00:00:17 with 0 errors on Fri Oct 18 03:45:18 2024” while the TrueNAS dashboard says “Last Scrub: Never”. It is also strange that the scrub ran on Friday 03 o’clock and not sunday 0 o’clock.

Any ideas?

joeschmuck · October 20, 2024, 3:10pm

Well, not the result I was expecting. Let’s try something else:

Run the command zpool scrub tank_titan_001
Wait 10 minutes (or a little longer)
Run the command zpool status tank_titan_001 and post those results.

We are looking for a report similar in format to:

 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub in progress since Sun Oct 20 11:08:45 2024
        2.18T / 6.04T scanned at 36.6G/s, 199G / 6.04T issued at 3.27G/s
        0B repaired, 3.22% done, 00:30:32 to go
config:

Hopefully the scrub will start.

thomas-hn · October 20, 2024, 4:25pm

Here it is.
The Scrub started.

What’s the next step to find why the “Scrub Task” did not work?

 zpool status tank_titan_001
  pool: tank_titan_001
 state: ONLINE
  scan: scrub in progress since Sun Oct 20 18:01:34 2024
        47.7T / 47.7T scanned, 915G / 47.7T issued at 1.28G/s
        0B repaired, 1.87% done, 10:24:41 to go
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank_titan_001                            ONLINE       0     0     0
          raidz3-0                                ONLINE       0     0     0
            9d81d9b3-512d-45b6-bc45-1065ec4e5272  ONLINE       0     0     0
            16fff3e8-cf8d-4006-a154-5a7383df3373  ONLINE       0     0     0
            63e9bd52-bc52-425b-9e48-d2412d29b66a  ONLINE       0     0     0
            cb510ff5-bc32-43d6-848a-81b4f1faf149  ONLINE       0     0     0
            7813ee4a-ed79-40cc-97e8-90f0620a8c22  ONLINE       0     0     0
            6fae3949-5492-45d8-bdb8-15830a26abe3  ONLINE       0     0     0
            91e4b791-593b-48b1-a2de-a7655510c190  ONLINE       0     0     0

errors: No known data errors

Addition:
The Scrub finished and the TrueNAS dashboard now showws “Last Scrub: 2024-10-21 07:56:53”.

zpool status gives:

zpool status tank_titan_001
  pool: tank_titan_001
 state: ONLINE
  scan: scrub repaired 0B in 13:55:19 with 0 errors on Mon Oct 21 07:56:53 2024
config:

        NAME                                      STATE     READ WRITE CKSUM
        tank_titan_001                            ONLINE       0     0     0
          raidz3-0                                ONLINE       0     0     0
            9d81d9b3-512d-45b6-bc45-1065ec4e5272  ONLINE       0     0     0
            16fff3e8-cf8d-4006-a154-5a7383df3373  ONLINE       0     0     0
            63e9bd52-bc52-425b-9e48-d2412d29b66a  ONLINE       0     0     0
            cb510ff5-bc32-43d6-848a-81b4f1faf149  ONLINE       0     0     0
            7813ee4a-ed79-40cc-97e8-90f0620a8c22  ONLINE       0     0     0
            6fae3949-5492-45d8-bdb8-15830a26abe3  ONLINE       0     0     0
            91e4b791-593b-48b1-a2de-a7655510c190  ONLINE       0     0     0

errors: No known data errors

garyez_28558 · October 20, 2024, 11:20pm

Isn’t there something of a conflict between a threshold of 35 days and a weekly scrub schedule?

thomas-hn · October 21, 2024, 6:07am

Why do you see a conflict? The cron job runs once every week and checks if since the last run of the Scrub at least 35 days have passed. Only if this time has passed, the Scrub is started, otherwise the cron job will wait one week before checking again.

oxyde · October 21, 2024, 7:00am

So, if the scrub trigger correctly from now on (after 35 days of threshold) really seems that the absence of previous scrub impact the scheduler somehow.
I would open a bug request in case

thomas-hn · October 21, 2024, 7:18am

I have re-configured my Scrub Task to see if it will correctly trigger the next days. If it does, I will issue the bug report.

Thanks a lot for all your help.

garyez_28558 · October 21, 2024, 5:09pm

When my newly created auto scrub didn’t run, I ran a manual scrub, set the threshold at 5, the scheduled weekly auto scrub executed, and then I messed with the threshold value.

thomas-hn · October 26, 2024, 1:09pm

After the manual trigger of the initial srub as proposed by @joeschmuck the Scrub Task seem to work now.

I have created a bug ticket regarding this issue:
https://ixsystems.atlassian.net/browse/NAS-132007

Hint:
There is another thread having the same issue:
https://forums.truenas.com/t/scrub-tasks-not-working-for-one-pool/21615

winnielinnie · October 26, 2024, 1:14pm

There’s a bug in TrueNAS Core 13.3 that issues a scrub on the boot-pool every day, no matter what it’s set to.

oxyde · October 26, 2024, 1:30pm

This Is the thread i was referring. Im launching scrub manually -.-
Don’t seems happening to everyone.

Chris_Holzer · October 28, 2024, 12:08pm

When you build a new system then there is no need to run a scrub - right?

So for a fresh system it is my understanding that the first scrub will occur after the set threshold of days has been reached.

thomas-hn · October 28, 2024, 12:49pm

That’s absolutely correct.

This is also the understanding of the other user’s, but TrueNAS SCALE behaves differently and does nothing until a first scrub was triggered manually. Even after the threshold elapsed, the scrub is not started by the scrub task.

Chris_Holzer · October 28, 2024, 4:36pm

I also ave RC2 on a fresh system here which is just 14days old, so it did not meet the threshold yet.
I have adjusted the threshold now so that it should scrub in 6 hours. I will get back to you - maybe I can confirm the issue.

Chris_Holzer · October 29, 2024, 6:11am

I cannot confirm the behaviour you have seen - but I am on 24.10 RC2.

I have a fresh 24.10 RC2 system here which has not yet met the threshold of 35 days to start the scrub.

I have 2 pools:
1x SSDs
1x HDDs

I changed the threshold of the scrub job for the SSD pool to “0” days
and the threshold of the scrub job fo the HDD pool to “10” days.

Both were then started as planned/scheduled.

Technoguru · January 26, 2025, 1:27pm

I also have this issue, so when looking around i came across this thread.
So here are my findings/results:

root@nas[~]# zpool status
  pool: BIGDATA
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 11:14:17 with 0 errors on Sun Jan  5 11:14:27 2025
config:

        NAME                                      STATE     READ WRITE CKSUM
        BIGDATA                                   ONLINE       0     0     0
          raidz2-0                                ONLINE       0     0     0
            3c4c3368-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c39e7c4-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c763462-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c6df7e1-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c9cf96f-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c941da8-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
            3c42c472-5ae3-11ed-815f-d05099fe5b3b  ONLINE       0     0     0
        spares
          sda2                                    AVAIL   

errors: No known data errors

  pool: NVME-TANK
 state: ONLINE
status: Some supported and requested features are not enabled on the pool.
        The pool can still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: scrub repaired 0B in 00:00:17 with 0 errors on Sun Jan 12 00:00:19 2025
config:

        NAME                                    STATE     READ WRITE CKSUM
        NVME-TANK                               ONLINE       0     0     0
          48f48b7e-49e4-4bff-ac57-80f9c4da7dc5  ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: One or more features are enabled on the pool despite not being
        requested by the 'compatibility' property.
action: Consider setting 'compatibility' to an appropriate value, or
        adding needed features to the relevant file in
        /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d.
  scan: scrub repaired 0B in 00:00:49 with 0 errors on Thu Jan 23 03:45:50 2025
config:

        NAME        STATE     READ WRITE CKSUM
        boot-pool   ONLINE       0     0     0
          mirror-0  ONLINE       0     0     0
            sdk3    ONLINE       0     0     0
            sdl3    ONLINE       0     0     0

errors: No known data errors

Notice the difference in scan date in boot vs BIGDATA vs NVMe Tank.

This is a pool running for years now, so it’s very strange having Truenas Scale not initializing the scrubs.

This is configured as such since i installed Core, and migrated all the way up to this version. So, also years running.

They worked all the time prior to the latest update to “ElectricEel-24.10.1”.

Found that the threshold days was set to 35 (default). → set to 6, because that’s what it was prior updating to Electric Eel.

As big as pools/datasets can be these days, it’s concerning the update doesn’t retain all the settings you’ve done. (at least to me)