I have a Scrub task on the data pool that has not been running since the update from Dragonfish to EE. According to multi-report it has been 57 days since the last scrub was run which was before the update. I don’t recall (and could not find) any errors or emails saying the scrub task on Pool1 has started, running, has finished, or had an error. I do get emails about the boot-pools built in default scrub being run/finished so it is still running properly. Looking around on the forums setup scrub tasks not running after an update seems to be a recurring issue with updated systems breaking the established scrubs.
The scrub task on the data pool (Pool1) is setup in the Data Protection > scrub task and is configured to run Monthly (0 0 1 * *) On the first day of the month at 00:00 (12:00 AM) with a threshold of 35 days. This has been configured and enabled since Bluefin and has always run until recently.
As can be seen below the scrub task on Pool1 was running each month on schedule then stopped. This corresponds to when the update to EE was released and I performed the update. Any thoughts on how to go about fixing this issue?
Here is the result of zpool history Pool1 | grep scrub
showing the last run was on 1/26/2025 before the upgrade to EE.
root@neo[/home/admin]# zpool history Pool1 | grep scrub
2023-09-24.00:00:03 py-libzfs: zpool scrub Pool1
2023-10-29.07:00:03 py-libzfs: zpool scrub Pool1
2023-12-03.07:00:03 py-libzfs: zpool scrub Pool1
2024-01-07.07:00:04 py-libzfs: zpool scrub Pool1
2024-02-11.07:00:04 py-libzfs: zpool scrub Pool1
2024-03-17.07:00:03 py-libzfs: zpool scrub Pool1
2024-04-21.07:00:04 py-libzfs: zpool scrub Pool1
2024-05-26.07:00:03 py-libzfs: zpool scrub Pool1
2024-06-30.07:00:03 py-libzfs: zpool scrub Pool1
2024-08-04.07:00:03 py-libzfs: zpool scrub Pool1
2024-09-08.07:00:03 py-libzfs: zpool scrub Pool1
2024-10-13.07:00:03 py-libzfs: zpool scrub Pool1
2024-11-17.07:00:03 py-libzfs: zpool scrub Pool1
2024-12-22.07:00:04 py-libzfs: zpool scrub Pool1
2025-01-26.07:00:04 py-libzfs: zpool scrub Pool1
I manually started the scrub using:
root@neo[/home/admin]# zpool scrub Pool1
and after a couple minutes ran:
root@neo[/home/admin]# zpool status -v
Which is showing the data pool scrub runs if started manually.
pool: Pool1
state: ONLINE
scan: scrub in progress since Mon Mar 24 08:27:26 2025
2.22T / 35.4T scanned at 42.1G/s, 0B / 35.4T issued
0B repaired, 0.00% done, no estimated completion time
config:
NAME STATE READ WRITE CKSUM
Pool1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
8db329f4-d786-4001-8cb1-28d0f89cd098 ONLINE 0 0 0
ed9d15d4-7afb-4df8-89bd-a432b378bae7 ONLINE 0 0 0
a9d6e8f1-b225-42ae-9bd7-6cd27abe3e5e ONLINE 0 0 0
7aef911f-391a-4db2-a5d8-22f90be8891a ONLINE 0 0 0
f42d6d27-4ae3-4523-bc52-5032784ad505 ONLINE 0 0 0
7096e247-2bfc-4a4c-a77b-6ff6a1d52c4b ONLINE 0 0 0
589122a7-541b-4cde-968c-60a1dc8d9b85 ONLINE 0 0 0
a5b21347-6782-49b1-979e-17a436c01c4a ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
0fb1462f-5da3-4c0f-9910-185af741992d ONLINE 0 0 0
49fd2487-23ae-451f-9ccd-849638c5a8d8 ONLINE 0 0 0
c281cc01-3eb5-4d55-bca0-c0247ab19697 ONLINE 0 0 0
51e225af-7b20-4a3f-b1c1-87adbd8e1e47 ONLINE 0 0 0
5b3abdb2-fcbc-4304-a57e-54297cec262e ONLINE 0 0 0
26a9f149-c620-4917-a3e5-05f15a3f2b2a ONLINE 0 0 0
79d65cbe-5408-4252-9f4e-c2a8bb645430 ONLINE 0 0 0
dc3de50a-790a-4a3a-b692-3446513a64bd ONLINE 0 0 0
errors: No known data errors
Any thoughts on why this happened/happens, if it’s still a bug, and will the task has to be recreated from scratch to make work again?
You forgot about February being 28 days. 26 Feb to 1 March is definitely less than 35 days, which is the threshold you have set. See what April brings you. I suspect it will skip April as well as it will only have been 33 days, still not exceeding 35 days.
My advice is to use the default which is once a month on Sunday, this way it is more regular. Or drop the threshold to 27 days or less if you want the 1st of each month.
That’s not holding much water. The default period on adding a scrub task is 35 days and always has been the default to my knowledge. It is pre-filled in when selecting to add a scrub task. If a task is scheduled for 35 day periods then every 35 days it should run regardless of days in a month.
That said I had a hunch so I checked the Owen server since multi-report is not throwing an error there shows that the task is running with the 35 day default just fine. The issue is with the default prefilled threshold of 35 days. If a premade time option for the schedule is selected as is in Neo and the 35 days is left in then there is a conflict in when to run and so the task does not run. If a custom schedule is selected then the task will run every 35 days on the day and time selected as shown in Owen below.
So if a premade schedule time is selected from the dropdown, then the 35 days should either automaticaly go away as it’s not needed, or change to a time period that will work with the selected premade schedule.
Threshold days = 35
Custom (00 00 * * 7) At 12:00 AM, only on Sunday
Next run 5 days
root@owen:/home/admin# zpool history Pool1 | grep scrub
2023-12-31.00:00:16 py-libzfs: zpool scrub Pool1
2024-02-04.00:00:20 py-libzfs: zpool scrub Pool1
2024-03-12.07:36:58 py-libzfs: zpool scrub Pool1
2024-03-13.07:56:39 py-libzfs: zpool scrub Pool1
2024-04-21.00:00:21 py-libzfs: zpool scrub Pool1
2024-05-26.00:00:14 py-libzfs: zpool scrub Pool1
2024-06-26.16:16:43 py-libzfs: zpool scrub Pool1
2024-08-04.00:00:15 py-libzfs: zpool scrub Pool1
2024-09-08.00:00:17 py-libzfs: zpool scrub Pool1
2024-10-13.00:00:22 py-libzfs: zpool scrub Pool1
2024-11-17.00:00:22 py-libzfs: zpool scrub Pool1
2024-12-22.00:00:18 py-libzfs: zpool scrub Pool1
2025-01-26.00:00:24 py-libzfs: zpool scrub Pool1
2025-03-02.00:00:12 py-libzfs: zpool scrub Pool1
and Pool2 on Owen
Threshold days = 35
Custom (15 0 * * sun) At 12:15 AM, only on Sunday
Next run in 5 days
root@owen:/home/admin# zpool history Pool2 | grep scrub
2023-12-31.00:00:03 py-libzfs: zpool scrub Pool2
2024-02-04.00:00:03 py-libzfs: zpool scrub Pool2
2024-03-11.18:20:51 py-libzfs: zpool scrub Pool2
2024-03-11.18:21:08 py-libzfs: zpool scrub Pool2
2024-04-17.14:42:02 py-libzfs: zpool scrub Pool2
2024-05-26.00:15:02 py-libzfs: zpool scrub Pool2
2024-06-30.00:15:02 py-libzfs: zpool scrub Pool2
2024-08-04.00:15:03 py-libzfs: zpool scrub Pool2
2024-09-08.00:15:04 py-libzfs: zpool scrub Pool2
2024-10-13.00:15:03 py-libzfs: zpool scrub Pool2
2024-11-17.00:15:05 py-libzfs: zpool scrub Pool2
2024-12-22.00:15:04 py-libzfs: zpool scrub Pool2
2025-01-26.00:15:05 py-libzfs: zpool scrub Pool2
2025-03-02.00:15:04 py-libzfs: zpool scrub Pool2
As can be seen both pools on Owen run properly every 5 weeks on a Sunday. Where on Neo the task did not run. Only difference is Neo is scheduled to run on a Monday using a premade schedule.
I guess Owen and Neo are the names of your two servers?
Yes those are the names.
Reading more about the issue from various forum posts including from the old forum, it is pointed to the fact that the issue is the default filled in threshold of 35 days when combined with (at least) the default premade monthly drop down selection. The combo causes the task to randomly not run based on days in a month and the threshold time period.
This I think is because the schedule and the threshold are anded together instead of or’d at least for the month schedule. It may also give issues with the other premade schedules but maybe not (at least based on the tool tip explanation) depending upon how things are programmed internally.
If the monthly schedule is selected in the schedule dropdown then the threshold days period should be automatically removed or automatically set to 28 or some lesser multiple of 7 to ensure the task runs as scheduled on the day scheduled within each month.
Right now if the default Threshold is not removed and is at the default 35 days, and the schedule says run every month on first Sunday of the month, the code apparently looks at the values as run every month on day 1 + 35 days which cannot be as no month has that many days and also would not land on day 1 of the month in most cases anyway and so exits with an “ok task checked, nothing to do”. This does not generate a fault because the task is not faulted, it ran, it has a valid parameter, just not what was intended. Without some external program making note the task has not run by the threshold time period and something is wrong a person may never know the task is not running as expected.
I think what may have been intended all along is to run on a selected schedule and if the task has not recorded a successful run by the threshold period, run the task on the threshold period. Thus assuring the task would get run at some point instead of not at all.
I also think this is the ‘bug’ people mention they want fixed or mention in posts about scrub tasks not working.
Using a custom schedule as the task schedule and leave the threshold as 35 at least seems to allow the task to work satisfactory.
1 Like