I am trying to figure out the best frequency/schedules for running the Smart tests and scrubs. I found a few older posts that mentioned doing the Scrub every 2 weeks, and long tests every two weeks, but not the weeks that the Scrub is working. Is that a good recommendation? Should I have offline tests scheduled?
Also trying the best way to define a schedule, this is what I currently have: Scrub
Threshold Days: 14
Schedule: Weekly (0 0 * * 0) On Sundays at 00:00 (12:00 AM)
SMART Tests
All Disks LONG At 23:00, only on Thursday (I believe this to be weekly)
In my case i dont scrub so often, i have a schedule day check with a treshold of 28 days.
For the smart test, a dayli short from monday to saturday, and a long on sunday.
Multi alert script help a lot, a dayli schedule after smart are completed with weekly the config file
Thats why I have the long test scheduled for 11pm, that way it should be fine to run overnight when it won’t be used, although they are 12TB drives. TrueNAS seems to store a fair amount of logs (I currently have 42 logged SMART tests) so I don’t think I will loose the tests too easily.
One thing I’ve seen here and there from what others have done is to alternate between scrub tasks and long SMART tests on a bi-weekly to monthly basis and ensuring they never run anywhere near each other in a schedule.
The time between isn’t the focus there but just ensuring they are on an alternating schedule and don’t overlap. Just food for thought.
I think scrubs are run by pool, and time taken depends on read speeds and pool size and % used (but I may be wrong about this).
SMART tests are run by disk, and time taken depends on read speeds and disk size and not on % used.
Assuming that your disks were 100% full (which of course they never are) a scrub should take about as long as SMART tests on all the drives run sequentially, though scrubs will be a bit slower because SMART tests are internal to the drive whilst scrubs require SATA transfers and some CPU processing - and of course you can probably run all the SMART tests in parallel if you wish.
So a scrub / SMART test of a small (say) 250GB mirrored SSD is going to take a lot less time than the scrub of a RAIDZ2 6x 18TB HDD.
I’ve personally encountered my disks getting pretty dang hot after doing tests and such- may be worth checking disk temperatures after SMART tests to make sure disks aren’t baking one another. I had tests running across 24 disks at a time when I noticed they were getting pretty toasty so I switched to doing tests in groups of 4 disks to try and spread the thermal load over time.
Though I think that says more about my terrible cooling situation as opposed to following good testing schedules.
Thats what I have read too, which is why I am hoping they are alternating (as far as I can tell, no easy way to do that), but also the one starting on Sunday, the other Thursday.
I have the SMART tests being done on all disks at one time. I believe you are correct with the scrub running per pool. My drives are 12TB, but only 2 being mirrored.
I think I will have worse issues with the CPU overheating since I have had issues with that (its SBC with no active cooling, and a very small form factor at that). I have 2 larger fans pushing air over everything so I think it will be fine (hopefully).
Under heavy load my CPU got to 91 Celsius, so I had to get the fans. Its an Intel Celeron basically. I think I’d probably max out the CPU before the drives would really have to work too hard. I will look into testing the drives.
I would not say that, scrubs are dependent on all the necessary metadata reads, resulting in a potentially much lower throughput than a simplistic “read the whole disk” workload. I wouldn’t infer any real timeline for one based on the other.
So my original comment was right? If a disk was 100% full, pretty much all blocks would be read and the scrub would indeed take about as long as a sequential set of SMART long tests?
I would not say that, scrubs are dependent on all the necessary metadata reads, resulting in a potentially much lower throughput than a simplistic “read the whole disk” workload. I wouldn’t infer any real timeline for one based on the other.
Ah - I hadn’t realised that scrubs were metadata only.
Scrubs read the metadata, then read the the blocks the metdata refers too… and verify them, possibly correcting them
The point that eric and myself are trying to get across is that scrubs are going to be slower once the disk is full as the scrub process has to bounce between reading metadata and reading blocks.
What do you mean? You can setup a fine-grained schedule for smart tests, instead of all select the disks you want to test. You need to create more than one task per test type then.
Scrubs can be scheduled per pool, so you have freedom in alternatng /offsetting them too.
As for the schedule, I’m pretty much doing the same schedule mentioned in the OP.
Daily shorts (except for the days I run a long test)
Long test every week
2 scrubs per month
I set fixed dates (it’s a bit more trouble to setup but eventually it’s easier for me to manage, that scrubs and long tests don’t overlap).