How do I properly test SAS drives?

kls · March 3, 2025, 9:29am

Hello there!

I’m on TNScale with a bunch of SAS drives, and I know SAS has not the SMART capabilities for testing or reporting like SATA.

Here is the question:
how do I properly test a SAS drive before putting into production, to be reasonable sure it’s okay?

The test can be performed either in TNScale or even better on my bench test PC with ArchLinux (I do have an HBA that can handle 8 drives).

Thanks for your inputs.

dan · March 3, 2025, 9:44am

SAS drives have fewer SMART capabilities than SATA, but “fewer” isn’t the same as “none.” Here’s output from one of mine:

root@truenas[~]# smartctl -a /dev/sdt
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.32-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST33000650SS
Revision:             0005
Compliance:           SPC-4
User Capacity:        3,000,592,982,016 bytes [3.00 TB]
Logical block size:   512 bytes
Rotation Rate:        7200 rpm
Form Factor:          3.5 inches
Logical Unit id:      0x5000c5005597274b
Serial number:        Z2951Z0X00009314VEE6
Device type:          disk
Transport protocol:   SAS (SPL-4)
Local Time is:        Mon Mar  3 04:42:18 2025 EST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Enabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     31 C
Drive Trip Temperature:        68 C

Accumulated power on time, hours:minutes 23706:05
Manufactured in week 42 of year 2012
Specified cycle count over device lifetime:  10000
Accumulated start-stop cycles:  162
Specified load-unload count over device lifetime:  300000
Accumulated load-unload cycles:  172
Elements in grown defect list: 0

Vendor (Seagate Cache) information
  Blocks sent to initiator = 843850433
  Blocks received from initiator = 2463895239
  Blocks read from cache and sent to initiator = 1552837504
  Number of read and write commands whose size <= segment size = 440817208
  Number of read and write commands whose size > segment size = 805767

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 23706.08
  number of minutes until next internal SMART test = 48

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3251118873        0         0  3251118873          0     178200.748           0
write:         0        0         0         0          0      50194.526           0

Non-medium error count:     3704


[GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on']
SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                   -   23700                 - [-   -    -]
# 2  Background short  Completed                   -   23676                 - [-   -    -]
# 3  Background short  Completed                   -   23652                 - [-   -    -]
# 4  Background short  Completed                   -   23628                 - [-   -    -]
# 5  Background short  Completed                   -   23604                 - [-   -    -]
# 6  Background short  Completed                   -   23579                 - [-   -    -]
# 7  Background long   Completed                   -   23564                 - [-   -    -]
# 8  Background short  Completed                   -   23555                 - [-   -    -]
# 9  Background short  Completed                   -   23531                 - [-   -    -]
#10  Background short  Completed                   -   23507                 - [-   -    -]
#11  Background short  Completed                   -   23483                 - [-   -    -]
#12  Background short  Completed                   -   23458                 - [-   -    -]
#13  Background short  Completed                   -   23434                 - [-   -    -]
#14  Background short  Completed                   -   23410                 - [-   -    -]
#15  Background long   Completed                   -   23395                 - [-   -    -]
#16  Background short  Completed                   -   23386                 - [-   -    -]
#17  Background short  Completed                   -   23362                 - [-   -    -]
#18  Background short  Completed                   -   23337                 - [-   -    -]
#19  Background short  Completed                   -   23313                 - [-   -    -]
#20  Background short  Completed                   -   23289                 - [-   -    -]

Long (extended) Self-test duration: 27600 seconds [7.7 hours]

So SMART attributes may not be available, but the self-tests definitely are. And, of course, badblocks works as well. I’d do the same thing I’d do with a SATA spinner: long SMART test, full run of badblocks, another long SMART test.

Jorsher · March 3, 2025, 12:41pm

I start dumping data on it and hope for the best.

I don’t recommend my method to others.

joeschmuck · March 3, 2025, 1:52pm

It depends on what you want to do, Burn-In testing where it is “destructive” then as @dan said, badblocks, but if you want to do non-destructive such as SMART tests, smartctl -t long /dev/sda for example is your goto command to launch a SMART Long test. But if it is a new drive, I prefer to run badblocks on it for a complete pass (that is four different test patterns) and if it is a large capacity drive, it could take a week to test. If you have more than one drive, use tmux to run them all at once. If you do not know what tmux is, Google it. there are a lot of postings in our forum about this, just need to search for it.

My advice: Run a daily SMART Short test and weekly SMART Long test, if that makes sense. It doesn’t make sense if you have 20+ drives if you don’t break them up, in that situation then a Monthly Long test is fine. The point is to run a daily Short test and periodic Long tests.

nasbdh9 · March 3, 2025, 1:56pm

Format disk once using sg_format and then run a smart long test. This is how I have been test thousands disks.

Of course this is very time consuming…
16~22T disk takes 2 to 3 days to run these two steps.

dan · March 3, 2025, 2:21pm

Not as time-consuming as running badblocks. And the time is part of the goal–stress the drives for a while; if they’re going to break, you want them to do it before you put them in production.

kls · March 3, 2025, 2:27pm

Yes, I should have used better wording. I stand your correction.

Sounds reasonable.
Long SMART > badblocks > Long SMART

Exactly my point. We’re talking about pre-deployment set of tests, destructive since the drives has nothing on them to care about, and time (a week? no problem) is not a factor due the tests are perfomed on a on-purpose PC running Arch.
If a drive pass/survive then it will be placed into production NAS.