Jumping in on the BMS/background scan conversation.
Background Media Scan is something that’s more common on SAS drives - at least more commonly controllable there - but it’s also present on some SATA devices. BMS is an always-running background job, like a low-priority patrol read - it’s like a SMART Long in this sense that it’s checking the entire disk surface, not just allocated data. This would catch a potentially marginal or bad sector. If it’s correctable with on-platter ECC it’ll just be rewritten (if allocated) and tick up Hardware_ECC_Recovered on the disk - if there’s no data there, and the sector can’t be refreshed in place, then you might see it as Reallocated_Event_Count but without a matching Reported_Uncorrect. BMS runs in-firmware on drives, and while there’s a degree of control you can exercise with sdparm and probing specific drive pages, it’s largely left to drive firmware to handle it - ZFS doesn’t get involved here.
- Fairly common on modern SAS and SATA drives.
- Runs constantly in the background when drives have been idle for more than a vendor-defined period. Brand new drives will have an accelerated schedule and aggressively scan when idle for even single-digit milliseconds, once that’s done it’s more typically 500ms of idle time required.
- Sometimes this can be fished out of the drive’s control page for the job specifically - otherwise, it’s just an always-on job with the results showing up as increases in the aforementioned counters.
- It’s basically just walking the entire LBA range, so it will “resume” from where it was and then loop around again.
- It’s testing each LBA to see if it’s readable. No write testing is performed.
Further to BMS is Media Pre-Scan - which is a table that’s tracking the allocated sectors to determine if it’s the first time they’re being written to. If a sector is getting a write for the first time, then a drive with Media Pre-Scan will turn it into a “write and verify” by immediately reading the data back.
We’ve always advocated for burn-in testing of drives before they’re put into use in a TrueNAS system - we do this ourselves for our own Enterprise gear as part of the build process. This means that the entire drive surface will have gotten a pass through Media Pre-Scan and any known marginal sectors would be mapped out and reallocated before data hits them.
A sector “going bad” after passing a successful self-test, and after a BMS pass is possible - but even in the scenario where it can write OK but not read, the redundancy of ZFS will protect the data there.