How to Properly ReMAP LBAs on a failing Hard Disk Drive (on Linux/SCALE)

How to map out LBAs on a HDD using Linux (TrueNAS SCALE)

In order to map out LBAs if you are unable to use dd or similar commands, I have found that a secure erase of the entire HDD might work. This is a last attempt type operation.

NOTE: This procedure is specific for a normal HDD, not SSD/NVMe. Also, do not use dd to try to fix SSD/NVMe drive LBAs, which may also include Hybrid HDDs as well. Digital memory operates differently.

This will not remove most errors recorded in the logs, however a SMART Long tests should pass and no new LBA error messages should occur, unless the drive has more failures. If after this repair operation, more LBAs are reported failed (could be hours later or months later), assume the drive will continue to have media issues and just throw it away.

I cannot stress enough the importance of ensuring the drive you are erasing is the correct ā€˜drive_id’ and ā€˜serial number’. There is no shame in verifying this several times before you reach steps 10 and 11.

READ THIS ENTIRE PROCEDURE BEFORE TRYING IT OUT ON YOUR SYSTEM !

  • You are expected to have some general knowledge about Linux and how to read the drive SMART data.

  • You need to be root or sudo.

  • You will need SSH access to the server to open an SSH session.

  • You will be using an SSH window and using a tmux session.

  • Do not attempt to use these instructions without using a tmux session.

  • Your drive data will be destroyed!

  • This will consume a lot of time, and the system with your hard drive must remain powered on or possible bricking of the HDD is possible. It is highly recommended that the NAS system is running on an UPS in case of a power failure.

  • You will likely need to unplug the drives power connection, not cycle power or reboot the system.

  • In my example below, my drive starts as drive ID sdc, then after power removal and application it becomes sde. Be certain you are aware of your drive IDs. A blunder on your part could be costly.

  • tmux usage:

    • To detach the tmux session, press CTRL+B and then D keys.
    • To create a session, Enter tmux new -s tsdrive to create a session ā€˜tsdrive’.
    • To list your sessions tmux ls.
    • To attach to a session tmux attach tsdrive will attach session ā€˜tsdrive’ if it exists.
    • This is not a tutorial on how to use tmux.
    • If your SSH window goes away while the erase is occurring, reopen a ssh connection to the server and use the tmux attach command to restore the tmux session.

    While I could have made this a script, I feel the importance of full control is prudent. If you do not understand these commands, feel free to look them up before using them.

Steps:

  1. Open an SSH window and log in as root.

  2. Enter the drive ID of your suspect HDD? (sda, sdb, sdc…) ________ Example: sdc

  3. We are using tmux to ensure the SSH session remains open, even if we lose our SSH window by accident. It is not absolutely necessary, however it is prudent to use it.

    Enter tmux new -s tsdrive which will open a session titled ā€˜tsdrive’.

  4. Enter smartctl -x /dev/drive_id (replaced drive ID with value from step 2.

    Note: You can use a pipe to output to a file as well as the screen for later reference. Ensure you use a valid path. The example below will create a file called drive_sdc_step_4.txt in the path /mnt/pool/scripts

    Example: smartctl -x /dev/sdc | tee -a /mnt/pool/scripts/drive_sdc_step_4.txt

  5. Gather the following data if present:

    a: Drive Serial Number ____________________ (Example: ZR13JRL0)

    b: The list of bad LBAs is not needed for this procedure, however it is good to have information if you ever need it. List of bad/defect LBAs. If not present, the LBA reported as failed from a SMART Short or Long test. Start _____________, End _____________

    c: Time for a secure erase to complete. _________________

    d: ATA Security is: ________________________ (Enabled/Disabled, Frozen/Not Frozen)

  6. If security is Enabled, Frozen, you must perform these steps:

    a: Access the physical drive, ensure this is the correct drive using the physical serial number.

    b: Unplug the drive power. This should work if you have a removable drive bay as well.

    c: Connect the drive power.

    d: The Drive ID likely will have changed, use lsblk -o SERIAL,NAME and then locate the SERIAL NUMBER you wrote down in step 5a.

    Example Return:

     SERIAL            NAME
     K1JUMW4D    	   sda
                 	   └─sda1
     2F4920060284      sdb
     		           ā”œā”€sdb1
             		   ā”œā”€sdb2
         			   └─sdb3
     ZR13JRL0          sde
     		           └─sde1
     K1JRSWLD          sdd
     		           └─sdd1
    

    e: Write down the new drive ID ________
    Example: smartctl -x /dev/sde

    f: At this time look for ATA Security is: Disabled, Not Frozen. If this is not correct, then the drive may not be recoverable on this computer system.

  7. If security is now Disabled, Not Frozen then continue. If not, you need to figure out how to unlock this drive. Seek further advice, maybe run a LIVE CD.

  8. While this step may not be absolutely required, I use it just to ensure the drive does not timeout early.

    Enter smartctl -l scterc,300,300 /dev/drive_id (where $drive= value from step 6d).

    Example: smartctl -l scterc,300,300 /dev/sde

  9. Verify the drive in Disabled and Not Frozen:

    a: Enter smartctl -g security /dev/drive_id and verify:

    Example: smartctl -g security /dev/sde

     ATA Security is:  Disabled, NOT FROZEN [SEC?]
    

    b. Enter smartctl -x /dev/drive_id > original_drive.txt so we can save the original condition of the drive.

  10. Set a password, in this example we are using the lower case p. This only lives with the drive until the secure erase is completed. If the drive does not complete the secure erase, this password may still be enabled.

    a: Enter hdparm --user-master u --security-set-pass p /dev/drive_id and verify return value of:

    security_password" "p"
    

    Example: ā€˜hdparm --user-master u --security-set-pass p /dev/sde’

  11. WARNING !!! The next step will commence a secure erse operation. The drive power MUST NOT be interrupted until after the erase operation has been completed or you may end up with a BRICK.

    a: Enter hdparm --user-master u --security-erase p /dev/drive_id

    Example: ā€˜hdparm --user-master u --security-erase p /dev/sde’

    b: You should see:

    hdparm --user-master u --security-erase p /dev/sde
    security_password: "p"
    /dev/sde:
    Issuing SECURITY_ERASE command, password="p", user=user
    
  12. If you had a value from step 5c, then you will wait at least this long before the prompt returns. My 6TB drive took roughly 10 hours, YMMV.

  13. After the prompt returns, you should not see any error messages from hdparm.

  14. Let’s look at the drive SMART data, Enter smartctl -x /dev/drive_id and you should have zero values for ID 5, 197, and 198.

  15. Run a SMART Short Test, Enter smartctl -t short /dev/drive_id for a quick check, wait 5+ minutes, Enter smartctl -a /dev/drive_id and verify the Short test passed.

  16. Run a SMART Long Test, Enter smartctl -t long /dev/drive_id and wait for it to complete.

  17. Examine the SMART data to ensure the Long test completed properly and without errors, Enter smartctl -a /dev/drive_id and verify the Long/Extended test passed.

  18. If the tests passed, your drive has been healed, for now.

  19. Before using this drive, Reboot your system to ā€˜enable’ the security features again.

  20. Now lets record the final result:
    Enter smartctl -x /dev/drive_id > fixed_drive.txt
    or if the drive was not fixed smartctl -x /dev/drive_it > not_fixed_drive.txt

Recovery from a power outage

This is purely experimental as I have not had a power outage while performing this task, however it is possible that if the power drops while the secure erase is occurring, the drive could be BRICKED. If this happens, repeat the steps above. The security password should still be ā€œpā€.

Perform:
Steps 1 through 9
Step 10 should not be required
Step 11 will hopefully result in:
	Issuing SECURITY_ERASE command, password="p", user=user

If Step 11 fails the first time, run step 10 and then step 11 again.

If Step 11 fails a second time, the drive may be bricked.

If you suspect your drive has been bricked, move it to a different computer and try to perform a secure erase of this drive. There are programs that will issue this command and maybe one of those will work but I will not discuss it here.

EDIT (2/12/2026): Updated beginning text and added steps 9b and 21 to record the starting point and ending point of the drive SMART data for later comparison.

1 Like

Joe, that was a bit long winded.

I thought the SATA standard simply needed to write to a bad block’s LBA, which forces the HDD, (and probably SSDs), to remap the bad block to a spare. Thus, the newly written data can now be read back just fine.

So, in the case of SATA interfaced devices, writing zeros, (or random if desired), to the entire storage device, HDD or SSD, would cause any pending bad blocks to be spared out. Of course, a targeted write to just the LBA(s) affected would be faster, yet sometimes harder to arrange.

SAS / SCSI uses a more complex sparing out, which I don’t remember. But, the host needs to send the appropriate SCSI command, with block LBA, to cause the block to be spared out.

Arwen, That is my take also that writing across the entire drive would force the firmware to remap the badblocks. I think Joe may have found a drive in a mirror that didn’t behave that way and was successful in fixing the drive. So edge cases may be where it is best used, I don’t know.

Unfortunately right now I don’t have any bad drives to test (recently tossed two) or a system not already full to test with.

As far as the instructions I don’t see any reason they couldn’t be used if they are followed step by step. I generally did manual testing of drives a few at a time with tmux sessions before putting them in a system build.

I originally thought the same thing, rewrite the data to the suspect LBA and the drive would remap it, however that did not happen while the drive was part of a pool. Either ZFS or the drive protected the LBA in question. I wrote to 10 LBAs data, repeatedly for over 500 iterations (each test is 50 iterations in the script I wrote). All I did was increase my Pending Reallocated Sector count, after a while. I was trying to force the 10 LBAs to reallocate, but it was not happening. (see attached log file, first sector is good, second sector is bad)

I removed the drive from the pool and tried again, same result. No forward movement at all.

Then I looked harder and found this method which was a secure erase. What this does is removes the computer from the equation, it is all internal to the drive. I am pretty sure I was hitting some wall where the computer was trying to do it all and the drive would not comply.

My initial looping script would write an LBA, check the return value, 0=Pass, not zero=Failed. I could get good LBAs to Pass, however the bad LBAs would always Fail. I also used a timer and the average timeout for the bad LBAs was 3 to 7 seconds, sometimes I’d see a 9 second value, but these were not always on the same LBA.

Long story short, I was unable to use dd to overwrite the bad LBAs. The secure erase was something I had nothing to lose so I tried it. I was very surprised that it worked, better than I expected. I had no reallocated or pending reallocated sectors listed.

One possible problem with my drive could have been the LBA address was not readable, thus the drive would always protect that area due to not knowing exactly where the head was. I don’t know if this was my situation or not, but very possible. Maybe I just needed the entire disk reinitialized? My drive was a Seagate Ironwolf 6TB drive.

So I thought I’d share this info. I would much prefer to just dd a few LBAs and then scrub the pool, much faster than a secure erase and then adding the drive back to the pool.

Any thoughts would be appreciated.
disk_test.txt (18.1 KB)

2 Likes

Hmm, weird and a good work around.

Perhaps iX can step in with what should happen…

Well, I don’t think the data is available now as the disk is remapped already, but next time would you please record specific SMART attribute IDs (numerical) and values for them. Because ā€œPending Reallocated Sectorā€, I don’t think it is any standard name? I am used to see ā€œCurrent Pending Sector Countā€ and ā€œReallocated Sector Countā€, and these are two different things.

@Alexey Attached are two text outputs, one before and one after.
Before:
ZR13JRL0_2026_02_08.txt (20.6 KB)
After:
HDD_Fixed.txt (11.7 KB)

So, the ā€œbeforeā€ file

  4 Start_Stop_Count        -O--CK   100   100   020    -    177
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
197 Current_Pending_Sector  -O--C-   100   100   000    -    16
198 Offline_Uncorrectable   ----C-   100   100   000    -    16

and ā€œafterā€ is

  4 Start_Stop_Count        0x0032   100   100   020    -       179
  5 Reallocated_Sector_Ct   0x0033   100   100   010    -       0
197 Current_Pending_Sector  0x0012   100   100   000    -       0
198 Offline_Uncorrectable   0x0010   100   100   000    -       0

So sectors were corrected by the overwrite, not reallocated.

Next time I wonder if you just dd something onto the entire drive, would that fix the problem. Theoretically, the write should have fixed it. Never seen that failing to work, at least on WDs. On the other hand, never had many Seagates.

Interesting data point, thanks.

I can’t say that. The drive could have mapped the bad sectors out. But I actually hope it just reformatted the sectors and all is good again. Time will tell.

Nope, the dd command was not actually writing/reading the drive. I had a critical medium error, the sector could not be written to, read from, something like that. I could write and read good sectors. I am under the impression that if the drive cannot verify the LBA to where it is to write, it will not write anything at all. This may have been my problem. But is it due to failing media or head alignment or something else?

I think the secure erase performed a low level format of the drive and mapped out bad sectors, just like the factory would have. That is what I think, but I don’t know this for certain.

And I will say, I have never had this kind of failure before. In the past dd would do the job but this time, it didn’t. It is odd.

But you can.

The thing goes (if simplified) like this - there are two types of problems:

  • Permanent, as in ā€œthere is a mechanical hole in the surface of this sectorā€.
  • Correctable by rewrite, as in ā€œtorn writeā€, the write was for whatever reason interrupted halfway, and now we have first half of the new sector, second half of the old sector. No way to ECC out of this.

Let’s say for whatever reason the read failure is discovered (either on the actual read, or during the patrol read, or self-test, whatever). Now the drive marks this sector as ā€œPendingā€ or ā€œOffline Uncorrectableā€ (I think depending on which mechanism discovered the read failure).

The drive cannot write to the sector to find out if it is correctable or not without the explicit command. Trying to do so would cause all sorts of strange side effects for filesystems, so no. So the sector remains in the state where it ā€œcertainly can’t be read but maybe can be written toā€. Next time the write command comes in for this sector, the drive does write-with-verify.

  • If the error is permanent, the sector is remapped. ā€œReallocated Sector Countā€ clicks up one, and ā€œCurrent Pending Sector Countā€ clicks down one.
  • If the error is correctable by write, the data is written, verification is successful, ā€œCurrent Pending Sector Countā€ clicks down one, and ā€œReallocated Sector Countā€ is unchanged to reflect the fact that remap did not happen (was not required).

The latter one is your case exactly. I only find it unusual that the regular write failed to achieve the desired effect.

1 Like

I would say that it is important to note that anyone trying to ā€œRepairā€ a drive with reallocated or pending sector errors, use dd first to try and force the drive to remap the problematic LBAs.

If that fails and you have no place to go, try the secure erase. And I will not say that a secure erase will fix the drive, it may not, but it also might.

Providing you have good redundancy in your pool, this drive could be put back into a pool, but I would not put critical data on a repaired drive unless there was redundancy. This is a decision the end user has to make. Maybe you will toss it anyway but just wanted to test this procedure out.

If anyone ends up in this type of situation, and the secure erase procedure works, or even if it does not work, please add an entry to this thread and include a few details, the entire output of smartctl -x /dev/sd? > fixed_drive.txt or smartctl -x /dev/sd? > not_fixed_drive.txt.

2 Likes

Perfect, yes.