Reused HDD Crashes Whole System Even After Sanitizing etc

Help!!

One of my Pools became corrupted. I have the data backed-up and want to re-create it. I’ve bought 3 new HDDs and want to use one (any one) of the 4 original HDDs to make a 4th for a RAIDZ1 array. But every time I insert any of the original 4 HDDs it causes the 3 new HDDs to disappear from ‘Disks’ and causes errors and eventual loss of access to my other Pool, which is otherwise perfectly healthy. It’s as if inserting any of the 4 original HDDs is ‘poison’ to the TrueNAS.

I’ve tried using all the TrueNAS Wipe options. Putting the HDDs in an external USB housing connected to my PC and using Diskpart/Clear. And using Seagate Tools to sanitize them by over-writing with zeros. None of those things has worked.

When I ran <sudo zpool wipefs -a> nothing happened.

When I ran <sudo zpool labelclear -f /dev/sdh> I got a message saying ‘Failer to Clear Label’.

I think there’s some legacy information recorded on all 4 original HDDs that is making TrueNAS recognise them, perhaps as part of the old corrupted and over-formatted Pool, that’s preventing TrueNAS from treating them as clean, new HDDs.

Any suggestions would be very much appraciated.

I tend to put them into a windows box and run diskpart

If you just insert the original HDD alone, with 3 new disks disconnected, what happens?

Hi Alexey, If I do that the HDD is seen in Storage and listed as 1 unassigned disk. My other Pool stays operating normally and is not taken down, which is good news.
But if I then insert any or all of the 3 new disks they appear briefly as unassigned disks, making 4 in total, but only for a few seconds, after which all 4 disappear and the other working pool starts reporting errors and shuts down. When that happens I get an error message telling me disks in the other pool were removed by the administrator - even though I’ve not removed anything.

My next question would then be, if you have a known-good power supply, swap the power supply and retry. That definitely sounds like “not enough something to serve all the disks at the same time”, and that “something” is often power.

1 Like

Alexey,

I have had 3 pools each of 4 HDDs all working happily together using the same power supply, which is a known good 750 Watt supply. Now I’m only trying to use 2 pools, the disks of the third having been temporarily removed whilst I try to sort this issue out.

So I don’t think it can be a shortage of power. The only power supply related thing I can think of is that I had to use a Molex/4-way SATA power adapter cable in order to have enough SATA drive power connections. But everything worked happily together, including the 3rd pool, until the pool comprising the original 4 troublesome HDDs got corrupted.

What happened was that I tried to stream a movie from the pool and it stuttered and then stopped. The TrueNAS then sent me an email telling me one of the drives had been removed by the administrator - once again even though I hadn’t removed anything. The affected pool wouldn’t resilver and eventually the disks started to disappear from Storage>Disks.

I tried everything I could find online to try to repair the pool and finally gave up, removed the HDDs, formatted them and tried to create the pool again. But it simply wouldn’t work. I decided to replace all 4 HDDs, but they’re in short supply (Seagate Ironwolf 10Tb) and I was only able to purchase 3. Hence wanting to reuse one of the original 4. (They’re all Seagate Ironwolf 10Tb drives too.)

Motherboard, CPU, RAM etc are all fine.

I’ve just tried running <sudo zpool labelclear -f /dev /sde> (Disks now listing suspect HDD as sde) and had the same message - failed to clear label.

I agree with Alexey that this sounds like something is doing that. Power supplies can fail, cables as well. I would have guessed a cable or connection issue. How are you connecting your disks? Native SATA ports, HBA, Port Replicator/Multiplier, …?

1 Like

Wel then,

What is the exact hardware?

Something wasn’t fine in the original system because we know the pool became corrupted (arguably in the same manner it does now - it ejected a drive).

I still think there is a common faulty component. If Seagate tools say the drives are fine, the drives probably are.

Currently using an HBA card. I’ve tried using 2 different HBA cards, each time with a new set of cables (because I thought one might be faulty), and also connecting directly to the SATA ports on the motherboard. So I’m sure it very unlikely to be a cable or SATA connection issue. The problem persists no matter how the HDDs are connected.

There’s good venilation in the case and an extra, dedicated cooling fan for the HBA card, the heat-sink of which only gets slightly warm to the touch, so I know it’s not overheating. The HDDs are all running around 30-35 degrees C, whether new, original, in the working pool or the pool I’m trying to re-create.

I’ve also tried changing the order in which the HDDs are connected, whether to the motherboard or an HBA card, by swapping round the SATA data cables. And for good measure I’ve tried swapping the SATA power cables round too. The issue always follows the original problematic HDD, irrespective of which SATA or power cable is connected to it and irrespective of which of the 4 original HDDs I try.

Could TrueNAS have written something into the the boot or end sector of these HDDs that a full wipe or sanitizing format hasn’t been able to erase - so that TrueNAS is still recognizing the formatted HDDs as belonging to a legacy pool in some way? When I insert any one of those disks next to the 3 new ones TrueNAS initially lists it in Storage>Disks, briefly, showing 4 Unallocated Disks. Then it disappears, the 3 new HDDs progressively disappear and then my otherwise working second pool starts to break down.

If I completely remove the TrueNAS boot drive, which is a Samsung 500Gb SSD, replace it with a new SSD, install a fresh download of TrueNAS onto the new SSD might that sort it out? The new TrueNAS installation wouldn’t have any legacy memory of any of the pools or HDDs. I’ve not tried that yet because I don’t want to risk compromising my working pool (which is a RAIDZ1 pool).

Because that’s an invalid command.
Drop the “zpool” bit and add the device you want to wipe, so for example:
sudo wipefs -a /dev/sdX
Mind you, be careful what you device you point it at, it’s a destructive command.

Sorry, maybe I didn’t express myself clearly. I’d actually done what you suggest a few times and still have the problem. With 3 new HDDs in the TrueNAS there’s no problem. They get recognised as 3 Unallocated Disks. When I then add one of the original, sanitized HDDs it briefly gets recognised as a 4th Unallocated Disk. But if I try to create a pool it always fails, all the Unallocated Disks disappear and shortly afterwards I start to get errors in my working Pool. Remove the original HDD and normal service is restored within a minute, with the errors on the working Pool having corrected themselves.

I’m assumingTrueNAS may have logged the serial numbers of the original 4 HDDs that were used in the Pool that got corrupted and is perhaps recognising them and blocking their re-use even after Windows>command prompt>diskpart>clean, SeaTools>Erase>Sanitize, TrueNAS>Shell>wipefs etc.

I still think it’s hardware or firmware. Even assuming hypothetically it does persistently hate a disk (which I am pretty sure it does not), there is no sense in crashing the working (the other) pool.

2 Likes

A test you can try. Power off, disconnect the data cables only from your good pool. Leave power connector in place. Power up with your all 4 of the drives you want for the new pool. What happens?

If it fails again, power down, disconnect the power to you good pool drives, power the system back up. Whap happens now to your new pool drives? Are they all recognized?

The first scenario tells you if there is a pool name conflict of some sort. The second scenario tells you if it is power related.

Hardware fails so do not assume it is all working properly until you have verified it.

And yes, the older drive could have some data on it telling it that it belongs to another pool, if you didn’t wipe it correctly.

Hi again, sorry for the delay responding. Life gets in the way sometimes.

Okay:- With the data cables disconnected from my good Pool and the power cables for the good Pool remaining connected I get exactly the same symptoms from the 3 new and one re-used (wiped, formatted and sanitized) drive. If I boot with the re-used drive installed none of the HDDs are recognised. If I boot with just the 3 new drives installed they get recognised as Unallocated Disks. If I then add the re-used drive (luckily my case has hot-swap drive bays) I see 4 Unallocated Disks listed. But if I then try to create a pool it immediately fails and the 3 new drives plus the re-used drive all disappear from Storage>Disks.

If I understand you correctly that may indicate some sort of Pool name conflict.

If I then disconnect the power cables from my good Pool exactly the same thing happens all over again.

So that would tend to eliminate a power supply problem. Is that right?

And just for completeness, it doesn’t matter which HDD from the 4 originals that I use to run these tests. The failure to be able to create a new Pool using 3 brand new HDDs and any one of the originals is exactly the same. And after trying the new HDDs all disappear from Storage>Disks. But if I remove the re-used disk all 3 new ones return and are listed as Unallocated Disks.

To clear the re-used HDDs I’ve tried using the Storage>Wipe (Full with zeros) command in TrueNAS, putting them in a USB case and running Windows Diskpart>List Disk>Select (disk3)>Clean and also Using SeaTools>Erase> Sanitize.

Do you have your old pool name in use currently? You may have to work with the command line, zpool list and zpool destroy

You can check the GUI to see if any pools are shown for IMPORT or anything odd when you look at the EXPORT/DISCONNECT options.

If that fails to work, try this.

  1. Leave ONLY your suspect drive (and boot drive) installed in the system, disconnect everything else since those appear to be working correctly.
  2. Bootstrap TrueNAS.
  3. Log in as root on a CLI or the GUI System → Shell (I prefer the CLI via SSH).
  4. Enter lsblk and Enter.
  5. You should have a list, sda, sdb more than likely. The 3 TB drive would be the suspect drive, the boot drive hopefully a lot smaller. Disregard the partitions if any are listed.

I have not used this command but I think it will be faster than dd, and it’s been around for a while.
6. Enter blkdiscard -f -s -v /dev/sda (where sda= the problem child)
7. What are the switches: -f = Force the drive even if mounted, -s = Secure Erase, meaning if there are duplicated blocks of the same data, erase those as well, -v = Verbose so you can see it in progress. Secure Erase is not the same as a Secure Erase on a SSD.
8. Once completed, reboot and see if the drive remains an available drive. If it looks fixed, add the three other drives, see if all still looks good. Of course power down and on as needed.
9. If the problem is NOT fixed, run the command again but add the -z switch to zero out the blocks. This will take longer of course. Here is what that command looks like blkdiscard -f -s -z -v /dev/sda.
10. If this fails to work, you are going to need to resort to the dd command to wipe the entire drive. This take a fair amount of time. With 3TB, should not be terrible. Those who have 20+TB drives would hate it.

Please post the results, I am very curious how the blkdiscard command works. Now I’m thinking about tossing in a spare hard drive that came from a Mirror that is active on my server and seeing what happens. My problem is I would not be able to recreate your specific issue.

Good Luck.

Hi Again. No luck I’m afraid.

I can enter <blkdiscard -f -s -v /dev/sda> with or without as a prefix and with the ‘problem child’ (which is actually a 10 Tb drive) connected via my HBA card or directly to a SATA connection on the motherboard. I get asked for my password, which is normal. But then it fails with the following message:-

BLKDISCARD ioctl failed: Operation not supported.

After trying each way the ‘problem child’ HDD is listed as a single Unallocated Disk. But when I add the 3 new HDDs I hear what sounds like a bunch of HDDs initialising. This goes on and on for about 10 minutes during which I get 1, 2 1nd eventually 3 Unallocated Disks shown in Storage. But when I click on <Storage, Disks> only the three new HDDs are listed. The ‘problem child’ is not listed.

And I’ve tried the same thing with each of the 4 original 10 Tb HDDs, so it’s not just a faulty single HDD that’s the ‘problem child’.

I’m not sure how to run the

command. Is it simply <dd /dev/sda> or should I have further ‘switches’ in the command line? I know that with a 10Tb drive it’ll take ages. Before I start and have to let it run its course, will
do anything different from SeaTools Erase, Full Sanitize overwriting with zeroes?

Thanks.

I think your system configuration is messed up. It might be time to export all your pools, do a fresh install and then import your pools.

What is the details on your system? Apps, etc? I don’t know how to look at your System Configuration database for conflicts. Maybe other posters will have ideas too?

@joeschmuck and others, for reference you can use dd to wipe the first 10MB of a drive. this is fast, about as fast as wipefs, on any drive size, and accomplishes the needed erasure of any superblocks, partitions, or other data that would interfere with reusing a drive while not having to dd the entire drive.
sudo dd if=/dev/zero of=/dev/sdX bs=1M count=10 status=progress

I have used it in the past with success when I needed a bunch of used drives cleared for reuse.

If you wish to do the entire drive do it in a tmux session and remove count=10 from the command. Be aware that on big drives it may take days to run so other methods are recommended for a full drive wipe.

There is a possibility the drives may have come from a Dell server, EMC, and be formatted to a special sector size. These drives are often formatted with a special non-standard sector size, such as 520, 524, or 528 bytes per sector, rather than the standard 512 bytes or 4Kn (4096 bytes). These drives can be reused but have to be low level formatted using sg_format (sg3_utils) and is best done in a live Linux environment and not in Truenas.

The Linux command in bash would be:
sg_format --format --size=512 /dev/sdX to change it back to standard 512.
or
sudo sg_format --format --size=4096 /dev/sdX to change to 4Kn (4096 bytes).

So if you bought cheap drives, they may need to be reformatted at a low level.

1 Like

Hi, Thanks for the suggestion(s).
I tried running <sudo dd if=/dev/zero of=/dev/sdX bs=1M count=10 status=progress> and it appeared to run okay. Afterwards, however, the symptoms were exactly the same as before.

Specifically, with any of the ‘problem child’ disks being the only HDD is the TrueNAS connected, whether via the HBA card of directly to the motherboard, that HDD appeared in Storage and was shown a 1 Unallocated Disk. When I then added the 3 brand new HDDs I got the sound of initialising drives for at least 10 minutes before ending up with the 3 new HDDs listed as Unallocated Disks and the original having disappeared from Storage>Disks again.

I bought the 4 ‘problem child’ HDDs, new, in 2018 and ran them until around July 2025 in a Netgear ReadyNAS RN104. (They were not cheap drives. They were brand new and warrantied by Seagate etc.) As the Netgear ReadyNAS was no longer receiving service or security updates and had become an obsolete model I decided to build a TrueNAS using components from an HTPC.

Motherboard is an Asus Z97A, CPU Intel i5 Core 4670K running at 3.4 GHz, 16Gb Corsair RAM and an SAS HBA card compatible with LSI 9305-16i IT mode, (with an extra HBA cooling fan) and power supply is a Corsair 750 Watt modular supply. TrueNAS is insatalled on a Samsung 500Gb SSD connected directly to the motherboard and there’s a second Samsung 500Gb SSD for apps, on which only Plex is installed just now.

After building the TrueNAS the Pool created using the 4 original HDDs worked perfectly for 5-6 months. Then I started to get error messages from the TrueNAS, by email, telling me the Pool was ‘degraded’ because one of the disks had been removed by the administrator - even though I’d done nothing of the sort. But there were sufficient disks remaining to keep the Pool working. No matter what I tried I couldn’t reverse the position, but I did note that it was different HDDs that had been allegedly removed by the administarator. I tried everything I could find online to try to recover/restore things but nothing worked, usually ending with a message telling me there was an I/O error. Finally the data just disappeared. But fortunately I’d backed it all up onto a JBOD and can repopulate my Pool if I can get it working again.

Eventually I concluded that all I could do was fully format all 4 of the disks from the troublesome Pool and then try to rebuild it. But I’ve never been able to get the disks to work in the TrueNAS, even when replacing 3 of the 4 with brand new ones. (They’re in short supply and I could only get 3 at the time.) As soon as one of the originals is fitted the three new ones are taken down, then the original disappears from Storage>Disks and then the 3 new ones return as Unallocated Disks.

I’ve tried Wipe in TrueNAS, including writing zeroes, which took days, Windows DiskPart>Clean, SeaTools Erase, Full Sanitize with zeroes etc. Absolutely nothing seems to clean any of the 4 original drives so that it can work alongside the 3 new drives. In SeaTools all 4 of the original drives pass both the short and long tests.

I have no exported Pools to Import and I remember that I ‘Destroyed’ the old Pool (which was pool_1) in TrueNAS.

I’m comng to the conclusion that it may be TrueNAS itself that’s the problem. It may have logged and recorded the serial numbers somewhere of the 4 original HDDs knowing they came from a Pool that became corrunpted and failed. And it may, in some way, be preventing me from re-using any of the 4 original HDDs rather than the HDDs themselves having some sort of irremoveable problem.