Hard drive disappeared after replacing Proxmox with Truenas

www.com.au · December 28, 2024, 4:07am

Proxmox was previously installed, with a ZFS Pool on 4 x 8TB HDD. OS hosted on an NVME SSD.

Put TrueNas on a bootable USB, installed over the top of Proxmox and wiped it. No data in ZFS Pool.

Upon successful install of TrueNas OS, now only 3x drives. I followed instructions from a post in these forums to run “lsblk” command to check devices, except when I do this in the shell it says “command not found”.

There are no answers that I can find on Google or Youtube that explain what it means if lsblk command is not found or how to fix it (other than one post where a user points to a custom script for checking devices, with no explanation on what it is, how to use or install it.)

Any help would be greatly appreciated

www.com.au · December 28, 2024, 4:08am

I have tried looking at the devices in BIOS, but as they are being hosted on an LSI HBA card and not the chipset PCI it cannot see any of them.

Farout · December 28, 2024, 6:01am

Try

sudo lsblk

www.com.au · February 25, 2025, 9:07am

Hasn’t worked. I have tried pretty much everything and I cannot seem to get it to show up for some reason.

I got an 8TB RMA’d. Bought a new 14TB because that died. Now I have another seperate 14TB which won’t show up on this, but will show up on another ubuntu server (22.04).

I tried wiping and formatting the drive, with a partition, without a partition, swapping power cables, data cables, different cable mounts in the HBA card. Nothing. Neither camcontrol rescan `camcontrol devlist’ ‘gpart rescan’ or any other command see to show the drive, however all the other drives are there. It can’t be the drive being incompatible, because 2 of the others are identical brand and size. I don’t think it’s the HBA as that sees 6 other drives find in various configurations. I have swapped PSUs and PSU cables.

This is the final step of a long server build so I am somewhat puling my hair out on what is going on here.

www.com.au · February 25, 2025, 9:13am

Weirder still, I now can’t do SMART tests on the existing drives? What on earth lol

Error: Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 141, in call_method
result = await self.middleware._call(message[‘method’], serviceobj, methodobj, params, app=self)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1242, in _call
return await methodobj(*prepared_call.args)
File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”, line 981, in nf
return await f(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/smart.py”, line 411, in manual_test
verrors.check()
File “/usr/local/lib/python3.9/site-packages/middlewared/service_exception.py”, line 62, in check
raise self
middlewared.service_exception.ValidationErrors: [EINVAL] disks.6.identifier: {serial_lunid}FYC5N01051340565Y_ace42e003a5dd5072ee4ac0000000001 is not valid. Please provide a valid disk identifier.

www.com.au · February 25, 2025, 9:40am

I took it out again, ran a SMART test on it, no errors, all OK. Back in the TrueNAS box, still not visible. Maybe it’s the motherboard or HBA being fuckey

www.com.au · February 25, 2025, 12:51pm

So it turns out lsblk did not work because I am on the free version, TrueNas Core, which does not include this command.

I was able to instead use the camcontrol command to troubleshoot. It is looking like it may be a fault in the LSI HBA card, which seems to only see 6 of 8 had drives, even with cables or hard drives swapped. Plugging 4 of the 7 into the mainboard SATA allows all to be visible. No idea why. The HBA is flashed to IT mode version 20. No idea if there’s anything else I can do at this point

HoneyBadger · February 25, 2025, 2:53pm

TrueNAS Community Edition (formerly SCALE) is also free, and will include (most of) the Linux troubleshooting commands you’re more familiar with.

Assuming this is a SAS2008 based card based on the P20 reference. Check cables for sharp bends/fraying or corrosion on the connectors and retry. Also ensure that the card is being sufficiently cooled.

neofusion · February 25, 2025, 2:54pm

It would probably help others here if you could give full details on what hardware is being used, as well as software version. The fact that lsblk didn’t work is because it was, in lack of other information, assumed that you were running TrueNAS SCALE (Which is the recommendation nowadays, it’s also free).

Your HBA being version 20 doesn’t say much, because P20 is just the general overarching version number, the relevant number comes later in the version number. It would also help to know exactly which model you have, but that links back to the very first thing, give a full description of the hardware involved.

www.com.au · February 27, 2025, 11:39am

It would probably help others here if you could give full details on what hardware is being used

True, sorry.

TrueNAS-13.0-U6.4 on a AMD Ryzen 3 3200G, Gigabyte B450 AM4 board with a 9200-8i HBA card, FW ver 20.00.07.00, NVDATA 14.01.00.08, x86 BIOS 07.39.02.00.

For all 8 hard drives to be seen, I have plugged 4 in via SATA to the mainboard, which is working. If I plug them back in to the spare HBA port, with either cable, in either port, only 3 show up.

The 4x Mobo SATA are 14TB, Exos
The 4x HBA are 8TB, Seagate Barracuda.

I’ll upgrade to Scale as soon as the resilvering process is done. I got this one as I was a bit confused about what version of TrueNAS did what, and then I never saw the updates because the TrueNAS server wasn’t seeing them due to the DNS address, but that’s fixed now at least. Problem is I don’t want to update until the resilvering is done, as I’m worried it’ll further risk something going wrong with the data. Atm it’s looking well over a week until it’s done.

I read elsewhere that it was fine to shutdown and restart during a resilvering, but now TrueNAS gives me no ETA and starts over from 0% if I reboot. So unfortunately I have lost a couple of days of resilvering time it seems. I have a 2x4 VDEV pool, that’s 25% full. I know that’s a slow topology choice. I also only have 8gb ram which doesn’t help. I saw someone say something about turning off hard drive power saving in the BIOS but I haven’t been able to find it.

SmallBarky · February 27, 2025, 11:48am

Expand the Servers section in neofusion’s post right above. That is the level of detail you need to post so we know how you have your system set up. Your Seagate Baracuda drives are probably SMR and not suited for use with ZFS and TrueNAS. You need to check the specifications on the model numbers of your hard drives.

SMR vs CMR ServeTheHome

pmh · February 27, 2025, 11:54am

lsblk is not included because it’s a Linux command and TrueNAS CORE is not based on Linux.

camcontrol devlist
gpart show

are available as you already found out.

www.com.au · March 6, 2025, 3:09am

Ok sure:

MB: Gigabyte B450-M
CPU: AMD Ryzen 3 3200g
RAM: 8GB DDR4 3200 C18
PSU: Corsair 850iax
Boot: SK hynix BC711 HFM256GD3JX013N
Storage: WD Ultrastar HC550 WUH721414ALE6L4 x2
Storage: Seagate EXOS X16 14TB ST14000NM001G x2
Storage: Seagate Barracuda ST8000DM004 x6 (2 dead, 2 degraded)
HBA: LSI 6Gbps SAS HBA 9200-8i

The weird thing is though that it wasn’t the SMR drives which weren’t showing up. If SMR is what killed the other two is there a way to revive them? Neither showed any errors, they just one day weren’t visible to the system at all. I have tried them in other Linux and Windows system but neither sees them.

Frustratingly after a 10-day resilvering process two of the SMR drives are also now somehow degraded. I have used the command zpool iostat -v 1 to view the drives during a new scrub, which I hope will fix the degraded status as a SMART test of all drives showed no errors. The output of iostat showed, if I am understanding correctly, that surprisingly the SMR drives are writing at ~80-100, while the 4 drives not on the HBA (the CMR ultrastar drives) are writing between 3-6. Unless the 3-6 is Gbps and 80-100 is Mbps or something? It is not specified so I’m not sure.

I will follow the steps for helping SMR drives but I don’t have a couple thousand dollars sitting around to drop on a full replacement of brand-new non-SMR drives. So if anyone has any tips on how to deal with that I’d appreciate it. The majority of the data will at least once written only be read.

www.com.au · March 9, 2025, 6:09am

Ok, so the pool scrub has finished after ~4 days. Two of the drives 8 drives (the SMR ones) remain degraded, even after a scrub. No SMART error on a short test.

Any suggestions on how I can fix the degredation, and any settings i need to change to make sure I don’t kill the SMR drives? Will they be ok if I just leave them until I start getting SMART errors, or what is likely to happen? I probably wont have money to replace any drives for some time.

I also unfortunately am unable to update. Another forum thread said my version has driver issues due to using OpenBSD, which I was hoping would fix the net drop out issues I’m having when I switch. However, when I attempt to apply the latest update I get this error:

Error: [EFAULT] Traceback (most recent call last): File "/usr/local/lib/python3.9/runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/lib/python3.9/runpy.py", line 87, in _run_code exec(code, run_globals) File "/var/db/system/update/squashfs-root/truenas_install/__main__.py", line 21, in < module > from .utils import getmntinfo, get_pids File "/var/db/system/update/squashfs-root/truenas_install/utils.py", line 9, in @dataclass(frozen=True, kw_only=True) TypeError: dataclass() got an unexpected keyword argument 'kw_only'

Edit: I was getting an error because I was trying to upgrade straight to either the latest stable or latest beta. I instead needed to choose 22.04 first, then apply patch, then when that was done update to the latest patch. Great! The performance in the new version is noticeably better. I also no longer get dropouts across big (multi-TB) transfers. Now that the ethernet issues and resulting slower UI are no longer masking what is happening while viewing the transfer info, I can see that the transfer speed is fluctuating from the drives themselves, which I assume wasn’t playing nice with the machine on the older drivers, which was causing all the dropouts and weirdness during transfers.

I am now much more urgently going to try and replace the 8tb drives as I think it will noticeably impact users ability to access the server reliably. Wish the guy who sold them to me mentioned it when I bought 6 of them but what can you do I guess that’s life.

www.com.au · March 17, 2025, 7:29am

Bump as I’m still not sure what to do about the degraded disks. If anyone has any info it would be greatly appreciated.

SmallBarky · March 17, 2025, 6:54pm

Please post back using Preformatted text (</>)

sudo zpool status -v

www.com.au · March 18, 2025, 5:13am

Linux JacPool 6.6.44-production+truenas #1 SMP PREEMPT_DYNAMIC Tue Jan 28 03:14:06 UTC 2025 x86_64

        TrueNAS (c) 2009-2025, iXsystems, Inc.
        All rights reserved.
        TrueNAS code is released under the LGPLv3 and GPLv3 licenses with some
        source files copyrighted by (c) iXsystems, Inc. All other components
        are released under their own respective licenses.

        For more information, documentation, help or support, go here:
        http://truenas.com

Warning: the supported mechanisms for making configuration changes
are the TrueNAS WebUI, CLI, and API exclusively. ALL OTHERS ARE
NOT SUPPORTED AND WILL RESULT IN UNDEFINED BEHAVIOR AND MAY
RESULT IN SYSTEM FAILURE.

Welcome to TrueNAS
root@JacPool[~]# sudo zpool status -v
  pool: JacPool
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
  scan: scrub repaired 0B in 1 days 01:10:34 with 0 errors on Fri Mar  7 12:45:42 2025
config:

        NAME        STATE     READ WRITE CKSUM
        JacPool     DEGRADED     0     0     0
          raidz1-0  DEGRADED     0     0     0
            sdb2    ONLINE       0     0     0
            sda2    DEGRADED     0     0     0  too many errors
            sdf2    DEGRADED     0     0     0  too many errors
            sdc2    ONLINE       0     0     0
          raidz1-1  ONLINE       0     0     0
            sdd2    ONLINE       0     0     0
            sde2    ONLINE       0     0     0
            sdg2    ONLINE       0     0     0
            sdh2    ONLINE       0     0     0

errors: No known data errors

  pool: boot-pool
 state: ONLINE
status: One or more features are enabled on the pool despite not being
        requested by the 'compatibility' property.
action: Consider setting 'compatibility' to an appropriate value, or
        adding needed features to the relevant file in
        /etc/zfs/compatibility.d or /usr/share/zfs/compatibility.d.
  scan: scrub repaired 0B in 00:00:08 with 0 errors on Tue Mar 18 03:45:09 2025
config:

        NAME         STATE     READ WRITE CKSUM
        boot-pool    ONLINE       0     0     0
          nvme0n1p2  ONLINE       0     0     0

errors: No known data errors
root@JacPool[~]#

SmallBarky · March 18, 2025, 6:53pm

If those two drive pass the SMART long tests and are SMR models, you can try ‘zpool clear’ to get rid of the errors.

If you have a spare hard drive port, I would suggest doing a in place drive replacement of the SMR drive, one at a time, as you can afford replacement. You don’t offline the old drive but choose replacement, allow resilver of data and then off line the old, SMR drive being replaced.

www.com.au · March 20, 2025, 5:02am

Thanks for your reply, I’ll give that a shot!

Stux · March 20, 2025, 7:51pm

Once the drive is replaced it will be removed from the pool. No offlining required.

But maybe you meant physically remove