Boot pool cksum error


My server get some strange errors in few day’s after updating to TrueNAS-SCALE-23.10.2. Maybe it was just coincidence. Unfortunately the boot ssd it’s connected on usb 3.0 (low power server without enough sata ports), and smart test doesn’t help so much.
Seems to be some problems related to checksum error:
//usr/lib/modules/6.1.74-debug+truena s/kernel/drivers/scsi/qla2xxx/q1a2xxx.ko

Thanks in advance for your help

My best bet would be: the CRC errors are a direct result of the USB connection. USB is not made for the sustained IO that a boot pool experiences. CRC is usually power / cable related so in that case I’d assume it’s the USB controller burning out.

The remedy: get an nvme drive (if you have ports for that) or get a HBA to get more SATA ports.

Backup your configuration and reinstall, restore config.

You can list your complete hardware for suggestions.

Thanks for reply.
I use 1xnvme 2tb,3x2tb on sata, everything on raidz1, 1x256gb for boot pool on caddy. I7 9700, 40Gb of ram.
The complete server is running under 20 watts on idle. If I’m adding an hba, will double the power consumption :slight_smile:
Can I try to replace the caddy and reuse the same ssd for boot?

I don’t know about the power consumption of a HBA, I doubt it will draw 20 W for a single SSD.

I use a cheap one, not the often recommended lsi hbas, but I would say the additional power draw was < 5 W, if noticeable at all.

You can buy them used, too. Maybe cheaper than dealing with a failing boot drive every now and then, it will not get better over time.

With caddy you mean like the USB connection? No, your boot pool is damaged. You can reuse the SSD, im fairly confident the SSD itself is okay. But you absolutely need to reinstall the OS. Without redundancy there is no way for zfs to recover the now existing errors.

You didn’t mention your mainboard, do you even have a free PCI slot?

No, that’s not likely. Things can fail over USB or any interface, but data corruption is not something to expect just “because USB”.

What are you using, exactly?

This is important. It’s hard to recommend something without knowing the details of the situation. Obviously, an HBA would be the shotgun approach, but perhaps we can avoid going that far.

Is an crucial ssd inside Sabrent caddy.

HP prodesk 400 g6 MT. I have 3 pci slots free. 1x16 and 2x4 if I remember well.
The motherboard will not boot from storage connected on pci slots. At least that’s what I find on Google. Also to go with an cheap 4 or 6 sata extension pci card base on ASM1166 (for low power consumption), I believe is not the best idea. An good LSI hba will drain at least 7-8 watts on idle (this I was reading on forums)

Sorry, maybe that wasn’t clear: I was talking about a failing USB interface, not in normal operation conditions. The paragraph continues

USB is not made for the sustained IO that a boot pool experiences. CRC is usually power / cable related so in that case I’d assume it’s the USB controller burning out.

Unless you’re not believing a failing USB interface may introduce CRC errors, then I didn’t quite catch you.

Do you test something like this?
Maybe I can move one ssd in pci slot with card adapter what is now connected on motherboard sata port, use it for storage pool with other ssd in raidz1, and use the sata port on motherboard for boot ssd?

I was reading about this. It’s not recommended to run the storage over USB, this was my only option at that time :blush:

Did I just get redirected because of the EU domain or are you from Germany?
I could send you the specific model I’m using (from delock) if you are interested. however I will not recommend it, I purchased it when I didn’t know better and until now it worked without problems. You will probably find an appropriate LSI HBA for example on the used marked in a similar price range. That is what I would recommend. Important note here: it needs to be flashed to IT Mode.
As I don’t own one myself yet I can’t really give you details on that, but the old forums is full of information on HBAs and their setup.

I’m in Germany. LSI in it mode it drains 7-10watts only to be plugged in the pci slot :blush: it’s not the best solution (I know is the best way for performance but not for me).
Pls write me the exactly model what u use. I will try to find it here.

Thank you

Again, do not recommend going the same route:

I didn’t measure the power draw, I didn’t really notice it though.

Thank you. I will check the review and I will order one. Anyway is more better than usb. It must draw less than 5W/h.
Now I don’t know if I can mix the storage pool ssds. One ssd on hba, one ssd nvme, 2 ssd’s on sata motherboard. Everything in raidz.
And the free sata port on motherboard to be used for boot pool.

But for a moment is any possibility to fix the corruption files without reinstalling truenas?

Actually, on CORE I am using a MLC USB stick and it’s giving me no issues; many more users run their boot SSD via USB, and very rarely we get this kind of issues.

@JustMe please read Multiply your problems with SATA Port Multipliers and cheap SATA controllers | TrueNAS Community; if you are using proper hardware, truenas does not care wheter they are connected to an HBA, a PCIe slot, or the motherboard’s controller.

Thanks for advice
Also my server was running with caddy almost 2 years, without any problem. Until it happens :grin:

Yeah, there are reliable USB flash drives, you just have to pay the substantial premium. Still less than the added cost of an HBA.

Sounds to me like it’s the SSD, then. SATA bridges have gotten surprisingly good in recent years.

Not in my experience so far, but time will tell :slight_smile:
It’s just a boot drive for my easily accesible NAS after all, I can just bend over the deck and replace it if needed.

Do you believe the ssd is failing? I can replace it is not a problem. I’ll go in vacation next month there (the server is running at 2000km far away from me, on my parents house, and that’s why I must keep it on low power consumption).
What’s your recommendations? Caddy or hba?

Also at this point, can I do something to fix files corruption?

You just need to reinstall TN and upload your system’s config backup. Only the boot drive got corruotion, everything else is fine.

Easy job to reinstall truenas. Big problem is to add and configure again all the apps :pleading_face: