Server Hardware Upgrade Check Request

OK, my NAS hardware is very old, and I basically want to do an upgrade for reasons below. I’m hoping I can get a few eyes on this to make sure there’s nothing I’m missing. Also, is there some rundown of all the different SuperMicro X12 motherboard variants like the old one for X11?

Reasons for wanting to upgrade:

  • Increased RAM/Processing cores/power
  • 2.5Gbe networking (ISP 2Gb coax connection)
  • Updated IPMI
  • Possible future virtualization of TrueNAS through Proxmox and expansion of server capabilities
  • Future addition of graphics card (cloud gaming PC/hardware encoding)

Current build:

  • Fractal Design Define R5 ATX Case
  • ASRock E3C226D2I Mini ITX LGA1150
  • Intel Xeon E3-1275
  • 2x Crucial CT102464BD160B 8 GB DDR3-1600 CL11

Current upgrade plan build:

  • 45Drives HL15
  • SuperMicro X12SCA-F
  • Intel Xeon W-1290P LGA-1200
  • A-Tech 128GB 4x 32GB 2Rx4 PC4-21300R DDR4 2666 ECC REG RDIMM

I’m very open to any and all feedback!

Why specifically X12? There are X13 and X14, and AMD is getting more and more interesting in the home server space, with “server Ryzen” boards from AsRock Rack (first and mostly), but also some Supermicro and Gigabyte boards (MC12, MC13), and now EPYC 4004 to take Xeon E straight on…

LGA1200 uses ECC UDIMM not RDIMM.
X12SCA is, nominally, a “workstation”. The genuine server equivalent would be a X12STH (or STL) with Xeon E-2300.

Go straight to 10G. Intel X550 or X710 NIC if you want to downgrade to 2.5G/5G copper from a 10GBase-T port.

Beware here…

1 Like

I’m certainly not married to X12. X13 seemed to get much more expensive. I run AMD on my desktop, so I’d happily go with an AMD server build! My current build is ASRock Rack, and it’s been solid. I’m trying to find a middle ground between a solid upgrade and not shelling out a ton of money for more recent hardware with power/features I don’t really need. I’m looking at incremental upgrades but I have an open budget if the added cost is justified. Is there a clean resource that I can use to say, review all the MC13 motherboard variants? I think this is the main roadblock: the inability to do comparative research across a large set of motherboard options.

Ah, I see that now. Thanks for the heads up.

Understood, thank you.

What’s the concern here? I have an HBA. My understanding is that it’s fairly straightforward to pass the HBA direct to a VM, install TrueNAS, import config, then pools and carry on. What are the pitfalls?

The pitfall is that you must also blacklist the HBA to prevent Proxmox from ever trying to mount your pools. Some have reported here losing their pools to vitualising TrueNAS under Proxmox.

The solution to this is typically to go for refurbished server parts.
Not knowing your full requirements (apparently at least x8/x8 for HBA+GPU) and use case it’s not clear if you’re a candidate for some second-hand Xeon Scalable. But you appear to be sizing for lots of drives and at least 128 GB RAM, which would be a pointer to RDIMM.

MC13 is GIgabyte nomenclature. There are only two variants of their AM5 server board: MC13-LE0 and MC13-LE1. 1 GbE and 10 GbE.
The previous generation MC12-LE0 is known to European tinkerers for having been available for as low as 50 € some months ago, but it seems that stocks have eventually been cleared.

AsRock Rack parts are the B550D4U (and previous X470D4U, X570D4U) and B650D4U families, plus the EPYC4000D4U, which may be more appropriate to bifurcate the CPU PCIe lanes.
Supermicro H13SAE-MF is nominally a “workstation”, which like the X12SCA is certainly capable enough as server. x16/x0-x8/x8 slots.

That’s the round-up. The catch is that there were relatively few options (AsRock Rack + MC12-LE0) with AM4 and DDR4 UDIMM, while the newer and more diverse supply of AM5 boards uses DDR5 and genuine DDR5 ECC UDIMM is uncommon.

1 Like

On Proxmox, it looks like you have to worry about boot also.

Virtualize TrueNAS

2 Likes

I use a virtual disk in Proxmox to boot TrueNAS and PCIe pass through for NVMe storage drives. No problem with that setup so far.

Virtual strictly for the boot pool is considered OK, also for other hypervisors like e.g. ESXi.

1 Like

Ahhhh. OK, yes this seems like a huge risk. Just the thought of the blacklist going wrong and Proxmox corrupting the ZFS Array has me rethinking the virtualization process…

My main concern is noise and my secondary is power draw. From the pretty extensive research I’ve already done, it seems near impossible to do old server parts without massive noise. My rack is in my basement, and I can’t really have server grade noise without causing a problem. Also I’ve heard power draw on old parts is really high. If there are options that aren’t crazy loud/power hungry I’m all ears!

Thanks for that run down. If I wanted to dive into a deeper comparison, where can I go to get into the rabbit hole of mATX/ATX/eATX motherboard options from the last ~5years from ASRock/SuperMicro/Gigabyte?

This was my plan, but I’m second guessing now because of the risk of corrupting the ZFS array if Promox accidentally tries to boot from any of the pool drives…

Rackmount chassis are typically very noisy. Though our former Resident Grinch suggested one could take a large 4U storage rack, fill it at 50% in a checkered pattern to leave large airflow channels and then (only then) use quieter fans behind the array.
You can also put a server motherboard in a supposedly quieter consumer-style case.

Spinning drives make noise. Spinning drives draw current. Lots of spinners will make noise and draw some significant current even at idle.
Xeon E/Core/Ryzen/Xeon D-1500/Atom C3000 idle low or very low (< 10 W). But these are limited in PCIe lanes, and the lowest consuming embedded platforms are also limited in compute if you want to run heavy workloads on your storage.
Xeon Scalable (and Xeon D-2100) idle around 60 W, which is a lot for idling but possibly still less than the spinners spinning. And then these provide lots of PCIe lanes, can take lots of RAM, and may provide lots of computing power.

Your use case, your choice.

For AM4/AM5 that’s in my above post, minus networking variations on AsRock Rack boards and the odd mini-ITX board, which is not the direction you’re taking. If you mean comparing across ALL server platforms, I’m afraid you have to dig through the webside of each manufacturer, one by one…
Or maybe you know of a price engine with extensive filtering options and can take it from the " supply" side. For the European market, Geizhals.at is very useful, BUT the way it is structured you’d still have to do separate searches by socket/processor kind.

Could someone knowledgable please elaborate a bit on that or post a link?

Proxmox running with a ZFS pool as its main storage will import arbitrary othe ZFS pools at boot, even when they are not configured as Proxmox storage? Don’t you have to explicitly set up every storage backend and also assign what type of objects (virtual disk, ISO images, backups, …) it is supposed to hold?

Assuming it will do that in some unpredictable way - why will the pool be destroyed? Just export again. No?

I have 2 Proxomox VE 8.3 environments currently. Proxmox booting from SATA SSDs with ZFS. Each system two NVMe SSDs passed to TrueNAS. Worked out of the box. I still don’t quite understand the risk.

Thanks and kind regards,
Patrick

Excellent points. The hardware list in my original post is looking at 4U “consumer” cases so avoid that noise. When I speak of noise, I’m basically referring to the jet turbine fans that I understand to be in essentially all refurb 1U/2U servers. My research basically shows that if you’re going the refurb server route, this is unavoidable? I already have eight spinners in a desktop case, and that noise is essentially nothing and I’m comfortable with the power draw. My power concern is more what you were getting at with server grade CPUs. I really appreciate the info on the processor options!

Thank you!

I’m not the knowledgeable one you’re asking about, but the risk as I understand it is say the Proxmox boot drives are suddenly inaccessible and the BIOS decides to look for another boot drive. It grabs the first disk from the HBA and corrupts it, potentially corrupting the ZFS array. For me as an example, I didn’t even consider this issue because I’m a beginner, so the risk of me messing around in Proxmox and accidentally exposing the array outside of the TrueNAS environment is less than zero. It sounds like as long as you know what you’re doing and have experience, passing the HBA to the virtualized NAS should work without issues.

Anyone please feel free to tell me if I’m misunderstanding this issue.

I’m not personally knowledgeable in Proxmox, but there have been quite a few reports here of lost pools in Proxmox virtualisation, some with dubious settings, but some even with proper passed through HBA. @Honeybadger discussed disabling the Proxmox scanning service here

before blacklisting the HBA, as per the link to Proxmox forum appeared to emerge as the preferred solution.
I could not find a more authoritative post here in a quick search.

It would certainly be useful if someone knowledgable could put out a validated Resource.

Simultaneous mounting by Proxmox and TrueNAS corrupts pools beyond repair.

Simultaneously? :scream:

Of course. But with TrueNAS running and proper PCIe pass through the drives are completely out of reach of the host.

Aren’t they? (insert Annakin and Padme meme)

I do feel like I effectively asked a few different questions, so I want to start by asking one to help narrow my upgrade path considerably:

Is it true, that the overwhelming majority of refurb enterprise servers are “very loud” due to the use of small, high-rpm fans?

Basically if this is true, I’d like to focus on purchasing my own components and building into a 4U consumer case. If this is untrue, I’d like to explore purchasing older enterprise rack servers.

Yeah. Somehow Proxmox does it.

Needless to say, once it happens your pool is toast.

This has never been a problem in other hypervisors as they don’t understand ZFS.

It’s possible correct PCIe pass through does resolve, but I’m fairly confident we’ve seen reports of this failure mode even when passing HBAs in.

The best solution seems to be a belt and braces approach of pass-through + blacklisting… but the next time you add a PCIe device… you risk renumbering all your devices…

Refurbished or new doesn’t matter. The race for standard rack mount servers until recently when power consumption became the main focus has been to cramp as much power, memory and storage into a single rack unit.

Because you rented colocation space by the rack unit. And electricity was cheap.

That’s why servers became ridiculously deep. And standard 800 mm racks will not fit many current servers, anymore. We are at 1000 or even 1200 mm rack depth, currently.

And this is combined with high performance reliable 40 mm fans to cool all that stuff in that single unit chassis.

In a data centre noise is irrelevant.

So, yes, most available rack mount systems are ot really fit for a home lab. Unless you live in a house instead of an appartment an can dedicate a room in the cellar or the attic to your IT infrastructure.

But running serious enterprise equipment in an appartment or even a “server room” in an office - in most cases just “no”.

1 Like

Thank you. This is how I understand the environment, and just wanted to ensure I wasn’t dismissing others’ suggestions to go with older enterprise equipment without really ensuring I was doing it based on the right premise.

Still a puzzle … :slight_smile:

These are the devices I am passing through - no special configuration, just used the UI to create the VM:

01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
	Subsystem: Samsung Electronics Co Ltd SSD 970 EVO [144d:a801]
	Kernel driver in use: vfio-pci
	Kernel modules: nvme
02:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983 [144d:a808]
	Subsystem: Samsung Electronics Co Ltd SSD 970 EVO [144d:a801]
	Kernel driver in use: vfio-pci
	Kernel modules: nvme

So after reading the fine manual to be 100% safe I added:

root@pve-ka:~# cat /etc/modprobe.d/vfio-pci.conf 
options vfio-pci ids=144d:a808
blacklist nvme
root@pve-ka:~# update-initramfs -u -k all
root@pve-ka:~# shutdown -r now

Odd thing is: the lspci output is absolutely identical.

I’ll leave it at that for now and just make sure we have not only snapshots on the device but proper backups. :wink:

Improving confidence a bit: lsblk does not show the NVMe SSDs, even when the VM is shut down.

Kind regards,
Patrick

1 Like

I dont know, whether the TrueNAS hypervisor have been improved or not, but I would not use TrueNAS as a hypervisor.
If VMs are planned, I prefer TrueNAS in a VM over Proxmox.
(in that case the pain is mostly the handing over HBA plus drives to TrueNAS. If you use TrueNAS as the hypervisor, you will have problems with memory ballooning, which is buggy, remote desktopping is a pain (For a simple Ubuntu install had to recreate the full VM like 6+ times, and you MUST assign the planned RAM amount fully to the VM since the min/max differing will not work.(they might have fixed this, but I am not sure, since I did not update to the new release yet, it is planned soon though ), ever since TrueNAS has dropped VNC support and moved to spice (it is laggy, buggy and really inconvenient to use, the built in console of Proxmox works really conveniently for me.)