5x / 6x / 8x SATA M.2 PCIe Cards

HoneyBadger · April 25, 2024, 7:03pm

Allow me to jump in here.

We can certainly encourage/discourage certain component choices, but ultimately each user is responsible for their own decisions, their own level of acceptable risk tolerance, and their desired system form factor/budget.

It’s important to note that M.2 is only a form factor decision - bite-sized PCIe is still PCIe - but the crux of the problem rests on three major issues:

The chipset used to provide SATA connectivity (including driver support)
Manufacturing quality of the device itself
Presence or absence of a port multiplier

While SCALE has expanded the playing field relative to CORE (where “Just Buy LSI” was the rhyme of reason) - there are still definitely “tiers” of driver support and functionality within the Linux kernel, and the vast majority of M.2 SATA controllers use a “lower tier” chipset. Intel and AMD are preferred over all others, Marvell and ASMedia share the next space, and JMicron brings up the rear (but has made improvements recently.)

A poorly manufactured or improperly cooled device can of course contribute to instability - whether the chip responds by temporarily throttling commands or running to thermal failure though is another question. M.2 slots traditionally do not get the same quantity of airflow as traditional PCIe, and this is exacerbated by the “cable spaghetti” that comes from these cards. Prefer cards with heatsinks, if possible.

Finally, port multipliers - the thorniest piece of this argument. Quite simply, they are to be avoided for multiple reasons.

The first is the bandwidth argument - the most common multipliers are 1:5 - so one SATA port worth of bandwidth being spanned across five downstream devices. That means your 600MB/s of theoretical SATA3 bandwidth is cut into 120MB/s per device - a definite limitation for SATA SSDs, but it can also be a limiter on sequential I/O to spinning disks. With sequential resilver and scrubbing, this is a very real thing to hit.

The second is the nature of the port multiplier itself - if it’s using “Command-Based Switching” then you can basically have only a single command queued against any of the devices behind the port multiplier, and all others are blocked. This means that your performance tanks even more than the bandwidth cap, because it’s like having a common media akin to a “hub” vs a “switch” in the old networking sense. If it’s using FIS-based (hardware) switching, then it’s able to queue up against multiple devices, but it’s still sharing the bandwidth.

And finally, if a single device behind a SATA port multiplier hangs up or fails to respond to a command, it could (“will” in the command-based switching model) cause all devices behind the PM to be non-responsive. Not exactly a good thing as RAIDZ6 doesn’t yet exist.