M.2 NVMe Recommendation

As stated above, it will have the rendered data from the MacPro to this NAS and it will also have data which are frequently required such as Footage or any Archives.

Upto 100GbE i would say as there would be 6-8 people accessing the NAS simultaneously. The client end needs to be 25GbE (if supported and feasible) and if not, 10GbE minimum.

Cause, to saturate 100GbE speeds, i would need less amount of PCIe 4.0 NVMe as compared to PCIe 3.0 speeds and PCIe 4.0 has almost double the speed of PCIe 3.0 NVMe.

This is the AIC I’m planning to buy:

If there’s any better option, either in terms of price, cooling and/or performance, I’m down to that.

Oh dang!

Oh, i learnt that lately. But then, i have a question. Why those Tri-Mode Adapters are used by the OEM partners which has U.2 NVMe servers and JBODs?

Indeed. They are damn expensive. Plus, no idea if they have PLX switch or not and whether they need bifurcation or not. I asked HighPoint regarding the Chip model on the AIC i selected but they did not provided me the info. Another sad story ;(

I’m looking to buy Supermicro X12SPM-LN6TF Motherboard and at least 512GB DDR4 ECC RAM paired with the highest possible clock speed CPU for fast SMB performance.

Any particular reason? The thing is Gen4 U.2 drives are not cheap and to fully saturate 100GbE speeds, i would need several U.2 Drives and i don’t have a NVMe backplane chassis. So, that’s another sad thing. But if i even plan to use the U.2 Drives, i have a few questions:

  1. Does U.2 Drives (Intel) has more sustained Read/Write speeds? Including the Random Read/Write.

  2. Let’s say some how i arrange a backplane chassis or a backplane. So, how do i connect it? Cause you already mentioned that its bad to use a Tri-Mode Adapter like LSI 9400-16i as it will add more latency and reduce the performance to an extent.

So, what’s the best way to connect U.2 NVMe?

1 Like

Wow! 25 GbE on the Mac client side should be easy with an Intel XXV710 or Mellanox CX 4/5 card in a MacPro or in a Thunderbolt enclosure.
Mikrotik CRS-510 switch or the like I suppose?

Bescause Broadcom is being very good at locking Dell/HP/Lenovo into the design and business model which suits Broadcom…

Then forget about [edit PCIe] switches anyway.

Higher capacity, better cooling, and you’re looking at data centre drives anyway: Plenty to pick from Kioxia (former Toshiba), Micron, Samsung, Solidigm (former Intel/SK Hynix)…
Without a U.2 backplane (no U.3!), you’d use cables.
First an adapter for your PCIe slot (simple or with retimers for better signal integrity, and then appropriate cables.
Simpler solutions from the PCIe 3.0 generation might be able to work at 4.0 speeds, but I would not take the chance for a professional setting.

Would you consider refurbished NVMe drives?

If it’s on that dimension of 8 people simultaneously accessing it for work, why don’t you just give our friends at iXsystems a call? Assuming it’s a system for yourself they might take it on themselves or refer you to an authorized Partner. They also usually have more detailed performance numbers on their systems and can set them up for you before the system even arrives.

2 Likes

OMG. Have already got a couple of XXV710. As MacPro does not have PCIe 3.0. Have heard a lot of issues regarding Mellanox so went with Intel for now. Having a couple of Chelsio NICs too. I’ll compare what works best and if Chelsio works best, I’ll upgrade all the systems with Chelsio NICs. For the client end, my Chelsio choice would be T6225-CR and for the NAS end, T62100.

Yeah, right! I’ve selected CRS518-16XS-2XQ-RM. The reason is they do not involve any vendor locking and do not have license requirements.

Would it not impact the performance then?

What do you mean? Which Switch are you talking about?

I fully agree with you on this!

I initially thought this but in the old forum, there were lot of people who said, it will impact performance, will have noise, etc. And i was too much afraid so i dropped that plan and since then, i planned to go with the M.2 NVMe.

Also, if I’m not wrong, i can use 4 NVMe (M.2) in an X16 SLOT (CPU Lanes) and that’s max NVMe per slot, having full speed. So, if i get a card like this, i think it will be same as NVMe, and no need of cables either. What do you think?

I trust 10Gtek more than Linkreal. xD

What is difference between the simple cards and retimers? Are the retimers having PLX switch?

Of course, i can go with that if the drives are all healthy, is in good condition and available at a decent price, why not. Are you having those by any chance? :grin:

Oh damn. I never thought about it. I’ll write to them about the requirements and check how they work. I mean a consultation or system building etc.

Thank you for the suggestion!

1 Like

MacPro7,1 (the last Intel model) does have PCIe 3.0 slots. Intel X500, X700 and Mellanox CX4/5/6 series NICs should work out of the box with Ventura or Sonoma, using ixgbe, ixl and mlx5 drivers from Driver Kit (minimal OS version depends on card).
For Chelsio, you’ll have to install the manufacturer’s driver, whose last version was for Catalina. The driver still works, but the signing certificate has expired. With a Hackintosh, it is easy to put the kext driver into the EFI folder and let Clover or OpenCore inject it “under” macOS and its safety checks. With a real Mac, it may be difficult to force an old kext to load.

Broadcom is in the business of extracting maximal value for Broadcom, not in the business of allowing end users to get maximal performance out of their NVMe drives… Otherwise U.3 would just not exist.

In this context, PCIe switches, aka “PEX” or “PLX” chips. Sorry for the confusion.

My links were examples, not necessarily recommendations (though I certainly trust C-Payne).
This adapter might work, but like the equivalent Linkreal product I listed, it is a PCIe 3.0 design and the traces may or may not be capable for PCIe 4.0 signals.
Short and good quality cables should work for PCIe 4.0. And if the wiring is not quite good enough, that’s when you add retimers.
Simple adapters are just traces. Retimers “refresh” the signal to ensure integrity, but do not manage bifurcation. Switches do everything.

Yes, right, right. So, have already the XXV710 but just saying that if it does not work well or performs low than Chelsio (will be testing Chelsio too). So, in that case, I’ll upgrade all systems to the Chelsio NICs.

:joy: :joy: :joy:

I still have no idea why U.3 drive exists. And what’s the difference between U.2 and U.3.

Ah ok. Got it. Not a problem!

Got it, got it!

Oh, so in general, the cards with retimers are more suitable? Is that what you mean?

Hmm. Are remiters also act as a switch?

What about the refurbished U.2 Drives you talked about? Do you have it for selling?

Ironically “AIC” was specifically a formfactor/connection for ssds that was distinct to M.2 and U.2.

I guess what is being referred to here is a carrier card for M.2s.

1 Like

Yes, exactly!

U.3 disks are fine and work fine with U.2 backplanes. U.3 backplanes are a scam, stay away.

1 Like

Mind explaining a bit please?

U.2 drives are pinned for NVMe only.
U.3 drives use pins which are shared with SAS signalling so that a U.3 backplane can take either SAS/SATA or NVMe drives. Drawbacks: U.2 drives are not compatible with U.3 backplanes; U.3 drives on a U.3 backplane have to go through a Tri-Mode controller… which drives them through the SCSI bus as if they were spinning rust.
Conversely, U.3 drives can work with a U.2 backplane.

tl;dr Don’t buy U.3 backplanes or Tri-mode controllers. Use pure SAS backplanes with regular SAS HBAs, and U.2 backplanes with U.2/U.3 drives, directly wired to the CPU (or to a PCIe switch).

Retimers are always better/safer for signal integrity. But they add cost.
Retimers “retime”, that is re-synchronise signals on different wires, correcting for drift if some traces are longer than others. Re-timers do not switch or bifurcate lanes.

I do not have drives to sell. But you may find used U.2/U.3 drives from small ads (the forum at ServeTheHome, or whatever local site may be of relevance) or from professional refurbishers.

4 Likes

@etorix Got it!

Any feedback on the setup? Will it bottleneck or will not be able to fully saturate 100GbE?

Also, I asked whether the Intel U.2 drives have a better IOPS and sustained read/write as compared to any other enterprise/client M.2 NVMe drives?

Also, as this will be in a server chassis, do you think I should install fans on the NICs to keep them cool?

I doubt it. Even with 8 parallel connections at 25 gigabit, I’d doubt that you would get close.

You can get more or less any GEN3/4 U.2 drive from Samsung, Intel, Soldigm or Kioxia. The only thing that’ll really matter is endurance, and depending on your workload may not even matter.

U.2 drives do have various SKUs with varying performance differences but at this point they are generally a commodity. IMHO if you’re not buying a solution and you’re piecing together a system yourself, you’re better off buying used enterprise on eBay or something.

Someone mentioned IXSystems here earlier. The TrueNAS F series is what you are interested in.

What’s the budget of your build here?

1 Like

Even with the NVMe?

Yes, got it!

I would say something like 10K for all the hardware, including SSDs.

Even with NVME. I have 4 Optane 980GB NVME Drives, which are/were some of the best money can buy. I also have a NVDIMM device. CPU is a bit slow for this, but Xeon Silver 4114.

Absolute BEST case scenario, mirrors (no parity calculations), no networking or file sharing involved…just dd’ing some zeroes around…I get 3.6GB/s reads which is about ~30 Gigabit.

Assuming (less than) linear scaling, you’d need bare minimum 12 drives. But we’re bottlenecked by system RAM at that point. Shuffling the data around RAM and PCI-E bus is a hardware problem, with enough of these disks, you are approaching system RAM speeds and it becomes your bottleneck.

Let alone back out through the network stack to your sharing protocol and client side bottlenecks and limitations…

root@prod[/mnt/optane_vm/no_compression]# zpool list -v optane_vm
NAME                                       SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
optane_vm                                 1.73T   441G  1.30T        -         -    54%    24%  1.00x    ONLINE  /mnt
  mirror-0                                 888G   222G   666G        -         -    57%  25.0%      -    ONLINE
    0a3ec38c-61bc-4c92-989e-bb886b22c241   892G      -      -        -         -      -      -      -    ONLINE
    e33c70aa-009f-4032-8ae6-0465156ad9ca   892G      -      -        -         -      -      -      -    ONLINE
  mirror-2                                 888G   219G   669G        -         -    52%  24.6%      -    ONLINE
    8a08efe1-8f61-4577-9180-0ed287029443   892G      -      -        -         -      -      -      -    ONLINE
    a9db22ec-c458-4e1e-b1f9-45b8667e07ec   892G      -      -        -         -      -      -      -    ONLINE
logs                                          -      -      -        -         -      -      -      -         -
  d596d454-233f-4667-b258-ddd435b9daf8    16.0G  9.90M  15.5G        -         -     0%  0.06%      -    ONLINE

root@prod[/mnt/optane_vm/no_compression]# smartctl -a /dev/nvme1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.6.16-production+truenas] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       INTEL SSDPE21D960GA
Serial Number:                      PHM2913000QM960CGN
Firmware Version:                   E2010480
PCI Vendor/Subsystem ID:            0x8086
IEEE OUI Identifier:                0x5cd2e4
Controller ID:                      0
NVMe Version:                       <1.2
Number of Namespaces:               1
Namespace 1 Size/Capacity:          960,197,124,096 [960 GB]
Namespace 1 Formatted LBA Size:     512
Local Time is:                      Wed Apr 24 02:42:18 2024 EDT
Firmware Updates (0x02):            1 Slot
Optional Admin Commands (0x0007):   Security Format Frmw_DL
Optional NVM Commands (0x0006):     Wr_Unc DS_Mngmt
Log Page Attributes (0x0a):         Cmd_Eff_Lg Telmtry_Lg
Maximum Data Transfer Size:         32 Pages

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +    18.00W       -        -    0  0  0  0        0       0

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        47 Celsius
Available Spare:                    100%
Available Spare Threshold:          0%
Percentage Used:                    0%
Data Units Read:                    91,245,424 [46.7 TB]
Data Units Written:                 256,488,872 [131 TB]
Host Read Commands:                 1,452,241,049
Host Write Commands:                5,014,963,382
Controller Busy Time:               1,645
Power Cycles:                       85
Power On Hours:                     12,103
Unsafe Shutdowns:                   23
Media and Data Integrity Errors:    0
Error Information Log Entries:      5

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0          5     7  0xa180  0xc008  0x000     69872217     1     -  Data Transfer Error
  1          4     1       -  0xc00c      -            0     -     -  Internal Error
  2          3    16       -  0xc00c      -            0     -     -  Internal Error
  3          2    13  0x6000  0x4008  0x000   1665700056     1     -  Data Transfer Error
  4          1     1  0x0011  0x4008  0x000          576     1     -  Data Transfer Error

Self-tests not supported
root@prod[/mnt/optane_vm/no_compression]# time dd if=/dev/zero of=./zeros bs=1M count=50000
50000+0 records in
50000+0 records out
52428800000 bytes (52 GB, 49 GiB) copied, 52.3619 s, 1.0 GB/s
dd if=/dev/zero of=./zeros bs=1M count=50000  0.13s user 42.00s system 80% cpu 52.377 total

root@prod[/mnt/optane_vm/no_compression]# time dd if=/dev/zero of=./zeros bs=1M count=50000
50000+0 records in
50000+0 records out
52428800000 bytes (52 GB, 49 GiB) copied, 57.9322 s, 905 MB/s
dd if=/dev/zero of=./zeros bs=1M count=50000  0.15s user 48.97s system 84% cpu 57.935 total

root@prod[/mnt/optane_vm/no_compression]# time dd if=./zeros of=/dev/null bs=1M count=50000
50000+0 records in
50000+0 records out
52428800000 bytes (52 GB, 49 GiB) copied, 14.3996 s, 3.6 GB/s
dd if=./zeros of=/dev/null bs=1M count=50000  0.10s user 14.29s system 99% cpu 14.403 total

root@prod[/mnt/optane_vm/no_compression]# time dd if=./zeros of=/dev/null bs=1M count=50000
50000+0 records in
50000+0 records out
52428800000 bytes (52 GB, 49 GiB) copied, 14.5187 s, 3.6 GB/s
dd if=./zeros of=/dev/null bs=1M count=50000  0.13s user 14.39s system 99% cpu 14.521 total

OMG. I’m scared now. Seems like you’re using the Intel Optane 905P 960GB variant. As per the datasheet, it has a Read speed of 2600MB/s and Write speed up to 2200MB/s. I know Optanes are the fastest, next to the RAM but they’re still PCIe 3.0 based, hence low Read/Write speeds. That’s the reason, going for Gen4 NVMe. Either M.2 or U.2. This way, i need less NVMe and it will offer more speeds.

What is exactly an NVDIMM device? Are you referring to Optane memory?

What’s the current memory configuration for the system you provided the test results?
Also, what’s the NIC?

Thank you for sharing the results. I really wanted to see what these Optane Drives perform!

Probably not going to make much difference, not unless you go to Gen5.

256GB DDR4-2666, which IIRC is standard JDEC for the time.

The only way you might be able to get what you want in performance would probably be Threadripper.But then you’re probably going to be over budget.

AMD Introduces New AMD Ryzen Threadripper 7000 Series Processors and Ryzen Threadripper PRO 7000 WX-Series Processors for the Ultimate Workstation
Pro WS WRX90E-SAGE SE|Motherboards|ASUS USA

16GB (x72, ECC, SR) 288-Pin DDR4 Nonvolatile RDIMM (micron.com)

Gotcha!

In my experience homelabbing, the thing that matters to me most (especially for VM’s) is sustained sync write speeds and for those sustained sync writes, what matters far far more than anything else is PLP over marketing gimmicks like PCIe 4.0 etc. In general, any enterprise (even SATA) SSD’s will smoke any consumer level SSD’s in this category. In fact, some of the ultra cheap QLC ones can even perform worse than high-end HDD once the cache runs out (ie. sustained writes).

1 Like