Planning a new build with spinning drives that try to be quiet and reducing random IO on HDDs

The intent is to build a quiet NAS to provide around 80TB or up-to 96TB storage space and capable of running 10-56Gbe network with features like snapshot, compression, and de-duplication.

The machine has to be placed only 3-5 meters away from my desk in a rather open space, and is not far enough from bedroom, so keeping the noise level down is a major challenge.

NVMe disks are silent and fast, but the per TB cost is too much for me to bear.

So this build will be a compromise between quietness and cost. I’ll try to eliminate or at least reduce its intensity of random I/Os especially on small files and hot-data, using sVDEV, L2ARC, and SLOG and sizing them adequately and cost-effectively.

With raidz2 to build around 80TB space, we can use either

  • 7 x 16TB HDD
  • 12 x 8TB HDD

Chassis:
The choice of the chassis is Fractal Design Define 7(or XL), it’s built with thick metal plates, has a front door, side and top panels are covered with sound absorbing materials. And has enough front bays for all the HDDs, and supporting 2.5 SSD/U2 drives to be cooled by the front fans. It definitely can reduce some noise, but I’d rather not be optimistic.

In addition, it has slots for vertical GPU mount, which can be used to install a carrier for 3x80 or 2x120mm fans blowing air down to the PCIe region, easier for cooling of the HBA, NIC, direct-mount NVMe drives, and quieter than attaching multiple small 40mm fans to them. Unsure about its effectiveness, worth a try though.

The only concern I’m about this chassis is the rather shaky-looking 3 points mount of the drive bays, i.e. the 8 o’clock corner is dangling. But this chassis has been widely used for storage, I guess it is stable enough and won’t cause damage to the spinning.

Further I’ve heard talking that the XL has better quality than the non-XL.

So if you knows this case, I’d love to hear your valuable advice.

HDDs:
The much quiet designated NAS HDDs like iron wolf pro are over 1.5-2 times the price of normal drives and not very afforable.

The 8TB disks like Seagate 7e10 or WD HA340 are both less noisy than say Seagate exo x18 16T, but definitely are hear-able(random IO test resulting in about 50+dB). So it’s not known that whether 7x16T are louder, or 12 less noisy 8T could potentially resonate and causing a larger noise. Further, 16T are helium sealed so it has a higher decibel but deep and low-pitched sounds that may be better absorbed by the chassis and less annoying midnight.

Further, knowing that

  • the major noise comes from intensive random IO,
  • idle and sequential IO are less noticeable,
  • these disks inevitably clicking periodically, and little can be done

therefore, the goal is to reduce intensive random IO as much as possible, which happens to be the goal of achieving high throughput.

The usage:

  • about 50TB out of the 80TB are greater than 1GB large and rather cold media files.
    these files alone are sequential.
  • about 10TB are medium files within the range of 1+MB and far less than a GB,
    including packages, documents, photos, etc.,
    and after awhile of usage, a large portion can sit in L2ARC.
  • 1 or 2 TB of hot VM images, virtual disks, etc.,
    the one in use is definitely in the L2ARC; if not R/W are likely to be sequential.
  • less than 1TB of tens of thousands of files less than 1M that can be stored
    in the NVMe sVDEVs.

by “small files” we specifically mean these files that are stored by the sVDEVs;

then “medium files” are those files that are larger than the sVDEV threshold, but likely be able to stay in the L2ARC, also the number of medium files can be large, and generate some random I/Os;

“large files” are too big to always stay in the L2ARC and likely be evicted, read or write a large file are often sequential operators.

The NAS itself will has one or two VMs with docker that running very few light-weight services, not expect to use the nas as a hypervisor; Then there will be hypervisor machine with 5/10 VMs that connect to this nas, only a few have short burst of heavy I/O and they are unlikely to be highly concurrent; Then a workstation, three Macs and are often not used the same time.

Although, I’ll install a 56Gbe NIC, I’m not expecting this machine to saturate the bandwidth, as long as it can run in 10Gbe, the higher bandwidth is left for reads hitting ARC/L2ARC and small files that can be directly write to the NVMe sVDEV. May be later adding a small pool of 3/4 NVMes.

Cases that can have random I/O:

  1. metadata lookup like rsync causes heavy random IO,
    easily mitigated using sVDEVs which store all the metadata
  2. majority of random reads of small files can be handled by pre-heated ARC/L2ARC,
    by find . -size 10M -exec wc for instance.
  3. majority of random write to small files can be handled by a large sVDEV and a large small file threshold adjustable per dataset, e.g. store the entire source code and git repo datasets in the sVDEV.
  4. medium files like VM images, applications in use are often in the L2ARC
  5. ! concurrent read of many large files can cause the hard-disk arm moves rapidly between these file locations, similar to random I/O.
    this cannot be mitigated
  6. ! a number of concurrent writes to large files that have to be committed to the HDDs;
    likely cannot be mitigated.
  7. ? a number of concurrent writes to medium files that needs be committed to the HDDs;
    I’m not sure how zfs handles the writes, if zfs serialize these medium files and then write to the HDDs in a sequential fashion, then the noise is low. Otherwise, it’s similar to random writes.
  8. ! zfs routine housekeeping like scrubs, esp. during midnight, can cause high random w/r,
    we could re-schedule it to run in the daylight, but it cannot be eliminated.
  9. TBD

A summary of a primitive build:

  • M.board: Supermicro X11SPL-i
  • CPU: Xeon 2nd Gen 6230 20 cores, 2.1-3.9 with 27.5M L3 and only 125W TDP
  • Memory: total 512GB, from 8 x 64GB 2666MHz or 2400Mhz DDR4 RDIMM ECC
  • HBA: Broadcom LSI 9305-16i PCIe3.0x8 (which has only a single chip compared to 9300-16i)
  • NIC: Mellanox dual port 40/56Gbe MCX354A-FCBT PCIe3.0x8
  • NIC: optional Mellanox dual port 10/25Gbe MCX4121A-ACAT PCIe3.0x8
  • NIC: optional dual port 10Gbe X550-T2 PCIe3.0x4
  • CPU Cooler: Noctua 3647 like U12S DX-3647 should be more than enough
  • Chassis: Fractal Design Define 7(or XL)
  • PSU: 850W Platium (model TBD, probably some Seasonic model)
  • UPS: SANTAK UPS TG-BOX850 850VA/510W, 200W~10min, 500W~5min
  • HDD: 7 x 16TB or 12 x 8TB
  • boot drive: mirroed
    • mboard SATADOM: innodisk 128G SATADOM-ML 3IE2-P 8pin compatible with supermicro superdom
    • mboard PCH SATA 0: Micron BX500 240G
  • L2ARC: [TBD] 1 x U.2 of 4/8TB likely buy used
  • sVDEVs: 2 x 2T U.2 of different brand mirrored, connected to different PCIe adapters
    • [TBD] 2T U.2
    • [TBD] 2T U.2
  • slog: 1 or 2 x 120GB optane [model TBD] m.2 or U.2 or AIC PCIe mount
    • one m.2 can be installed using the board PCH m.2 slot
    • one U.2 can install using an adaptor on the PCH PCIe3.0x4 slot
    • AIC PCIe mount can install into the PCH PCIe3.0x4 slot
    • ? is it critical that the PCH m.2/PCIe slot increased latency?

notice the cpu only provide 6 memory channels in 2933MHz, but as per the x11spl-i board manual, installing 8 memory sticks will still run in 2666Mhz.

de-duplication of 80T storage at least require 5 x 80 = 400GB memory, 512G memory is indeed too tight, but single stick of 128G is 3-4 times the price of the 64G stick. Therefore, if the dedup. performance drops too much due to the low memory, I’ll have to turn this feature off. However, as zfs is online dedup. (is it eager?) then it may reduce random I/O in some scenarios, for instance L2ARC can cache more files in higher density, and some concurrent or random write are absorbed, for example, when multiple VM from the hypervisor operating on the same datasets, and writing backups sharing high similarities. If this turn out to be effective, then upgrading to 128G single stick is a cost worth paying, 1T is the max memory this cpu can support.

metadata consumes about at least 0.3% of total pool capacity, e.g. a 96T pool, the metadata is about 280G. To be safe, we could go 500G. The default svdev metadata and small file ratio is 25/75, for a 2TB svdev, 25% is 500G, and leave about 1.5T for small files.

4T L2ARC is 1:8 ratio to 512G main memory, with about 88GB footprint in ARC if all blocks in the L2ARC are 4KiB blocks.

stick[GB]  N      mem[GB]   :8[TB]     meta[GB]   :16[TB]    meta[GB]  
64         8      512       4.096      88.0       8.192      176.0     

the motherboard layout:

the slot7 is the closest to the cpu. slot1-4 should underneath the fans that mount to the chassis vertical GPU slots

two mirrored sVDEVs U2 should be installed via different PCIe slots, in case the failure of one of the adapters.

x11spl-f

PCH 2 x 1Gbe RJ45                          
2 x satadom     satadom#0: boot
PCH 8 sata      sata#0: boot 2.5 ssd, the rest are not used
PCH m2 pcie3.0x4    TBD

slot7 CPU PCIe 3.0 x8,              TBD
slot6 CPU PCIe 3.0 x8 (in x16),     2 x PCIe to U2 cable
slot5 CPU PCIe 3.0 x8,              2 x PCIe to U2 cable
slot4 CPU PCIe 3.0 x8 (in x16),     dual port 56G NIC
slot3 CPU PCIe 3.0 x8,              2 x directly attached NVMe, m.2 or U2
slot2 CPU PCIe 3.0 x8,              HBA
slot1 PCH PCIe 3.0 x4 (in x8)       dual 10G NIC

Appendix:

just for a rough estimation

                raidz memory and l2arc sizing table

every data block in the L2ARC, the primary ARC needs an 88-byte entry.
we assume the l2arc is filled with 4k blocks, which is the extreme case

recommended ratio is 1:4/5/8/10 and very large memory(TB) can go to 1:16/20
we only list 1:4/8/16 for quick reference.

stick[GB]  N      mem[GB]    :4[TB]     meta[GB]   :8[TB]     meta[GB]   :16[TB]    meta[GB]  
16         2      32         0.128      2.75       0.256      5.5        0.512      11.0      
32         2      64         0.256      5.5        0.512      11.0       1.024      22.0      
64         2      128        0.512      11.0       1.024      22.0       2.048      44.0      
128        2      256        1.024      22.0       2.048      44.0       4.096      88.0      

16         4      64         0.256      5.5        0.512      11.0       1.024      22.0      
32         4      128        0.512      11.0       1.024      22.0       2.048      44.0      
64         4      256        1.024      22.0       2.048      44.0       4.096      88.0      
128        4      512        2.048      44.0       4.096      88.0       8.192      176.0     

16         6      96         0.384      8.25       0.768      16.5       1.536      33.0      
32         6      192        0.768      16.5       1.536      33.0       3.072      66.0      
64         6      384        1.536      33.0       3.072      66.0       6.144      132.0     
128        6      768        3.072      66.0       6.144      132.0      12.288     264.0     

16         8      128        0.512      11.0       1.024      22.0       2.048      44.0      
32         8      256        1.024      22.0       2.048      44.0       4.096      88.0      
64         8      512        2.048      44.0       4.096      88.0       8.192      176.0     
128        8      1024       4.096      88.0       8.192      176.0      16.384     352.0     

16         12     192        0.768      16.5       1.536      33.0       3.072      66.0      
32         12     384        1.536      33.0       3.072      66.0       6.144      132.0     
64         12     768        3.072      66.0       6.144      132.0      12.288     264.0     
128        12     1536       6.144      132.0      12.288     264.0      24.576     528.0     

16         16     256        1.024      22.0       2.048      44.0       4.096      88.0      
32         16     512        2.048      44.0       4.096      88.0       8.192      176.0     
64         16     1024       4.096      88.0       8.192      176.0      16.384     352.0     
128        16     2048       8.192      176.0      16.384     352.0      32.768     704.0

Noise is related the number of drives, so I’d rather aim for 6*20 TB in raidz2. (A Node 304 is the limit of what I personally regard as “quiet enough”. YMMV.)
No HBA.

Best avoided in a home NAS. 512 GB RAM? Really?

In this size, I prefer the Nanoxia Deep Silence 8 Pro: It already comes with all the (tool-less!) drive trays, and is much easier to wire up than a Define 7 in “storage layout”.

1 Like

The biggest reduction in noise will be by avoiding seeks, and the biggest reduction in seeks will be by having large memory.

I don’t normally recommend any sort of specialised vDevs, but in this case:

  • svDevs might make sense by reducing seeks by storing both metadata and small files
  • L2ARC you would have to experiment to see if it actually helps (given that you have an svDev)
  • synchronous writes (and file system syncs after async writes) can use the ZIL, and ZIL use is frequent small writes - so definitely likely to cause seeks and an SLOG will eliminate them
  • dedup has a TERRIBLE reputation and cannot be removed once you include it - think twice about this - as an alternative you can dedup by a script that stores file sizes and hashes in an sqlite database and when these match compares contents - and if the files are identical then it can do a block-clone copy to overwrite one file by the original saving all the blocks. No performance overhead or memory required. And you can run it incrementally to check only new files.

So this does seem like an opportunity to use the specialised vDevs. However a couple more points:

  • Given the cost of mitigating the noise, would it be cheaper to build a small IT cabin out-back to house your server? Or perhaps just build the server into a soundproof box?

  • You might also want to look at the ZFS tuneable parameters to see how they might help reduce seek noise (e.g. by extending toe TXG time from 5s to (say) 60s.

  • You might want to research which manufacturer’s drives to use based on noise levels - and whether there are firmware settings to reduce seek noise at the expense of some performance.

  • Most noise IMO is from fans and airflow - and that is due to heat - so minimise power usage (because that ends up as heat) and think carefully about both airflow and the typeof fans you use.

1 Like

OK, my recoomendations:

  • Deduplication: You should split up your drive space and create a separate dataset for all the data that MIGHT need deduplication. (Movies, and music are NOT this category!) So like use 10 TB for backups with dedup enabled and 70TB for multimedia files without deduplication. This way, you can easily abandon your plan for 512GB of RAM, and spend that money on HDDs. (You dont need 400GB RAM for dedup data, but only 10 GB, and this means, if you dont run any VMs you can use 32-64GB and if you use VMs then like 32-64GB per VM. This will save you hundreds of dollars on your budget.
  • HDDs: I also recommend to move to use less number of bigger HDDs. ( I see from your plan that money is not a hard object for you) So, like even go for a 5x22TB for RaidZ1 (4 data + 1 redundancy) or 6x22TB for RAIDZ2 ( 4 data + 2 redundancy). I write down again, PLEASE AVOID SMR HDDs at any cost! They are however being cheaper, are highly not recommended for ZFS! Also, with this lower number of drives you dont have to buy any HBA. That will free up PCIe slots and a huge cost! With 6 drives, you might even be able to get away with the sATA ports on the MoBo (I did not check, how many this MoBO has, but I suspect it has at least 6.)
  • Noise: Please be aware that TrueNAS WILL set your fan speeds to 100% by default, overriding even the BIOS control mechanism. This means, that your machine needs 140mm fans, to get lower noise level. (If you are lucky, you MIGHT be able to override this behavior through the IPMI interface of your MoBo. Otherwise, you cannot do anything to eliminate this mechanism by any SW means, it is done by purpose by the developers. So you must do some HW solutions to override this SW behavior.)
  • In general, I think, this requirement with the need of a silent machine will be a challenge. You should consider to put the machine into some kind of box or furniture to further reduce the audible noise level. And I recommend to pick a case, that can use 140mm fans.
1 Like

I personally have a Node 304 with 6x 22TB drives. This does actually deliver 80TB of usable space.

It’s in my living room.

After Noctuaizing, the loudest part is the CPU Fan, which if it bothered me, I’d replace with a duct and noctua fan.

1 Like

Why? Is this some kind of a dorm? Ethernet over fiber can span several 100m without any problems. I solved my problems by putting my noise sources down into the basement. My work PC is fanless and noiseless (zero dB).

1 Like