I have been taking a look at my servers and thinking I may move some things around. I currently have my setup (full details below) with a dual socket EPYC system and it runs well but I am thinking of expanding my storage and taking the opportunity to shuffle some things. I have a HP DL580 G9 and its one standout feature is that it supports a massive amount of ram. I had originally used it as a gpu server but it was replaced with a newer EPYC based server so it is in kind of an odd spot. Currently equipped it has 2TB of ram but the memory is limited by the jordan creek controler to 1333 MHz. While much slower than my current systems 3200MHz it has 4X the capacity installed and the ability to scale out past 6TB if ram prices come back to earth.
So the idea is maybe moving my storage over to this server for the extra ram capacity over the speed as I don’t really feel that the ARC needs that much speed over the shear volume that the server could provide.
The goal with the expansion would be to end up at ~1.5PB of storage but it would be a slower build up rather than a day one implementation.
Also the power is negligible. I know on paper the DL580 is much more inefficient but in an apples to apples the DL580 is within 200w of the current truenass EPYC server with most of the power budget being from the ram and memory controllers over the cpu’s.
What is your current server used for and what are your ARC statistics? You can run ``arc_summary`` to get an idea. It’s really hard to give any feedback without a lot details in your post as to how all your servers are used and what ‘problem’ you are trying to solve by scaling up RAM and ARC. We can’t guess as to your current bottle necks.
Less that I am trying to solve a “problem” and more of a question about arc density vs speed.
The main use case for my server is file storage. Lots of cold data but a good chunk of hot data. For hot data I have my video editing work flow with my pro rest raw files. These projects are usually about a TB but are compute bound on the editing side. The other source of hot data are llm’s. These vary drastically in size and get spun up and down based on testing needs. The top end on size is about 100GB but is definitely more sensitive to transfer speed as I need to swap them in and out repeatedly for comparison testing.
My gut tells me that really the size vs speed of arc doesn’t make much difference but I was curious what others thought.
What are you seeing for your current arc_summary over the standard workflows? I could see it helping the LLM if that only does a lot of repeated reading of the dataset once loaded but you would have to figure out if the rest of your workflows could be helped by planning for better reading, writing and IOPS from your pools.
Have you looked at if SLOG devices would help smooth out sync writes to the HD pools? Are you getting good use of your L2ARC? Look at the arc_summary and see if that is helpful. L2ARC can also be set for metadata use with the advantage that it can be added and removed from pools without having to recreate the pools / VDEVs, unlike SVDEVs used just for metadata. You should be able to find some info on L2ARC and metadata by searching the forum.