putting stuff in a proliant dl325
I started with the HP Proliant DL325 for about $1800. (Prices seem to vary, everything is approximate.) This line is, according to some HP advertising copy, optimized for fast nvme storage. Sounds perfect for me. I’ve got, like, four sqlite databases over here and I really need them to vroom.
Also, several words of caution. I’m coming at this from the perspective of some dude flying yolo. HP would probably prefer you buy all the parts direct from them. And pay HP prices. If you’re the type to tell people to always obey the manufacturer, maybe stop reading.
The CPU is the 7402P model of Epyc. I knew I wanted something in the 7002 line, which includes improvements to the Zen 2 core, and also, for Epyc and Threadripper, reorganizes the chiplets to eliminate some of the NUMA. It has 24 cores, which is more than I will ever know what to do with, but it was only slightly more expensive than the 16 core 7302P. I figure, provision like it’s 16 cores, and get an insurance policy just in case. Moving to 32 cores or beyond increases the cost substantially. (If you fill the thing up, the added cost is probably inconsequential, but for me, the next step up in CPU is about equal to total system cost.)
On the subject of NUMA, the L3 cache still isn’t unified, so it’s not a completely flat 24 core system. Some more details on the consequences of that here on the vSphere scheduler. (And also this post.)
By default, the system is in a power efficient mode that keeps the CPU clocked down. Unlike most systems I’m familiar with, it doesn’t clock up immediately. And so a short benchmark like md5 -t takes 0.35 seconds, 10x takes 2.4 seconds, and finally 100x takes 19.0 seconds. The explanation is that most loads are ephemeral and so instead of racing to idle, it’s better to keep clocks, voltage and power low. Squares, cubes, math stuff. This isn’t ideal for latency, however. There’s about a dozen options in the BIOS, each of which tunes about a dozen other dials, which you can spend a few hours investigating.
The CPU has eight memory channels, which is quite a lot. HP naturally insists you fill them all with variously dire warnings. Of course, they ship the system in a danger configuration, with only two DIMMs. I got 2x 32GB for 64GB total. Adding two more brings me to 128GB, and I’m not sure I can find the motivation to imagine I need 256GB.
Due to a quirk of memory pricing, 32GB DIMMs are barely more than 16GB? Not even twice the price of 8GB. I looked for a second at tossing the 64GB and actually filling all 8 channels, but that’d be criminally wasteful. It was cheaper by far to upgrade to 128GB with 2x 32GB than to swap to 8x 8GB. I’ll eat the tiny hit from leaving channels idle.
For reference, it shipped with Samsung M393A4K40CB2-CVFBY DIMMs. (Thank you HP for at least not peeling the sticker off.) If you search, it’s common to find that part with just a CVF suffix (2933 MHz). The BY appears to mean it has an extra buffer?
Alright, finally coming to the fun part. If you like off book experimentation anyway.
The Epyc CPU has a fantastic 128 PCI lanes. So you can load up with tons of peripherals and GPUs and network cards and whatnot, and still have a few dozen left over. What to do with them, except attach piles of nvme storage direct to the CPU? That is indeed how HP advertises this system. 10x nvme drives with no lane sharing or switching? Yes, please. I can stripe my sqlite databases across two drives each.
Except, near as I can tell, HP doesn’t sell the system in that configuration to regular peons. You need to know the magic knock to get into the secret club and let the sales rep bribe you with steak and martinis first. All in all, doesn’t sound that bad, but I prefer to buy stuff by just clicking a button. All the nvme ports are there on my motherboard, but the drive backplane is wired for SAS/SATA drives.
There’s a RAID card with 2GB cache and battery included. I’m kinda at the point where hardware RAID may be a net negative, as likely to fail and eat your data as anything else, but there it is. The cache is configured to 90% write cache, so if you stripe across some SATA (oh, I mean SAS, good heavens) SSDs you can probably get decent performance.
I wasn’t quite ready to give up on the dream of nvme, and fortunately there are other options. There are PCI slots in the back. For $10, you can get a cheap converter. For $60, you can get this ASUS 4x nvme adapter. For the price, this is quite the card. The heat sink is nice and heavy, and on the whole well constructed. The only issue, initially, is that the screws holding the PCB to the PCI bracket extend a little too far, and snag on the HP case. Easy to swap.
So the new plan is maybe a boot drive in the front for hypervisor, another drive or two dedicated to block storage for OS guests, maybe a few more for bulk data, and some nvme drives riding in the back for fast data storage. Mullet storage. Business in the front, speed in the back. (HP sells a two port M.2 adapter as well, but it’s SATA only. Totally missing the point.)
Load the expander up with 3 M.2 nvme drives (didn’t have a fourth handy at the time of testing). And at first only one drive appears. Not unexpected, you have to bifurcate the x16 slot. The HP BIOS has two settings for bifurcation, auto and bifurcate. Auto seems like a strange way to spell off. Flip that to bifurcate and reboot, and... hangs in POST. Shortly after memory test, while configuring devices. Somebody doesn’t like my new card as much as I do.
The lazy answer is unsupported hardware is unsupported, but I do think that’s really lazy. I don’t expect to receive a lot of assistance from HP for unqualified hardware, but I still expect it to work. And worst case, hanging in the BIOS probe is just shitty engineering. At this point, you’re locked out of the BIOS and can’t change the setting back, either, requiring you remove the card. Imagine accidentally doing this remotely. (I’ve never been impressed with HP software, from any division, for any product, at any level. This system also spits out ACPI errors with some frequency because the AML is busted.)
So my dream is dead, but fortunately in the time it took for all these parts to arrive, a new dream was hatched. I found some 6.4TB Intel nvme drives on ebay for around $700. Lightly used but no longer loved. This does kinda mess up my plan because now I have 1TB of SATA and 6.4TB of nvme and I’m going to run out of bulk storage before fast storage, but such is life. Also, if you want to go mad scientist, there’s U.2 drive to PCI slot adapters. Just screw the drive right onto the card and slide it in.
At this point, though, I gave in and crawled back to HP. There’s a two drive nvme enablement kit which is never in stock, but you just buy it anyway, and eventually the elves will see your order, and you get it when you get it. So now everything lives in the front, like a proper grown up server. And yes, for $250 list, you get a nail file sized PCB with no active components and about two feet of cabling. All for the privilege of having two nvme drives.
Unfortunately, I was unable to find a reason to believe I needed to install some Optane cards, so that went untested.
HP likes to talk about the great configurability of this system. It’s mostly false flexibility though.
With the 7002 series systems, they removed the ethernet ports from the motherboard, moving them to the network flex module. First of all, this is a PCI slot by any other name, except you can only put HP networking cards in it. It lives under big x16 slot, but if it weren’t there, that space could have been a dual slot riser. Second, if you want 10G ports, now you have to remove the 4x 1G ports. (The space where the 4x ports used to live on the motherboard is permanently vacant now.)
Similarly, the RAID card blocks the third PCI riser. But because it’s not a standard form factor, good luck using it or selling it.
Obviously depends on peripheral load, but my system idles at about 80W to 90W. Add about 5-6W per active core. So make -j 6 pushed it to 110W from the socket. I think that’s pretty good.
It’s kinda noisy booting up, but once the system settles down, the idle fans are fairly quiet. I would not want to live with it, and will be happy to have it out of my house, but unlike some systems, it’s entirely reasonable to do software setup in person, then rack it later.
HP allows remote access via iLO, but they charge a few hundred dollars extra for remote console post POST. Really? Fortunately, I guess, the lockout mechanism is imperfect, and you can use the HTML5 console for a minute or two before it tells you no and quits. Then just start it again for another minute.
HP also sells a 2U model of this server, the DL385, which comes with two CPU sockets. Dell and Lenovo also sell very similar servers, in both 1U and 2U options.
I picked the DL325 because, uh well, it was the cheapest by far. If money were no object, I quite like the look of the Lenovo system. And their web store will let literally anybody configure it for all nvme. But I’m not planning on buying one of each for the shootout.
This section is mostly just idle musing.
Of course, you can always build your own instead of buying factory built. That’s kinda how I started down this road, even. My Ryzen build was working great, let’s make another and rack it. For $650, you can get a 1U Ryzen barebones server. But no CPU or RAM. By the time you add a 3950x (if you can find one) and fill it in, it’s pretty close to similar in budget. Also, didn’t find a 1U AM4 heatsink I felt comfortable with. There’s simply not much market for such.
Moving up a step, the Threadripper does have the same physical socket as Epyc. How about a 3970X server? Well... I was looking at this picture of a high end Epyc server and noticed, it has two heatsinks. There’s one on the CPU, and then a heatpipe connecting it to another. Where do you buy those? And that’s just Epyc, not even Threadripper, which can throw off even more heat.
The reason I’d even consider such a build is the much, much higher peak frequency. If the system is mostly idle, heat won’t actually be much of a problem. And thus you can run those md5 benchmarks, er I mean TLS handshakes, much quicker. Worst case, there’s always thermal throttling when you find yourself running a continuous load, though your CPU may not like living at a fixed 90C.
The downside here is again, cost. The DL325 I got for $1800 is possibly below parts cost? ATM, newegg has 3960X for $1800. CPU alone. And while I have my complaints about how HP put some things together, the system integration is pretty solid, with better fit than I’d get with boxed parts.
It’s still sitting on my dining room table, so I can’t say too much about performance or long term reliability. For value, it’s really hard to beat used gear off ebay. For a new server, though, assuming you can find it below list price, the DL325 looks pretty good. This comes with the caveat that you have a couple thousand to spend, but not too many thousands to spend. For real deal serious business, I might move up to other options.