Hi there, I’m running my TrueNAS Scale in Proxmox with one all-HDD pool and one all-SSD pool. The HDD pool has been running great with dozens of terabytes in reads and writes so far.
However, the SSD pool continues to behave extremely poorly. After about 1 or 2 minutes of reading or writing, the speed drops from about 250MB/s to 300KB/s and does not recover.
I have six HDDs and four SSDs in their respective pools. The SSDs are currently in raidz2 (No, this is not necessary at all. This is not what the topic is about.)
My six HDDs are currently direct SATA to the motherboard and the SSDs run via my LSI 9300-16i HBA. I’ve tried different arrangements, such as half/half via motherboard and HBA, but I’ve seen no difference in performance.
Frankly, I’m at a loss. I don’t know what to try or reconfigure anymore. I’d love to hear what things I could try or metrics I should look at to try and figure this out.
What makes you think we are going to guess anything without even knowing what these drives are?
Well, if you’re using this SSD pool for iSCSI (we don’t know either), raidz2 is an extremely poor choice—and this IS straight on the performance topic.
Good recipe for losing your data, if not done properly.
The last sentence suggests drive passthrough from Proxmox rather than passing though the controllers.
Here I noticed one of my CPU cores was running at 100% iowait once I started writing data to the pool. Scrolling down further to individual drives I found one drive standing out:
After physically unplugging this SSD write speeds immediately returned to normal.
I’ve tried swapping which ports the SSDs were connected to but the problem persist so long as this individual SSD is connected. Guess I’ll be filing an RMA.
It doesn’t sound like you have a a correct configuration for running TrueNAS under Proxmox. You need to pass through the controllers as well as the disks, and you have to blacklist the controllers.
I’d be happy to provide additional details, but unfortunately you haven’t told me what you want to know.
There’s a reason I said in my original post that it wasn’t important. There is no production data on this pool as a 300KB/s write speed is unusable. I’ve simply tried various configurations (to no avail, hence why I’m posting here) and have not changed it since.
Regarding the protocol used to access it, the problem occurred regardless of what protocol I used. I’ve tried all of iSCSI, SMB and NFS, and all grinded to a halt after minutes.
Okay. What consititutes “doing it properly”?
I did have this originally as IOMMU groups made it impossible to pass through controllers without bricking the host, though I’ve since worked it out and am passing through controllers.