Deleting a pool in a production system to reformat from RAIDZ2 to mirrors. What could go wrong?

We are a small film post-production house and we are running 2 Supermicro servers (32 core Xeon, 512GiB RAM, 10G ethernet, 24 spinning HDDs). One with TrueNAS Core and one on Scale. The Scale server is our work server with active projects and the Core server is for backup and archive. The work dataset on the Scale server is backed up onto the Core server daily via replication.

While the Scale server delivers OK performance mostly saturating the 10G network with single video files (Eg. ProRes4444) it struggles with image sequences, which we have to deal with often. After much research and tuning we believe that this is inherent to our setup with 3 Vdevs of 8 disks each in RAIDZ2 configuration as that only gives us IOPS of 3 single disks, which is just not enough.

The plan is to reformat the pool as 12 mirrored VDEVS to increase performance.

My checklist:

  • Make sure main work dataset has been sucessfully replicated
  • Replicate other datasets (IX-Applications, etc) to Core server.
  • Stop all apps
  • Stop all sharing
  • Export / Disconnect Pool. Select: Destroy data. Uncheck: Delete Saved configuration
  • Reformat Pool as mirrors
  • Restore datasets from Core server
  • Start sharing
  • Start apps

Did I miss something? Am I overlooking some potential catastrophic mistakes? My main worry would be to somehow mess up the backup and lose all the data in the process.

Any hints welcome.

I would suggest maybe stopping apps and sharing prior to replicating the datasets. I tend to do this when moving datasets between servers.

2 Likes

My suggestion - for comment by others. If you have one or more spare drive slots
Set up a new pool for any containers (you mention IX-Applications - so I assume you have containers running) and then migrate to that pool. I believe (but have not tried) that this is possible

Consider SSD’s rather than HDD’s - you would get far more IOPS (but less space probably)

1 Like

Thank you. That’s actually a good suggestions. I don’t have any spare slots but I have 2TB of nvme on a pci card. (We used it as L2ARC but turned out to be counter productive). I will try to put all small datasets for apps and VMs on that and keep them running through the migration of the pool.

And yes. We are considering to upgrade to an all NVME server but wanted to try if we can get more performance from spinning disks first.

1 Like

That’s for sure.

Do note that for maximum performance you want your working pool to stay below 50% of space utilization; if you work with large media files, be sure to set the dataset’s recordsize to at least 1M.

1 Like

The 50% guideline is for block storage, based on frequent rewrites of small blocks. It should not apply to SMB/NFS sharing of large media files.

I believe it is a general advice to address free space fragmentation, which is especially important in block storage. Am I wrong?

Set datasets to be not read-only after restore

Manually backup your app configs (ie screenshots)

We’re in a similar situation, and upgraded from raidz2 to mirrors to saturate 10gbit.

And future plans is an all nvme etc build :slight_smile:

Hi Everyone
Just a quick update: The reformat and restore of the main pool all went flawlessly and the few apps and VMs are also working again. The restore of the 77TB pool took around 3 days but went through without any hiccups.
But performance is not great. I have a weird issue where read speed seems to be capped at around 250MB/s. Writes are as expected above 1GB/s.
I will probably start another thread about that problem.
Thanks for everyone’s advice.