Recommended way to migrate my main pool from RAID1 (2 disks) to RAIDZ1 (4 disks)

stk · April 24, 2024, 2:32pm

Does it matter if the config data is saved or not?

Logically, it seems to me I can delete the data AND the config data so I can check DELETE on BOTH when I do the Export/Disconnect.

I’m not going to import the pool after the backup. I’m creating a brand new Z1 pool with 4 disks: 2 existing and 2 new.

Whether those are 4 clean disks or dirty disks shouldn’t matter, it should work either way.

So why would you recommending keeping the configs? Wouldn’t that just confuse TrueNAS when I try to create a new pool with the 4 disks, 2 of which have config info on them?

Stux · April 24, 2024, 11:21pm

If you keep the configs. Name your new pool the same as the old pool, and restore the datasets, your shares, snapshot tasks etc and configured services will just come back by themselves.

stk · April 25, 2024, 2:36am

Oh, so the configs are associated with the pool name itself and not stored on any of the disks in the pool?

If the answer is yes, then your answer makes perfect sense.

Stux · April 25, 2024, 2:52am

Yes, they’re stored in your configuration, not on the pool, and as far as I can tell, they’re just dumb paths.

stk · April 25, 2024, 5:17am

Final (?) question. I was going to use RAIDZ1 but dRAID1 looks more appropriate.

You tradeoff ~50% capacity, but everything else is better (performance, rebuild time, …).

So if you have large disks (mine are 12 TB), this seems appropriate for me.

Am I missing anything (e.g., is there something i should know that was not in the tool tip)?

etorix · April 25, 2024, 9:12am

dRAID only make sense if you have multiple raidz# vdevs with hot spares, for more than 20 drives in total. And flexibility is even worse than raidz#.

stk · April 25, 2024, 10:06am

Well, it almost worked!

but I lost the ability to run apps

I tried rebooting and no app service is running.

Relevant screens are:

and the kubectl cluster-info said it couldn’t connect to the server on 6443

I’ve attached the error output when I did a k3s server to this message.

Note: you can’t upload a log file so you have to give it a .sh extension.
k3s.sh (38.0 KB)

Output of k3s server ends with:

E0425 03:13:06.366461   52694 kubelet.go:1466] "Failed to start ContainerManager" err="invalid kernel flag: vm/overcommit_memory, expected value: 1, actual value: 0"

help appreciated. I’m sooo close to getting this working again.

stk · April 25, 2024, 10:24pm

So I’m stuck. It says “Error in Apps Service” and it says "Application(s) have failed to start:
Command ‘(‘mount’, ‘-t’, ‘zfs’, ‘main/ix-applications/k3s/kubelet’, ‘/var/lib/kubelet’)’ returned non-zero exit status 1.

Doing this manually fails:

root@truenas[~stk]# mount -t zfs main/ix-applications/k3s/kubelet /var/lib/kubelet
filesystem 'main/ix-applications/k3s/kubelet' cannot be mounted using 'mount'.
Use 'zfs set mountpoint=legacy' or 'zfs mount main/ix-applications/k3s/kubelet'.
See zfs(8) for more information.
root@truenas[~stk]#

stk · April 26, 2024, 6:41am

The solution to the apps not starting problem is described in this post. So that worked like a charm.

stk · April 26, 2024, 7:42am

Here is the final process with all the details and gotchas!

Be sure to follow all the steps especially in moving the system dataset before you export the main pool!

I decided to use dRAID1 instead of RAIDZ1 due to higher read performance, slightly slower writes, but way faster rebuild. My disks are huge so a Z1 rebuild would take a long time. The price I pay is giving up 50% of the capacity, so I get around 21.68 TiB from my 4 12 TB disks.
add nvme card to slot near 4 eth ports and chelsio 10G eth card to supermicro opposite the HBA. Put 1G eth cable into IPMI slot. Route the 10G SPI+ DAC cable to the UDM Pro 10G port. Note: SuperMicro documentation is terrible. To remove the riser, you have to lift the two black tabs in the back, and then you can pull straight up on the riser. There are NO screws you need to remove. So now 10G is the only way in/out of my system. Note: unluckily for me, changing ethernet cards like this can cause the Apps not Starting bug later, but I found a fix for that.
Config the NVME disk in TrueNAS as a new pool SSD. I used this to back up my existing data (which is modest at this point)
insert the 2 more disks into the supermicro, but don’t create a pool yet. Make sure they are recognized. They were. Flawless. To pre-test the newly added disks, see: Hard Drive Burn-in Testing | TrueNAS Community. Just make sure you are NOT specifying the old disks… WARNING: the system assigns different drive letters on EVERY boot.
save TrueNAS config (just to be safe). Always good to checkpoint things.
This is a good time to stop all the apps and then unset the app pool on the apps page. This is going to make it easier when you detach the pool later =.
Make sure there is a periodic snapshot task that snaps Main recursively. You can make the frequency once a week for example. You need this for the next step. I used the custom setting for timing so that the first one would be 5 minutes after I created the task. This meakes the next step easier since I will have an up to date snapshot. You must check “Allow taking empty snapshots.” The reason you need to check this is that the full filesystem replication makes sure every snapshot is there. If you recently snapshotted a dataset which didn’t change, this check will fail to create a snapshot and the sanity check will fail so you won’t be able to replicate the filesystem.
Use the GUI to copy everything in the main pool to SSD pool by using the ADVANCED REPLICATION button in Data Protection>Replication Tasks. Transport is LOCAL. “Full Filesystem Replication” is checked. Source and dest are the pool names: Main and SSD. Read only policy is set to IGNORE. You do NOT want to set this READ only. Replication schedule: leave at automatically. Pick the replication task you just configured for the replication task. Set retention to be same as source. Leave everything else at the default. Automatically will start the replication job right after the snapshot job above finishes. So less work for you.
While it is doing a full filesystem replica, it will unmount the SSD pool so if you ssh to the system, you will find that /mnt/SSD will be gone. So if you look at the Datasets page and refresh, it will give you an error about not being able to load quotas from SSD. This is only unavailable while doing the full filesystem replica.
If you click on the Running button, you can see the progress. You can also go to the Datasets page and expand the SSD pool and see that it is populating and that the sizes match up with the originals. You can also click on the Jobs icon in the upper right corner of the screen (left of the bells); that screen updates every 30 seconds to show you the total amount transferred.
Migrate system dataset from main pool to SSD pool (so I have a system when I disconnect the pool) using System >Advanced> System Dataset, then select the SSD pool and save.
Use GUI (Storage>Export/Disconnect) to disconnect MAIN pool (do not disconnect the SSD pool). If you get a warning like “This pool contains the system dataset that stores critical data like debugging core files, encryption keys for pools…” then you picked the wrong pool. Select to delete the data but do NOT select “delete the config information.” The config info is associated with the pool name, not the pool. You will get a warning with a list of all the services that will be “disrupted.” That’s to be expected.
You will need to type “main” (i.e., whatever your pool name is) below the checkboxes to confirm the destroy and disconnect… Make sure the middle selection is NOT checked. You want it just like above.
The export will take a little while.
When I first tried it, I got: "[EFAULT] cannot unmount '/mnt/main/user': pool or dataset is busy" and libzfs.ZFSException: cannot unmount '/mnt/main/user': pool or dataset is busy"
I tried it again and it said Kubernetes is still referencing main. So I did the same process a second time and it failed: [EFAULT] cannot unmount '/mnt/main/user': pool or dataset is busy. This is because I was ssh’ed into the truenas system and in my home directory.
So i killed my ssh and tried the export a third time. Remember to adjust the options as above each time!! The system will NOT default to this (it defaults to the last 2 options)
It worked! “Successfully exported/disconnected main. All data on that pool was destroyed.”
Storage now shows I have 4 unassigned disks!!! Perfecto!
Now click that “Add to pool” button! !
Use GUI (“New Pool”) to create a brand new main pool with all four disks with same name as my original main pool. Select Data>Layout of dRAID1 since I have large 12 TB disk drives and I want performance. Use the exact same name as before (including correct case). Data devices should be 2 (which uses 3 disks) and hot spare=1 and so children=4, vdev=1. t will show in the summary as: “Data: 1 × DRAID1 | 4 × 10.91 TiB (HDD)”. Then it says: “In order for dRAID to overweight its benefits over RaidZ the minimum recommended number of disks per dRAID vdev is 10.” But that’s not what I learned.
Some people believe “dRAID only make sense if you have multiple raidz# vdevs with hot spares, for more than 20 drives in total. And flexibility is even worse than raidz#.” This is a topic for another thread.
At this point, look at your SMB shares. They are preserved!!! So are the snapshot tasks, replication tasks, etc. in the Data Replication page. So looking great!!! It’s all downhill from here!
Just for safety, let’s export the config at this point. (at 12:25am on 4/25)
This is the time to periodically snapshot the SSD. Do the snapshot process we did above; a periodic snapshot of the SSD pool just like we did before. This sets you up for the transfer back to the main pool of the data that was there originally. So recursive, custom start time (weekly starting in a 15 minutes) to give us time to create the Replication Task that will start automatically after the SSD full recursive snapshot.
Use GUI again to copy data over from SSD pool to the new main pool. So same steps as before. We need to use the SSD snapshot and reverse the source and destination and use the SSD snapshots. So I just loaded the previous replication task and started with that, but when I went to advanced, it blanked it out! So I have to re enter everything from scratch!
It failed this time. I got the message: middlewared.service_exception.ValidationErrors: [EINVAL] replication_create.source_datasets: Item#0 is not valid per list types: [dataset] Empty value not allowed which sound like it thinks I’m trying to replicate from an empty source. I’m going to bring up the menu again, but this time I’m not going to try to load from the OLD job, but enter everything from scratch. I think this is a big in retrieving the old replication task job. So bring up the dialog and go straight to advanced and fill it out from scratch. Do not try to edit the old job.
I was right!!! Whew, what a relief. This is just a bug which I’ve reported.
Hit reload to see your SSD to main Replication Task. It’s status will be pending. It will run right after the SSD recursive snapshot is done. Hit reload and the periodic snapshot tasks will be “Finished” and the “SSD to main” replication task will be running.
You can monitor progress as before.
After the replication finishes, it’s time to move the system dataset to the new main pool using GUI using the process used above.
It says “Apps Service Stopped”. restart apps Service. Not sure how to do that. it isn’t in System>Services. No clue. Hopefully a reboot will fix it.
It wasn’t serving any SMB shares (the switch was flipped to off for all the shares) so I had to flip all the shares to enabled, and that did the trick!
Done.
Now disable that replication task (from SSD → main). And possibly from main → SSD if you want to use the SSD for a different use.
Reboot just to make sure there are no issues. There shouldn’t be, but I’d rather find out now than later.
Reboot didn’t fix k3s not starting. I covered the fix for Apps not starting here.
Take a manual full system recursive snapshot of your new pool and save the system config.
Have a beer. You did it!

In a few days, check SMART errors on the new disks just to be safe using the GUI in Storage>Disks. But beware that the OS will not name the disks in the order in the server. Use lsblk before you put the disks into a pool to see which disks are in use and which are the two new ones. In my case, the new disks were sdb and sdc. You can also tell because if you look at the smart test results, there will be none for the new disks (but I think it’s a bug that I’m not seeing any smart results)! Sadly, manual SMART tests fail because SMART isn’t enabled even though the GUI says it is. Reported as a bug.

It seems the GUI is broken to enable SMART tests. Go to the CLI and type:
smartctl -s on /dev/sda and do for each disk. Then go to the GUI to start the tests. You can close the window after you start the test.

ericloewe · April 26, 2024, 8:37am

That is, almost certainly, a terrible idea. DRAID has significant gotchas, most notably no support for smaller allocations - which means a lot of additional wasted space.
in terms of resilver times, they’re not going to be meaningfully faster with just four drives, regardless of how large the disks are.

stk · April 26, 2024, 2:07pm

Thanks. Easy enough to fix now. I can just redo the process.

It would be nice if there were performance numbers posted somewhere.

chuck32 · April 26, 2024, 2:42pm

If performance and fast resilvering are arguments for you and you are willing to trade off 50% parity, striped mirrors are an option as well.
There are not as resilient as RAIDZ2 though, i.e. two drives may fail but only one drive failure is gruanteed to not kill the pool.

etorix · April 26, 2024, 8:57pm

“Faster resilver” comes from pre-filling the “spares” (actually: using all drives). Without any spare in the mix, dRAID will resilver just like raidz.

scharbag · September 27, 2024, 6:43pm

Thanks for this sir!! Going to try this very soon.

winnielinnie · September 27, 2024, 6:45pm

A member from the Old World.

scharbag · September 29, 2024, 4:42am

Also a question: I have around 30TB to send to a new pool. I am going to do this with the system still running and then once the initial is complete, what do I need to do to send another snapshot to capture all the changes? I have read that the -I switch with zfs send in the CLI is a good option, but if I want to stick with the GUI, is the way to do this to use a scheduled snapshot and replication task?

I assume if you do so, set the schedule for a long period on the initial send and then adjust timing after the bulk send is complete? Then I would shut down all the services, do a final send and then follow the above steps?

Just waiting for my 4 new 20TB HDDs to burnin so trying to prep as much as possible.

Thank you,

etorix · September 29, 2024, 9:11am

Drop the “scheduled” part. You can also initiate a one-off replication in the GUI.

scharbag · September 30, 2024, 3:16am

I see that - just want to make sure I understand that the GUI will be able to do an incremental as well. Does it just sort that out if you re-run the same 1-off replication recursively?

Burning in big drives is painfully sloooooooooooooooowwwwwwwwww

etorix · September 30, 2024, 11:57am

One-off or scheduled, replication is based on snapshots. As long as the replication task uses the right set of snapshots, based on snapshot naming convention, it will sort out the incremental part.