Migration from CORE to SCALE "failed", now running CORE with pool status "30"

I tried to migrate from CORE 13.0-U6.2 to to SCALE 24.04.2 but ran into several issues:

Hardware:
X11SCL-F, small Xeon, 2x 32 GB DDR4-2666 ECC RAM, 16 GB M.2 SSD, 4x 12 TB drives (raidz2), Intel X520 NIC with two 10 Gbit/s SFP+ ports.

Setup:
The NAS is connected to the local network via two fibre connections (ix0 & ix1), link aggregated to lagg0 with a local address 10… .
One onBoard NIC has another local address (192.168…), the other onBoard NIC is not in use.
In System → General, the Web Interface IPv4 Address ist bound to that 10… address (lagg0).
The NAS was builded some years ago, starting with CORE 10.x.
A second system for replication is not ready yet.

Preparation:
I saved the config and wrote down all the configurations before attempting to upgrade.
While doing this I saw all the old boot environments in System → Boot but left everything unchanged.

First attempt:
The upgrade to SCALE was not successfull because of insufficient space on the SSD.
The upgrade attempt must have deleted older boot environments because I looked into System → Boot and almost everything was already deleted which is ok. Deleting older boot environments would have been my next step anyway.

Second attempt:
The upgrade was “successfull”. The machine booted but I was unable to reach it.
In the remote console of the BMC everything was ok, it shows “the web user interface ist at: …” and the login prompt.
So I walked to the machine but was unable to connect to the ui in any way.
The SFP+ ports on the switch didn’t even show a link anymore. Obviously SCALE didn’t use the NIC in the way it was configured before for whatever reason.
The 10… address was not reachable but due to the configuration it was the only one that would answer to http(s).
I tried to login on the remote console but it doesn’t support copy/paste to enter the password. Also the password is too long and some of the special characters can’t even entered here.
Maybe the login would work locally with keyboard and monitor attached but I don’t have a monitor with a VGA port anymore.
I thought I had to reinstall CORE from scratch but thankfully the boot environment of CORE was still there so I gave it a shot.
Booting into CORE worked after some time because it rebooted somewhere automatically.
After CORE was running again I had two Alerts (both the same):

Beside these alerts everything seemed to work fine and the machine is back online.
I changed the config so the web interface will now be available on all interfaces (0.0.0.0) and after a reboot I tried the upgrade again with almost the same(?) error:

zpool status:

Is PoolStatus “30” a serious problem? As mentioned above the machine is working without problems. Maybe this status is somewhat normal for a SCALE system and just unknown for CORE.
The next step could be a fresh install of SCALE and restoring the CORE configuration. Can this work?

The whole pool is encrypted and the secret seed is included in the config (I know. I wouln’t do that again and encrypt single datasets instead of just everything).

It seems like networking is the biggest issue… do you agree?

Better to have the webUI on the simple non-LAGG IP address. Then you can resolve the LAGG or NIC issues.

Exactly. Maybe something like this could be mentioned in the migration instructions?

Anyway: Can I just install a fresh SCALE system and load the configuration from CORE?

Your ZFS data is safe.
It is probably best to describe the critical configuration information
Let’s see if anyone else has tested.
Are you on CORE 13.0-U6 now?

I’m on CORE 13.0-U6.2 right now.
The hardware is nothing special for TrueNAS environments + the Intel X520 NIC.
The system is connected to the local network but no gateway is configured so there is no internet connection, I always updated the system manually.
There are periodically SMART tests (long and short) and scrub.
4 groups and 16 users are configured.
The system is running smb shares for the users and some other purposes (i.e. file exchange). All clients backup their data on the the correspondend smb shares.
The smb home-feature is not used, the datasets and it’s ACLs are configured manually.
Most datasets are encrypted, some are not.
Snapshots of most datasets are created daily.
NFS and FTP services are also running: NFS for testing and FTP for one legacy scanner (will be replaced later).
There is a Plex plugin running for one device that is not used very often.
Every night a script copy the configuration and delete older files.

The configuration isn’t very complex and I wrote down what I configured and why. Worst case: If I can import the pool everything else can be done in less than one hour.

2 Likes

You are well organized.
I’d suggest letting the UI from the simple IP interface and then sidegrading again.

Thank you!
The UI is already configured to work on all interfaces (0.0.0.0), before upgrading again I need to test this.
What do you mean by sidegrading? I already tried to upgrade again from the UI and got that error message because of the pool status.

The HDD for the backup server will arrive next week so here is my plan:

  • test the HDD and backup the whole NAS
  • install a fresh SCALE system on a bigger SSD (16 GB → 512 GB)**
  • restore CORE configuration and import pool

** the bigger SSD is not necessary but it’s left from my old laptop AND if somewhing goes wrong I can switch back to the other SSD.

I will report if that works.
In any way I will build a new pool because initially I created it encrypted and then created a mixed set of encrypted and non-encrypted sub-datasets underneath it. This is not recommended and not supported so I take this as a chance and change it.