Migration from CORE to SCALE "failed", now running CORE with pool status "30"

fosaq · September 14, 2024, 7:34am

I tried to migrate from CORE 13.0-U6.2 to to SCALE 24.04.2 but ran into several issues:

Hardware:
X11SCL-F, small Xeon, 2x 32 GB DDR4-2666 ECC RAM, 16 GB M.2 SSD, 4x 12 TB drives (raidz2), Intel X520 NIC with two 10 Gbit/s SFP+ ports.

Setup:
The NAS is connected to the local network via two fibre connections (ix0 & ix1), link aggregated to lagg0 with a local address 10… .
One onBoard NIC has another local address (192.168…), the other onBoard NIC is not in use.
In System → General, the Web Interface IPv4 Address ist bound to that 10… address (lagg0).
The NAS was builded some years ago, starting with CORE 10.x.
A second system for replication is not ready yet.

Preparation:
I saved the config and wrote down all the configurations before attempting to upgrade.
While doing this I saw all the old boot environments in System → Boot but left everything unchanged.

First attempt:
The upgrade to SCALE was not successfull because of insufficient space on the SSD.
The upgrade attempt must have deleted older boot environments because I looked into System → Boot and almost everything was already deleted which is ok. Deleting older boot environments would have been my next step anyway.

Second attempt:
The upgrade was “successfull”. The machine booted but I was unable to reach it.
In the remote console of the BMC everything was ok, it shows “the web user interface ist at: …” and the login prompt.
So I walked to the machine but was unable to connect to the ui in any way.
The SFP+ ports on the switch didn’t even show a link anymore. Obviously SCALE didn’t use the NIC in the way it was configured before for whatever reason.
The 10… address was not reachable but due to the configuration it was the only one that would answer to http(s).
I tried to login on the remote console but it doesn’t support copy/paste to enter the password. Also the password is too long and some of the special characters can’t even entered here.
Maybe the login would work locally with keyboard and monitor attached but I don’t have a monitor with a VGA port anymore.
I thought I had to reinstall CORE from scratch but thankfully the boot environment of CORE was still there so I gave it a shot.
Booting into CORE worked after some time because it rebooted somewhere automatically.
After CORE was running again I had two Alerts (both the same):

Failed to check for alert ZpoolCapacity: concurrent.futures.process._RemoteTraceback: “”"
Traceback (most recent call last): File “/usr/local/lib/python3.9/concurrent/futures/process.py”,
line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”,
line 111, in main_worker res = MIDDLEWARE._run(*call_args) File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”,
line 45, in _run return self._call(name, serviceobj, methodobj, args, job=job) File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”,
line 39, in _call return methodobj(*params) File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”,
line 39, in _call return methodobj(*params) File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”,
line 985, in nf return f(*args, **kwargs) File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”,
line 87, in query pools = [i.getstate(**state_kwargs) for i in zfs.pools] File “libzfs.pyx”,
line 402, in libzfs.ZFS.exit File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”,
line 87, in query pools = [i.getstate(**state_kwargs) for i in zfs.pools] File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”,
line 87, in pools = [i.getstate(**state_kwargs) for i in zfs.pools] File “libzfs.pyx”,
line 2489, in libzfs.ZFSPool.getstate File “libzfs.pyx”,
line 2693, in libzfs.ZFSPool.healthy.get File “libzfs.pyx”,
line 2675, in libzfs.ZFSPool.status_code.get File “/usr/local/lib/python3.9/enum.py”,
line 384, in call return cls.new(cls, value) File “/usr/local/lib/python3.9/enum.py”,
line 702, in new raise ve_exc ValueError: 30 is not a valid PoolStatus “”"
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/alert.py”,
line 740, in __run_source alerts = (await alert_source.check()) or File “/usr/local/lib/python3.9/site-packages/middlewared/alert/source/zpool_capacity.py”,
line 48, in check for pool in await self.middleware.call(“zfs.pool.query”): File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”,
line 1283, in call return await self._call( File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”,
line 1248, in _call return await self._call_worker(name, *prepared_call.args) File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”,
line 1254, in _call_worker return await self.run_in_proc(main_worker, name, args, job) File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”,
line 1173, in run_in_proc return await self.run_in_executor(self.__procpool, method, *args, **kwargs) File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”,
line 1156, in run_in_executor return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs)) ValueError: 30 is not a valid PoolStatus

Beside these alerts everything seemed to work fine and the machine is back online.
I changed the config so the web interface will now be available on all interfaces (0.0.0.0) and after a reboot I tried the upgrade again with almost the same(?) error:

Error: concurrent.futures.process._RemoteTraceback: “”"
Traceback (most recent call last):
File “/usr/local/lib/python3.9/concurrent/futures/process.py”, line 246, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 111, in main_worker
res = MIDDLEWARE._run(*call_args)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 45, in _run
return self._call(name, serviceobj, methodobj, args, job=job)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 39, in _call
return methodobj(*params)
File “/usr/local/lib/python3.9/site-packages/middlewared/worker.py”, line 39, in _call
return methodobj(*params)
File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”, line 985, in nf
return f(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”, line 87, in query
pools = [i.getstate(**state_kwargs) for i in zfs.pools]
File “libzfs.pyx”, line 402, in libzfs.ZFS.exit
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”, line 87, in query
pools = [i.getstate(**state_kwargs) for i in zfs.pools]
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/zfs.py”, line 87, in
pools = [i.getstate(**state_kwargs) for i in zfs.pools]
File “libzfs.pyx”, line 2489, in libzfs.ZFSPool.getstate
File “libzfs.pyx”, line 2693, in libzfs.ZFSPool.healthy.get
File “libzfs.pyx”, line 2675, in libzfs.ZFSPool.status_code.get
File “/usr/local/lib/python3.9/enum.py”, line 384, in call
return cls.new(cls, value)
File “/usr/local/lib/python3.9/enum.py”, line 702, in new
raise ve_exc
ValueError: 30 is not a valid PoolStatus
“”"
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/usr/local/lib/python3.9/site-packages/middlewared/job.py”, line 355, in run
await self.future
File “/usr/local/lib/python3.9/site-packages/middlewared/job.py”, line 391, in __run_body
rv = await self.method(*([self] + args))
File “/usr/local/lib/python3.9/site-packages/middlewared/schema.py”, line 981, in nf
return await f(*args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/plugins/update.py”, line 389, in file
await self.middleware.call(‘update.install_manual_impl’, job, destfile, dest_extracted)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1283, in call
return await self.call(
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1251, in call
return await self.run_in_executor(prepared_call.executor, methodobj, *prepared_call.args)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1156, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File “/usr/local/lib/python3.9/concurrent/futures/thread.py”, line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/update/install_freebsd.py", line 66, in install_manual_impl
return self.install_scale(job, path)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/update/install_freebsd.py", line 95, in install_scale
return self.middleware.call_sync(
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1310, in call_sync
return methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/update/install.py", line 24, in install_scale
self.middleware.call_sync(“update.ensure_free_space”, boot_pool_name, manifest[“size”])
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1310, in call_sync
return methodobj(*prepared_call.args)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/update/install.py", line 75, in ensure_free_space
space_left = self.space_left(pool_name)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/update/install.py", line 104, in _space_left
pool = self.middleware.call_sync(“zfs.pool.query”, [[“name”, “=”, pool_name]], {“get”: True})
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1303, in call_sync
return self.run_coroutine(self._call_worker(name, *prepared_call.args))
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1339, in run_coroutine
return fut.result()
File “/usr/local/lib/python3.9/concurrent/futures/_base.py”, line 439, in result
return self.__get_result()
File “/usr/local/lib/python3.9/concurrent/futures/_base.py”, line 391, in __get_result
raise self._exception
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1254, in _call_worker
return await self.run_in_proc(main_worker, name, args, job)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1173, in run_in_proc
return await self.run_in_executor(self.__procpool, method, *args, **kwargs)
File “/usr/local/lib/python3.9/site-packages/middlewared/main.py”, line 1156, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
ValueError: 30 is not a valid PoolStatus

zpool status:

pool: boot-pool
state: ONLINE
status: This pool has a compatibility list specified, but it could not be
read/parsed at this time. The pool can still be used, but this
should be investigated.
action: Check the value of the ‘compatibility’ property against the
appropriate file in /usr/local/etc/zfs/compatibility.d or /usr/local/share/zfs/compatibility.d.
scan: scrub repaired 0B in 00:00:13 with 0 errors on Wed Sep 11 03:45:13 2024
config:

NAME        STATE     READ WRITE CKSUM
boot-pool   ONLINE       0     0     0
  nvd0p2    ONLINE       0     0     0

errors: No known data errors

pool: trunk
state: ONLINE
scan: scrub repaired 0B in 05:15:03 with 0 errors on Thu Sep 12 06:15:03 2024
config:

NAME                                            STATE     READ WRITE CKSUM
trunk                                           ONLINE       0     0     0
  raidz2-0                                      ONLINE       0     0     0
    gptid/e8945f71-e32b-11ed-a375-90e2ba9d2798  ONLINE       0     0     0
    gptid/59fa1cd4-0078-11ed-bd1f-90e2ba9d2798  ONLINE       0     0     0
    gptid/c08febc6-0552-11ed-bd1f-90e2ba9d2798  ONLINE       0     0     0
    gptid/d0e969e4-101b-11ed-bd1f-90e2ba9d2798  ONLINE       0     0     0

errors: No known data errors

Is PoolStatus “30” a serious problem? As mentioned above the machine is working without problems. Maybe this status is somewhat normal for a SCALE system and just unknown for CORE.
The next step could be a fresh install of SCALE and restoring the CORE configuration. Can this work?

The whole pool is encrypted and the secret seed is included in the config (I know. I wouln’t do that again and encrypt single datasets instead of just everything).

Captain_Morgan · September 14, 2024, 10:58am

It seems like networking is the biggest issue… do you agree?

Better to have the webUI on the simple non-LAGG IP address. Then you can resolve the LAGG or NIC issues.

fosaq · September 14, 2024, 5:48pm

Exactly. Maybe something like this could be mentioned in the migration instructions?

Anyway: Can I just install a fresh SCALE system and load the configuration from CORE?

Captain_Morgan · September 15, 2024, 12:51am

Your ZFS data is safe.
It is probably best to describe the critical configuration information
Let’s see if anyone else has tested.
Are you on CORE 13.0-U6 now?

fosaq · September 15, 2024, 7:33am

I’m on CORE 13.0-U6.2 right now.
The hardware is nothing special for TrueNAS environments + the Intel X520 NIC.
The system is connected to the local network but no gateway is configured so there is no internet connection, I always updated the system manually.
There are periodically SMART tests (long and short) and scrub.
4 groups and 16 users are configured.
The system is running smb shares for the users and some other purposes (i.e. file exchange). All clients backup their data on the the correspondend smb shares.
The smb home-feature is not used, the datasets and it’s ACLs are configured manually.
Most datasets are encrypted, some are not.
Snapshots of most datasets are created daily.
NFS and FTP services are also running: NFS for testing and FTP for one legacy scanner (will be replaced later).
There is a Plex plugin running for one device that is not used very often.
Every night a script copy the configuration and delete older files.

The configuration isn’t very complex and I wrote down what I configured and why. Worst case: If I can import the pool everything else can be done in less than one hour.

Captain_Morgan · September 15, 2024, 7:37pm

You are well organized.
I’d suggest letting the UI from the simple IP interface and then sidegrading again.

fosaq · September 16, 2024, 3:57am

Thank you!
The UI is already configured to work on all interfaces (0.0.0.0), before upgrading again I need to test this.
What do you mean by sidegrading? I already tried to upgrade again from the UI and got that error message because of the pool status.

fosaq · September 20, 2024, 5:09am

The HDD for the backup server will arrive next week so here is my plan:

test the HDD and backup the whole NAS
install a fresh SCALE system on a bigger SSD (16 GB → 512 GB)**
restore CORE configuration and import pool

** the bigger SSD is not necessary but it’s left from my old laptop AND if somewhing goes wrong I can switch back to the other SSD.

I will report if that works.
In any way I will build a new pool because initially I created it encrypted and then created a mixed set of encrypted and non-encrypted sub-datasets underneath it. This is not recommended and not supported so I take this as a chance and change it.

fosaq · November 19, 2024, 7:54am

It took pretty long to do all this because…:

it took several attempts to complete the backup and I backuped everything twice
the SSDs didn’t work so I ordered another one, now a Intel EX900 is running
some settings are removed in SCALE, i.e. the auxiliary parameters of the smb service

I like the UI of SCALE, how to customize the dashboard an other little things that feel smother and faster now.
What I didn’t expect was the aux parameters of smb shares and the service to be removed. With the command

midclt call sharing.smb.update

it is possible to add aux parameters to shares, I use the parameter “force user” on some shares and this works for me.
As replacement for the smb-service aux parameter “map to guest” I just added a guest-user.

In the end the migration is done and everything is running well with one odd exception:
With SCALE I have a performance issue when watching a movie via smb on tv.
I tesed with a second machine (less powerfull, less ram) on CORE and don’t got these problems.

The behavior is a bit strange: Watching on a PC with vlc everything runs smooth from both machines so the network speed itself is not the problem. Backing up und restoring the data was at expected speed for the 1GbE connection on the pc. On the older machine with CORE a bit slower (108 vs. 114 Mb/s).
On the tv I can watch movies smooth without lagging from the CORE machine but from SCALE absolutely impossible: it lags every 5-6 seconds. The tv does not have that much hardware power so sometimes ist lags if watching 4k material but at the moment only material in dvd resolution runs smooth, hd resolution or higher has these periodical lags.

Maybe I should open a new thread for this.

P.S.: I ordered a new tv so let’s see if it works better with a newer client

Captain_Morgan · February 9, 2025, 8:03pm

Did you resolve this or start a new thread?

fosaq · February 10, 2025, 6:04am

Everything solved, it wasn’t a problem with the NAS.

The TV had two problems: 100 Mbit/s is to slow for some movies (bitrate up to 80 Mbit/s) and VLC for android has several problems in version 3.x.
Now I use Kodi on an external media player with 1GbE connection.

Thank you, thread can be closed.