Dead boot pool - NAS stuck on error code A02

A couple of days ago my boot pool SSD send an IO error, and quickly after that the server went unresponsive. Unfortunately I still didn’t have the boot pool set on a raid.
Due to some other problem my Truenas server IPMI kvm isn’t working (I still need to figure that out) so I didn’t know 100% what could have been the problem.
After a restart, the server remains stuck on the boot page with code A02, trying to either get in the bios or change boot doesn’t do anything.
I now managed to reinstall Scale on a new SSD and I have another one on order to set the boot pool as raid. The data is all fine, the apps are working, but I lost my general settings and my VM.

I followed some other discussions and found that Truenas does make configs saves every day ay 3:45am. So, as per title, I’m trying to locate the config backups Truenas makes as part of the System dataset. I located the folder within var/db/system but I have no idea how to access those files.
Can anyone help me? Or if I got this completely wrong and there is another way of going at this, just let me know, I’m all ears! I really want to recover the VM with HAOS

In /var/db/system/configs-<long string of hex> you can find copies sorted by version.

the thing is, how do I access/download them?
I navigate to them in the shell but they do look empty. I’m not sure if I’m doing something wrong

For the future implement @joeschmuck’s Multi Report script and have a backup of your system configuration emailed to you once per week.

1 Like

I’m not sure why that would be.
I only have a single config-folder and that contains directories with names like “TrueNAS-SCALE-24.04.2.5”. That in turns contains row of db-files named after the date.

dir1

If your config-directories are empty that suggests they were stored on the now failed boot-pool, maybe? I’m also not sure why you would have three directories.

if I go at it with zfs list I get this


so it does look there is something inside those folders? Could I have any chance looking inside the old boot pool with another PC? For whatever reason as soon as I hook up the old boot pool to the server it remains stuck and doesn’t even reach the bios.
If this doesn’t work, is there any way of linking truenas to the previous zvol of my VM? It does seem to see it in the pool

I suppose it could be that they are not currently mounted in the normal /var/db/system location.
What does mount | grep system show?

not exactly sure what do you mean, but

You previous System Dataset (before the boot drive failure) was probably housed in your boot-pool instead of a storage pool.

Any new configs that get saved will be of your current system. (This won’t be helpful to you.)

So it’s either:

  1. Recover a config db file from the former boot drive
  2. Accept your loss, and next time make it a habit to regularly export your TrueNAS config file

EDIT: My mistake. @neofusion pointed out that the other two datasets are not mounted, and may in fact have recent config files.

Please note that it does not run on TrueNAS 24.10.1 or greater, YET. Email issue is still kicking me in the jewels.

Only the config directory starting with ae is actually mounted at /var/db/system.
The other two, which happen to be the ones with megabytes of contents, are not mounted.

I am in unfamiliar territory and not sure if there are caveats to simply mounting them temporarily yourself to get a db-copy out. Perhaps someone else can say.

2 Likes

Question to more knowledgable here, would these work in this scenario?
mount -F zfs arcadia/.system/configs-0a2b9b8243a747409c8014877982836b
mount -F zfs arcadia/.system/configs-342111fac902458e8468ba3ccee40d16

Do you need to specify the mount point in the command or will it deduce that based on other information?

Edit:
If you need to specify the mountpoint, my uneducated guesses would be these:
mount -F zfs arcadia/.system/configs-0a2b9b8243a747409c8014877982836b /var/db/system/configs-0a2b9b8243a747409c8014877982836b
mount -F zfs arcadia/.system/configs-342111fac902458e8468ba3ccee40d16 /var/db/system/configs-342111fac902458e8468ba3ccee40d16

Edit2:
I should say that I am not so sure about that -F flag. I only see a reference to it that makes sense in the ancient Oracle documentation.

Newer documentation suggests the right flag would be mount -t zfs <source> <mountpoint>. Again, best wait for someone who knows how you actually do this.

1 Like

thank you for the input! Hopefullt someone else can help with this as well.
In the meantime, I just checked and I think Truenas creates a backup tonight

In the meantime, if I want to simply test out the process of getting the backup, how do I access this file?

Edit:

Nevermind?! I did try before posting to run a replication task as I’ve seen someone else suggest in another discussion, but for me it just copied empty folders for some reason.
I tried again just now, and the folders are not empty but they indeed have all the db files and there is one from the 21st of December.

Is it now just a matter of restoring this db file to get everything back as it was before the boot pool fail?


Edit2:

I think I’m almost there, but I’m not sure how to access the file in question. I don’t have the permission to do so

I would use the shell or ssh to copy the file you want to the home directory of your main user.
Assuming this directory doesn’t use ACLs you can change ownership of the db-file with:
chown kinga:kinga 20241218.db
(replace kinga with your username and group)

Then either scp the file to your computer or put the file on a user accessible SMB-share and grab it from there.

1 Like

Thank you! I did manage to follow this process, although after restoring the latest db file I cannot reach the GUI. Truenas console says there is a bridge preventing me to connect. If I wait enough my apps start to spin and seems to be reachable, but I’m left out of TrueNAS GUI.
I tried with 4 other db files and all of them seem to complain about that. Not that long ago I managed to set up a working bridge inside TrueNAS to make the system reachable the Home Assistant VM and everything was working fine.
Any idea why the restore will not make the GUI reachable anymore (IPs and everything seem to be the same as usual)?
I can keep rolling back to older db until I find one without the bridge, but if I can make the latest work that’d betterm so if in a year time I need to restore a db I can do it with the bridge in place. Is there any way to just remove the bridge?

You could try connecting a keyboard and monitor to the server, to see what the screen shows.
That would also let you change the network settings, possibly removing the bridge temporarily so that you can get in.

You can also get to the control menu by ssh-ing in and running cli, if you don’t already use that as your shell.

As I mentioned previously my kvm connection hasn’t been working for a while so I connected a monitor and keyboard to it.
When uploading the db on the Truenas the system restarts and on the console view I see the message after it posts saying “bridge firewalling registered”.
I tried to mess around with the settings in those menus but I can’t see/understand how to remove the bridge.
I also tried to use some dbs going from the 21st until the 14th and then tried the oldest one from this version of truenas from the 5th(definitely before my time messing around with bridges) but the error remains the same.
I never setup SSH, but wouldn’t that reset with the new config upload as well?

You should be able to configure the bridge settings from the CLI.
If you didn’t set up SSH previously it’s not going to be enabled.

cool, that does make sense, so technically that backup is a bit of a dead end.
I managed to get everything back up by simply resetting the configuration, recreate the user and create a new VM using the existing zvol. Home Assistant OS is now up and running with working as it was previously.

Thank you all for the help, it took a bit, but in the end it was really cool being able to restore pretty much everything. Now I just have to reenable the email notifications, tests and so on :wink: