Help recovering from a trueNAS scale install stuck at a grub error

I am having an issue where my true NAS scale 23 box will not boot, it is stuck at a grub rescue screen saying “error: symbol ‘grub_disk_native_sectors’ not found”.

Two things have happened recently:

  1. I upgraded from scale 22 to 23. This was completely successful and has been running for probably about a month. I have not rebooted in this time at all. I also did not do anything that would trigger a feature flag upgrade on any of my pools. I was giving it some stable time before going to version 24.

  2. a couple of weeks ago one of my boot disks died. It was in a two disc boot pool mirror and I replaced the disc using the UI (system → boot → find disk → replace) And as far as I knew it was successful.

Earlier today I got an alert from my monitoring system but the agent was unavailable on the NAS. Following that, all of my virtual machines whose storage is backed by eye scuzzy on the all of my virtual machines whose storage is backed by iScsi on the NAS stopped responding. The web UI for the NAS was frozen at something like “waiting to connect to the controller”. I shut down my hypervisors and rebooted the NAS, And since it came up it is going directly to the grub error screen and I am stuck.

Any help is appreciated. I’m having difficulty searching as I’m stuck being just on my phone right now but I am finding a lot of people with issues during the system install but not issues like this on a running system.

The most common reason I’ve seen for a user to be stuck at a Grub error is a failed or failing boot device. Install to a fresh device, upload a saved config file, and Robert’s your father’s brother.

So let’s say, right now, to my embarrassed realization, my backup is on my fileserver, which is backed by a LUN on this NAS…

If at all possible, I’m looking for a grub rescue solution, or something that can be done from a live USB.

I may have solved it? The bios was legacy, I changed it to dual and it came up, the move from 22->23 changed it to UEFI or something? I have no idea how it happened but I’m in and have a fresh OFFLINE backup.

I ran into an issue like this too. I was using Rufus to create the boot-able media, and when it asked for what “mode” to use I selected the default one instead of dd. This caused the grub unknown filesystem issue.

I went back and used dd and it worked.

I’ve run into grub issues be caused by other disks… try disconnecting (temporarily) some disks and seeing if your system is able to get to the boot screen.

I used to get “Out of Memory” errors when GRUB would read the header of every disk (of a server with 64 HDDs and 8 SSDs) with 512GB RAM… I could boot the server by just turning on the JBOD after GRUB was loaded. this issue fixed itself during an upgrade at some point so… if you are stepping through upgrades I wonder if your issue will resolve itself too…?