Data Pool Causing Kernel Panic -- Options?

Hello,

Thanks in advance for any help. Short version – a bad memory module has caused corruption that is leading to kernel panics when a data pool is present at boot in the original TN Scale install or imported into a new instance of TN Scale. When I unplug the physical drives so the pool cannot be mounted, the og TN instance will boot up with the pool showing as offline. A new install will boot without difficulty until I attempt to import the bad pool (mediatank).

Having Googled the issue, I’ve tried forcing the import which causes the same panic. Importing as read-only is successful but only the root dataset is visible. Sub datasets where the files I care about actually live cannot be mounted. Efforts to mount them manually lead to the same kernel panic.

“zpool import” shows the following pool as importable but it doesn’t actually work.

I would post an image showing the kernel panic but apparently I can’t post images or links to images. How would I show that information to the community in order to receive help?

Any ideas on how to mount the sub datasets as read-only or scrub the pool?

Thanks!

I don’t have a complete answer, but here are some:

  • New users to the forums can’t post pictures or links. Try running through the forum training using the link provided when you joined.
  • You can not scrub a pool without importing the pool Read Write
  • You can try and import the pool Read Only or roll back some recent write transactions

All serious ZFS troubleshooting will be from the Unix command line. For example, if you could copy and paste the output from the following command, in CODE tags, it would help:

sudo zpool import

Note that I’ve left off the pool name, as we are looking for the pool layout, not attempting to import.

You can try a Read Only import using this, but it may crash again:

sudo zpool import -R /mnt -o readonly=on POOL

Replace POOL with the name of the pool. Even if this works, it won’t the problem because the GUI / TrueNAS Middleware won’t know about the pool, yet.

Note that sudo is not needed if you are running the command(s) from user root.

Thanks, I appreciate the suggestions. The output from the two commands is visible in the attached picture. I had to add -fR because I’d imported the pool in a second instance of TN as readonly.

The read-only import works but only the root dataset is visible. There should be other child datasets. The pool “mediatank” does not appear in the storage GUI but from your post it sounds like that is expected behavior.

“sudo zpool status” returns:

Any further suggestions for troubleshooting, mounting or rolling back the pool would be welcome.

Thanks

Mediatank appears to be a root dataset and should not be shared. You are supposed to make a child dataset and share that. Your Datasets GUI doesn’t show that set up.

Correct. Somethings done from the Unix shell command line won’t be known by TrueNAS GUI or it’s Middleware.

Next, you can try and see if rolling back some ZFS write transactions will allow importing the pool Read / Write. However, this will permanently throw out more recent writes to the pool. Normally I would advise doing a non-changing attempt with the -n option. But, since the pool will import R/O, that seems like it would work anyway.

sudo zpool import -fFX -R /mnt mediatank

You may want to wait and see if someone else has a better suggestion.

Where and how did it cause corruption ? And how did you remove the corruption ?

Is this a bare metal install or a virtual machine? Pls give us full hardware details.

Mediatank appears to be a root dataset and should not be shared. You are supposed to make a child dataset and share that. Your Datasets GUI doesn’t show that set up.

Are you suggesting that the child datasets may exist and just not be visible in the GUI? Is there some way to see the whole tree of datasets for a given pool through the command line?

Would I run this command with the pool in it’s current state (e.g. mounted as R/O)? Or should I export it first and then attempt this as a re-import?

It’s a bare metal install on an Asus z370 board, Intel 8400 and 32GB of RAM. The offending pool is a RAID Z2 with 5x8TB and 1x12TB drives. The 12TB drive was added through expansion but that was previous to my present problem.

I was seeing errors from TN warning of possible data loss so I tested the RAM and found one bad stick. After removing that, I deleted a couple of files identified as corrupt by:

sudo zpool status mediatank -v

I thought I’d dodged a bullet but sometime fairly shortly after that, it crashed and I got stuck in a boot loop with the kernel panic. I unplugged enough drives so the pool couldn’t be loaded at boot, exported it and then replugged the drives. Then I could import it as R/O which is where we are at the moment.

I don’t think the current problem is hardware related though I recognize that’s possible. The remaining RAM has tested good and the kerel panic was in exactly the same spot in the boot loop. It also panic the same way when I attempt to mount the pool as R/W. I’d expect a more random problem if the RAM or motherboard were bad. My guess is something in the pool itself if corrupt.

Thanks for any suggestions.

The documenation for NFS and SMB sections, has the following warning
" When creating a share, do not attempt to set up the root or pool-level dataset for the share. Instead, create a new dataset under the pool-level dataset for the share. Setting up a share using the root dataset leads to storage configuration issues."

Understood. I’m not attempting to share the root dataset. Before this all went sideways, there were child datasets underneath the root dataset. So:

mediatank (pool)

  media (root dataset)

        movies (child dataset)

        tv (child dataset)

Now, when I mount the pool as R/O, I’m only seeing:

mediatank (pool)

  media (root dataset)

If I could see the child datasets, I assume I could pull the data off them and then rebuilt the pool as R/W. I’m wondering if the child datasets aren’t mounted or if they just aren’t visible in the GUI.

Try checking using the command line. TrueNAS is funny if you don’t use the GUI for almost everything.

sudo zfs list only shows me the following for mediatank:

I can see child datasets for the other pools so they must not be mounted properly under mediatank.

I’ll give Arwen’s command a try and see if it can roll further back to properly mount. I’m not concerned with losing some of the most recent files.

Well…no luck with the forced import. It just triggers the same kernel panic I was getting in the boot loop. The command to mount as read-only succeeds but the datasets I need aren’t actually mounted so I can’t recover any data. I’ll attach an image of the kernel panic in case there are any final suggestions before I move in another direction.

I would try the following:

Disconnect 1 drive and try to import the pool in a degraded state. Do this until you disconnected all drives one time.

Also you could check all your disks with

sudo zdb -l /dev/sdX

To see if your transaction groups are in line.

1 Like

Based on the zfs list output, I would guess that movies and tv are simple directories. You DO have the media dataset, so the share part should be good. You would share out “media” not “mediatank”.

To be clear, I seriously doubt ZFS deleted your movie or tv directories. It’s just that they are oddly listed as being under /mnt/mnt/mediatank/media.

Sorry the -X did not help.

They are simple directories. Can you clarify what you mean by share out? Share via SMB/NFS or share within TN for copying files to another pool? Thanks

Interesting idea but I ended up with the same kernel panics when trying to import it degraged (one disk at a time).

sudo zdb -l /dev/sdX is showing failed to unpack / bad label cksum errors on all the drives. One in particular fails to unpack all of the labels. But, I would have tried to import the pool with that drive removed and still had the kenel panic. If the drives are able to unpack at least one of the labels, does that mean this isn’t the problem? The 4 labels per drive are for redundancy?

Well, that doesnt sound good.
I think we have to call in @HoneyBadger for help !

We can worry about that after the pool is fixed.