Updated to Dragonfish and disaster ensued

I updated to Dragonfish from Cobia 23.10.2 today and the system is borked. The things I have seen so far:

  1. SMB will show the shares, and I can open and see them, but then nothing. I can’t open files, and when I try to change directories I get spinning wheel with “Loading”.

  2. An alert in the GUI indicates dumped core files. The cores directory is full of files that start with ‘core.smbd’ a long alphanumeric sequence, and ending with ‘.zst’.

  3. In SSH, I can’t scroll or copy text as usual: I just get a bunch of codes and semicolons on the screen.

  4. A script I use to control fans worked perfectly in Cobia, but now every time it calls ipmitool it gets "IANA PEN registry open failed: No such file or directory ". This is a Debian Bookworm bug that was fixed in mid-2023. https://groups.google.com/g/linux.debian.bugs.dist/c/ukUAcfnm280

I need help figuring out how to fix it. Please! I tried rebooting; no change.

Edit: Submitted a bug report with a debug file:

After filing the bug with debug report, I reverted to SCALE 23.10.2. All problems resolved except, strangely, SSH problem: can’t scroll, select text, copy, etc., as before. I have to hit ^C to get control back.

Screenshot 2024-04-26 at 15.50.10

Glad to hear that you’re back to a stable state. Someone will have a look at the debug that you supplied (can confirm we did receive it) as soon as possible and try to make some sense of what the trouble might be. Have a good weekend.

1 Like

Thank you @patrickkeane !

Apparently the SSH problem that persisted after reverting to 23.10.2 was on my end; maybe something triggered by the server. When I rebooted the laptop, that too was back to normal.

Great news. We’re going to focus on the SMB core files and see where that leads. Thanks for the update.

This is duplicate of [NAS-128410] - iXsystems TrueNAS Jira

SMB access to shadow copies via the ZFS snapdir .zfs/snapshots is currently broken if NFSv4 ACLs are used. There is a replacement samba package in the ticket, but most people who are using this feature can probably just wait until 24.04.1.

1 Like

Thanks for looking at it @awalkerix. But I did not do anything through .zfs or snapshots. The SMB access was a plain vanilla dataset share. And as far as I can tell there are no ACLs on that dataset.

And then there is the ipmitool errors.

You may have created a symlink to it then (I saw another user who did this). Corefile is clear that your client was inside it.

Does Samba use the snapdir internally to provide the shadow copies functionality or is this something about navigating to the snapdir?

We use it internally. Problem is when it’s externally exposed. The zfs ctldir and snapdir are special directories that don’t have internal ZFS ACLs. Unfortunately, this is tripping a general assert I wrote for case where we try to get an ACL and fail unexpectedly (since this is a security-sensitive area we assert rather than reject access so that I can investigate coredump and fix potential bug).

1 Like

Wow, that is very surprising. So when I open the share on my client, it is actually going to a snapshot? How can I detect that and make sure it is not happening? I never intentionally made a symlink to a snapshot. Also, I never made .zfs visible. I don’t even know what shadow copies are.

I mean, when I go to that dataset via SMB and delete a file, I can see via SSH that it is deleted. They couldn’t both be looking at a snapshot, could they?

Here is the pool top-level content. The dataset I accessed over SMB is Jim. I don’t see any symlink to another place.

Tabernacle:/mnt/Ark$ ll
total 190K
drwxr-xr-x 10 root    root    10 Apr 27 05:38 ./
drwxr-xr-x  4 root    root     4 Mar 25 13:23 ../
drwxrwx---  8 jim     attic    9 Jan 16  2023 Attic/
drwxr-x---  3 bill    root     3 Apr 20  2016 Bill/
drwxr-x--- 12 jim     root    31 Apr 27 11:19 Jim/
drwxr-x--- 11 jim     movtv   12 Mar 28 12:42 Media/
drwxrwxr-x  9 shuling dialout 11 Feb 27  2023 Shuling/
drwxrwx---  4 jim     dialout  4 Jan 16  2023 Time/
drwxr-xr-x  7 root    root    10 Apr 26 13:26 ix-applications/
drwxr-xr-x  3 root    root     4 Apr 26 12:27 jailmaker/

And I verified that the share itself is pointing to /mnt/Ark/Jim

You made snapdir visible on Ark/Jim dataset, ArkBak/Shuling, and ArkBak/Attic

1 Like

You’re right! That must have been from long ago. Thank you for helping me see the light. So now I think I understand that having .zfs visible is enough to trigger the problem, even if I don’t enter that directory in SMB.

I made them all invisible again with
sudo zfs set snapdir=hidden <pool>/<dataset>

I guess I’m safe to go back into Dragonfish waters again? Hopefully that will fix the ipmitool issue also?

Thank you for your patience.

This bug has no impact on ipmitool. It’s SMB-only.

Once we have a release or two of DragonFish under our belts I may convert more of these asserts into simple failures with error messages. This particular bug will be fixed in 24.04.1 (or earlier for those willing to install replacement packages). Otherwise, simply making it hidden should be enough.

I went back to Dragonfish and the SMB issues are resolved. Also no problem with SSH. But the ipmitool error persists. Should I do a new bug report? That was reported in the previous one, but maybe it’s cleaner to do one on just that?

Make a new bug report.

The ipmitool bug report reported that every call to ipmitool returns "IANA PEN registry open failed: No such file or directory ". The bug report was quickly closed, saying it is not a bug, it is cosmetic, and “Fixing it, last we looked, required downloading random files off internet during build process.” Well it’s causing a bug in my script.

@Stux , I saw you investigated and found a fix in 23.10.2 by downloading a file to /usr/share/misc/enterprise-numbers.txt. Did that solution persist in Dragonfish?

Dragonfish won’t let me place the file:

Tabernacle:~$ wget -O /usr/share/misc/enterprise-numbers.txt https://www.iana.org/assignments/enterprise-numbers.txt
/usr/share/misc/enterprise-numbers.txt: Read-only file system
1 Like

Yes. Same issue persists in Dragonfish. I filed a feature suggestion, but it’s in the “awaiting likes” purgatory.

I found the best fix for a user was to download to ~/.local/usr/share/misc/enterprise-numbers.txt then ipmitool will find it, and it will survive system updates

wget -O ~/.local/usr/share/misc/enterprise-numbers.txt https://www.iana.org/assignments/enterprise-numbers.txt

BUT, ipmitool is part of the OS, and I think it should be correctly installed with a copy of the enterprise numbers table.

1 Like

Thanks @Stux, that’s it. It didn’t work at first for me - I was doing it as my own user - it has to be in root’s. More explicitly:

mkdir -p /root/.local/usr/share/misc/
wget -0 /root/.local/usr/share/misc/enterprise-numbers.txt https://www.iana.org/assignments/enterprise-numbers.txt

I wish ipmitool would only emit that ‘error’ when the command requires that registry (OR TrueNAS would include it). As it stands, to share a script that uses ipmitool, one would have to explain all this and get people to install the registry, or festoon every ipmitool call with 2>/dev/null