Active Directory corruption

So this system has been on Cobia for awhile and would see occasional ACL corruption eg (it shows up as just a number in the ACL editor and with a warning icon in the field).

After updating to Dragonfish it mostly worked but I lost my AD configuration entirely and had to set that back up exactly the same as before. And now, after it had been running that way for a day or so, ACL corruption happens on all the shares same as I was having on Cobia.

AD server is windows server 2016 and 2022 (a VM) (we tend to upgrade these stair step very few years).

We have almost entirely windows clients on windows 10 and 11, and a couple servers. We use backupchain and azure cloud backups within truenas itself. The configuration for that appears unaffected. The System itself is a AM4 motherboard + 5950x + 128GB ECC UDIMMS, a 2TB NVMe mirror, and a 4 wide raidz2 of SATA HDDs. The ACL corruption occurred on both vdevs and multiple pools.

That’s not corruption of ACL. Seeing a UID or GID simply means that we could not resolve those into AD SIDs. This may mean that the user / group was deleted from AD. It also may mean that there was a local user / group that has since been deleted.

There have been no local groups created.
And the AD SIDs were not deleted.

So yes there is some kind of corruption going on… in the past I go in delete the “corrupted” entry and remap it back to the user or group that was previously there. No changes on the AD side of things at all… .its been running with only a handful of new groups added in the past decade and none removed.

The presets I had previously saved also do not map to AD correctly anymore.

When I saw this get messed up on Cobia… the presets still worked fine.

No. It’s not corruption (the term has a specific and useful meaning that is not relevant here). As mentioned, it’s winbindd failing to resolve a unix UID / GID to a SID. There are various reasons for this to happen. I gave two likely scenarios.

That doesn’t make any sense, the only thing that has occurred here is an upgrade from Cobia to Dragonfly without any changes to any of the AD users or groups.

Why would winbind fail if everything is still valid… both of the situations you described are DEFINITELY not the case here.

I’d have to see specific server configuration (testparm -s output) details and log messages to precisely say why those particular accounts didn’t resolve. If the connection to AD is flapping regularly then we we’ll have erratic results for non-cached entries.

If the actual mappings are changing then that either indicates that the results are for an improperly configured trusted domain or that our idmap settings have been changed since you first joined AD.

image
and

It looks like the id map did change I think?

Nope. You’re looking at wrong idmap config. Find the one for your domain.

That is the one that appears under [global] and workgroup = mydomain for that entry REALM = mydomain.com (obviously not my real domain name)

None of the other entries from testparm -s say anything about idmap or domain

Please send me a debug file via private message.

It appears I do not yet have PM rights on this discourse instance.

You can email to me at awalker@ixsystems.com

I sent you that email a bit ago. Thanks