High throughput low latency ACL setup on ZFS+NFS share

High throughput low latency ACL setup on ZFS+NFS share

Hello!

I request Experts help to achive high throughput low latency ACL setup on ZFS+NFS share.
Let me tell you my environment, setup and use case first:

I have 5 storage in my DMZ based on TrueNAS-SCALE (Drives: 4TB 870 Evo’s with raidz1 pools)

home : 23.10.2 # 10T
storage-1: 23.10.2 # 15T
storage-2: 23.10.2 # 65T
storage-3: 22.12.0 # 72T + 22T
storage-4: 22.12.1 # 105T

I have 80++ Linux Ubuntu 22.04 “10Gbit” clients and all the storages are mounted by NFSv4.2
My truenas servers and clients are connected to freeipa LDAP.

I have 50 users and they are using single or multiple clients at the same time and they are accessing to their same “home” and “userdata” folders at the same time (rwx). (I need fast, low-latency locking and sync enabled to avoid any corruption)

On all the training nodes (clients), processes are running all the time (Read operations most) + (Write operations as a result to the same similar folders but not to the same file)

So, I have constant, endless read operations “%80” and write operations “%20” from all the clients to the all the storages.

To manage user access I need permissions and basic linux permissions are not enough. Because of this I’m using ACL’s with “setfacl”
On each folder I have 3-5 different LDAP groups ACL config as “r-x” and “rwx” no others (750)

I don’t use extended attributes as hide, do not delete etc. I don’t need SMB attributes or Windows support. All of my clients are Linux.

My current configuration is very old “6 years” and I believe this configuration has Server side and Client side overheads because of:

  1. ZFS dataset config
  • aclmode = discard
  • aclinherit = passthrough
  • acltype = posix
  • xattr = on
  1. NFS Server config (v3 ownership model for v4)
    image

  2. Posix ACL setup on v4 protocol with v3 ACL style with

root@stest[/mnt/tank]# getfacl folder1
# file: folder1
# owner: root
# group: 29101
# flags: -s-
user::rwx
group::---
group:29000:rwx
group:29001:rwx
group:29100:r-x
group:29101:r-x
mask::rwx
other::---
default:user::rwx
default:group::---
default:group:29000:rwx
default:group:29001:rwx
default:group:29100:r-x
default:group:29101:r-x
default:mask::rwx
default:other::---

To have the best possible setup, I’m researching these key questions:

  1. ZFS “zfs-2.2.3-1” dataset configuration to have no ACL or Posix convertion. Minimum layer possible with ACL.
  2. For my use case, which nfs version is logical and provide minimum overhead “v3 or v4”
  3. What is the lowlatency and fastest ACL communication between ZFS + NFSv4 or v3
  4. Which ACL type is logical and faster for my use case “setfacl or nfs4_setfacl or nfs4xdr_setfacl”
  5. Instead of creating folders in dataset and setting ACL on the folder. How can I create zfs datasets and set ACL’s on the dataset directly.

Based on your advise I will build test bench and run benchmarks and share the results.
If you help me I believe this topic will be an awesome guide.

Thank you.

Unless you are dealing with very small files sizes, I would doubt ACLs are an issue when the data is stored on HDDs.

Making sure there is enough RAM for ARC is probably the much bigger issue. Having a SLOG is also useful for NFS if data integrity is required.

Hello @Captain_Morgan.

I always use enough RAM & NVME and all of my drives are SSD.

Ram: 256G++
Write cache: 2x 280GB Optane 900P
Read cache: 2x 1.6TB MZPLK1T6HCHP-00003 Nvme

But when I run “ls -l” it takes 2-3 seconds for basic list when I have ldap groups ownership and ACL’s.
I have heavy load and lots of files between 16K…1GB and the cluster is using for ai training. I’m just trying to understand all the overheads and how to improve.

So, is the issue that the ldap server is taking too long?

Is Nscd enabled and caching results… is the second ls-l faster?

We don’t use nscd for caching user / group info. User can probably upgrade to 24.10 if he’s not on it already. We’ve switch to SSSD for LDAP there and it has its own internal caching for users / groups that can help perf assuming that’s the relevant thing.

Ldap part is just out-of-topic but its nice to talk.

root@sd-storage[/mnt/sd-pool]# while true; do time ls -l > /dev/null; sleep 2; done
ls -l > /dev/null 0.02s user 0.03s system 0% cpu 13.480 total
ls -l > /dev/null 0.02s user 0.03s system 0% cpu 13.500 total
ls -l > /dev/null 0.00s user 0.04s system 0% cpu 13.478 total
ls -l > /dev/null 0.01s user 0.04s system 0% cpu 13.483 total
ls -l > /dev/null 0.00s user 0.04s system 0% cpu 13.500 total
ls -l > /dev/null 0.01s user 0.03s system 0% cpu 13.471 total

I think there is no nsswitch cache at all. To speed up ldap, I’m building a new ldap server on the storage network but cache must handle better as you said.

@awalkerix
It’s hard to upgrade 5 x production storage right now do you recommend any other tuning for 23.10 ?

Best tuning for 23.10 is to move to 24.10
We fix performance issues with new software.

Thanks for the ldap part @Captain_Morgan

But the ldap part is not related with this topic and it is not a problem other then cosmetics. Listing is barely used and I have real path write an read operations so it won’t effect me at all.

My main research on best possible low latency ACL setup for multiple NFS client access on same files.

There’s not really enough information here to give concrete recommendations or tuning advice. Honestly, performance tuning / help is beyond scope of my involvement in forums. My primary focus when I’m engaged here is to look for unreported software issues or give low-time-cost help for community members :slight_smile:.

If you have truenas hardware and a support contract, you could reach out to our support team and get some help / questions answered in a more direct way.

You probably need to do testing to figure out what your bottlenecks are and then see if newer version helps. If you have extra hardware you can test newer version. There was an NFS-related performance that went into 24.10.1.

1 Like