High throughput low latency ACL setup on ZFS+NFS share
Hello!
I request Experts help to achive high throughput low latency ACL setup on ZFS+NFS share.
Let me tell you my environment, setup and use case first:
I have 5 storage in my DMZ based on TrueNAS-SCALE (Drives: 4TB 870 Evo’s with raidz1 pools)
home : 23.10.2 # 10T
storage-1: 23.10.2 # 15T
storage-2: 23.10.2 # 65T
storage-3: 22.12.0 # 72T + 22T
storage-4: 22.12.1 # 105T
I have 80++ Linux Ubuntu 22.04 “10Gbit” clients and all the storages are mounted by NFSv4.2
My truenas servers and clients are connected to freeipa LDAP.
I have 50 users and they are using single or multiple clients at the same time and they are accessing to their same “home” and “userdata” folders at the same time (rwx). (I need fast, low-latency locking and sync enabled to avoid any corruption)
On all the training nodes (clients), processes are running all the time (Read operations most) + (Write operations as a result to the same similar folders but not to the same file)
So, I have constant, endless read operations “%80” and write operations “%20” from all the clients to the all the storages.
To manage user access I need permissions and basic linux permissions are not enough. Because of this I’m using ACL’s with “setfacl”
On each folder I have 3-5 different LDAP groups ACL config as “r-x” and “rwx” no others (750)
I don’t use extended attributes as hide, do not delete etc. I don’t need SMB attributes or Windows support. All of my clients are Linux.
My current configuration is very old “6 years” and I believe this configuration has Server side and Client side overheads because of:
- ZFS dataset config
- aclmode = discard
- aclinherit = passthrough
- acltype = posix
- xattr = on
-
NFS Server config (v3 ownership model for v4)
-
Posix ACL setup on v4 protocol with v3 ACL style with
root@stest[/mnt/tank]# getfacl folder1
# file: folder1
# owner: root
# group: 29101
# flags: -s-
user::rwx
group::---
group:29000:rwx
group:29001:rwx
group:29100:r-x
group:29101:r-x
mask::rwx
other::---
default:user::rwx
default:group::---
default:group:29000:rwx
default:group:29001:rwx
default:group:29100:r-x
default:group:29101:r-x
default:mask::rwx
default:other::---
To have the best possible setup, I’m researching these key questions:
- ZFS “zfs-2.2.3-1” dataset configuration to have no ACL or Posix convertion. Minimum layer possible with ACL.
- For my use case, which nfs version is logical and provide minimum overhead “v3 or v4”
- What is the lowlatency and fastest ACL communication between ZFS + NFSv4 or v3
- Which ACL type is logical and faster for my use case “setfacl or nfs4_setfacl or nfs4xdr_setfacl”
- Instead of creating folders in dataset and setting ACL on the folder. How can I create zfs datasets and set ACL’s on the dataset directly.
Based on your advise I will build test bench and run benchmarks and share the results.
If you help me I believe this topic will be an awesome guide.
Thank you.