This is also a part of my confusion to make right decissions regarding this topic because we receive external date often with special characters so it would be necessary to run convmv every day on all my data ;-( ?
I have many macOS and Windows clients at my site. When I created my zfs pool I set the parameter of UTF8only=on, and normalization=formD.
I am unsure of what your question is? Files with umlauts are valid and are part of the UTF-8 specification. I have many SMB users that create files with umlauts and haven’t experienced an issue. We are running the latest versions of macOS and TrueNAS 24-4.
Are you confusing non-ascii characters (special characters) and normalization (formD vs formC)?
I also have to deal with windows and mac clients here using default zfs pool settings and smb datasets for the smb shares on core and right now but just usually my windows and mac (vica versa) users can also create and operate with special characters in files and folders eg. umlauts äü etc. Especially since the most recent version of Macos 14.6.1 (23G93) I was unable to reproduce issue between Mac and Windows so maybe they fixed it finnaly ?
Still I have two challenges on my table, the first are old data, not all but a lot which are created before our first Mac client was present, no mac user can operate with them (rename, copy, delete not even view) but windows users can. For example if a windows user rename such a problematic file containing no eg. umlaute, the issue is also gone for mac users.
Exactly the behaviour as described here:
I thought ok, lets convmv -r -f UTF-8 -t UTF-8 --nfc the problematic files and the issue is gone which seems to work but back to my first post here I realized that it will also, not always rename the path itself which has a huge imapact for production data indeed and I dont get why convmv will also convert / rename folders and which strategy is the best to deal with that to avoid disruption of productive date workflow and that the users wont hate me
Secondly regarding new data which are problematic for Mac users, eg. external data or one exotic example I was able to reproduce: a user received via mail a pdf (with umlaute) safed to the smb share and windows and mac can deal with it further an ipad connects via smb to this pdf via pdf expert and eg. makes a draw and safe it back to smb and from this point on no mac user can operate with that file anymore not even the ipad, again just windows can and if windows renames it without special chars its back to live for mac users as well or I use convmv -r -f UTF-8 -t UTF-8 --nfc and its fine as well.
For the second issue the zfs pool normalization setting could help, I need to test that.
But for the first I fear I have to use convmv and the users have to deal with it ?
My challenge is to find and to decide the best strategy….
Lastly, even if we hate Mac I wonder why there is no documentation or knowledge base regarding this because it is relevant and I would love to read a statement from IX.
Second, decide how you’re going to stop this from happening any longer. Maybe create a new, temporary dataset with your new normalization settings applied? Copy some of the ‘offending’ files to it and test with a mac and a windows box. Once you know you’ve got a solid solution, apply it to the rest of your zfs pools.
Third, maybe write a bash or perl script to identify the ‘offending’ files on your pool with the output saved to a text file or something. It’s unclear how many files we’re talking about here, but for me, I would do something non-destructive in finding these files.
My zfs pool was created back in 2015 with the ‘zpool create -O utf8only=on -O normalization=formD…’ options enabled. I did this specifically and deliberately. We are primarily a macos shop, with a few Windows users. We’ve got quite a few files with umlauts, accent marks, etc., and haven’t any problems with files in the way that you’ve seen. macOS has always been ‘special’ with standards and interoperability with other OSes. Thanks Apple.
I decided to convert my “problem” files via convmv seem to be the most pragmatic way to deal with it.
Further I decided it is not my way to leave the appliance idea to create pools and or datesets customly to force it to utf8only and nfd or whatever without knowing the impact for other fuctions of the system properly.
Also I still did not know if I could create just datasets (smb) with these settings applied but not the whole pool.
I simply do not have time for this to investigate and to test, especially if I run an appliance in production.
Strategies of preventing these issues in the future is kinda unsolved here but I will dirty solve it via a cron task convmv.
But I must admit that I miss the lack of documentation and interest and support for an appliance like this an regarding that big topic but maybe it is not for you > IX