SMB mangles long file names to DOS8.3! Why? Have I fixed it?

My file server, running TrueNAS CORE 13.0-U6.2 and delivering files to macOS clients via SMB, was mangling a number of filenames to DOS 8.3 format when presenting them to macOS clients. When I listed the same names on the same server connected via AFP, the names were not mangled. Likewise, when I listed the same names directly from the TrueNAS shell, the names were not mangled.

Taking a hint from another post, I modified the configuration of the SMB share. In TrueNAS admin UI, tab “Sharing” > “Windows Shares (SMB)”, I selected the share in question, from its 3-dot menu selected “Edit”, clicked the “Advanced Options” button, then checked the box labelled “Use Apple-style Character Encoding”. This was successful; the TrueNAS server started displaying full filenames instead mangled 8.3 filenames to the macOS clients.

So this seems to work. OK, why? What I would really like is some explanation that the behaviour I saw was expected Samba behaviour, and that the TrueNAS checkbox definitively stops Samba from serving my macs 8.3 filenames.

The help for the “Use Apple-style Character Encoding” checkbox says, “By default, Samba uses a hashing algorithm for NTFS illegal characters. Enabling this option translates NTFS illegal characters to the Unicode private range.” The TrueNAS documentation for this option does not add anything more.

But the filenames which got presented as 8.3 on macOS did not include NTFS illegal characters. Most were moderately long, 50-100 characters. Many had ASCII quote marks. Some had non-ASCII accented Latin script letters. I don’t see how illegal character handling would be relevant.

There is Samba documentation for smbd.conf options controlling name mangling. They include mangled names and preserve case. Does the TrueNAS option set some of these smbd.conf options?

Is there source code implementing this TrueNAS checkbox which I can examine? I took a look at the TrueNAS repos in GitHub and did not find a clear answer.

Thanks for you help! —Jim

There are some characters (that don’t have SFU equivalencies) and strings that are illegal over the SMB protocol and still trigger the mangling algorithm.

Thank you for the lead.

Which characters and strings trigger the mangling algorithm? Could you please point me to the documentation for this? I have not yet found it.

Part of what puzzles me is that I was able to stop the mangling by changing the file name on the server, from the macOS client, from the mangled name to the original name (matching the name on the server’s file system). If characters in the file name were illegal and triggered mangling, surely they would be illegal if imposed from the client also?

Also, what do you mean by “SFU”? I tried doing a web search for it, but all my results were a certain local Canadian university. I couldn’t get past that to a Samba-related definition.

I appreciate the help.

After a bit more searching, perhaps you mean “Windows Services for UNIX (SFU)”, which, says WIkipedia, is “a discontinued software package produced by Microsoft which provided a Unix environment on Windows NT…”?

It apparently included a way to map filename characters permitted on Linux but forbidden on NTFS. I see a mention of SFU mapping in what looks like LinuxCIFSKernel 3.18 release notes. I don’t know if that is relevant to the Samba code which TrueNAS CORE uses, however.

I would still appreciate pointers to documentation which will give me the full story.

2 Likes

Thank you! The Naming Conventions section in this document explains the Windows file name rules well. For instance, it says that double quote (") characters are forbidden in file names. Some of my mangled names had double quotes.

Now to figure out the Samba file naming rules! And what TrueNAS CORE really means with the “Use Apple-style Character Encoding” checkbox.

I have found some more fragments of information.

The Samba source code for samba/source3/modules/vfs_fruit.c, lines 55-58
has this comment:

 * The OS X client maps all NTFS illegal characters to the Unicode
 * private range. This module optionally stores the characters using
 * their native ASCII encoding using vfs_catia. If you're not enabling
 * this feature, you can skip catia from vfs modules.

The code uses a data structure macos_string_replace_map. This appears to be a mapping from characters in filenames to alternative characters for Samba to use in the underlying storage. I think it is defined in samba/source3/lib/string_replace.c, lines 182-192:

const char *macos_string_replace_map =
	"0x01:0xf001,0x02:0xf002,0x03:0xf003,0x04:0xf004,"
	"0x05:0xf005,0x06:0xf006,0x07:0xf007,0x08:0xf008,"
	"0x09:0xf009,0x0a:0xf00a,0x0b:0xf00b,0x0c:0xf00c,"
	"0x0d:0xf00d,0x0e:0xf00e,0x0f:0xf00f,0x10:0xf010,"
	"0x11:0xf011,0x12:0xf012,0x13:0xf013,0x14:0xf014,"
	"0x15:0xf015,0x16:0xf016,0x17:0xf017,0x18:0xf018,"
	"0x19:0xf019,0x1a:0xf01a,0x1b:0xf01b,0x1c:0xf01c,"
	"0x1d:0xf01d,0x1e:0xf01e,0x1f:0xf01f,"
	"0x22:0xf020,0x2a:0xf021,0x3a:0xf022,0x3c:0xf023,"
	"0x3e:0xf024,0x3f:0xf025,0x5c:0xf026,0x7c:0xf027";

I read this to mean that a character with the hexadecimal value before the colon is replaced by the character with the hexadecimal value after the colon, and that there is a list of these replacements separated by commas. Thus a double quote character ("), with character code 0x22, is replaced by a character with code 0xF020, and that modified file name is what Samba writes to the TrueNAS server’s file system.

And indeed,

I looked at this file from the TrueNAS server’s shell. The filename on macOS had a double-quote characters. On the server’s file system, those characters were instead stored as three bytes, 0xEF 80 A0. That is the UTF-8 encoding of 0xF020.

So, is vfs_fruit active in the Samba code deployed as part of TrueNAS CORE? How can I find out?

Another fragment: a man page for the vfs_catia module mentions a parameter catia:mappings, which has a similar format to macos_string_replace_map.

Is vfs_catia active in the Samba code deployed as part of TrueNAS CORE? How can I find out?

A third fragment: Samba source code also has a name mangling module samba/source3/smbd/mangle_hash.c. This includes code to accept or reject various characters, as well as special names like AUX. However, if I am reading it right, I don’t see that it rejects double-quote characters or parentheses.

These fragments don’t quite answer my TrueNAS CORE administration questions. What, concretely, in my existing filenames caused the SMB sharing service to deliver mangled filenames to my clients? Can I avoid having 0xF020 characters in filenames on my TrueNAS filesystem, and have the native 0x22 characters in their place? How?

I am perhaps not the first person with these questions. If anyone knows where the answers are written down, I would appreciate a pointer. Thanks!

We already do catia and fruit automatically. There a checkbox to do the services-for-mac mappings via catia. It doesn’t fix every possible problem with naming files via MacOS. At the end of the day, the onus is on the administrator to understand issues of cross-platform compatibility and implement sensible naming standards for directories and files. Basically, if you have enabled the character translations and are still being manged, it is because the filename must be manged (otherwise Windows and potentially other clients won’t be able to read it). Doing otherwise creates problems for a future admin that are basically impossible to diagnose.

OK. More links, but not yet a complete chain of understanding.

By “there is a checkbox”, do you mean the “Use Apple-style Character Encoding” checkbox in the dialogue which appears in the TrueNAS admin UI, tab “Sharing” > “Windows Shares (SMB)” > select the share in question > from its 3-dot menu select “Edit” > the “Advanced Options” button?

Is there documentation which explains exactly which Samba features are turned on as a result of this checkbox? Or, could someone perhaps point me to the relevant part of the TrueNAS CORE code, so that I could look it up?

Where can I see what Samba features are actually enabled in my instance of Samba? I found /usr/local/etc/smb4.conf. It mentions a few things relevant to my instance, such as the server string of my server, it does not mention catia, or the file-name related configuration names of fruit. Is there someplace else I should look?

“Use Apple-style Character Encoding”

This enables catia with the MS-SFM mappings, which operate independently of whether vfs_fruit is in play. There’s a global checkbox to enable SMB2/3 apple protocol extensions. This enables fruit globally. testparm -s shows your configuration (which you shouldn’t be touching via shell).

Generally speaking, the exact samba features getting toggled by particular checkboxes are mostly irrelevant for end-users (you shouldn’t be required to have this degree of backend knowledge).

1 Like

Thank you for explaining that “Use Apple-style Character Encoding” “enables catia with the MS-SFM mappings”. Now I need to find out what exactly the “MS-SFM mappings” are.

And thank you for the testparm -s tip! I did not know about testparm, and its output is illuminating.

1 Like

I applaud that sentiment. However, I am not having luck answering my original question without that degree of backend knowledge. Why were my filenames mangled? Why did checking the box labelled “Use Apple-style Character Encoding” fix it, except for one case? If there were higher level docs suitable for users or admins which explained that, I would have been satisfied.