Hey all! Like header says I’m having some bizarre, sporadic, yet quite frustrating and intrusive issues with my newish TrueNAS setup. I’m new here and new to TrueNAS so apologies if I break or misunderstand conventions here. I can’t find anyone else on the internet with this problem, so I figured I would see if anyone here can help. This issue is pretty weird and I’ve done a bunch of my own testing, so there’s a lot to say, I’ll give as much detail as I can.
Preemptive TL;DR: Random Windows errors on batch file transfers, seems related to periodic speed drops to 0B/s but doesn’t do it every time, error code 0x8007003b, says it can’t transfer file, except the file exists but is corrupted somehow and is a major pain in the neck to delete and/or replace. About a 1-in-2 chance this happens during a batch [50-500] file transfer and then it’s ages of trying delete the corrupted file and start over. Basic troubleshooting didn’t work. Disabled windows firewall for private network and it didn’t help. Laptop connected to same SMB share didn’t help. No red flag apparent in Task Manager or TrueNAS dashboard. Not even sure what to tinker with next.
My Setup
Main Rig:
- Ryzen 5900X
- Asus TUF Gaming X570 Plus [Mark 1 version, I think they made an updated version] [No onboard Wi-Fi]
- Onboard 1Gb/s [See below for routing]
- 32GB Ram
- NVIDIA 3060 12GB
- PCIe Wi-Fi/Bluetooth card [only for Wi-Fi, NOT connected to NAS network]
- Windows 11 23H2 [24H2 apparently hasn’t rolled around to me yet, “Check for Updates” says “up to date”]
NAS
- Ryzen 5500GT w/ Radeon Graphics
- Asus Prime B550 Plus
- 32GB Ram
- TrueNAS CORE -13.0-U6.7 [Problem began on 13.0-U6.2, updated manually to U6.7 but problem persists]
- 1 ZFS Pool set up with SMB share for my editing rig, strictly following the TrueNAS guide. No plugins, VMs, or anything other than the storage pool and SMB configuration.
Network Routing:
Dedicated TP Link gigabit switch [ER605 V2], no internet connection for NAS, different link for my Main rig.
NAS Onboard Ethernet ↔ TP Link Router ↔ Main Rig Onboard Ethernet -|- PCIe Wi-Fi/Bluetooth Card ↔ Home Router ↔ Web
Background
I’m a freelance event videographer/video editor, so my job is mostly shooting and editing weddings. I’m reasonably techy [AKA I watch a lot of LTT and I’m the primary tech volunteer for church that’s almost big enough to hire me to do it], but I’m not “tinker with Linux” kinda tech savvy, and networking that goes beyond basic stuff is probably too much for me.
I’ve been working from home on my main rig forever. It’s built like a gaming machine except that it’s packed full of drives, one SATA SSD for my OS and two NVMe drives for editing, plus several high capacity hard drives managed by windows. I had finally filled up enough hard drives that I was sick of having them in my main rig so I finally built a NAS maybe a month ago and put TrueNAS Core on it.
Now I’m finally getting around to moving about 20+ TB of footage, project files, photos, etc. out of my main machine and into my NAS so I can finally pull those drives out of my windows machine. And that’s when I started running into problems:
The Problem
Periodically, but with no rhyme or reason I can discern, my transfer speeds drop from the gigabit saturated ~100-110 MB/s speeds all the way to 0B/s, and then shoot back up. That’s not too much of an issue for me, except when the dropout is significant enough that it throws up an error code [0x8007003b, which as far as I can tell is a 10 digit way to say “idk check you your network or something lol”, courtesy of Microsoft] and asks me what to do [Try again/Skip/Cancel]. That makes it hard to transfer overnight, and it’s causing some other problems too, namely that whatever file it is that is getting transferred when it gets interrupted basically becomes a parasite after that error comes up. Here’s an example:
If I transfer three clips–
Clip A.mp4 - 734 MB
Clip B.mp4 - 241 MB
Clip C.mp4 - 401 MB
–at the same time, and it throws up the error during Clip B, it doesn’t matter which of the three options I choose, Clip B will show up with the exact same file size as the original. So if I hit “Try Again”, it will actually upload Clip C next, and anything else in the list [potentially failing again on some other files along the way], and then at the end it will give me another dialogue box that tells me that Clip B already exists. Replacing is not an option [you’ll see why] so if I have it save a new copy instead, I’m left with:
Clip A.mp4 - 734 MB
Clip B.mp4 - 241 MB
Clip C.mp4 - 401 MB
Clip B(1).mp4 - 241 MB
And the biggest problem is that Clip B is now super corrupted or something. Trying to do anything with it is a nightmare. It won’t open in whatever player you have [I’ve had it happen to both MP4 and BRAW clips, so VLC and Blackmagic Raw Player have either frozen, crashed, or just not worked trying to open them respectively], right-clicking the file causes Explorer to lag for a few seconds, and clicking delete causes it to hang for like 2 minutes, after which it will finally give me the dialogue box to confirm deletion, then it gives me the file transfer graph and proceeds to do nothing [except file explorer will fail to respond once or twice during] and then, sometimes after literally 20+ minutes, it will throw up another error, now saying that the file can’t be deleted because it’s in use by another program.
That’s a big issue because I need my project files to point back to the same filename. I can’t have a broken Clip B reference interfere with a project file if I need to recall a project. I need to rename “Clip B(1).mp4” to exactly “Clip B.mp4” again, and that means I need the corrupted one with the original name to be gone. Besides, I certainly don’t want to store duplicates of files. Some of these are 100GB+ RAW files.
Troubleshooting so far
-
Task Manager shows no obvious red flags for what could be using the file. I’ve done this without trying to open it, so it’s not BRAW Player or VLC. Could it be some under-the-hood process of Explorer itself? Windows Defender? Heck if I know.
-
Same with the stats in the TrueNAS GUI; Ram usage is high compared to what I’m used to on windows but not full, and most of it is ZFS cache, not system services; CPU is barely doing anything; pool is only at 19% capacity; and the network transfer rate is basically nothing while trying to delete.
-
Restarting Explorer from TM frees it up from being frozen and laggy, but if I try again I get the same results.
-
Temporarily shutting off SMB from ui/sharing/SMB/edit from TrueNAS browser UI doesn’t stop the problem from happening either, even though I assumed that would force Windows to re-initiate with the NAS without a full restart.
-
Restarting my main rig allowed me to delete the file normally in 3/4 cases, no idea what was different that 4th time.
-
Restarting my NAS also allowed the file to be deleted normally in 1/1 cases.
-
Hooking up my Laptop [Asus ROG Zephyrus G15, Ryzen 5800H, onboard networking] to the same router [different cable] and transferring the files didn’t prevent disconnections from happening. When a disconnection happened during the test transfers from my laptop, the TrueNAS GUI Dashboard stats would hang up on my Main Rig. In one out of three cases, it actually booted the GUI back to the login page. I think this might narrow the dropouts down to router and/or NAS configuration and/or onboard Ethernet on the NAS end. The weirdness around these corrupted files after a connection has been reestablished still confuses me though.
-
Turning off Wi-Fi on both Main Rig and Laptop to rule out Windows weirdness about prioritizing one network. Even with only Ethernet connected, dropouts still occurred on both systems.
-
Having a file fail to transfer from one system and then deleting it from the other via SMB worked as intended, it seems deleting/altering is only a problem from the same system that failed to transfer the file.
-
Multiple forum posts that had the same windows error code reported that the fix was to disable Windows Defender’s Firewall. I disabled it for private networks [and verified that my Ethernet connection to my TP-Link router was set to private in Windows] but to no avail, I still suffered dropouts. Again, the fact that the GUI hangs on a separate machine when a file transfer dropout occurs makes me feel like it can’t be the transferring system that’s the issue.
-
I use Surfshark VPN on most of my devices. I wasn’t connected through it, but I eventually uninstalled it from my laptop entirely, and after a restart, still suffered dropouts that led to a corrupted file. Not it.
-
Discovered during testing that files report their size to File Explorer at their full size the moment they begin to transfer, regardless of whether or not they have finished transferring or not. If I start to transfer a single 28GB file on my laptop, it immediately looks like a full 28GB file on explorer even on the other system. This is unlike what I’m used to in Windows, with .part files and stuff. Is this a ZFS characteristic? It makes it more difficult to identify files that failed to transfer properly, I don’t like this behavior.
Conclusion and plea
Basically I’m completely lost. I have no idea what I can even test now. I’ve spent a day and a half on this and gotten nowhere. I don’t normally ask for help on forums because I like to figure things out on my own and benefit from the experience, but I have real work to do and eager newlyweds to please, and I don’t know what else to do beyond what I’ve already done.
It’s not like I have a second NAS I can test, nor can I even test on my home router [it’s a Starlink because I can only get DSL to my house, and it has literally no accessible wired connections and it’s two floors below my other systems]. So are there some wizards out there that can tell me that I just configured something wrong? Do I just need to order some dedicated NICs or something? Any help or ideas are appreciated.
I either need to:
- Fix the dropouts issue and thereby prevent the corruption issue from ever happening
Or at least
- Find out how to deal with the corruption issue without having to hook in a different system or restart the Main Rig or NAS.
Does anybody have any ideas?