SCALE and ESXi iSCSI Dead Status

scharbag · June 26, 2025, 5:42am

So my battle with SCALE has hit another snag. I have a test system that is using the hardware from my second ESXi host (running 1 host currently). I have gutted this system and all it has are a pair of M.2 boot disks and 2 SAS controllers. I put 2 SSDs in there just to be able to make a test pool.

As virtualization is borked on 25.04.01, I thought I would do some testing of other things. One such thing was iSCSI. After restoring my CORE config to SCALE, and re-arranging all of the networking to work properly, I tried to setup a new iSCSI share to test if it worked as expected. I have used the wizard and I have done it manually and I keep getting a Dead status:

I added the dynamic Portal connection in ESXi:

and the static links show up:

(I renamed the portal during testing - made no difference)

On the SCALE side, I have a simple setup:

(yes, it is nested in a datastore - tried it this way and in the root and it made no difference - it is a zvol)

You can see in TrueNAS that the ESXi system is connecting to the target:

but in ESXi it is showing as dead:

But TrueNAS things things are swell…

I am so stumped currently. I have rebooted SCALE (a number of times) and tried over and over with no luck. The iSCSI networks can ping between ESXi/SCALE of course. 2 NIC ports, all 10G through a 3850X. Each subnet is on a separate VLAN. This setup works fine with CORE:

and here is SCALE’s network:

Finally, here is what SCALE is reporting:

So, yeah, I am stumped. I really cannot reboot ESXi right now without an outage. May punt this SCALE back to default settings and start over - see if there is something coming from the CORE restore.

Any help at all would be FANTASTIC - I sure hope that I did not make some rookie mistake here…

Cheers,

NickF1227 · June 26, 2025, 6:16am

So I have seen some weird edge cases trigger bugs in older VMWare versions. All Paths Down is triggered when the STORAGE is rebooted in like a non TrueNAS Enterprise HA context. If it happens longer than 140 seconds, an APD is triggered in VMWare. This can happen accidentally during maintenance because of the wrong order of operations as an example. VMWare should be able to handle that either manually or automatically, but I have seen cases where in certain 6.x versions this does not work. The only solution I have ever found is this. I never ran into it personally on 7…And I have not used 8 in any meaningful capacity. Their docs actually scope 7 and 8 tho.

All affected ESXi hosts may require a reboot to remove any residual references to the affected devices that are in an APD state.

https://knowledge.broadcom.com/external/article/318850/all-paths-down-for-a-storage-device.html

I don’t think your issue is related but this is interesting because your problem sounds like it’s the opposite problem?

The vmk log files might be more interesting I think. Can you share those @scharbag

scharbag · June 26, 2025, 6:28am

I am running ESXi 7.0.3…

All my CORE paths are fine. Just new SCALE paths are being stupid. Set back to default and now ESXi will not even find a target on the SCALE machine. Research suggests I need an ESXi reboot but not right now… I am tired and crabby.

This has eaten my lunch for sure. booo.

But thanks for the reply and I will look into this when I can.

Cheers,

Topic		Replies	Views
TrueNAS SCALE system random shutdowns TrueNAS General SCALE	10	477	April 16, 2025
TrueNAS VM - Flapping SMB transfer speed between 1Gbit/s and zero TrueNAS General SCALE , SMB , macOS-Client , TrueNAS_as_VM	28	453	April 29, 2024
ESXi 8 U2, nfs4 - Core works fine but Scale is causing issues TrueNAS General CORE , SCALE , NFS , TrueNAS_as_VM	61	2026	April 27, 2026
QNAP TS-251+ upgraded to scale 24.04, drops off the network a while after boot TrueNAS General CORE , SCALE , Networking	2	652	June 6, 2024
VM Network issues TrueNAS General SCALE	19	874	May 16, 2024

SCALE and ESXi iSCSI Dead Status

Related topics