TruneNas Scale 24.04.0 scrub is taking all the resources

Oh, you may be able to see the speed the Thunderbolt 3 controller is running at, using lspci. I don’t know the exact options other than “-vvv -s PCIe_device”. But one of my NVMe drives has this for speed;

LnkSta:	 Speed 8GT/s, Width x4

This shows that NVMe can do 32Gbits/ps maximum transfer speed, (not including flash actions).

There are no claims of Thunderbolt support from TrueNAS. There has been no development or testing.

It probably works because the Linux kernel has some support for it. It may not work, because it has never been tested as part of a system. In general, nothing works unless it has been tested and fixed. To expect otherwise is irrational.

So, its an experiment. We generally recommend these experiments be done with developer level skills. It would be useful to get testing and proposals on how to fix the issues found.

2 Likes

Here is the layout

root@truenas[~]# zpool status tank
  pool: tank
 state: ONLINE
  scan: scrub repaired 76K in 2 days 08:34:41 with 0 errors on Tue Jul 30 08:36:24 2024
config:

	NAME                                      STATE     READ WRITE CKSUM
	tank                                      ONLINE       0     0     0
	  raidz1-0                                ONLINE       0     0     0
	    c4bc0d6d-7d73-4526-b7d9-6bac63b9a819  ONLINE       0     0     0
	    2594b2a3-adbf-4d16-8360-a490bf0f2d78  ONLINE       0     0     0
	    236e35e0-b520-42db-bd66-a2bc387f7490  ONLINE       0     0     0
	    e61e3680-623c-4529-9b42-be73fc0a30c2  ONLINE       0     0     0
	    f1fa006e-4c8d-4852-8d4e-9c8ab33d15be  ONLINE       0     0     0
	    52e50ba5-ad80-4e46-8be0-2e7781673393  ONLINE       0     0     0
	    d47992a2-bc75-4f2c-a5ea-b7831854f869  ONLINE       0     0     0
	    f7811e87-49a8-42f7-b182-53fe6daf4059  ONLINE       0     0     0
	  raidz1-2                                ONLINE       0     0     0
	    e14907d2-60b5-4499-b3c8-c08a11763af7  ONLINE       0     0     0
	    dc89d06a-2365-429f-811f-1745f1b5e388  ONLINE       0     0     0
	    20681199-ab14-4aa8-aa3d-ff289d108ba0  ONLINE       0     0     0
	    1bc37bdb-a60f-4c62-a930-7b5d57a6c47f  ONLINE       0     0     0
	    92735d96-bb73-48e6-b36c-d0dbdacc67f0  ONLINE       0     0     0
	    cad717ef-6ed8-4551-94c5-3f9d285be605  ONLINE       0     0     0
	    19b442ae-8313-4a26-9b65-61cb25e7601b  ONLINE       0     0     0
	    f10c4ab4-85dd-4877-9bdf-6e39ba768c2e  ONLINE       0     0     0
	  raidz1-3                                ONLINE       0     0     0
	    4c184508-5ddc-4d19-a4f6-0b9f6de71a7e  ONLINE       0     0     0
	    ef0e54af-d078-4bb1-9946-378bae948642  ONLINE       0     0     0
	    acabaf6e-672e-4329-9281-6df3fd334d43  ONLINE       0     0     0
	    ff50181d-58aa-4c01-826d-e062ec125b84  ONLINE       0     0     0
	    adc0c5f6-4875-4d7b-aa82-b64db7415e43  ONLINE       0     0     0
	    cb7b5f3e-a18c-4d64-b588-07785ce1903f  ONLINE       0     0     0
	    606ad78a-6728-40e0-9e1e-16c6dda7da41  ONLINE       0     0     0
	    801a6f27-a790-49a5-a355-e984e2d7fabd  ONLINE       0     0     0
	  raidz1-6                                ONLINE       0     0     0
	    a4a38347-b6d3-4e8b-80fd-60db0edb7a04  ONLINE       0     0     0
	    34317f56-1a89-4203-9573-ffa93bbc61b4  ONLINE       0     0     0
	    777fa9c0-b266-4dc5-b573-c68be9a0f55f  ONLINE       0     0     0
	    a44188c0-7938-436e-a74f-8456f6a80e8d  ONLINE       0     0     0
	    88a4c08a-1d57-4ede-b224-36d3919428bd  ONLINE       0     0     0
	    cdb84b2a-0e4c-4ba7-9d43-26dc1b92ad67  ONLINE       0     0     0
	    1cd3b26f-5ee3-4d81-ae13-895472326ed4  ONLINE       0     0     0
	    fdde0f25-5597-41c5-bff1-12e8601b40b1  ONLINE       0     0     0
	logs	
	  d75ee0e6-d4d2-4f6e-b7d7-712bd4ba84ea    ONLINE       0     0     0

errors: No known data errors

root@truenas[~]# lspci -vvv -s 4d:00.0    
4d:00.0 PCI bridge: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge DD 2018] (rev 06) (prog-if 00 [Normal decode])
	Subsystem: Device 1c7a:de78
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 19
	IOMMU group: 45
	Bus: primary=4d, secondary=4e, subordinate=71, sec-latency=0
	I/O behind bridge: 0000f000-00000fff [disabled] [32-bit]
	Memory behind bridge: 5b900000-71ffffff [size=359M] [32-bit]
	Prefetchable memory behind bridge: 6025300000-604a0fffff [size=590M] [32-bit]
	Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
	BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16- MAbort- >Reset- FastB2B-
		PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
	Capabilities: [80] Power Management version 3
		Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA PME(D0+,D1+,D2+,D3hot+,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [88] MSI: Enable- Count=1/1 Maskable- 64bit+
		Address: 0000000000000000  Data: 0000
	Capabilities: [ac] Subsystem: Device 1c7a:de78
	Capabilities: [c0] Express (v2) Upstream Port, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ SlotPowerLimit 0W
		DevCtl:	CorrErr- NonFatalErr- FatalErr- UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
			MaxPayload 128 bytes, MaxReadReq 512 bytes
		DevSta:	CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
		LnkCap:	Port #3, Speed 8GT/s, Width x4, ASPM L1, Exit Latency L1 <2us
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; Disabled- CommClk-
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 2.5GT/s (downgraded), Width x4
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis- NROPrPrP- LTR+
			 10BitTagComp- 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS-
			 AtomicOpsCap: Routing-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled,
			 AtomicOpsCtl: EgressBlck-
		LnkCap2: Supported Link Speeds: 2.5-8GT/s, Crosslink- Retimer- 2Retimers- DRS-
		LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
		LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1-
			 EqualizationPhase2- EqualizationPhase3- LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [50] Capability ID 0x15 [0000]
	Capabilities: [100 v1] Device Serial Number 62-de-68-80-d1-e9-07-00
	Capabilities: [200 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP+ SDES- TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
		CEMsk:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		AERCap:	First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [300 v1] Virtual Channel
		Caps:	LPEVC=0 RefClk=100ns PATEntryBits=1
		Arb:	Fixed- WRR32- WRR64- WRR128-
		Ctrl:	ArbSelect=Fixed
		Status:	InProgress-
		VC0:	Caps:	PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
			Arb:	Fixed+ WRR32- WRR64- WRR128- TWRR128- WRR256-
			Ctrl:	Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
			Status:	NegoPending- InProgress-
	Capabilities: [400 v1] Power Budgeting <?>
	Capabilities: [500 v1] Vendor Specific Information: ID=1234 Rev=1 Len=100 <?>
	Capabilities: [600 v1] Vendor Specific Information: ID=8086 Rev=2 Len=04c <?>
	Capabilities: [700 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [800 v1] Latency Tolerance Reporting
		Max snoop latency: 3145728ns
		Max no snoop latency: 3145728ns
	Capabilities: [a00 v1] L1 PM Substates
		L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
			  PortCommonModeRestoreTime=0us PortTPowerOnTime=10us
		L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1-
			   T_CommonMode=0us LTR1.2_Threshold=20480ns
		L1SubCtl2: T_PwrOn=10us
	Capabilities: [b00 v1] Precision Time Measurement
		PTMCap: Requester:+ Responder:+ Root:-
		PTMClockGranularity: Unimplemented
		PTMControl: Enabled:- RootSelected:-
		PTMEffectiveGranularity: Unknown
	Kernel driver in use: pcieport

This one OWC Thunderbolt (USB-C) Cables

Yes and yes - I took the risk knowingly well, hence my request, trying to move away for power hungry and noisy Supermicro servers, nothing was scalable till the OWC came up with very nice TB3 enclosures and anything I can do - test or sponsor - I am happy to participate.

@shwet - Your pool layout looks fine. (Well, except for the 76KBytes of corrected bit rot… that can be problematic for larger disks in RAID-Z1.)

On rare occasions here in the forums we see users with far too many devices in a single RAID-Zx vDev. That absolutely can impact performance. But, you are good in that regard.

This seems to indicate that your Thunderbolt is running slower than it can. I am no expert on either performance or Thunderbolt. However, with PCIe version 1 speeds, (2.5GT/s), that really impacts overall speed. You might see if you can get a better cable. Someone mentioned active cables, again I don’t know if that will make a difference.

With 32 disks and about 38.4Gbits/ps desired speed, having only 10Gbits/ps would point to over-subscription of your Thunderbolt connection.

Now, if it were 4 separate Thunderbolt host connections, instead of daisy chained, that would be different.

Good luck.

2 Likes