Hey, is it normal that scrubbing makes my NVMe drives much hotter compared to even high-load normal use? On a typical day, the temperatures don’t go above 55°C, but during scrubbing, the disks can reach up to 70°C (which is probably their factory limit). I’m using three Lexar 790 drives in RAID 3. I understand that that scrubbing is read intense task but difference is huge, Is thermal throttling risk for scrubbing/disk/data integrity?
NVMe drives run hot in general. Scrubs are nonstop reads. That’s normal to see those temps.
I have a heatsink on my NVMes and it has helped keep the temperatures from climbing too high.
I have the same issue with my SN850 drives.
Wow! There’s no such thing is ZFS, and actual RAID3 has not been seen in the wild for quite some time (it was already considered obsolete and superseded by RAID5 thirty years ago), although Unraid does something quite similar to it.
Check what you really have, and whether it is appropriate for your use case.
Yes there is. This is the early alpha release of AnyRaid™ RAID3.
With 3 NVMe drives, if @szymon loses 3 of them, their data is still safe. It only becomes a problem if you lose 4 drives in your 3-drive vdev.
Yeah. it was mistake, but there is raidz - 3, it depends on your failure tolerance. Im actualy using RAIDZ in 3 drives vdev, so 1 drive for parity.
Buy a few m.2 heatsinks. The slim ones are easier to fit if your drives are near other components, such as a GPU, CPU, or RAM. The thicker ones will keep your temps lower and do better at spreading heat.
I only ever use heatsinks with my NVMe drives, and I recommend everyone do the same, if their chassis has enough room.
You don’t need the ones that have small fans. A passive heatsink works just fine, especially if you have proper airflow in the case.
Nope, not a factory limit, but rather a warranty limit. They can get hotter, I’m fairly certain of that. If you hit 71C, your warranty is toast. (see what I did there). Some drives like mine can operate at
RAID3, I had to look that up.
To solve your problem you will either need to improve airflow, add a heat sink to each NVMe, or both.
To see your NVMe temperature data, try: smartctl -x /dev/nvme0
and look for Warning Comp. Temp. Threshold and Critical as well. These are the absolute maximum values you can have. Now look for the same thing but for Temperature Time: to see how long you were out of spec. If that does not work, try nvme smart-log /dev/nvme0
and then you can look for “temperature” and “Thermal” for the values. You may be able to find out what your maximum temperature was for each device.
If you cannot keep those drives cool, you are in for trouble down the road, sooner than you think.
Scrub is the most throughput-intensive workload ZFS can run. It is highly parallel and optimized. So not surprising.
I get 0. Is this normal for SN850 or could there be something wrong?
That’s good. It means your NVMe never exceeded and sustained temperatures over the “warning” or “critical” levels.
EDIT: Where do you see 0
? I assumed you were looking at Warning Comp. Temperature Time
.
You should see 4 values if you run this:
smartctl -x /dev/nvme0 | grep "Comp\. Temp"
You should be running smartmontools version 7.4 or better, or use the nvme command to talk properly with the NVMe drive. I am using version 7.5 (as seen in my example) since it was released in April 2025.
There are always some inconsistencies between manufacturers but a “0” value is likely to be okay in this situation. Here is one of my outputs for comparison.
root@truenas:~# smartctl -x /dev/nvme0
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: Nextorage SSD NEM-PA4TB
Serial Number: 5112308181500000
Firmware Version: EIFS51.3
PCI Vendor/Subsystem ID: 0x1f31
IEEE OUI Identifier: 0x7cef40
Total NVM Capacity: 4,000,787,030,016 [4.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 1
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 4,000,787,030,016 [4.00 TB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 7cef40 813a300063
Local Time is: Sat Jun 7 08:36:08 2025 EDT
Firmware Updates (0x12): 1 Slot, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d): Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0c): Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size: 512 Pages
Warning Comp. Temp. Threshold: 84 Celsius
Critical Comp. Temp. Threshold: 89 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 8.80W - - 0 0 0 0 0 0
1 + 7.10W - - 1 1 1 1 0 0
2 + 5.20W - - 2 2 2 2 0 0
3 - 0.0620W - - 3 3 3 3 2500 7500
4 - 0.0440W - - 4 4 4 4 10500 65000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 2
1 - 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning: 0x00
Temperature: 27 Celsius
Available Spare: 100%
Available Spare Threshold: 5%
Percentage Used: 0%
Data Units Read: 253,223,547 [129 TB]
Data Units Written: 16,420,340 [8.40 TB]
Host Read Commands: 624,362,026
Host Write Commands: 193,180,514
Controller Busy Time: 1,392
Power Cycles: 293
Power On Hours: 5,376
Unsafe Shutdowns: 124
Media and Data Integrity Errors: 0
Error Information Log Entries: 3,827
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num Test_Description Status Power_on_Hours Failing_LBA NSID Seg SCT Code
0 Short Completed without error 5369 - - - - -
1 Short Completed without error 5357 - - - - -
2 Short Completed without error 5357 - - - - -
3 Short Completed without error 5345 - - - - -
4 Short Completed without error 5321 - - - - -
5 Extended Completed without error 5297 - - - - -
6 Short Completed without error 5273 - - - - -
7 Short Completed without error 5266 - - - - -
8 Short Completed without error 5249 - - - - -
9 Extended Completed without error 5130 - - - - -
10 Short Completed without error 5081 - - - - -
11 Extended Completed without error 4962 - - - - -
12 Short Completed without error 4913 - - - - -
13 Extended Completed without error 4814 - - - - -
14 Extended Completed without error 4799 - - - - -
15 Extended Completed without error 4794 - - - - -
16 Short Completed without error 4745 - - - - -
17 Extended Completed without error 4626 - - - - -
18 Short Completed without error 4577 - - - - -
19 Extended Completed without error 4476 - - - - -
Now look for Warning Comp. Temp. Threshold:
and Warning Comp. Temperature Time:
. The first value tells you what the warning limit is, the second value tells you how many minutes you exceeded that limit.
Hope this helps some, if you do not see those values, post the entire output of the command for one of your NVMe drives. You can redact the serial number if desired.
I should have been a bit more clear. I don’t have a threshold. I have 2 NVMEs, one of them has no warnings:
vader:~# smartctl -x /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: WD_BLACK SN850X 2000GB
Serial Number: 24171U801864
Firmware Version: 620361WD
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 8224
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 4096
Namespace 1 IEEE EUI-64: 001b44 8b472a93dd
Local Time is: Sat Jun 7 17:56:59 2025 EEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 94 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W 9.00W - 0 0 0 0 0 0
1 + 6.00W 6.00W - 0 0 0 0 0 0
2 + 4.50W 4.50W - 0 0 0 0 0 0
3 - 0.0250W - - 3 3 3 3 5000 10000
4 - 0.0050W - - 4 4 4 4 3900 45700
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 2
1 + 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 40 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 22,204,587 [11.3 TB]
Data Units Written: 22,395,311 [11.4 TB]
Host Read Commands: 109,998,391
Host Write Commands: 407,435,444
Controller Busy Time: 1,729
Power Cycles: 57
Power On Hours: 319
Unsafe Shutdowns: 13
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged
and the other has a warning:
vader:~# smartctl -x /dev/nvme1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: WD_BLACK SN850X 2000GB
Serial Number: 24435C4A4111
Firmware Version: 620361WD
PCI Vendor/Subsystem ID: 0x15b7
IEEE OUI Identifier: 0x001b44
Total NVM Capacity: 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity: 0
Controller ID: 8224
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size: 4096
Namespace 1 IEEE EUI-64: 001b44 4a41de9082
Local Time is: Sat Jun 7 17:58:04 2025 EEST
Firmware Updates (0x14): 2 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e): Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 94 Celsius
Namespace 1 Features (0x02): NA_Fields
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 9.00W 9.00W - 0 0 0 0 0 0
1 + 6.00W 6.00W - 0 0 0 0 0 0
2 + 4.50W 4.50W - 0 0 0 0 0 0
3 - 0.0250W - - 3 3 3 3 5000 10000
4 - 0.0050W - - 4 4 4 4 3900 45700
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 - 512 0 2
1 + 4096 0 1
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 46 Celsius
Available Spare: 100%
Available Spare Threshold: 10%
Percentage Used: 0%
Data Units Read: 11,766,180 [6.02 TB]
Data Units Written: 6,170,648 [3.15 TB]
Host Read Commands: 58,149,382
Host Write Commands: 123,131,561
Controller Busy Time: 473
Power Cycles: 20
Power On Hours: 62
Unsafe Shutdowns: 6
Media and Data Integrity Errors: 0
Error Information Log Entries: 0
Warning Comp. Temperature Time: 7
Critical Comp. Temperature Time: 0
Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged
Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged
You have that correct. nvme0 has no issues, nvme1 was too hot for 7 minutes. Above 90C is damn hot! 200F, don’t touch this.
You might be able to get more specific data using the nvme
command provided earlier, but that is up to the manufacturer if they recorded the event.