NVMe scrub, high temps

Hey, is it normal that scrubbing makes my NVMe drives much hotter compared to even high-load normal use? On a typical day, the temperatures don’t go above 55°C, but during scrubbing, the disks can reach up to 70°C (which is probably their factory limit). I’m using three Lexar 790 drives in RAID 3. I understand that that scrubbing is read intense task but difference is huge, Is thermal throttling risk for scrubbing/disk/data integrity?

NVMe drives run hot in general. Scrubs are nonstop reads. That’s normal to see those temps.

I have a heatsink on my NVMes and it has helped keep the temperatures from climbing too high.

1 Like

I have the same issue with my SN850 drives.

Wow! There’s no such thing is ZFS, and actual RAID3 has not been seen in the wild for quite some time (it was already considered obsolete and superseded by RAID5 thirty years ago), although Unraid does something quite similar to it.
Check what you really have, and whether it is appropriate for your use case.

Yes there is. This is the early alpha release of AnyRaid™ RAID3.

With 3 NVMe drives, if @szymon loses 3 of them, their data is still safe. It only becomes a problem if you lose 4 drives in your 3-drive vdev.

Yeah. it was mistake, but there is raidz - 3, it depends on your failure tolerance. Im actualy using RAIDZ in 3 drives vdev, so 1 drive for parity.

Buy a few m.2 heatsinks. The slim ones are easier to fit if your drives are near other components, such as a GPU, CPU, or RAM. The thicker ones will keep your temps lower and do better at spreading heat.

I only ever use heatsinks with my NVMe drives, and I recommend everyone do the same, if their chassis has enough room.

You don’t need the ones that have small fans. A passive heatsink works just fine, especially if you have proper airflow in the case.

1 Like

Yeah, already ordered ICY box. I’ve used it in another setup and was great

Nope, not a factory limit, but rather a warranty limit. They can get hotter, I’m fairly certain of that. If you hit 71C, your warranty is toast. (see what I did there). Some drives like mine can operate at

RAID3, I had to look that up.

To solve your problem you will either need to improve airflow, add a heat sink to each NVMe, or both.

To see your NVMe temperature data, try: smartctl -x /dev/nvme0 and look for Warning Comp. Temp. Threshold and Critical as well. These are the absolute maximum values you can have. Now look for the same thing but for Temperature Time: to see how long you were out of spec. If that does not work, try nvme smart-log /dev/nvme0 and then you can look for “temperature” and “Thermal” for the values. You may be able to find out what your maximum temperature was for each device.

If you cannot keep those drives cool, you are in for trouble down the road, sooner than you think.

1 Like

Scrub is the most throughput-intensive workload ZFS can run. It is highly parallel and optimized. So not surprising.

I get 0. Is this normal for SN850 or could there be something wrong?

That’s good. It means your NVMe never exceeded and sustained temperatures over the “warning” or “critical” levels.

EDIT: Where do you see 0? I assumed you were looking at Warning Comp. Temperature Time.

You should see 4 values if you run this:
smartctl -x /dev/nvme0 | grep "Comp\. Temp"

You should be running smartmontools version 7.4 or better, or use the nvme command to talk properly with the NVMe drive. I am using version 7.5 (as seen in my example) since it was released in April 2025.

There are always some inconsistencies between manufacturers but a “0” value is likely to be okay in this situation. Here is one of my outputs for comparison.

root@truenas:~# smartctl -x /dev/nvme0
smartctl 7.5 2025-04-30 r5714 [x86_64-linux-6.12.15-production+truenas] (local build)
Copyright (C) 2002-25, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Nextorage SSD NEM-PA4TB
Serial Number:                      5112308181500000
Firmware Version:                   EIFS51.3
PCI Vendor/Subsystem ID:            0x1f31
IEEE OUI Identifier:                0x7cef40
Total NVM Capacity:                 4,000,787,030,016 [4.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          4,000,787,030,016 [4.00 TB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            7cef40 813a300063
Local Time is:                      Sat Jun  7 08:36:08 2025 EDT
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x005d):     Comp DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0c):         Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         512 Pages
Warning  Comp. Temp. Threshold:     84 Celsius
Critical Comp. Temp. Threshold:     89 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     8.80W       -        -    0  0  0  0        0       0
 1 +     7.10W       -        -    1  1  1  1        0       0
 2 +     5.20W       -        -    2  2  2  2        0       0
 3 -   0.0620W       -        -    3  3  3  3     2500    7500
 4 -   0.0440W       -        -    4  4  4  4    10500   65000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         2
 1 -    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02, NSID 0xffffffff)
Critical Warning:                   0x00
Temperature:                        27 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    0%
Data Units Read:                    253,223,547 [129 TB]
Data Units Written:                 16,420,340 [8.40 TB]
Host Read Commands:                 624,362,026
Host Write Commands:                193,180,514
Controller Busy Time:               1,392
Power Cycles:                       293
Power On Hours:                     5,376
Unsafe Shutdowns:                   124
Media and Data Integrity Errors:    0
Error Information Log Entries:      3,827
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 63 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06, NSID 0xffffffff)
Self-test status: No self-test in progress
Num  Test_Description  Status                       Power_on_Hours  Failing_LBA  NSID Seg SCT Code
 0   Short             Completed without error                5369            -     -   -   -    -
 1   Short             Completed without error                5357            -     -   -   -    -
 2   Short             Completed without error                5357            -     -   -   -    -
 3   Short             Completed without error                5345            -     -   -   -    -
 4   Short             Completed without error                5321            -     -   -   -    -
 5   Extended          Completed without error                5297            -     -   -   -    -
 6   Short             Completed without error                5273            -     -   -   -    -
 7   Short             Completed without error                5266            -     -   -   -    -
 8   Short             Completed without error                5249            -     -   -   -    -
 9   Extended          Completed without error                5130            -     -   -   -    -
10   Short             Completed without error                5081            -     -   -   -    -
11   Extended          Completed without error                4962            -     -   -   -    -
12   Short             Completed without error                4913            -     -   -   -    -
13   Extended          Completed without error                4814            -     -   -   -    -
14   Extended          Completed without error                4799            -     -   -   -    -
15   Extended          Completed without error                4794            -     -   -   -    -
16   Short             Completed without error                4745            -     -   -   -    -
17   Extended          Completed without error                4626            -     -   -   -    -
18   Short             Completed without error                4577            -     -   -   -    -
19   Extended          Completed without error                4476            -     -   -   -    -

Now look for Warning Comp. Temp. Threshold: and Warning Comp. Temperature Time:. The first value tells you what the warning limit is, the second value tells you how many minutes you exceeded that limit.

Hope this helps some, if you do not see those values, post the entire output of the command for one of your NVMe drives. You can redact the serial number if desired.

1 Like

I should have been a bit more clear. I don’t have a threshold. I have 2 NVMEs, one of them has no warnings:

vader:~# smartctl -x /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD_BLACK SN850X 2000GB
Serial Number:                      24171U801864
Firmware Version:                   620361WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8224
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     4096
Namespace 1 IEEE EUI-64:            001b44 8b472a93dd
Local Time is:                      Sat Jun  7 17:56:59 2025 EEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     94 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W    9.00W       -    0  0  0  0        0       0
 1 +     6.00W    6.00W       -    0  0  0  0        0       0
 2 +     4.50W    4.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000   10000
 4 -   0.0050W       -        -    4  4  4  4     3900   45700

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         2
 1 +    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        40 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    22,204,587 [11.3 TB]
Data Units Written:                 22,395,311 [11.4 TB]
Host Read Commands:                 109,998,391
Host Write Commands:                407,435,444
Controller Busy Time:               1,729
Power Cycles:                       57
Power On Hours:                     319
Unsafe Shutdowns:                   13
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

and the other has a warning:

vader:~# smartctl -x /dev/nvme1
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.13-amd64] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       WD_BLACK SN850X 2000GB
Serial Number:                      24435C4A4111
Firmware Version:                   620361WD
PCI Vendor/Subsystem ID:            0x15b7
IEEE OUI Identifier:                0x001b44
Total NVM Capacity:                 2,000,398,934,016 [2.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      8224
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,000,398,934,016 [2.00 TB]
Namespace 1 Formatted LBA Size:     4096
Namespace 1 IEEE EUI-64:            001b44 4a41de9082
Local Time is:                      Sat Jun  7 17:58:04 2025 EEST
Firmware Updates (0x14):            2 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x00df):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp Verify
Log Page Attributes (0x1e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg Pers_Ev_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     94 Celsius
Namespace 1 Features (0x02):        NA_Fields

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     9.00W    9.00W       -    0  0  0  0        0       0
 1 +     6.00W    6.00W       -    0  0  0  0        0       0
 2 +     4.50W    4.50W       -    0  0  0  0        0       0
 3 -   0.0250W       -        -    3  3  3  3     5000   10000
 4 -   0.0050W       -        -    4  4  4  4     3900   45700

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         2
 1 +    4096       0         1

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    11,766,180 [6.02 TB]
Data Units Written:                 6,170,648 [3.15 TB]
Host Read Commands:                 58,149,382
Host Write Commands:                123,131,561
Controller Busy Time:               473
Power Cycles:                       20
Power On Hours:                     62
Unsafe Shutdowns:                   6
Media and Data Integrity Errors:    0
Error Information Log Entries:      0
Warning  Comp. Temperature Time:    7
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 256 entries)
No Errors Logged

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged


You have that correct. nvme0 has no issues, nvme1 was too hot for 7 minutes. Above 90C is damn hot! 200F, don’t touch this.

You might be able to get more specific data using the nvme command provided earlier, but that is up to the manufacturer if they recorded the event.

1 Like