Supermicro FAN Control!

What that “voodoo” does is set the min and max fan speed thresholds to 100 rpm and 25K rpm. This keeps the BMC from ramping fan speeds up to max unless the speed falls below 100 rpm (unlikely if the fan hasn’t failed).

Now that you have done that, you should keep lowering the PWM duty cycle for the fans to see what the actual minimum speed is, then re-run the command to set a min value that reflects reality. The low threshold isn’t an “annoyance”…it lets the BMC keep your system from overheating when the fan RPM drops too low to cool the system.

3 Likes

ipmitool and impiutil are the same thing with slightly different interface depending whether the tool acts locally or remotely.
The higher thereshold are irrelevant in practice, but the lower thresholds should all be different by at least 100 rpm. Setting inconsistent values may cause the BMC to reject the change.

As said, the values should be calculated from the specifications of the actual fan:
Take the minimal allowed rotational speed (e.g. 400 +/- 20% would give 320 rpm); round down to a multiple of 100, this is your lnc value (here 300).
Then set lcr and lnr from there (here 200 and 100).

2 Likes

You can use zeros as well. Helpful when your fans runs at 300rpm at 30% :wink:

Ie my lcr and lnr is zero.

2 Likes

Hi @Stux, I remember you had a very nice sript written for the fans in a Node 304 housing. I am still using that case with the very same SuperMicro board you used in that install. I actually bought it because of your detailed post :wink: But now it seems that I need a new way of having the system automatically adapt fan speeds under Dragonfish. Any idea how to tackle this?

1 Like

My Node 304 system is still my main server at home too. I did have to replace the motherboard after a lightning strike, but insurance covered it.

You could use my updated version that works on scale (tested on Cobia/Dragonfish) and Core :wink:

Don’t forget to copy your config settings in and it should just work.

(It’s minimally edited so just do a diff between the two versions to see your changes)

Needs to be run using tmux at post init.

1 Like

OK, I am not quite sure I am getting it right. Here in your post you write “post-init” but on the Github site you mention “pre-init”:

And in your Build for the Node 304 you matched the CPU to FAN4 and the three Noctua fans to FAN1-3 so it looks like this in my changed script:

Do I need to make some other changes in the file (apart from entering “Linux” as the OS)? E.g. because I switched around the fan zones?

And with which command can I see the logs?

Any help? :slight_smile:

Yeah. Pre-init is too early. I found this the hard way. And the GitHub hasn’t been updated.

1 Like

I think you shouldn’t be listing 3 fans for the hd fan header (and technically you’re not). The script needs just a single fan. That’s the one it will test.

I’ll post my settings later.

Meanwhile, the script auto detects the platform. You shouldn’t have to set it to “Linux”

This is the config section from my version of the hybrid fan control script that is running on my Node 304.

###############################################################################################
## CONFIGURATION
################

## DEBUG LEVEL
## 0 means no debugging. 1,2,3,4 provide more verbosity
## You should run this script in at least level 1 to verify its working correctly on your system
$debug = 1;

## CPU THRESHOLD TEMPS
## A modern CPU can heat up from 35C to 60C in a second or two. The fan duty cycle is set based on this
$high_cpu_temp = 70;		# will go HIGH when we hit
$med_cpu_temp = 60;	 	# will go MEDIUM when we hit, or drop below again
$low_cpu_temp = 50;		# will go LOW when we fall below 35 again

## HD THRESHOLD TEMPS
## HDs change temperature slowly. 
## This is the temperature that we regard as being uncomfortable. The higher this is the
## more silent your system.
## Note, it is possible for your HDs to go above this... but if your cooling is good, they shouldn't.
$hd_max_allowed_temp = 40;	# celsius. you will hit 100% duty cycle when you HDs hit this temp.

## CPU TEMP TO OVERRIDE HD FANS
## when the CPU climbs above this temperature, the HD fans will be overridden
## this prevents the HD fans from spinning up when the CPU fans are capable of providing 
## sufficient cooling.
$cpu_hd_override_temp = 75;

## CPU/HD SHARED COOLING
## If your HD fans contribute to the cooling of your CPU you should set this value.
## It will mean when you CPU heats up your HD fans will be turned up to help cool the
## case/cpu. This would only not apply if your HDs and fans are in a separate thermal compartment.
$hd_fans_cool_cpu = 1;		# 1 if the hd fans should spin up to cool the cpu, 0 otherwise


#######################
## FAN CONFIGURATION
####################

## FAN SPEEDS
## You need to determine the actual max fan speeds that are achieved by the fans
## Connected to the cpu_fan_header and the hd_fan_header.
## These values are used to verify high/low fan speeds and trigger a BMC reset if necessary.
$cpu_max_fan_speed 	= 6500;
$hd_max_fan_speed 	= 1800;


## CPU FAN DUTY LEVELS
## These levels are used to control the CPU fans
$fan_duty_high	= 100;		# percentage on, ie 100% is full speed.
$fan_duty_med 	= 70; # was 70
$fan_duty_low 	= 30;

## HD FAN DUTY LEVELS
## These levels are used to control the HD fans
$hd_fan_duty_high 	= 100;	# percentage on, ie 100% is full speed.
$hd_fan_duty_med_high 	= 85;
$hd_fan_duty_med_low	= 50;
$hd_fan_duty_low 	= 30;	# some 120mm fans stall below 30.


## FAN ZONES
# Your CPU/case fans should probably be connected to the main fan sockets, which are in fan zone zero
# Your HD fans should be connected to FANA which is in Zone 1
# You could switch the CPU/HD fans around, as long as you change the zones and fan header configurations.
#
# 0 = FAN1..5
# 1 = FANA
$cpu_fan_zone = 1;
$hd_fan_zone = 0;


## FAN HEADERS
## these are the fan headers which are used to verify the fan zone is high. FAN1+ are all in Zone 0, FANA is Zone 1.
## cpu_fan_header should be in the cpu_fan_zone
## hd_fan_header should be in the hd_fan_zone
$cpu_fan_header = "FAN4";	
$hd_fan_header = "FAN1";



################
## MISC
#######

## PLATFORM
## The platform is either "FreeBSD" or "Linux", and is determined by calling uname
$platform = `/usr/bin/uname`; # "FreeBSD" when on CORE or "Linux" on SCALE.
chomp $platform;

## IPMITOOL PATH
## ipmitool is used to invoke the IPMI tool to access the SuperMicro BMC
$ipmitool = "ipmitool";

## uncomment the following line, and replace HOST/ADMIN/PASSWORD with your IPMI credentials to access IPMI over the network 
#$ipmitool = "$impitool -I lanplus -H 192.168.1.209 -U <username> -P <password>";	# network access, necessary when running in a VM

## HD POLLING INTERVAL
## The controller will only poll the harddrives periodically. Since hard drives change temperature slowly
## this is a good thing. 180 seconds is a good value.
$hd_polling_interval = 180;	# seconds

## FAN SPEED CHANGE DELAY TIME
## It takes the fans a few seconds to change speeds, we allow a grace before verifying. If we fail the verify
## we'll reset the BMC
$fan_speed_change_delay = 10; # seconds

## BMC REBOOT TIME
## It takes the BMC a number of seconds to reset and start providing sensible output. We'll only
## Reset the BMC if its still providing rubbish after this time.
$bmc_reboot_grace_time = 120; # seconds

## BMC RETRIES BEFORE REBOOTING
## We verify high/low of fans, and if they're not where they should be we reboot the BMC after so many failures
$bmc_fail_threshold	= 1; 	# will retry n times before rebooting

# edit nothing below this line
########################################################################################################################

If you have the same build as me, with the same fans, connected to the same headers, this should work fairly well.

Meanwhile, these are my current IPMI thresholds as per sensor list all

root@chronus[/mnt/tank/server/scripts]# ipmitool sensor list all | grep FAN
FAN1             | 1000.000   | RPM        | ok    | 100.000   | 200.000   | 300.000   | 25300.000 | 25400.000 | 25500.000 
FAN2             | 800.000    | RPM        | ok    | 0.000     | 100.000   | 200.000   | 25300.000 | 25400.000 | 25500.000 
FAN3             | 1000.000   | RPM        | ok    | 100.000   | 200.000   | 300.000   | 25300.000 | 25400.000 | 25500.000 
FAN4             | 3500.000   | RPM        | ok    | 300.000   | 500.000   | 700.000   | 25300.000 | 25400.000 | 25500.000 

and it shows you the current fan speeds too :stuck_out_tongue:

And this is after I added the heatsink to my boot m.2

Also, I added support in my version on this platform for monitoring the m.2 slot’s nvme.

This is a modification to the get_cpu_temp_direct function

sub get_cpu_temp_direct
{
    # the following command needs to return a list of temps for the cores, output is something like "50.0\n51.0\n"
        my $core_temps = $platform eq "FreeBSD" ?
                `sysctl -a dev.cpu | egrep -E \"dev.cpu\.[0-9]+\.temperature\" | awk '{print \$2}' | sed 's/.\$//'`
        :
                `sensors -A coretemp-isa-0000 | egrep 'Package id [0-9]:' | awk '{print \$4}' | sed 's/[^0-9\.]*//g'`
        ;

        # the below line adds the temp sensors from nvme0 (boot drive) to the list. The CPU fan cools these drives
        # so I want them to go into the core temps list too.
        my $nvme_temps = `smartctl -a /dev/nvme0 | grep "Temperature Sensor" |  awk '{print \$4}'`;
        $core_temps = $core_temps.$nvme_temps;

        chomp($core_temps);

        dprint(3,"core_temps:\n$core_temps\n");

Which basically makes it so the nvme temps count as a cpu core temp for the purpose of controlling the CPU fan… which is what cools the m.2.

Works quite well when the heatsink is installed.

1 Like

just updated it.

Does that work for you? Every time I try to launch the command on a pre-init or post-init basis, the command seems to fail. Is it perhaps a permissions issue?

I’ve made a small change to the version of the script I’m running to discount the hottest hard drive, so it uses the second hottest for control - my boot SSD invariably runs hotter than the normal hard drives being in a plastic case with no real thermal transfer and ends up forcing the fans on full 24/7.


sub get_hd_temp
{
        my $max_temp = 0;
        my $second_temp = 0;

        foreach my $item (@hd_list)
        {
                my $disk_dev = "/dev/$item";

                my $command = "smartctl -A $disk_dev | grep Temperature_Celsius";
                dprint( 3, "$command\n" );

                my $output = `$command`;
                dprint( 2, "$output");

                my @vals = split(" ", $output);

                # grab 10th item from the output, which is the hard drive temperature (on Seagate NAS HDs)
                my $temp = "$vals[9]";
                chomp $temp;
        dprint( 3, "temp: $temp\n" );


                if( $temp )
                {
                        dprint( 1, "$disk_dev: $temp\n");

                        $second_temp = $max_temp if $temp > $max_temp;
                        $max_temp = $temp if $temp > $max_temp;
                }
        }

        dprint(0, "Maximum HD Temperature: $max_temp\n");
        dprint(0, "Second Max HD Temp: $second_temp\n");
        #return $max_temp;
        return $second_temp;


}

The command doesn’t work on Scale, it gives you a result of

admin@Orac[/mnt/Orac/Scripts]$ sudo smartctl -a /dev/nvme0 | grep "Temperature Sensor"
admin@Orac[/mnt/Orac/Scripts]$

It needs to be


admin@Orac[/mnt/Orac/Scripts]$ sudo smartctl -a /dev/nvme0 | grep "Temperature"       
Temperature:                        29 Celsius
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

So you need a


smartctl -a /dev/nvme0 | grep "Temperature" |  awk '{print \$2}'

Instead

It’s device specific I guess. It works on my scale box.

My SSD has two temperature sensors.

1 Like

How horribly helpful! In any case, I was grateful for the pointer and it’s helped me.

1 Like

Thanks for this mod! SATA DOM was running hot and spinning the fans, now, not. :slight_smile:

1 Like

Has anyone seen IPMI reset to default fan settings? I set the fan mode to Full and used ipmitool init scripts to configure the fans to stay at an acceptable speed.

This worked great until something caused my motherboard to revert to the optimal fan speed mode and start pulsing. It’s been up for a little over a month with no changes so I don’t know what would have caused it to suddenly switch fan modes.

I’d check to see via tmux what the fan script is blurting back at you. It could be worthwhile to have cron kill the fan script job on occasion and restart it on a monthly basis also.

The more important thing is that the rig reverted to a safe condition, ie your drives didn’t get roasted inadvertently.

I’m not using the fan script. I just set the fans to a fixed 80%. Temps have been fine. The optimal actually ends up with less airflow because it keeps oscillating.

What concerns me is what caused the fan mode to change from Full to Optimal.