Problem/Justification
As many of us work on repurposed servers or computers, and many of us also use Truenas as a host for our home media streaming server, many of us have a GPU for transcoding.
The request is to make a GPU widget (which at the minimum) for the dashboard which shows usage and temperature of the GPU.
Some (like me) not only have a GPU for this transcoding (RTX 1060 in my case) BUT also use Server grade hardware (Tesla P40 in my case) to run a virtual machine -nay- an ‘instance’ with AI for image generation, LLMs or training purposes.
They can run hot, and once a thermal barrier is reached the system shuts down to protect itself.
A widget would allow monitoring for the GPUs in a server/computer to see how it’s doing.
Bonus points for colour coding the widget with a green border if the temps are below 60°C, orange between 60 and 75°C and red if above 75°C. Even more bonus points if all relevant info is presented in a half-widget too since I have 2 GPUs…
Impact
This allows all users to monitor their servers to a better degree, avoiding unforeseen hardware driven forced reboots (which can lead to data loss or corruption) due to thermals. And if the system is doing fine, at least all users have a cool widget to impress friend and foe alike.
User Story
User Dennis runs ComfyUI on a server with a (fanless) Tesla P40 GPU. To monitor the temperature, Dennis has to go down to his basement and feel the hot air blowing out of the back of the server. Dennis runs an ML350p G8 with 4 system fans designed to be operated in soundproofed server rooms because they can go up to 98 decibels, so he can already hear the screaming machine from the stairs going downstairs.
TLDR - I need a monitoring widget for my GPUs to avoid having to put on a sound cancelling headset to go downstairs and feel the hot air blowing out the back of the server to know if I can continue my work or if I need to give the system time to chill out.
Thank you for your time and consideration.