Scale constantly crashing (I suspect the latest Plex update)

nothing · August 15, 2024, 6:14pm

Every time it crashes - I see this in /var/log/messages:

The only container I run now is the latest version of plex.

This has happened before, where a new container or new version of a container causes the server to crash. I fixed it previously by simply removing the TRUECHARTS container that was causing it and it stopped crashing

Server:

The end result is that plex gets stuck deploying as all the rebooting corrupts the DB

Is there a way to revert to an older version of plex presuming you deleted your previous container and started from scratch? (ie: rollback not an option)

nothing · August 15, 2024, 6:39pm

Trying out Chart Version:
1.7.60

Wish me luck

awalkerix · August 15, 2024, 6:52pm

Data corruption in pool is not an apps update issue. It’s often hardware-related.

Please provide hardware details.

nothing · August 15, 2024, 6:54pm

You can see the file , it mentions it in the zpool status

root@truenas:/mnt/SOLID/plex# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 14.6T 0 disk
├─sda1 8:1 0 2G 0 part
└─sda2 8:2 0 14.6T 0 part
sdb 8:16 0 14.6T 0 disk
├─sdb1 8:17 0 2G 0 part
└─sdb2 8:18 0 14.6T 0 part
sdc 8:32 0 14.6T 0 disk
├─sdc1 8:33 0 2G 0 part
└─sdc2 8:34 0 14.6T 0 part
sdd 8:48 0 14.6T 0 disk
├─sdd1 8:49 0 2G 0 part
└─sdd2 8:50 0 14.6T 0 part
sde 8:64 0 14.6T 0 disk
├─sde1 8:65 0 2G 0 part
└─sde2 8:66 0 14.6T 0 part
nvme1n1 259:0 0 1.9T 0 disk
├─nvme1n1p1 259:1 0 2G 0 part
└─nvme1n1p2 259:2 0 1.9T 0 part
nvme0n1 259:3 0 931.5G 0 disk
├─nvme0n1p1 259:4 0 1M 0 part
├─nvme0n1p2 259:5 0 512M 0 part
├─nvme0n1p3 259:6 0 915G 0 part
└─nvme0n1p4 259:7 0 16G 0 part
└─nvme0n1p4 253:0 0 16G 0 crypt

sfatula · August 15, 2024, 6:58pm

Scale crashing should have nothing to do with your plex app. You have 207 checksum errors on your pool, 207 too many. That’s very bad and not a Plex issue.

nothing · August 15, 2024, 7:00pm

I deleted all the plex config files and restarted.

Ran a scrub - checked again - i dont see errors anymore

dan · August 15, 2024, 7:00pm

You seem to be confusing cause and effect. A Plex data file has data errors resulting from the lack of redundancy on your single-disk pool–the Plex file didn’t cause those data errors.

nothing · August 15, 2024, 7:01pm

So the SSD is pear shaped you guys reckon?

sfatula · August 15, 2024, 7:03pm

With no redundancy, it cannot fix errors, most likely, it’s toast. Not being able to fix errors is bad as who knows what all might be corrupted now.

nothing · August 15, 2024, 7:04pm

Is the SSD the cause of the “blocking state” entries in messages?

and if so why then does the server stay up with zero issues if I stop and remove the plex container?

ie; no more blocked states entries, no more crashes for HOURS. spin up plex ? scan media, BOOM crash, with those blocking state messages. Every single time like clockwork

neofusion · August 15, 2024, 7:12pm

Checksum errors can, among other things, be due to a failing drive, a defective cable or a bad motherboard/HBA controller.

Wouldn’t surprise me if it could also happen if you experience memory errors (but don’t quote me on that).

Your first priority ought to be to find what’s causing the checksum errors.
It’s not Plex, the issues with Plex are, as other have stated here, just another symptom of whatever is causing the checksum errors.

Try another cable, try another port on your motherboard/HBA, try a different drive. Memtest your RAM.

nothing · August 15, 2024, 7:25pm

Ok ill give that a shot - thanks for the reply

Could I just dd this drive to a new SSD and just swap them out - or do I have to reinstall the OS again?

Stux · August 15, 2024, 11:53pm

Zfs knows. Run a scrub

Stux · August 15, 2024, 11:55pm

You can use replace in the gui.

A better option is to use “extend” on the drive I question to turn it into a mirror with redundancy

Explained here:

sfatula · August 16, 2024, 1:59am

Sure, but it can’t do a thing about it. He’s been ignoring the errors likely for some time. It could have been corrupted on write, maybe due to memory or other issues, who knows.

I would reload that pool myself.

nothing · August 16, 2024, 11:28pm

I just realized that I have a boot-pool on another drive. Doesnt that mean that the OS is running off of nvme0n1?

That one never got any errors, so if this is the case how would errors on nvme1n1 cause the server to reboot? I DO have the applications on nvme1n1.

nothing · August 17, 2024, 8:42pm

UPDATE

I just created a new pool to install the ix-applications in and removed the troublesome drive.

So far - no crashes

It surprises me that an ssd with errors would cause the server to reboot - go figure!

Thanks for your help guys!!!

Stux · August 18, 2024, 7:00pm

It’s a bit surprising to me too. But only a bit.

NVMe is directly connected to the PCIe bus.

If the drive is failing it could be doing bad things to that bus

At least when using SATA or SAS there is an HBA between the device and the PCIe bus.

Where as the PCIe bus is directly connected to your CPU.

<sci-fi>
Imagine a neural chip implant directly connected to your brain… that went bad… vs a smart watch on your wrist.
</sci-fi>

Yeah. I guess that could cause a reboot.