Nextcloud Database Issue

RegularA · August 27, 2024, 1:25pm

Hi all!

I am currently having an issue with the Nextcloud app after an update for the app - with me no longer being able to run it. It was no longer deploying correctly, and looking at the logs, I was discovering this was something to do with the database, with the logs indicating:

PANIC: could not locate a valid checkpoint record

The database pod would then stop running.

After doing some research, it seems likely that I need to run the pg_resetwal command to get the database running again. However, I cannot for the life of me work out how to keep the database pod running long enough for me to run the command inside it, as the pod keeps stopping once hitting this PANIC failure.

Can anyone provide any advice as to how I can keep the pod running long enough to run the PostgreSQL commands I need to run - or alternatively any other advice?

I am currently running the latest version of TrueNAS Scale 24.04.2.

Thank you for any pearls of wisdom offered!

oxyde · August 27, 2024, 2:36pm

I’m not so confident, so take what I say with a grain of salt, but looking on documentation:

This command must not be used when the server is running. pg_resetwal will refuse to start up if it finds a server lock file in the data directory. If the server crashed then a lock file might have been left behind; in that case you can remove the lock file to allow pg_resetwal to run. But before you do so, make doubly certain that there is no server process still alive.

pg_resetwal works only with servers of the same major version.

I think you can try to restore the db before try to deploy the app, as i think is a good choice try on a copy of it and not directly (the risk of data loss is high), then safely swap.

(and i hope you have backups to restore in case db will definetly loss data)

RegularA · September 9, 2024, 12:20pm

Thank you oxide. Sorry for delay in response.

Still a bit annoyed I cannot access the pod whilst there is a failure. Basically limits my troubleshooting options and makes it a bit of a pain to do anything - including restoration!

Thankfully I did have a backup. In the end, I ultimately whipped up a new App instance, and then deleted the database in the working database pod of the new App, recreated a blank database and restored my backup from an .sql file in this pod. I then killed the new App and pointed my original app to this folder for the database instance. This got it working with a few hitches - but a few OCC commands to upgrade/fix indices etc later and it seems to be back to normal.

But it was an absolute pain and took an entire evening!

Also unfortunately happened with my Immich app this week as well. Seems that Immich was performing some sort of activity when the apps service on TrueNAS decided to restart itself. Did the same thing to get it back up and running from a backup, as once again could not get into the database pod due to it closing with this error, and I still can’t find a way of keeping it alive when such a thing happens.

Hoping the move towards Docker Compose in the next version may alleviate some of these issues!