Any idea why /usr/sbin/daemon is not starting processes?

Fire-Dragon-DoL · April 2, 2024, 8:15am

I recently (~1 month ago) set up a jail with Cryptpad inside. Works flawlessly and was actually very easy to configure.
Today I upgraded from U5 to U6, then I upgraded the jail from 13.2 to 13.3 with the following command:

iocage exec cryptpad_1 'pkg update -f; pkg upgrade; pkg update; pkg upgrade'
iocage stop cryptpad_1
iocage upgrade -r 13.3-RELEASE cryptpad_1
iocage start cryptpad_1
iocage exec cryptpad_1 'pkg update -f; pkg upgrade; pkg update; pkg upgrade'
iocage restart cryptpad_1

Upgrade was smooth, but Cryptpad is not up, when I run service cryptpad start it says Started and then nothing started (it does generate the pid file).

This GIST include my very simple rc.d file, .sh file and 2 commands I used for debugging: Cryptpad on TrueNAS CORE jail not starting through daemon, but starting through raw command · GitHub

I would have said it’s my fault, or the upgrade’s fault, except if I run su - cryptpad -c '/usr/local/opt/cryptpad.sh' it works perfectly fine, which suggests the problem is the daemon or the rc file. At this point (as root) I executed:

/usr/sbin/daemon -t cryptpad -T cryptpad -P /var/run/cryptpad/supervisor.pid -p /var/run/cryptpad/child.pid -r -S -f -u cryptpad /usr/local/opt/cryptpad.sh

This command only creates the child/supervisor pid, but the processes die immediately.

I had this problem before when I created a new jail and updated/upgraded packages, I was not able to solve it. I recreated the jail and suddenly daemon was behaving properly with identical setup as far as I’m aware.

Any idea what’s going on or how to debug it?
I don’t have a problem recreating this jail, but I have another jail that’s pretty convoluted to setup and I don’t have an ansible script for that one, so I would really appreciate not having to restart from scratch.

EDIT: Investigating using truss in front of the daemon command didn’t bring up anything interesting, all I see is a fork and an exit at the end:

sigaction(SIGHUP,{ SIG_IGN SA_RESTART ss_t },{ SIG_DFL 0x0 ss_t }) = 0 (0x0)
sigaction(SIGTERM,{ SIG_IGN SA_RESTART ss_t },{ SIG_DFL SA_RESTART ss_t }) = 0 (0x0)
socket(PF_LOCAL,SOCK_DGRAM|SOCK_CLOEXEC,0)       = 3 (0x3)
getsockopt(3,SOL_SOCKET,SO_SNDBUF,0x7fffffffe7ac,0x7fffffffe7a8) = 0 (0x0)
setsockopt(3,SOL_SOCKET,SO_SNDBUF,0x7fffffffe7ac,4) = 0 (0x0)
connect(3,{ AF_UNIX "/var/run/logpriv" },106)    = 0 (0x0)
openat(AT_FDCWD,"/var/run/cryptpad",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,00) = 4 (0x4)
openat(4,"child.pid",O_WRONLY|O_NONBLOCK|O_CREAT|O_CLOEXEC,0600) = 5 (0x5)
flock(5,LOCK_EX|LOCK_NB)                         = 0 (0x0)
fstatat(4,"child.pid",{ mode=-rw------- ,inode=657536,size=0,blksize=131072 },0x0) = 0 (0x0)
fstat(5,{ mode=-rw------- ,inode=657536,size=0,blksize=131072 }) = 0 (0x0)
ftruncate(5,0x0)                                 = 0 (0x0)
fstat(5,{ mode=-rw------- ,inode=657536,size=0,blksize=131072 }) = 0 (0x0)
cap_rights_limit(4,{ CAP_UNLINKAT })             = 0 (0x0)
cap_rights_limit(5,{ CAP_PWRITE,CAP_FTRUNCATE,CAP_FSTAT,CAP_EVENT }) = 0 (0x0)
openat(AT_FDCWD,"/var/run/cryptpad",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,00) = 6 (0x6)
openat(6,"supervisor.pid",O_WRONLY|O_NONBLOCK|O_CREAT|O_CLOEXEC,0600) = 7 (0x7)
flock(7,LOCK_EX|LOCK_NB)                         = 0 (0x0)
fstatat(6,"supervisor.pid",{ mode=-rw------- ,inode=657537,size=5,blksize=4096 },0x0) = 0 (0x0)
fstat(7,{ mode=-rw------- ,inode=657537,size=5,blksize=4096 }) = 0 (0x0)
ftruncate(7,0x0)                                 = 0 (0x0)
fstat(7,{ mode=-rw------- ,inode=657537,size=0,blksize=4096 }) = 0 (0x0)
cap_rights_limit(6,{ CAP_UNLINKAT })             = 0 (0x0)
cap_rights_limit(7,{ CAP_PWRITE,CAP_FTRUNCATE,CAP_FSTAT,CAP_EVENT }) = 0 (0x0)
open("/dev/null",O_RDWR,00)                      = 8 (0x8)
sigaction(SIGHUP,{ SIG_IGN 0x0 ss_t },{ SIG_IGN SA_RESTART ss_t }) = 0 (0x0)
fork()                                           = 10312 (0x2848)
exit(0x0)
process exit, rval = 0

This seems logical, even the exit (exit 0 after fork), but that pid 10312 died immediately (I was never being able to really find it).

I wonder if the problem is related to NodeJS. But the other jail runs one nodejs process but also a non-node process, so it’s not specific.

EDIT 2:
To further eliminate variables, I wrote this very simple scriipt:

#!/bin/sh

echo "start" > /var/log/foo.log
while true
do
  sleep 1
  echo "$(date -Iseconds)" >> /var/log/foo.log
done

Then I executed it (it’s in a weird spot): /usr/sbin/daemon -t cryptpad -T cryptpad -P /var/run/cryptpad/supervisor.pid -p /var/run/cryptpad/child.pid -r -S -f -u cryptpad /var/log/foo.sh

Nothing really happens, but if I do su - cryptpad -c /var/log/foo.sh, it works fine.

dan · April 2, 2024, 9:30am

I think this is your problem:

awalkerix · April 2, 2024, 10:33am

You may also wish to validate your script in a FreeBSD 13.3 VM. /usr/sbin/daemon was significantly rewritten between 13.1 and 13.3 (and we’ve seen some associated breakage).

Fire-Dragon-DoL · April 2, 2024, 2:39pm

Oh damn, ok thanks, this is definitely my problem!

Fire-Dragon-DoL · April 2, 2024, 9:16pm

Yeah, rolling back fixed it

Fire-Dragon-DoL · April 3, 2024, 4:51am

Thank you, I’ll make sure of that