Any idea why /usr/sbin/daemon is not starting processes?

I recently (~1 month ago) set up a jail with Cryptpad inside. Works flawlessly and was actually very easy to configure.
Today I upgraded from U5 to U6, then I upgraded the jail from 13.2 to 13.3 with the following command:

iocage exec cryptpad_1 'pkg update -f; pkg upgrade; pkg update; pkg upgrade'
iocage stop cryptpad_1
iocage upgrade -r 13.3-RELEASE cryptpad_1
iocage start cryptpad_1
iocage exec cryptpad_1 'pkg update -f; pkg upgrade; pkg update; pkg upgrade'
iocage restart cryptpad_1

Upgrade was smooth, but Cryptpad is not up, when I run service cryptpad start it says Started and then nothing started (it does generate the pid file).

This GIST include my very simple rc.d file, .sh file and 2 commands I used for debugging: Cryptpad on TrueNAS CORE jail not starting through daemon, but starting through raw command · GitHub

I would have said it’s my fault, or the upgrade’s fault, except if I run su - cryptpad -c '/usr/local/opt/cryptpad.sh' it works perfectly fine, which suggests the problem is the daemon or the rc file. At this point (as root) I executed:

/usr/sbin/daemon -t cryptpad -T cryptpad -P /var/run/cryptpad/supervisor.pid -p /var/run/cryptpad/child.pid -r -S -f -u cryptpad /usr/local/opt/cryptpad.sh

This command only creates the child/supervisor pid, but the processes die immediately.

I had this problem before when I created a new jail and updated/upgraded packages, I was not able to solve it. I recreated the jail and suddenly daemon was behaving properly with identical setup as far as I’m aware.

Any idea what’s going on or how to debug it?
I don’t have a problem recreating this jail, but I have another jail that’s pretty convoluted to setup and I don’t have an ansible script for that one, so I would really appreciate not having to restart from scratch.

EDIT: Investigating using truss in front of the daemon command didn’t bring up anything interesting, all I see is a fork and an exit at the end:

sigaction(SIGHUP,{ SIG_IGN SA_RESTART ss_t },{ SIG_DFL 0x0 ss_t }) = 0 (0x0)
sigaction(SIGTERM,{ SIG_IGN SA_RESTART ss_t },{ SIG_DFL SA_RESTART ss_t }) = 0 (0x0)
socket(PF_LOCAL,SOCK_DGRAM|SOCK_CLOEXEC,0)       = 3 (0x3)
getsockopt(3,SOL_SOCKET,SO_SNDBUF,0x7fffffffe7ac,0x7fffffffe7a8) = 0 (0x0)
setsockopt(3,SOL_SOCKET,SO_SNDBUF,0x7fffffffe7ac,4) = 0 (0x0)
connect(3,{ AF_UNIX "/var/run/logpriv" },106)    = 0 (0x0)
openat(AT_FDCWD,"/var/run/cryptpad",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,00) = 4 (0x4)
openat(4,"child.pid",O_WRONLY|O_NONBLOCK|O_CREAT|O_CLOEXEC,0600) = 5 (0x5)
flock(5,LOCK_EX|LOCK_NB)                         = 0 (0x0)
fstatat(4,"child.pid",{ mode=-rw------- ,inode=657536,size=0,blksize=131072 },0x0) = 0 (0x0)
fstat(5,{ mode=-rw------- ,inode=657536,size=0,blksize=131072 }) = 0 (0x0)
ftruncate(5,0x0)                                 = 0 (0x0)
fstat(5,{ mode=-rw------- ,inode=657536,size=0,blksize=131072 }) = 0 (0x0)
cap_rights_limit(4,{ CAP_UNLINKAT })             = 0 (0x0)
cap_rights_limit(5,{ CAP_PWRITE,CAP_FTRUNCATE,CAP_FSTAT,CAP_EVENT }) = 0 (0x0)
openat(AT_FDCWD,"/var/run/cryptpad",O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC,00) = 6 (0x6)
openat(6,"supervisor.pid",O_WRONLY|O_NONBLOCK|O_CREAT|O_CLOEXEC,0600) = 7 (0x7)
flock(7,LOCK_EX|LOCK_NB)                         = 0 (0x0)
fstatat(6,"supervisor.pid",{ mode=-rw------- ,inode=657537,size=5,blksize=4096 },0x0) = 0 (0x0)
fstat(7,{ mode=-rw------- ,inode=657537,size=5,blksize=4096 }) = 0 (0x0)
ftruncate(7,0x0)                                 = 0 (0x0)
fstat(7,{ mode=-rw------- ,inode=657537,size=0,blksize=4096 }) = 0 (0x0)
cap_rights_limit(6,{ CAP_UNLINKAT })             = 0 (0x0)
cap_rights_limit(7,{ CAP_PWRITE,CAP_FTRUNCATE,CAP_FSTAT,CAP_EVENT }) = 0 (0x0)
open("/dev/null",O_RDWR,00)                      = 8 (0x8)
sigaction(SIGHUP,{ SIG_IGN 0x0 ss_t },{ SIG_IGN SA_RESTART ss_t }) = 0 (0x0)
fork()                                           = 10312 (0x2848)
exit(0x0)
process exit, rval = 0

This seems logical, even the exit (exit 0 after fork), but that pid 10312 died immediately (I was never being able to really find it).

I wonder if the problem is related to NodeJS. But the other jail runs one nodejs process but also a non-node process, so it’s not specific.

EDIT 2:
To further eliminate variables, I wrote this very simple scriipt:

#!/bin/sh

echo "start" > /var/log/foo.log
while true
do
  sleep 1
  echo "$(date -Iseconds)" >> /var/log/foo.log
done

Then I executed it (it’s in a weird spot): /usr/sbin/daemon -t cryptpad -T cryptpad -P /var/run/cryptpad/supervisor.pid -p /var/run/cryptpad/child.pid -r -S -f -u cryptpad /var/log/foo.sh

Nothing really happens, but if I do su - cryptpad -c /var/log/foo.sh, it works fine.

I think this is your problem:

3 Likes

You may also wish to validate your script in a FreeBSD 13.3 VM. /usr/sbin/daemon was significantly rewritten between 13.1 and 13.3 (and we’ve seen some associated breakage).

1 Like

Oh damn, ok thanks, this is definitely my problem!

Yeah, rolling back fixed it

Thank you, I’ll make sure of that

1 Like

I had to deal with this issue because Sonarr/Radarr/Bazarr/Prowlarr were not starting after updating from 13.2 to 13.4. The problem is with /usr/sbin/daemon in 13.4, the new version does not work as expected when the options -p and -P are used. The pid file for the service is created but with zero bytes, so the process is not started due to this. Because I could not find any other solution I just copied the /usr/sbin/daemon from 13.2 to my 13.4 jail, I put the file somewhere else (/usr/local/bin/daemon.old) and changed the rc.d script for those services to use the old daemon.old binary. I know this is a big hack but it is the only way I found. Staying in 13.2 was not possible for me because there is already an incompatibility with libinotify and some mono based services won’t run, i.e. jackett is currently broken in 13.2 and 13.4, but as long as Prowlarr works it is not a big problem.

When preparing the 13.3 release we also had to revert daemon to an earlier version. It was substantially broken by some refactoring to use kqueue. There is upstream bug report about it.

1 Like

Well, I faced the same problem, my solution was to write an ansible script and recreate the whole jail from scratch. It did work, but it makes sense now, there was no recovery. Your hack was probably the right solution.