Periodic (as in at intervals) hitches due to asyncio_loop taking 100% CPU

Not expecting much here since there’s a few somewhat similar topics with little resolve, and the issue seems rather context specific, but hey…

I’m not sure when this started exactly, but I’m fairly certain this only cropped up sometime in the past 1-3 months.

At a shockingly consistent interval, the middleware will hitch for a second or two, for example:

  • Holding a key in the terminal, the echoing of the characters will lock up and then catch up a second later
  • The widgets on the main dashboard will stop updating for that time
  • Pages will take longer to load (if caught on the interval)
  • etc

Funny enough the terminal through ssh does not seem to be effected, nor does a terminal in say a docker container that’s accessible via HTTP. Its mainly the middleware that gets bogged down.

What I’ve been able to determine is that the hitches are occurring due to (or at least at the same time as) the asyncio_loop process suddenly swelling all the way up to 100%+ usage.

I run about 20 docker stacks and a single HAOS VM (on a bridge with the host if it matters). As I spin these services down one after another, the interval at which this happens slows at an almost linear rate, until eventually when all services are stop the issue appears to cease. But even with just the VM or one or two containers up it will happen every 90 seconds or so.

With everything up it’s about every 18 seconds, though I’ve seen it be as bad as every 10.

I have another X570 system with a 3700X that doesn’t seem to do this even with a good number of docker containers (though no VM running).

I imagine it might be storage related, but I’m not really sure what to make of this. I checked the audit log, and there’s nothing odd in there (especially not anything that matches the period) other than me breaking things /s.

System is in sig.

Hmm… a hard reboot fixed the issue, but I’ve definitely rebooted before since I noticed this (though maybe they were all soft resets?).

The only thing I can think of is of course now ARC is cleared, so maybe RAM is involved somehow?

The CPU recent usage widget before used to do this weird thing where instead of just jumping to a new position each update tick it would kind of stutter slide into the new spot, even between the hitch intervals. When I noticed it wasn’t doing that anymore I figured something might be different.

Ok, it turns out (and thank god for the email alert for this I got recently, which gave me the context I was missing, though I’m a little confused as to why I hadn’t received this notification earlier, unless I just missed it):

linuxserver/jackett SIGFAULTs on my system for some reason when its auto-update feature was enabled. This was causing daily core-dumps into the system/.system/cores dataset, which would fill it up and push it against its default set quota of 1GB. Datasets close to quotas or pools close to capacity are known to cause ZFS to thrash.

Turned off the feature while I troubleshoot with linuxserver.io, deleted the dumps, whipped snapshots of that dataset to ensure it actually emptied out, restarted, and the problem went away.