Advice sought please, regarding this quota management script

E_B · August 17, 2024, 11:34am

Hello all

My questions:

In ZFS would it be much faster to use (pseudocde) mv /* /dev/null rather than use rm /* ?
what other way would be better than “my” approach"?

Background:
I asked chatGPT to write me a Bash script (because my script/programming skills border on non-existent). I checked the script myself and I used shellcheck and some online equivalents to satisfy myself that it would do what it says (it does).

Goal:
Once an hour via Cron, this script looks recursively through a set of directories and deletes the oldest files until the size of the directories and files in a dataset is reduced to be below the associated quota value.

Results:
The script works but it is so slow that a race hazard ends up occurring where I am writing to a dataset fater than the script can read and act upon the files and directories. The dataset quota is frequently exceeded.

The script itself, which lives in /mnt/mainraid/scripts and is called in a cronjob (set in the UI) cd /mnt/mainraid/scripts && ./frigate_prune.sh

Here’s the script:

#!/bin/bash

DATASET_PATH="/mnt/mainraid/frigate_media/frigate/recordings"
QUOTA_LIMIT=80 # in GB
CURRENT_USAGE=$(du -sBG "$DATASET_PATH" | cut -f1 | sed 's/G//')

if [ "$CURRENT_USAGE" -ge "$QUOTA_LIMIT" ]; then
    # Find and delete the oldest files until the usage is below the limit
    while [ "$CURRENT_USAGE" -ge "$QUOTA_LIMIT" ]; do
        # Find the oldest file and delete it
        OLDEST_FILE=$(find "$DATASET_PATH" -type f -printf '%T+ %p\n' | sort | head -n 1 | cut -d' ' -f2-)
        if [ -n "$OLDEST_FILE" ]; then
            rm "$OLDEST_FILE"
            echo "Deleted: $OLDEST_FILE"
        fi
        CURRENT_USAGE=$(du -sBG "$DATASET_PATH" | cut -f1 | sed 's/G//')
    done
fi

I can just about follow what’s happening but this is the extreme limit of my knowledge (I looked up how sed works and the various switches for du etc). If someone has a finger puppet/crayon explanation as to how to make it either a lot faster, or ideas regarding other methods of keeping the data in the dataset maintained via a “FIFO” process, I’d be very pleased to learn!

Final thought: there maybe an optimum cron timing which allows the script to keep on top of the number of files, but it seems like a task best suited to Sisyphus!

Or someoone brainier than me.