Hello,
I’m trying to automate my replication task from my TNS-Fileserver to my TNS-Backup.
Both Servers have IPMI and run TNS 25.10.0.
I created the following script with AI, which I want to run as a CRON job from the main fileserver.
Basically: every 2 weeks, wake up backup server, replicate the whole storage pool to backup server and then shut down the backup server.
Unfortunately I get the following error message when I want to start the CRON job:
CallError
[EFAULT] CronTask “/mnt/Storage/_Apps/Scripts/replication_orchestrator.sh > /dev/null 2> /dev/null” exited with 137 (non-zero) exit status.
View Details
| Error Name: | EFAULT |
|---|---|
| Error Code: | 14 |
| undefined | ---- |
| Reason: | [EFAULT] CronTask “/mnt/Storage/_Apps/Scripts/replication_orchestrator.sh > /dev/null 2> /dev/null” exited with 137 (non-zero) exit status. |
| undefined | ---- |
| Error Class: | CallError |
| undefined | ---- |
| Trace: | Show |
| undefined | ---- |
What am I doing wrong here? I think this is a very common scenario? Any other thoughts how to achieve this automation?
#!/bin/bash
# Script: replication_orchestrator.sh
# Runs on TrueNAS SCALE Server A (Source) via Cron Job
# --------------------------------------------------------
# 0. CONFIGURATION (Edit These Values)
# --------------------------------------------------------
# IPMI Configuration for Server B (Target)
IPMI_HOST="xxx.xxx.xxx.xxx" # Server B's dedicated IPMI IP
IPMI_USER="zzz"
IPMI_PASS="zzz"
# Server B (Target) Network IP (for ping check and replication)
TARGET_NAS_IP="yyy.yyy.yyy.yyy"
TARGET_NAS_USER="root"
# SSH Key Path (Private Key for Server A accessing Server B)
SSH_KEY_PATH="/mnt/Storage/_Apps/Scripts/TNS-Backup Key_public_key_rsa"
# Replication Parameters
SOURCE_DATASET="Storage/00_Storage" # Dataset on Server A to send
TARGET_DATASET="Storage/00_Storage" # Dataset on Server B to receive
WAIT_TIMEOUT="300" # Max time (seconds) to wait for Server B to boot
# TrueNAS Shutdown Command (Executed remotely on Server B)
# NOTE: Requires full path and mandatory 'reason' argument for SCALE 25.04+ [13-15].
REMOTE_SHUTDOWN_CMD="/usr/bin/midclt call system.shutdown \"Automated post-replication shutdown\""
# --------------------------------------------------------
# 1. WAKE SERVER B (TARGET) VIA IPMI
# --------------------------------------------------------
wake_server_b() {
echo "$(date): Attempting to wake Server B (${IPMI_HOST}) via IPMI..."
# Using the required ipmitool command structure [5]
/usr/bin/ipmitool -I lanplus -H "$IPMI_HOST" -U "$IPMI_USER" -P "$IPMI_PASS" chassis power on
if [ $? -ne 0 ]; then
echo "$(date): ERROR: IPMI command failed. Check connectivity or credentials."
exit 1
fi
}
# --------------------------------------------------------
# 2. WAIT FOR SERVER B TO BOOT (PING LOOP)
# --------------------------------------------------------
wait_for_server_b() {
echo "$(date): Waiting up to ${WAIT_TIMEOUT} seconds for Server B (${TARGET_NAS_IP}) to become reachable..."
local end_time=$((SECONDS + WAIT_TIMEOUT))
while [ $SECONDS -lt $end_time ]; do
# Ping the target IP to confirm OS is booted and network is up [16, 17]
if ping -c 1 -W 2 "$TARGET_NAS_IP" &> /dev/null; then
echo "$(date): Server B is reachable. Continuing."
return 0
fi
sleep 10
done
echo "$(date): ERROR: Server B did not become reachable. Aborting replication."
exit 1
}
# --------------------------------------------------------
# 3. PERFORM ZFS REPLICATION (Example Placeholder)
# --------------------------------------------------------
perform_replication() {
echo "$(date): Starting ZFS Replication..."
# --- IMPORTANT: Custom snapshot management logic should be implemented here ---
# This example assumes snapshots are managed separately and finds the latest local one.
# Find the latest snapshot on the source dataset
LATEST_SNAP=$(/usr/sbin/zfs list -t snapshot -o name -r -d 1 "$SOURCE_DATASET" | tail -n 1)
if [ -z "$LATEST_SNAP" ]; then
echo "$(date): ERROR: No snapshots found on $SOURCE_DATASET."
return 1
fi
echo "$(date): Found latest snapshot: $LATEST_SNAP. Sending..."
# Execute ZFS send/receive over SSH using the private key [9, 10]
# NOTE: The -I (incremental basis) argument is omitted for simplicity/initial send,
# but highly recommended for subsequent runs.
if /usr/sbin/zfs send -R "$LATEST_SNAP" | \
/usr/bin/ssh -i "$SSH_KEY_PATH" "$TARGET_NAS_USER"@"$TARGET_NAS_IP" \
"/usr/sbin/zfs receive -F -d $TARGET_DATASET" ; then
echo "$(date): ZFS Replication completed successfully."
return 0 # Success
else
echo "$(date): ZFS Replication FAILED (Exit Code: $?)."
return 1 # Failure
fi
}
# --------------------------------------------------------
# 4. SHUTDOWN SERVER B (TARGET)
# --------------------------------------------------------
shutdown_server_b() {
echo "$(date): Replication complete. Initiating graceful shutdown of Server B (Target)."
# Use SSH to remotely execute the middleware shutdown command on Server B [9, 10, 18]
if /usr/bin/ssh -i "$SSH_KEY_PATH" "$TARGET_NAS_USER"@"$TARGET_NAS_IP" "$REMOTE_SHUTDOWN_CMD"; then
echo "$(date): Server B shutdown sequence initiated successfully."
else
echo "$(date): WARNING: Failed to initiate graceful shutdown on Server B."
fi
}
# --------------------------------------------------------
# MAIN EXECUTION FLOW
# --------------------------------------------------------
# Step 1: Power On Target Server
wake_server_b
# Step 2: Wait for Target Server to be ready
wait_for_server_b
# Step 3: Perform Replication and conditionally check its success (&& equivalent functionality) [19]
if perform_replication; then
# Step 4: If replication succeeded, shut down Server B
shutdown_server_b
else
echo "$(date): Orchestration finished with errors. Target server remains running for manual investigation."
exit 1
fi
Thank you