Problem/Justification
You cannot do Periodic Snapshots of a running Virtual Machine safely because the backed up data cannot be relied on.
You should only ever backup an offline VM otherwise you risk data corruption in the backup.
Impact
I cannot backup my Virtual Machines safely using automatic functionality in TrueNAS.
This issue prevents an administrator doing a fully automatic backup strategy.
User Story
I have a webserver running on my TrueNAS and I have to manually shut it down, take a snapshot and then let the replication task run.
Because this is manual, it does not always get run on a regular basis which is bad.
I am surprised this issue has not been address before because VMs are a large part of TrueNAS now and data integrity is very important.
Proposed Solution
Add a feature into Periodic Snapshots
where it can gracefully shutdown specified Virtual Machines, do a snapshot, and then restart the Virtual Machines. The replication task will then run either immediately afterwards or on it’s specified schedule.
VMs can be selected individual or by Dataset which allows groups of VMs to be shutdown without having to re-add every new VM.
Exclusion options might be useful as some people virtualize their router and also there might be VMs in the group that should not be shutdown.
Also an option (on by default) that prevents a VM from being backed up if it is running. It is better to get a warnign message rather than a backup be potential broken.
The technology for this is ACPI: ACPI - Wikipedia
It will need a QEMU Guest Agent in the VM to respond to the ACPI shutdown commands.
There is ongoing work to add this capability.
My Ubuntu server VM recognises an ACPI broadcast and perfoms a soft shutdown when the stop
button has been pressed in TrueNAS, have I missed something?
Stop
: This sends an “ACPI power down command” to the VM. This will start a graceful shutdown of the guest OS. This is the same as briefly pressing the power button.
Good that this is being looked into though.
Perhaps the Agent is for more advanced services?
So, you could create a script to nightly shutdown the VM and snapshot with current tools. Have you tried that?
This is outside my skillset. I will just have to do this manually for the time being until it is added to the GUI.
Just to share my own experience with doing periodic snapshots on a running truenas VM: I have a small VM running Ubuntu server with docker and a few small apps which I’ve backed up with periodic snapshots for about 9 months while the VM was running. When I was using Cobia, a power outage broke the VM but I was able to restore it from the Cobia snapshots without any issues. Like I said, it’s a small VM and not exposed to the internet so these snapshots certainly have worked for me and have continued to use them on Dragonfish.
Snapshots by default are crash-consistent and should behave like this.
If the application is more complex and includes databases, then the crash can happen halfway through a transaction. The database should recover, but it may have not completed a transaction and takes longer to recover. With a graceful shutdown, the database can complete any transactions before shutting down.
2 Likes