OK so we have 2 truenas arrays and have truenas1 replicating to truenas2 on a regular basis. I have shares (nfs and smb) setup on truenas1 and can see those datasets on truenas2 as well as all the snapshots just in RO mode. In the event of a true DR scenario. I think the process to failover would be:
Make sure truenas1 is no longer replicating to truenas2
Edit each replicated dataset on truenas2 and set to RW
Create shares on truenas2 that points to the RW datasets
Repoint all users at the shares on truenas2
Then in the event that truenas1 is repaired and back online, I assume I will need to destroy all data (provided there is still data after the failure) and setup replication jobs from truenas2 to truenas1 and repeat the process in a downtime window to failback. Is this basically the process? Am I missing anything?
I would use a logical IP address rather than repoint clients.
Have a production IP that is an extra IP on truenas1 and if it fails, once you break replication and make truenas2 read/write then move the extra IP from truenas1 to truenas2. Once the clients ARP cache times out they will automatically hit truenas2.
This part is a bit more complicated. It starts with your hardware and the design of your system. I personally opt for separate server head and JBOD for precisely this reason. If my server head fails I can relatively quickly introduce a new head, connect it to the JBOD upload my config and we are back in business. If however my JBOD fails (which believe it or not has been known) I move the 90 drives over to a spare JBOD I have and connect back to the head. Both of these scenarios are fairly quick to action and probably no need to make your backup system RW. I use Microsoft DFS to act as a global namespace and have two entries per share primary and backup and in the event the primary vanishes then users auto redirect to the backup server in RO mode. Itâs also unlikely that I have lost the pool or any data but if that day ever comes then like you say active your backup system. I like to keep my pools confined to a single JBOD as I can move them around a bit like lego blocks if needed. Things can get messy with all-in-one systems or when your pool spans multiple chassis.
In this scenario is the extra IP the on you point your DNS at then? so I add an extra IP to truenas1 say 192.168.200.10 and then truenas1 is bound to 192.168.200.11 and truenas2 is bound to 192.168.200.12. All jobs and replictions and what not use the .11/.12 combo but clients map to shares via .10? Then in a failure, move .10 to truenas2 and remount shares?
Each TN has a âlocalâ address for managing it and there is one âsharedâ address used by only one of the two TN at a time.
You want the users to use the shared address (and make sure it is configured on only ONE of the two TN at a time).
But you want things like TN1 to TN2 replication and SSH keys to use the individual IP addresses. You do not want TN2 trying to replicate from TN1 via the shared IP address since that address will be assigned to TN2 during a failure (or even maintenance outage) of TN1.
YepâŚthis makes sense. Thanks for the advice, I will implement this way. As a follow-up, do you configure your SMB service to only listen on the shared IP address, or do you bother with this step at all?
right I would leave SMB service disabled on DR site. Just thinking in the event of a DR, I would move the logical IP from the production TN to the DR TN and then user shares which were mapped to the prod TN would still work on the same IP when its on the DR TN and nothing would have to change as far as user config or GPO mapped shares this way. But I agree with not binding SMB to only that IP, it wouldnât really matter anyway if prod TN is dead its not listening on that IP anymore anyway.
One more clarification. When adding this logical IP address, do I just add this to the existing physical network that has the local IP address on it? So that interface would have 2 IPs, and then in DR I add the logical IP to the DR TN existing
physical interface?