4.4. Disaster Recovery / Cold Failover¶
It's possible to configure Ngenea Hub to be ablo to cold-failover to another node if it's running on a PixStor.
4.4.1. Setup¶
4.4.1.1. Configure datastore¶
Configure Ngenea Hub to store it's persistent data on the GPFS filesystem so it can be read by multiple nodes. This is done by settings the following setting in /etc/sysconfig/ngeneahub
DATA_DIR=/mmfs1/.arcapix/ngeneahub/data
4.4.1.2. Configure Networking¶
It's strongly recommended to configure a floating IP that can be used for the Ngenea Worker to connect to. This will allow cold failover without having to reconfigure workers.
This can be done by setting the following settings in /etc/sysconfig/ngeneahub
:
SERVICE_CIDR
. Set this to the IP and netmask of the IP you want to be managed by ngeneahub. e.g.192.168.2.3/24
for the IP192.168.2.3
on a network with a netmask of255.255.255.0
SERVICE_INTERFACE
. Set this to the name of the interface the IP adress should be added to. e.g.man0
Configure the workers to use this IP by editing /etc/ngenea/ngenea-worker.conf
on each worker node and modifying broker_url
and result_backend
4.4.1.3. Install Ngenea Hub¶
Install Ngenea Hub on multiple nodes as usual. Make sure /etc/sysconfig/ngeneahub
are in sync across these nodes. Enable and start the service on one node only. Leave the service disabled and stopped on the other nodes.
4.4.2. Performing failover¶
In the case of a node failure, after confirming the services are no longer running on the other node, the following seteps can be peformed to bring the serive up on another node:
important You must be certain the service is not running anywhere else before continuing, otherwise data loss can occur.
Remove the lock file from
${DATA_DIR}/.lock
.Start the Ngenea Hub service
4.4.3. Migration from local datastore¶
AFter setting DATA_DIR
in /etc/sysconfig/ngeneahub
and restarting the service, data will automatically be migrated. This is a one-way operation.