4.4. Disaster Recovery / Cold Failover

It’s possible to configure Ngenea Hub to be ablo to cold-failover to another node if it’s running on a PixStor.

4.4.1. Setup

4.4.1.1. Configure datastore

Configure Ngenea Hub to store it’s persistent data on the GPFS filesystem so it can be read by multiple nodes. This is done by settings the following setting in /etc/sysconfig/ngeneahub

DATA_DIR=/mmfs1/.arcapix/ngeneahub/data

4.4.1.2. Configure Networking

It’s strongly recommended to configure a floating IP that can be used for the Ngenea Worker to connect to. This will allow cold failover without having to reconfigure workers.

This can be done by setting the following settings in /etc/sysconfig/ngeneahub:

  • SERVICE_CIDR. Set this to the IP and netmask of the IP you want to be managed by ngeneahub. e.g. 192.168.2.3/24 for the IP 192.168.2.3 on a network with a netmask of 255.255.255.0

  • SERVICE_INTERFACE. Set this to the name of the interface the IP adress should be added to. e.g. man0

Configure the workers to use this IP by editing /etc/ngenea/ngenea-worker.conf on each worker node and modifying broker_url and result_backend

4.4.1.3. Install Ngenea Hub

Install Ngenea Hub on multiple nodes as usual. Make sure /etc/sysconfig/ngeneahub are in sync across these nodes. Enable and start the service on one node only. Leave the service disabled and stopped on the other nodes.

4.4.2. Performing failover

In the case of a node failure, after confirming the services are no longer running on the other node, the following seteps can be peformed to bring the serive up on another node:

important You must be certain the service is not running anywhere else before continuing, otherwise data loss can occur.

  • Remove the lock file from ${DATA_DIR}/.lock.

  • Start the Ngenea Hub service

4.4.3. Migration from local datastore

AFter setting DATA_DIR in /etc/sysconfig/ngeneahub and restarting the service, data will automatically be migrated. This is a one-way operation.