This document describes three possibilities for EMG multi-node setups.
The main reason for having more than one EMG node is to add redundancy to be able to handle a situation where one EMG node fails.
To handle IP address failover in a multi-node setup the utility keepalived can be very useful.
EMG uses two different data stores: Indexed files for message queues, open delivery reports etc and MySQL for user authentication, connector configuration and more.
EMG needs to be licensed for each node in the multi-node setup. However, licenses for second and following nodes in a redundant setup is subject to a discount.
1. Active-passive with MySQL replication
If you have a primary standalone EMG node you can quite easily add a secondary node ready to take over if primary node fails.
The secondary node would be configured identically to the primary node with the difference that MySQL on the secondary server would be configured as a replication slave, replicating the “emg” database from the primary server.
- It is relatively easy to add a secondary node to an existing primary node.
- No additional components are needed (only MySQL built-in replication is used).
- Can be suitable even when latency between primary and secondary node is substantial (geographic redundancy).
- Queues and open delivery reports (in indexed files) will not be available on secondary node (unless they are made available on shared disk storage).
- Once primary node has failed and failover to secondary node has occured the secondary node would be considered primary and a new secondary node needs to be set up.
2. Active-active with Percona XtraDB Cluster and MongoDB
A more complete high-availability solution would be to use Percona XtraDB Cluster (PXC, a cluster replacement for MySQL) and MongoDB (replaces the indexed files) to create an active-active setup.
Both PXC and MongoDB are replicating data stores where all data will be replicated to all data nodes in the cluster.
Normally three nodes would be required for both PXC and MongoDB but it is possible to use two nodes as data (and EMG) nodes and the third node as an arbiter.
Since all data is replicated between the data nodes it is important that latency between nodes is kept to a minimum. Regardless there is a performance penalty for the replication.
- All data nodes in the cluster will be equivalent and no data is lost if one node goes down.
- More complex set up and administration of the components in the cluster.
- Not possible to extend an existing single-node setup to a multi-node cluster setup.
- It is necessary to have three nodes (even though one node can run on modest hardware).
- Not suitable when latency between nodes is high.
- Regardless of latency there is a performance penalty.
3. Active-active using standalone nodes and dlr failover
To achieve the performance of a stand-alone node yet handling multiple active nodes.
Two or more nodes can operate standalone but connect to operators using same account credentials on each node.
In the case when delivery reports are received by the “wrong” node it will be forwarded to next node for lookup.
- Highest performance.
- Simple setup.
- Nodes are not equivalent and message information will be stored on a specific node.
- MO messages may need some special handling.