Please note that this tutorial is valid for the ArangoDB 3.3 milestone 1 version of DC to DC replication!

This milestone release contains data-center to data-center replication as an enterprise feature. This is a preview of the upcoming 3.3 release and is not considered production ready.

In order to prepare for a major disaster, you can setup a backup data center that will take over operations if the primary data center goes down. For a server failure, the resilience features of ArangoDB can be used. Data center to data center is used to handle the failure of a complete data center.

Data is transported between data-centers using a message queue. The current implementation uses Apache Kafka as message queue. Apache Kafka is a commonly used open source message queue which is capable of handling multiple data-centers. However, the ArangoDB replication is not tied to Apache Kafka. We plan to support different message queues systems in the future.

The following contains a high-level description how to setup data-center to data-center replication. Detailed instructions for specific operating systems will follow shortly.

The components involved are:

The main components are:

  • Apache Kafka
  • Mirror Maker
  • ArangoDB Sync
  • ArangoDB Cluster

Installation

Kafka

This is the main data transport channel between the datacenters. Please follow the instruction on https://kafka.apache.org/ and https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330 to setup Apache Kafka in both data-centers and connect these two using Mirror Maker.

It operates on port 9092 and all data centers must be able to connect to all brokers of the other data centers on this port.

Furthermore, the brokers need to contact each other internally in each datacenter.

Besides brokers, a mirror maker process is used to transport Kafka messages between different data centers. The mirror maker processes need access to all Kafka brokers in both data centers (port 9092).

ArangoDB Cluster

Install the enterprise release on your system using the normal install procedure. Stop and disable the default single server using the appropriate linux command (for example, systemctl disable arangodb3.service on RedHat).

ArangoDB Sync

The two components of ArangoDB Sync are part of the enterprise package.

Syncmaster(s)

This is the main control component. They will handle the administrative work between the datacenters.

They will use tcp port 8629 and all data centers need to allow incoming traffic from the other datacenters as well as outgoing traffic to the other datacenter. Inside a data center there will be multiple masters (typically 3) (to make it fail safe). However direct communication between these is not required and can be turned off.

Instead they will coordinate themselves using the normal ArangoDB agency. That means they have to reach all possible agents in their datacenter on port 8531 and all possible coordinators on port 8529.

The Syncmasters will coordinate the work and delegate it to workers. These workers listen on port 8729 and need to be reachable from the Syncmasters.

Furthermore, they have to contact the datacenter local Kafka brokers on port 9092. They do not need to contact the brokers of the other datacenter directly.

Syncworker

These will execute the actual synchronization work.

They listen on port 8729 and must be able to talk to all coordinators & DBServers in the cluster (ports 8529 & 8530).

They also have to be able to contact the Syncmasters of their datacenter directly on port 8629.

Furthermore, they have to contact the datacenter local Kafka brokers on port 9092. They do not need to contact the brokers of the other datacenter directly.

Preparation

Create Certificates

The synchronization system needs 2 CA certificates, which must be
shared on all machines.

To create them (on 1 machine only), run (in folder containing Makefile) as root:

Then distribute the generated files to all machines in all data centers (in /etc/arango/certificates).

Start the ArangoDB Clusters

Create a ArangoDB cluster in each data-center. For example, follow the instruction given in https://www.arangodb.com/2016/12/starting-arangodb-cluster-easy-way/ to start a cluster using the ArangoDB starter. The cluster must use RocksDB and must run on the standard ports, i. e. Coordinators on port 8529, DBservers on port 8530 and agents on port 8531. Therefore it is important that the default single server instances are stopped.

Configuration Files

Create /etc/arangodb.env. This file contains environment variable with the following semantics (values are examples).

Create /etc/arangodb.env.local with the following content:

Start the Syncmaster(s)

For systemd based system use

[Unit]

[Service]

[Install]

to start the Syncmaster on at least one server.

Start the Syncworkers

Again for systemd based system use

[Unit]

[Service]

[Install]

to that the Syncworkers on the servers.

Starting the Syncronization

Once you’ve completed the setup of the 2 data centers using the above
instructions, you have to connect them to each other.

(Note: replace all IP addresses, DNS names, usernames & passwords with appropriate ones)

To initiate the sync process you first have to create a client certificate using our cluster ca that we created earlier:

This certificate can be used to authenticate against the sync master in
this datacenter. Note that this is for example needed by the sync master
in the other datacenter (see below).

Please make sure that at this point the firewalls are configured such
that the following connections are available:

  • The sync masters in the two DCs have to be able to reach each other.
  • The mirror makers have to be able to reach all Kafka brokers in the other data center.

We now want to set up synchronization from datacenter A to datacenter B. To this end we contact the sync master in datacenter B to tell it to start synchronization from datacenter A. The following command can be executed anywhere where one can reach the sync master in datacenter B (options --master.endpoint), note that we give all three instances, because we do not know who is currently in charge. The endpoints given in the --source.endpoint options are the endpoints of the sync masters in datacenter A. Authentication of the CLI tool with the sync master in datacenter B is via user name and password:

During this initial sync activation the generated client certificate
will be sent to the master of datacenter B, who in turn uses it to
authenticate itself with the sync master of datacenter A.

After that you should be able to get the status:

Finally, if you want to stop synchronization, use: