Set up the ArangoSync Workers for Datacenter-to-Datacenter Replication

The ArangoSync Worker is responsible for executing synchronization tasks.

For optimal performance at least 1 worker instance must be placed on every machine that has an ArangoDB DB-Server running. This ensures that tasks can be executed with minimal network traffic outside of the machine.

Since sync workers will automatically stop once their TLS server certificate expires (which is set to 2 years by default), it is recommended to run at least 2 instances of a worker on every machine in the datacenter. That way, tasks can still be assigned in the most optimal way, even when a worker is temporarily down for a restart.

To start an ArangoSync Worker using a systemd service, use a unit like this:

[Unit]
Description=Run ArangoSync in worker mode
After=network.target

[Service]
Restart=on-failure
EnvironmentFile=/etc/arangodb.env
EnvironmentFile=/etc/arangodb.env.local
Environment=PORT=8729
LimitNOFILE=1000000
ExecStart=/usr/sbin/arangosync run worker \
    --log.level=debug \
    --server.port=${PORT} \
    --server.endpoint=https://${PRIVATEIP}:${PORT} \
    --master.endpoint=${MASTERENDPOINTS} \
    --master.jwtSecret=${MASTERSECRET}
TimeoutStopSec=60

[Install]
WantedBy=multi-user.target

The ArangoSync Worker must be reachable on a TCP port ${PORT} (used with --server.port option). This port must be reachable from inside the datacenter (by sync masters).

Note the large file descriptor limit. The sync worker requires about 30 file descriptors per shard. If you use hardware with huge resources, and still run out of file descriptors, you can decide to run multiple sync workers on each machine in order to spread the tasks across them.

The sync workers should be run on all machines that also contain an ArangoDB DB-Server. The sync worker can be memory intensive when running lots of databases & collections.