home shape

Foxx in the Cluster

This tutorial should explain how Foxx service are distributed within a cluster and how the cluster is self-healing the environment, such that always a consistent state of services is present.

This tutorial is not supposed on how Foxx services work or how to write them, it assumes a situation where you have a developed service and want to deploy it to your production cluster.

scroll down line

The setup

For this tutorial we will start with a classical setup of ArangoDB cluster:

  • 3 Agents
  • 3 DBServers
  • 3 Coordinators

However for this case the Agents and DBServers are not too relevant, we will focus on the Coordinators which are hosting the Foxx services.

Furthermore we will add an additional fresh Coordinator during this tutorial.

All of the cluster starting is done with the cluster starter.

The testing script

During this tutorial, we will always execute the following unix script that tries to access the foxx service we are going to install (if you want to try it yourself please adjust the URLs accordingly):

#!/bin/bash
 
echo 'Coordinator1:'
curl -w "\n" -X GET http://example1.com:8530/aztecgod/random
 
echo 'Coordinator2:'
curl -w "\n" -X GET http://example2.com:8530/aztecgod/random
 
echo 'Coordinator3:'
curl -w "\n" -X GET http://example3.com:8530/aztecgod/random
 
echo 'Coordinator4:'
curl -w "\n" -X GET http://example4.com:8530/aztecgod/random

This script will try to call the route of the Foxx service on all 4 Coordinators and report their output. If you are not using a unix system you could just open the URLs in your favorite browser and press reload every time we trigger the script to see results.

Installing the service

In this tutorial, we will use a simple service that is available from the Foxx Store.
Feel free to use any other service to test this out, it will work independently from the service code.

The service we are going to use is itzpapalotl, a service that offers a route to randomly generate aztec god names.
We will call this route to verify if the service is online and serving properly.
But first let’s make sure we have a clean state on our cluster and we run the test script.
The output should look like:

Coordinator1:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator2:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator3:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator4:
 
curl: (7) Failed to connect to example4.com port 8530: Connection refused

The first three Coordinators are up and running, but the return 404, the service is not installed yet.
The fourth Coordinator is not yet started, we cannot connect.

Now we go to the webinterface of any of those coordinators http://example1.com:8530.

Then we navigate to the services, click on Add Service, will out /aztecgod as mountpoint
and select itzpapalotl from the list:

InstallService cut 1

After a short period of time the service should be displayed in the UI and is up and running:

runningService

Now let’s try which servers do respond:

Coordinator1:
{"name":"Coyolxauhqui"}
Coordinator2:
{"name":"Tezcatlipoca"}
Coordinator3:
{"name":"Huitzilopochtli"}
Coordinator4:
 
curl: (7) Failed to connect to 192.168.10.8 port 10021: Connection refused

As you can see we installed the service on only one of the coordinators.
Afterwards all three coordinators will respond successfully to requests on this service.
Note here: The names are random, if you test this yourself you may get different names.

Scaling up

Now that we have our service running our business is growing and we need to increase the number of servers.
Let us add an additional coordinator, again using the cluster starter, with the --cluster.start-dbserver=false flag, as we only want to add an addition coordinator.

After the coordinator has booted we directly try the test script again:

Coordinator1:
{"name":"Cihuacoatl"}
Coordinator2:
{"name":"CentzonTotochtin"}
Coordinator3:
{"name":"Mictlantecuhtli"}
Coordinator4:
{"name":"Mictlantecuhtli"}

All 4 Coordinators now serve the app, we did not have to do any bootstrapping on the new coordinator.

Server Outage

Next let us simulate a server-failure or unexpected reboot of one of the coordinators.
First we need to suspend the arangodb starter script, otherwise it will directly reboot the coordinator.
On unix we can just send a SIGSTOP to the script with:

kill -TSTP

Than we kill the coordinator process, in a rather unfriendly way (well we simulate the machine just powered off).

kill -9

After a couple of seconds we get a red-flag in the UI, so the cluster is now aware that the server is gone:

CoordinatorOffline

Now lets again issue the test script:

Coordinator1:
{"name":"XipeTotec"}
Coordinator2:
 
curl: (7) Failed to connect to example.com port 8530: Connection refused
Coordinator3:
{"name":"Coyolxauhqui"}
Coordinator4:
{"name":"Tlauixcalpantecuhtli"}

So in this test we have killed Coordinator 2.
This is not too exiting, but while the server is offline, let us uninstall the Service.

UninstallService cut

And again the test script:

Coordinator1:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator2:
 
curl: (7) Failed to connect to example2.com port 8530: Connection refused
Coordinator3:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator4:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}

So all three running coordinators have uninstalled the app successfully.
Now reboot Coordinator2 again reconnect it to the cluster.
On unix this can be done by sending kill -CONT .
And we try to access the service again (note Coordinator2 simply reboots, it has missed
the uninstall command):

Coordinator1:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator2:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator3:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}
Coordinator4:
{"error":true,"code":404,"errorNum":404,"errorMessage":"unknown path '/aztecgod/random'"}

Conclusion

As you can see now, Foxx is enhanced with a complete self-healing environment. We can add/remove any Coordinator (and DBServer) to our liking and the Foxx apps will stay in a consistent view. No need to worry to keep all of them in sync ArangoDB does that for you.

The self-healing is also able to survive server isolation and a couple of network split scenarios, like isolating a group of Coordinators from the other parts of the cluster.

As long as the database itself is operating all Coordinators will sync their services eventually.