Please consider monitoring your productive ArangoDB installation as part of the best practices strategy. It is effortlessly done using established services Prometheus for data collection and Grafana for visualisation and alerting.

ArangoDB Monitoring Dashboard

Best practices

Many of us have come to experience database infrastructure like a solid rock to rely upon when the going gets rough. It is the mission statement of any database, that comes to my mind right now: Trust us with the data and … – and what, exactly?

It turns out that running a single instance, let alone a whole cluster of database servers, demands a good deal of attention, just as operating a car, an airplane etc does. There are sensors and instrumentation that are provided for every professional appliance, that needs monitoring and alertness to guarantee safety of investment and service.

In this article, I propose some best practices, when running ArangoDB installations. The provided guide is easily applied to any sort of ArangoDB deployment. This includes single instance, active-failover and cluster installations inside or outside Kubernetes.

Metrics export

Some obvious parameters, that obviously constitute trouble for a database daemon, include overall and ArangoDB memory consumption, overall and ArangoDB CPU utilisation and network utilisation. But one can do much better by monitoring a host of different metrics, which are exported directly by ArangoDB instances starting with 3.6 through the _admin/metrics API. The API exports metrics spanning from IO in bytes all the way to RocksDB file size in prometheus exporter syntax as described here.

Actual working setup

In the following and for simplicity sake, I will walk us through a cluster monitoring setup. I will assume a monitoring machine, monitor, that is running ubuntu 18.04 and a 3 node ArangoDB cluster, c01 through c03, with 3 agents (port 8531), 3 coordinators (port 8529) and 3 db servers (port 8530).

We’ll start by installing Prometheus node exporter on each on the cluster nodes running our cluster instances:

This service automatically activates and starts. It is not a minimum requirement for monitoring but adds that extra valuable information about the general sanity of the operating system, that is hosting your ArangoDB instances. Now change to monitor to install the Prometheus collection and the Grafana server daemons:

These services need of course to be configured now. So edit next /etc/prometheus/prometheus.yml and add to the end the following sections:

Now restart the Prometheus daemon

Direct your browser now to monitor:9090/targets to check, if all targets show a green UP in the State columns.

Let’s now pay a visit to the Grafana integration. Use your browser to access monitor:3000. You will initially be challenged to login with admin login and password respectively followed by specifying a new password for the admin user. Subsequently, choose Configuration|Data Sources from the left icon column to add a new Prometheus data source. Here the only thing that needs to be set is the field URL to that of the Prometheus server above, namely, http://monitor:9090

Next, hover over the + sign in the navigation column to click on Import to add the content of my dashboard to the field Or paste JSON and click on Load. Assign the name ArangoDB on the following page and click on Import. Et voilá, habemus dashboard! Click on the ArangoDB dashboard to inspect your new dashboard.

Security, TLS and Authentication

Of course, production environments and arguably development and staging likewise should only operate hardened services, including the monitoring infrastructure. This implies that one should only operate ArangoDB deployments utilising encryption and authentication.

The necessary adjustments for Prometheus and Grafana can be made through a host of configuration options. Keep in mind, that authenticated ArangoDB clusters do not allow username, password-authenticated access to agents or db servers. Just use the JWT secret file of your cluster to generate an access token like so:

The string after bearer should be copied to the file /etc/prometheus/prometheus.token on monitor and then be included in the arangodb section of /etc/prometheus/prometheus.yml as follows:

Be mindful, that the token file is only readable by the prometheus service and root. Likewise guard your prometheus and grafana installations to allow only encrypted and authenticated access. Their respective documentations offer great guidelines into getting that done.

Oftentimes, while valid TLS certificates are created for a service, their use within the cluster (among the nodes, monitoring, etc.) is not as easily doable. This has to do with the need that certificates only be related to a set of FQDNs and/or fixed IP addresses. Cluster internal IP addresses and names are however subject to change in many reasonable deployments and thus the strict enforcement of TLS is not possible. If however, the setup is such that the safe operation is guaranteed through encapsulation from outside, it is possible to relax the constraints to allow monitoring, for example, to work.

Prometheus and Grafana both allow one to work under such relaxed constraints. Add to the prometheus.yml above the following line to access Arangodb metrics endpoints with otherwise invalid TLS certificates:

Of course, both Grafana and Prometheus are configurable for TLS access themselves and should only be operated secured in this manner in productive environments. Documentation for TLS configuration of these services is found here and here.

Outlook

The dashboard is by no means a solution that fits all needs, please adjust it to your specific needs. Also expect official and community dashboards to show up over time.