How to Monitor ArangoDB using collectd, Prometheus and Grafana

Sign up for ArangoGraph Insights Platform

Before signing up, please accept our terms & conditions and privacy policy.

What to expect after you signup
You can try out ArangoDB Cloud FREE for 14 days. No credit card required and you are not obligated to keep using ArangoDB Cloud.

At the end of your free trial, enter your credit card details to continue using ArangoDB Cloud.

If you decide that ArangoDB Cloud is not (yet) for you, you can simply leave and come back later.

How to Monitor ArangoDB using collectd, Prometheus and Grafana

How to Monitor ArangoDB using collectd, Prometheus and Grafana

Information on how to set up a monitoring system for ArangoDB (standalone or cluster)

Introduction

ArangoDB provides several statistics via HTTP/JSON APIs. Such statistics can be used to monitor ArangoDB, when collected, stored and then visualized.

In this Article we will present an ArangoDB monitoring approach that makes use, under Linux, of the tools collectd, Prometheus and Grafana. We will start with an overview on how to install and configure the needed tools. Then we will walk you through the necessary steps required to get some data through the pipeline and visualize it. A more complete example is then included. Finally, we will provide an example to monitor the health of an ArangoDB Cluster.

Required Software Tools and Components

The following is the list of tools used in this setup:

  • ArangoDB
  • collectd
  • Prometheus
  • Grafana

The data flow between the above tools is as follows:

  1. collectd data from ArangoDB, using its plugin curl_json
  2. Prometheus fetches data from collectd, which presents it via its plugin write_prometheus (available since collectd v. 5.7)
  3. Grafana queries Prometheus to visualize the data

Installing the software

We assume you already installed ArangoDB.

For this setup to work, you will need at least one instance of collectd. Please use version 5.7 or higher, so the required write_prometheus plugin is included. You may prefer to install collectd on every server in your setup, as it can feed lots of valuable information about those systems into your Prometheus database, like CPU, memory or disk usage, which can complement the data from ArangoDB nicely. However, one installation suffices to get the information provided by ArangoDB and you may want to start with that.

Finally, you need to install Prometheus and Grafana.

Basic configuration

In the following examples, we use the following names for the different installation:

  • coordinator.arangodb.local for one ArangoDB coordinator
  • collectd.local for your collectd instance
  • prometheus.local for your Prometheus instance

These may also be installed on the same machine. Just replace the names used here with the actual names (or plain IP addresses) of your installations.

collectd

Assuming you are using a default collectd installation, it should already contain the following lines in /etc/collectd/collectd.conf to include additional *.conf files in the directory

/etc/collectd/collectd.conf.d:

You may want to set/add a line to specify the time interval in seconds after which collectd fetches another set of data:

However, this can also be set for each plugin separately.

Now add the following file to configure the write_prometheus plugin:

with the following content:

After (re)starting collectd, the Prometheus interface should already be available. To check if it works, open the address http://collectd.local:9103/metrics in your browser. Do not forget to replace collectd.local with your actual collectd server. You should see something like this:

Now we are ready to connect collectd to Prometheus.

Prometheus

A minimal working configuration file looks like this:

In case you already have a configuration file, you only need to add the line - collectd.local:9103 to an existing job node, or add your own. You may also add multiple targets here if you chose to install multiple collectd instances. Later you will be able to discern metrics between the targets as Prometheus will enrich your time series with the labels instance="collectd.local:9103" and job="node".

You may also want to configure how often Prometheus fetches data from collectd (taking into account also the Interval setting of collectd):

The default setting for scrape_interval is 1m. More information can be found in the Prometheus documentation on configuration.

After (re)starting Prometheus, visit http://prometheus.local:9090/targets in your browser. There should be a table node containing your endpoint, and its State should be UP: this means Prometheus is already scraping data from your collectd instance. It may take a minute (depending on the scrape_interval you have used) until the status changes from UNKNOWN to UP.

Prometheus is now set up.

Grafana

After logging into your Grafana installation, you should arrive at the Home Dashboard , where there is a link to Create your first data source. Alternatively, navigate to ConfigurationData sources and from there to Add data source.

Fill out the field Name for your Prometheus data source (choose freely). You probably want to check the box Default to set it as your default data source. As Type, choose Prometheus.

Add your Prometheus server under HTTPURL: http://prometheus.local:9090.

Finally, click on Save & Test. If everything is configured correctly, you should get the message Data source is working.

Step-by-step example: Adding data to the pipeline

In this example, we add two metrics to our setup:

  1. The total physical memory in the ArangoDB Cluster (the sum of the physical memory of all Coordinators)
  2. The total resident set size, i.e. the amount of memory used by the ArangoDB instances

Other metrics can be added the same way.

Initial configuration of collectd / curl_json

This step has to be done only once. You can extend the configuration later as needed.
Add a config file for the curl_json collectd plugin:

Optionally, you may override the Interval setting, specifying every how many seconds curl_json should fetch data from ArangoDB. Please note that choosing a very low setting may generate load and therefore reduce the performance of the database.

Also optionally, you may add an Instance parameter. If you do set it, for example to arango_coordshort, the label curl_json="arango_coordshort" will be added to all metrics configured in the < URL > block. Otherwise, the label curl_json="default" will be used.

You have to configure your credentials User and Password which you use to login to http://coordinator.arangodb.local:8529/.

Also, please create the file /etc/collectd/arangodb_types.db. It may initially be empty.

Getting data from ArangoDB to collectd with curl_json

The URL http://coordinator.arangodb.local:8529/_admin/aardvark/statistics/coordshort may be visited with a browser to get an overview of the available data. The response looks something like this:

So the data we’re looking for is available under data/physicalMemory and data/residentSizeCurrent, respectively. These need to be added in the curl_json configuration above.

First we add two new types:

Using these types, curl_json will use the names
collectd_curl_json_coordshort_physicalMemory and collectd_curl_json_coordshort_residentSizeCurrent for the metrics. You may choose your own names for the types. If you just use builtin types (e.g. gauge) instead, all data will be fed into the same metric (e.g. collectd_curl_json_coordshort_gauge) and can only be discerned using labels.

Now replace the lines

In your <URL> block with:

The Key is the path to the data in the JSON document above, while the Type is the one we added to /etc/collectd/arangodb_types.db.

After a restart of collectd and a minute (or whatever Interval is configured) of waiting, corresponding lines similar to the following should appear in the endpoint of write_prometheus:

A minute or so (depending on scrape_interval) later the first values should arrive in Prometheus. This can be checked by executing, for example, the query collectd_curl_json_coordshort_physicalMemory in the Prometheus GUI under Graph. It should yield some results in either the Console or the Graph tab. If the message No datapoints found. appears, the metrics weren’t scraped (yet).

Creating a graph in Grafana

Now that the metrics on physical memory and resident set size, named collectd_curl_json_coordshort_physicalMemory and collectd_curl_json_coordshort_residentSizeCurrent, respectively, arrived in Prometheus, graphs to visualize them can be added in Grafana.

First, create a new dashboard (unless you created one already): either click on Create your first dashboard on Grafana’s Home Dashboard, or navigate to Create → Dashboard. You have to save all changes made to a dashboard explicitly, either by pressing Ctrl+S, or by clicking on the floppy disk symbol in the upper right.

Then, a New panel dialog should be open. You can add more panels to the dashboard with the Add panel button in the upper right. Select the Graph visualization.

Navigate to Panel title and Edit.

In the General tab, you can set the panel’s Title; e.g. ArangoDB cluster: total memory. In the Metrics tab, set query A to collectd_curl_json_coordshort_physicalMemory and set the Legend format to Physical memory. Now add another query B, set it to collectd_curl_json_coordshort_residentSizeCurrent and its Legend format to Resident set size. Switch to the Axes tab, and set Left Y’s Unit to Data (IEC)bytes. Close the panel by clicking on the X to the right.

If you are satisfied with the result, do not forget to save the dashboard!

More complete configurations

Add the following lines to

and the following lines in the <URL> block in

Hence restart collectd

Grafana dashboard

In the Grafana GUI, navigate to CreateImport and paste the following JSON to get a dashboard with some cluster graphs. You only need to select your data source to configure it. The dashboard was created with Grafana 4.6.3, the current stable version at the time of writing this Article. If there are problems importing it, check your version first.

Expand for full JSON

Adding ArangoDB Cluster Health info to collectd/Prometheus/Grafana

To perform this step we assume you already have a working setup of ArangoDB, collectd, Prometheus and Grafana (see previous sections).

The Cluster Health information, that is used to show the number of Coordinators and DBServers on the Dashboard of the ArangoDB Web Interface, while available as JSON via HTTP, is not suitable for direct consumption with the curl_json plugin in collectd. However, it is possible to get around this limitation using the exec plugin and a small script.

Requirements

The packages curl and jq need to be installed on your system.

Adding and configuring the plugin in collectd

Create the following bash script:

Make the script above executable:

Add the following types to the types database:

Register it with the exec plugin by creating this file:

The address coordinator.arangodb.local:8529 needs to be set to a coordinator of the Cluster to monitor. If needed, username and password can be provided in the URL for HTTP basic auth, i.e. replace http://coordinator.arangodb.local:8529 with http://USERNAME:PASSWORD@coordinator.arangodb.local:8529. Note that the password can be read by users on the same system using ps. User and group (nobody and nogroup) can be chosen freely, as long as they have permission to execute the script /etc/collectd/arango_cluster_health.plugin.bash.

Adding useful dashboards

The following JSON documents can be added to the rows array of the Grafana dashboard example shared above.

Expand for full JSON

You can alternatively add them manually, by adding a panel of type Singlestat. Add one each for the total number of Coordinators and DBServers, using the metrics collectd_exec_health_coordinatorsTotal and collectd_exec_health_dbserversTotal, respectively. Go to the Options tab, and under Value, set Stat to Current. Then, add one each for the number of faulty Coordinators and DBServers.

As queries, use collectd_exec_health_coordinatorsTotal - collectd_exec_health_coordinatorsGood and collectd_exec_health_dbserversTotal - collectd_exec_health_dbserversGood, respectively.

Under Options, also set Stat to Current. Check the box ColoringBackground, set Thresholds to 1,1 and choose an all-clear color (e.g. green) as the first and a warning color as the second (e.g. red) and third. That way, as soon as one server goes down, the panel turns red.

The following is a screenshot of a possible Grafana dashboard:

Grafana dashboard