arangobench

arangobench is ArangoDB’s benchmark and test tool. It can be used to issue test requests to the database system for performance and server function testing. It supports parallel querying and batch requests.

arangobench is a client tool which makes network connections to an ArangoDB server in about the same way as a client application would do via an ArangoDB client driver. It thus often provides good enough throughput and performance estimates. It provides different test cases that can be executed, that reflect a broader set of use cases. It is useful to pick and run the test cases that most closely resemble typical or expected workloads.

General configuration

arangobench can be run on the same host as the ArangoDB server, or on a different host. When using it against a cluster, it must be connected to one of the cluster’s Coordinators. It can communicate over normal (unencrypted) TCP connections and encrypted SSL/TLS connections.

The most important general arangobench options are:

  • --server.endpoint: the server endpoint to connect to. This can be a remote server or a server running on the same host. The endpoint also specifies whether encryption at transit (TLS) should be used. Multiple endpoints can be provided. Example:

    arangobench \
      --server.endpoint tcp://[::1]::8529 \
      --server.endpoint tcp://[::1]::8530 \
      --server.endpoint tcp://[::1]::8531 \
      ...
    
  • --server.username and --server.password: these can be used to authenticate with an existing ArangoDB installation.
  • --test-case: selects the test case to be executed by arangobench. A list of the available test cases can be retrieved by running arangobench with the --help option. For detailed descriptions see Test Cases.
  • --requests: total number of requests to be executed by arangobench in the selected test case. If batching is used, multiple operations will still be counted individually, even though they may be sent together in a single request.
  • --runs: number of test case runs to perform. This option defaults to 1, but it can be increased so that result outliers have less influence on the test results.

General options that can affect test case performance and throughput:

  • --concurrency: number of parallel threads to use by arangobench. Increasing this value should normally increase throughput, unless some saturation or congestion is reached in either arangobench itself, the network layer or the ArangoDB instance. If increasing --concurrency does not improve throughput or even decrease it, it should be determined which part of the setup is the bottleneck.
  • --wait-for-sync: whether or not all write operations performed by test cases should be executed with the waitForSync flag. If this is true, write operations are blocking and only return once they have been fully acknowledged by the server’s disk subsystem(s). Setting --wait-for-sync to true will have a large negative impact on write performance and is thus not recommended for most use cases.
  • --async: if set to true, it will make arangobench send fire-and-forget requests. These requests will be responded directly after having been added to the server’s request processing queue. arangobench will not wait for the operation to be fully executed on the server side. All it checks for is whether the server was able to queue the operation(s). The arangobench test case may complete before the queued operations have been fully processed on the server. In addition, sending more requests than the server can handle for a prolonged time may lead to the server’s scheduler queue filling up. The server-side scheduler queue has a limited capacity, and once it is full, any further incoming requests will be rejected by the server with HTTP 503 “Service unavailable” until there is again some capacity in the queue.
  • --batch-size: by default, arangobench will send one HTTP request per test case operation. This is often okay for test cases that execute a certain AQL query or such, when there is naturally no other request to batch the query with. However, in some use cases multiple operations can actually be sent together in a single HTTP request. The prime example for this is bulk-inserting documents, which are normally sent to the server in batches by client programs anyway. Any value greater than 1 will make arangobench send batch requests. Using batching should normally increase the throughput.
  • --complexity: some test cases can be adjusted via the --complexity parameter, which often controls the number of document attributes that are inserted in document-centric test cases.
  • --keep-alive: whether or not arangobench should use HTTP keep-alive connections. This should always be turned on.

Important cluster-specific options are:

  • --number-of-shards: number of shards for collections created by arangobench. This option is only meaningful for test cases that create collections.
  • --replication-factor: number of replicas for each shard in collections created by arangobench. This option is only meaningful for test cases that write into collections. The larger the replication factor is, the more expensive write operations will become.

Test Cases

arangobench provides the following predefined test cases. The test case to be executed can be selected via the --test-case startup option.

Note that these test cases have been added over time, and not all of them may be fully appropriate for a given workload test. Some test cases are deprecated and will be removed in a future version.

In order to benchmark custom AQL queries, the appropriate test case to run is custom-query.

Test Case Description
aqlinsert performs AQL queries that insert one document per query. The --complexity parameter controls the number of attributes per document. The attribute values for the inserted documents will be hard-coded, except _key. The total number of documents to be inserted is equal to the value of --requests.
aqltrx (deprecated)
creates 3 empty collections and then performs different AQL read queries on these collections, partially using joins. This test was once used to test shard locking, but is now largely obsolete. In a cluster, it still provides a little value because it effectively measures query setup and shutdown times for concurrent AQL queries.
aqlv8 (deprecated)
performs AQL queries that insert one document per query. The --complexity parameter controls the number of attributes per document. The attribute values for the inserted documents are generated using AQL functions RAND() and RANDOM_TOKEN(). The total number of documents to be inserted is equal to the value of --requests.
collection creates as many separate (empty) collections as provided in the value of --requests.
counttrx (deprecated)
executes JavaScript Transactions that each insert 50 (empty) documents into a collection and validates that collection counts are as expected. There will be 50 times the number of --requests documents inserted in total. The --complexity parameter is not used.
custom-query executes a custom AQL query, that can be specified either via the --custom-query option or be read from a file specified via the --custom-query-file option. The query will be executed as many times as the value of --requests. The --complexity parameter is not used.
crud will perform a mix of insert, update, get and remove operations for documents. 20% of the operations will be single-document inserts, 20% of the operations will be single-document updates, 40% of the operations are single-document read requests, and 20% of the operations will be single-document removals. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted and updated documents.
crud-append will perform a mix of insert, update and get operations for documents. 25% of the operations will be single-document inserts, 25% of the operations will be single-document updates, and 50% of the operations are single-document read requests. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted and updated documents.
crud-write-read will perform a 50-50 mix of insert and retrieval operations for documents. 50% of the operations will be single-document inserts, 50% of the operations will be single-document read requests. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted documents.
deadlocktrx (deprecated)
creates two collections and executes JavaScript Transactions that first access one collection, and then the other. This test was once used as a means to detect deadlocks caused by collection locking, but is obsolete nowadays. The --complexity parameter is not used.
document performs single-document insert operations via the specialized insert API (in contrast to performing inserts via generic AQL). The --complexity parameter controls the number of attributes per document. The attribute values for the inserted documents will be hard-coded. The total number of documents to be inserted is equal to the value of --requests.
edge-crud will perform a mix of insert, update and get operations for edges. 25% of the operations will be single-edge inserts, 25% of the operations will be single-edge updates, and 50% of the operations are single-edge read requests. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted and updated edges.
hash will perform a mix of insert, update and get operations for documents. The collection created by this test does have an extra, non-unique, non-sparse persistent index on one attribute. 25% of the operations will be single-document inserts, 25% of the operations will be single-document updates, and 50% of the operations are single-document read requests. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted and updated documents. This test case can be used to determine the effects on write throughput caused by adding a secondary index to a collection. It originally tested a hash index, but both the in-memory hash and skiplist index types were removed in favor of the RocksDB-based persistent index type.
import-document performs multi-document imports using the specialized import API (in contrast to performing inserts via generic AQL). Each inserted document will have two attributes. The --complexity parameter controls the number of documents per import request. The total number of documents to be inserted is equal to the value of --requests times the value of --complexity.
multi-collection (deprecated)
creates two collections and then executes JavaScript Transactions that first write into one and then the other collection. The documents written into both collections are identical, and the number of their attributes can be controlled via the --complexity parameter. There will be as many JavaScript Transactions as --requests, and twice the number of documents inserted.
multitrx (deprecated)
creates two collections and then executes JavaScript Transactions that read from and write to both collections. There will be as many JavaScript Transactions as --requests. The --complexity parameter is ignored.
random-shapes (deprecated)
will perform a mix of insert, get and remove operations for documents with randomized attribute names. 33% of the operations will be single-document inserts, 33% of the operations will be single-document reads, and 33% of the operations are single-document removals. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted documents.
shapes (deprecated)
will perform a mix of insert, get and remove operations for documents with different, but predictable attribute names. 33% of the operations will be single-document inserts, 33% of the operations will be single-document reads, and 33% of the operations are single-document removals. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted documents.
shapes-append (deprecated)
will perform a mix of insert and get operations for documents with randomized attribute names. 50% of the operations will be single-document inserts, and 50% of the operations will be single-document reads. There will be a total of --requests operations. The --complexity parameter can be used to control the number of attributes for the inserted documents.
skiplist (deprecated)
identical to the hash test case nowadays.
stream-cursor (deprecated)
creates 500 documents in a collection, and then performs a mix of AQL update queries (all on the same document) and a streaming AQL query that returns all documents from the collection. The --complexity parameter can be used to control the number of attributes for the inserted documents and the update queries. This test will trigger a lot of write-write conflicts with --concurrency bigger than 2.
version queries the server version and then instantly returns. In a cluster, this means that Coordinators instantly respond to the requests without ever accessing DB-Servers. This test can be used to establish a baseline for single server or Coordinator throughput. The --complexity parameter is not used.

Troubleshooting

The test cases provided by arangobench vary significantly in how they perform operations. For example, inserting documents into ArangoDB can be achieved by either:

  • single-document inserts (one document per request to /_api/document)
  • multi-document inserts (multiple documents per request to /_api/document or /_api/import)
  • single-document AQL insert queries (one document per AQL query to /_api/cursor)
  • multi-document AQL insert queries (multiple documents per AQL query to /_api/cursor)
  • JavaScript Transactions (one or multiple documents per transaction to /_api/transaction)
  • other ways omitted here, such as Stream Transactions, because none of the current test cases makes use of them

Especially for insert operations, AQL queries and JavaScript Transactions will have higher setup and teardown costs than plain Document API operations. Thus, it is likely that higher throughput can be achieved by using the specialized Document APIs in throughput tests, rather than AQL queries or JavaScript Transactions.

Many test cases can benefit from using request batching, that can be turned on in arangobench via the --batch-size option. Batching makes sense in cases where a client application would also send multiple operations in a single request, e.g. when inserting documents in bulk. Batching is often the easiest way to improve the throughput.

If increasing --concurrency for a given benchmark does not increase throughput or even decrease it, it is likely that some saturation or congestion is occurring somewhere in the stack. On Linux systems, running top during the tests on the participating hosts should reveal details about CPU usage (user, system, iowait) and memory usage. This can be used as a quick first probe to see if any of the hosts is maxed out on a particular resource (CPU power, available RAM, I/O throughput). The ArangoDB servers also provide detailed metrics about CPU usage, I/O wait, memory usage etc., which can be monitored during the benchmarks. The most useful way of analyzing metrics is to have them scraped automatically by a Prometheus instance and to make them available via Grafana. This allows metrics to be collected over time and to compare them for multiple test runs with different configurations. If Prometheus/Grafana cannot be used, the ArangoDB web interface also provides a way to access the current values of all metrics for privileged accounts.

A few useful metrics on the ArangoDB server side include:

  • arangodb_process_statistics_resident_set_size: resident memory usage for an arangod process, in bytes. Can be used to determine how much memory was used during the tests. If the memory usage is close to the capacity limit, this indicates that adding more memory could speed things up.
  • arangodb_process_statistics_resident_set_size_percent: resident memory usage for an arangod process, in percent of available RAM. Can be used to determine how much memory was used during the tests. If the memory usage is close to the capacity limit, this indicates that adding more memory could speed things up.
  • arangodb_scheduler_queue_full_failures_total: contains the number of rejected requests because the scheduler’s queue capacity was exceeded. If this contains values other than 0, it indicates overload (i.e. more requests coming in than can be handled by the server).
  • arangodb_scheduler_queue_length: number of requests in scheduler queue at a given point. This is expected to deviate from 0 if requests are queued, but the closer it is to the configured maximum queue length (default value: 4096), the more close the server is to its processing capacity limit.
  • arangodb_server_statistics_system_percent: on Linux, provides total %sys CPU time for a host (most likely used by arangod or arangobench).
  • arangodb_server_statistics_user_percent: on Linux, provides total %user CPU time for a host (most likely used by arangod or arangobench).
  • arangodb_server_statistics_iowait_percent: on Linux, provides total %iowait CPU time for a host (most likely used by arangod or arangobench).

Another thing to check is the arangobench location: if arangobench runs on the same host as the ArangoDB server, arangobench and arangod may compete for the same resources. This may be appropriate if the production use case will also use a localhost setup. If this is not the case, arangobench should be executed on a separate host and send its request over the network, in the same way as the client applications are expected to do later.