ArangoDB can be scaled horizontally by using many servers, typically based on commodity hardware. This approach not only delivers performance as well as capacity improvements, but also achieves resilience by replication and automatic failover. Furthermore, one can build systems that scale their capacity dynamically up and down automatically according to demand. ArangoDB can also scale vertically by using ever larger servers. This has the disadvantage that the costs grow faster than linear with the size of the server, and none of the resilience and dynamical capabilities can be achieved in this way.
ArangoDB supports the data models key/value, documents and graphs, for documents one can additionally perform joins. These data models offer different opportunities for scalability. The possibility to scale decreases from key/value over documents (documents with joins) to graphs. In more detail, the different data models behave as follows:
The key/value store data model is the easiest to scale. In ArangoDB, this is implemented in the sense that a document collection always has a primary key attribute and in the absence of further secondary indexes the document collection behaves like a simple key/value store. The only operations that are sensible in this context are single key lookups and key/value pair insertions and updates. If key attribute is the only sharding attribute then the sharding is done with respect to the primary key and all these operations scale linearly. If the sharding is done using different shard keys, then a lookup of a single key involves asking all shards and thus does not scale linearly. For the key/value part of ArangoDB there are planned some improvements for the near future.
Graph databases are particularly good at queries on graphs that involve paths in the graph of an a priori unknown length. For example, finding the shortest path between two vertices in a graph, or finding all paths that match a certain pattern starting at a given vertex are such example. If the vertices and edges along the occurring paths are distributed across the cluster, then more communication is necessary between nodes, and performance is worse than a single node. If your use case allow it and you configure the collection in that sense you can store the vertices and their outgoing edges on the same shard. All graph queries and functions are already optimized to take care about this. This can be helpful in some use cases. There are surely more possibilities to optimizes and they will be added during our 3.x development phase.
For the document store case even in the presence of secondary indexes essentially the same arguments apply, since an index for a sharded collection is simply the same as a local index for each shard. Each shard only holds the part of an index which is needed by this shard. Therefore, single document operations still scale linearly with the size of the cluster.
For a deeper analysis of this topic see this blog post in which good linear scalability of ArangoDB for single document operations is demonstrated.
The AQL query language allows complex queries, using multiple collections, secondary indexes as well as joins. In particular with the latter, scaling can be a challenge, since if the data to be joined resides on different machines, a lot of communication has to happen. The AQL query execution engine organises a data pipeline across the cluster to put together the results in the most efficient way. The query optimizer is aware of the cluster structure and knows what data is where and how it is indexed. Therefore, it can arrive at an informed decision about what parts of the query ought to run where in the cluster.
The cluster architecture of ArangoDB is a CP master/master model with no single point of failure. With “CP” we mean that in the presence of a network partition, the database prefers internal consistency over availability. With “master/master” we mean that clients can send their requests to an arbitrary node, and experience the same view on the database regardless. “No single point of failure” means that the cluster can continue to serve requests, even if one machine fails completely.
Structure of a cluster
An ArangoDB cluster consists of a number of ArangoDB instances which talk to each other over the network. They play different roles, which will be explained in detail below. The current configuration of the cluster is held in the “Agency”, which is a highly-available resilient key/value store based on an odd number of ArangoDB instances running Raft Consensus Protocol.
For the various instances in an ArangoDB cluster there are 4 distinct roles: Agents, Coordinators, Primary and Secondary DBservers. In the following sections we will shed light on each of them. Note that the tasks for all roles run the same binary from the same Docker image.
One or multiple Agents form the Agency in an ArangoDB cluster. The Agency is the central place to store the configuration in a cluster. It performs leader elections and provides other synchronisation services for the whole cluster. While generally invisible to the outside it is the heart of the cluster. As such, fault tolerance is of course a must have for the Agency. To achieve that the Agents are using the Raft Consensus Algorithm. The algorithm formally guarantees conflict free configuration management within the ArangoDB cluster.
Coordinators are the ones the clients talk to. They will coordinate cluster tasks like executing queries and running Foxx services. They know where the data is stored and will optimize where to run user supplied queries or parts thereof. Coordinators are stateless and can thus easily be shut down and restarted as needed.
Primary DBservers are the ones where the data is actually hosted. They host shards of data and using synchronous replication a primary may either be leader or follower for a shard. They may also execute queries in part or as a whole when asked by a coordinator.
Secondary DBservers are asynchronous replicas of primaries. If you are using only synchronous replication, you don’t need secondaries at all. For each primary, there can be one or more secondaries. Since the replication works asynchronously (eventual consistency), the replication does not impede the performance of the primaries. The secondaries are perfectly suitable for backups as they don’t interfere with the normal cluster operation.
Many sensible configurations
This architecture is very flexible and thus allows many configurations, which are suitable for different usage scenarios:
- The default configuration is to run exactly one coorddinator and one primary DBserver on each machine. This achieves the classical master/master setup, since there is a perfect symmetry between the different nodes, clients can equally well talk to any one of the coordinators and all expose the same view to the datastore.
- One can deploy more coordinators than DBservers. This is a sensible approach if one needs a lot of CPU power for the Foxx services, because they run on the coordinators.
- One can deploy more DBservers than coordinators if more data capacity is needed and the query performance is the lesser bottleneck.
- One can deploy a coordinator on each machine where an application server runs, and the Agents and DBservers on a separate set of machines elsewhere. This avoids a network hop between the application server and the database and thus decreases latency. Essentially, this moves some of the database distribution logic to the machine where the client runs.
The important piece here is that the coordinator layer can be scaled and deployed independently from the DBserver layer.