When your usage of ArangoDB begins to exceed what your hardware can support, it’s time to scale. While you can scale vertically, moving ArangoDB onto a better server, there are definite advantages to scaling horizontally, by linking several database servers into a cluster.
Clustering ArangoDB not only delivers better performance and capacity improvements, it also provides resilience through replication and automatic failover. Moreover, you can deploy systems that dynamically scale up and down according to demand.
ArangoDB supports three data models: key/value, document and graph. There are different ways to scale these models. The Key/Value store can scale more easily than the Document store or documents with join. The Document store in turn scales better than graphs.
ArangoDB implements the Key/Value store as a document collection that always has a primary key attribute. Absent secondary indexes, the collection always behaves as a simple key/value store. It is the easiest data model to scale.
The only sensible operations in this context are single key lookups and key/value pair insertions and updates. If the key attribute is the only sharding attribute, then sharding is done with the primary key and all operations scale linearly. If sharding is done using different shard keys, then a lookup of a single key involves asking all the shards and thus does not scale linearly.
Databases using Graph store are particularly good at queries that follow paths of an unknown length. For instance, finding the shortest path between two vertices in a graph or finding all paths that match a certain pattern starting at a given vertex.
When the vertices and edges along the path are distributed across a cluster, then the query requires more communication between the nodes and performance is worse than when the path is limited to a single node. If your use case allows it, you can configure the collection to store the vertices and their outgoing edges on the same shard. All graph queries and functions are optimized to handle this for you.
Using the Document Store is very similar to using the Key/Value Store. Since the index for a sharded collection is the same as the local index for each shard. Each shard holds only the parts of the index that it needs. Therefore, single document operations scale linearly with the size of the cluster.
The AQL query language allows for complex queries using multiple collections, secondary indexes as well as joins. In particular, with joins, scaling can be a challenge. Since the joined data can reside on different machines, it can require a lot of communication between the nodes. The AQL execution engine organizes a data pipeline across the cluster to put together the results in the most efficient manner. The query optimizer is aware of the cluster structure and knows what data is where and how it’s indexed. Therefore, it can arrive at an informed decision about what parts of the query ought to run where in the cluster.
For a deeper dive into the linear scalability of document operations in ArangoDB, see this blog post.
Following the CAP theorem, clusters in ArangoDB use the CP master/master model with no single point of failure. When cluster encounters a network partition, ArangoDB prefers to maintain its internal consistency over availability. Clients experience the same view of the database regardless of which node they connect to. And, the cluster continues to serve requests even when one machine fails.
Structure of a Cluster
ArangoDB clusters consist of a number of ArangoDB instances that talk to each other over the network. These instances play different roles in the cluster, as agents, primary and secondary database servers, and coordinators. ArangoDB holds the current cluster configuration in the Agency, a highly available and resilient key/value store kept on an odd number of ArangoDB instances running the Raft Consensus Protocol.
The Agency key/value store consists of one or more ArangoDB instances in the cluster serving as agents. It is a central place used to store the configuration of the cluster. It performs leader elections and provides other synchronization services.
Although it is generally unseen to the outside world, the Agency is the heart of the cluster. To achieve the necessary fault tolerance, agents use the Raft Consensus algorithm to formally guarantee conflict free configuration management within the ArangoDB cluster.
When clients communicate with the ArangoDB cluster, they talk to coordinators. These instances coordinate tasks such as executing queries and running Foxx services. They know where the data is stored and optimize where to run user supplied queries. Coordinators remain stateless, so you can easily shut them down and restart as needed.
Primary Database Servers
ArangoDB instances that actually host data are the Primary database servers. Each of these hosts shards of data and uses synchronous replication to function as either a leader or follower for the shard. They then execute queries in part or as a whole when asked by a coordinator.
Secondary Database Servers
Where primaries handle replication synchronously, Secondary database servers provide asynchronous replicas of the primaries. Each primary in your cluster can have one or more secondaries.
Since replication works asynchronously with eventual consistency, the process doesn’t impede the performance of the primary servers. Also, given that they don’t otherwise interfere with normal cluster operations, the secondaries are perfectly suitable for backups.
The architecture of ArangoDb clusters is very flexible, allowing for many different configuration and deployment variations to suit your particular needs.
- Default Configuration: You might start by running one coordinator and one primary database server on each machine in your cluster. This provides the classical master/master setup. With perfect symmetry between nodes, clients can talk equally well to any of the coordinators and all serve the same view of the data store.
- Coordinator Emphasis: You can deploy more coordinators than database servers. This approach works best in cases where you need more CPU power for Foxx services. Foxx services run on coordinators. By adding machines that only host coordinators, you allow for greater CPU usage for these services.
- Database Server Emphasis: You can deploy more database servers than coordinators.This approach is useful in cases where you need to increase data capacity more than you need to improve query performance. It introduces a lesser bottleneck as there are fewer coordinators to receive client connections.
- Application Coordinators: You can deploy coordinators on the application server.This approach moves the database distribution logic from a remote host to the same server that initiates the client connection, separating it from the agents and the database servers. It also avoids a network hop between the application server and the database, decreasing latency.
The takeaway from these various examples is that you can scale and deploy the coordinator layer of an ArangoDB cluster independent from the database server layer.