UPDATE: ArangoDB 2 introduced sharding! 🙂
Original blog post:
In ArangoDB’s google group there was recently an interesting discussion on what ArangoDB should offer in terms of replication and sharding. For the rest of you who does not follow the posts in the group, I have copied Frank Celler’s answer into this post:
We will start with a master-slave, asynchronous replication for 1.4. This has at least the following advantages:
- It is a good fit for most use cases.
- It will allow us to implement backup as “slave”.
- It easily gives you redundancy by setting up multiple instances.
- It gives you read-scaling.
There are also drawbacks. For example, you need to manually select and switch masters in case of fail-over. However, restricting to a simple solution (which is still hard enough to implement) should allow us to release V1.4 this summer. If you think about MySQL, you will see that in most case a master-slave replication is sufficient.
The next step will be master-master replication. This, however, requires more complex protocols like Paxos to elect a master and at least three nodes. We have to decide, if this will be in version 1.5 or maybe already 2.0. We have to see how much has to be changed.
We do not want to go the road Riak has chosen, to scale out unlimitedly.
In our experience there really is a trade-off between “scaling” and “querying”. If you scale massively, than you restrict yourself basically to key/value queries. That is what Riak is extremely good at. That is not what we are aiming at. We want to replicate to a moderate number of computers and assume that most of the time the whole dataset fits onto a single computer. With SSD and RAM getting cheaper and cheaper, this is not a totally unreasonable assumption. However, there will be cases where we want to shard the data. If we do this, it will restrict the possible queries for that collection in some way. Maybe restricting complex queries to one shard or only allowing very simple queries on sharded collections. The same is true for transactions.
As Frank Meyer has pointed out, there will problems, if for instance you implement something based on the graph features and you become the next Twitter. However, if this happens no automagic sharding or replication will help you. Amazon changed their view of the world and throw away landmarks like consistency, so that they can put their vast amount of data into Dynamo. But this does require carefully planning of the application. If you have so much data, then you end up with Dynamo or Riak or some-other massively distributed key-value store.
With our replication and sharding feature, we would like to implement something useful for a lot of cases, but not for extreme ones. For instance, master-master replication makes automatic fail-over much easier. The setup will be more complex, because you need at least three nodes and you have to deal with inconsistencies. Various approaches exist. CouchDB keeps all the document versions, CouchBase only the newest, Riak has vector-clocks and CFRD (Conflict-Free Replicated Data) for some types. If we use conflict documents, then the applications need to be aware of them and handle them accordingly. Our idea here is, again, to start with something straight forward (e.g. last-write wins) and later give an alternative (e.g. conflict documents). The simple solution should be helpful in most cases; harder cases can be dealt with; extreme cases however require support from the application.
We are really looking forward to hearing your thoughts on this – let us know in the comments, via twitter, in the Google group or via good old email.