home shape

Open Source DC/OS: The modern way to run a distributed database

The mission of ArangoDB is to simplify the complexity of data work. ArangoDB is a distributed native multi-model NoSQL database that supports JSON documents, graphs and key-value pairs in one database engine with one query language. The cluster management is based on Apache Mesos, a battle-hardened technology. With the launch of DC/OS by a community of more than 50 companies all ArangoDB users can easily scale.

Just a little while ago setup, management, and maintenance of a database cluster was just a world of pain. Everybody who has put effort into getting automatic failover to work or who updated their database cluster know what I am talking about. Many of us may have experienced calls at 4 am in the morning notifying us that something within the cluster just went bad. Say hello to the Fail Whale.

Deploying #ArangoDB on #DCOS is as simple as typing: dcos package install arangodb

Now we have stepped across the edge to a new era in which open-source technology is all you need to run distributed applications at scale. With DC/OS you can now put your data center on autopilot, automatically scale to current needs and the best thing about it … this technology is backed by a huge community and by reputable enterprises. The whole team of ArangoDB is thrilled about Mesosphere going open-source with DC/OS!

DC/OS is the easiest way to run an ArangoDB cluster in production. Deploying ArangoDB is as easy as typing

dcos package install arangodb

or clicking on a button in the Mesosphere UI. If you need to edit parameters, you can either edit a JSON file or use the GUI.

DCOS

ArangoDB is a distributed and stateful application. There are many reasons for using distributed systems: scaling out for performance and/or data size reasons, and adding resilience and fault tolerance are probably the most common ones. Massive amounts of computing and storage power in the form of farms of commodity hardware have become relatively cheap and universally available. All this needs to be managed and having “cluster operating systems” is a logical consequence.

When we first started to look for a new base of our cluster management some months ago, we wanted to realize two ideas. First and foremost we needed a really simple way for our users to deploy, manage and maintain an ArangoDB cluster. Second, we strongly believe that it does not make sense if a database implements its own low level cluster management.

The following and similar questions should not be the business of the distributed database:

  • Has a machine or a task been lost?
  • Which machine or task has been replaced automatically by another one?
  • Where and with which resources do they run?

Rather, these details should be handled automatically by the infrastructure. Other issues are genuine duties of the database, like distributing the execution of a query across the cluster, handling replication of data, deciding when a transaction commit was successful and the like. In our terminology we call this “high level cluster management”.

In summary, we need to have an infrastructure that makes it easy to administrate a database cluster and that provides the low level cluster management. Furthermore, it needs to be a solution that has demonstrated that it works on enterprise level. The latter is crucial since it is a large leap of faith to rely on another system for the low level cluster management. We see the future of distributed computing in systems like DC/OS. We view Apache Mesos with its persistent primitives as a solid basis for our first version of a modern cluster management for databases. DC/OS extends this basis by a multitude of useful capabilities and thus provides a solid enterprise level foundation.

Teaming up with Mesosphere was an exciting and valuable time for us because we could leverage their broad and indepth experience with handling large and distributed environments which they gained at Twitter and AirBnB. The team of ArangoDB worked hard and became the first fully certified operational database for DC/OS including the persistent primitives and is by now the only multi-model database available for the DC/OS environment.

As we are open source ourselves, the team of ArangoDB is more than happy to see Mesosphere going open source as well and contribute even more to the community.

For users of ArangoDB this means that they just got enabled to:

  • Simple deployment of ArangoDB to clusters, with command line installation and easy horizontal scaling of ArangoDB for users.
  • Having a straightforward path to making ArangoDB available on all major cloud distros that DC/OS supports (AWS, Azure, Google Cloud Platform) without relying on closed or proprietary and expensive software.
  • Use the persistent volume primitives of Mesos that solve the problem of persisting data in clusters, and provide predictable restoring of data across failures.
  • Leverage service discovery for connecting distributed applications with the distributed ArangoDB cluster instances.

admin

Leave a Comment





Get the latest tutorials, blog posts and news: