Static binaries for a C++ application

00GeneralTags: , ,
TL;DR; This describes how to generate a completely static binary for a complex C++ application which runs on all variants of Linux without any library dependency.

ArangoDB is a multi-model database written in C++. It is a sizable application with an executable size of 38MB (stripped) and quite some library dependencies. We provide binary packages for Linux, Windows and MacOS, and for Linux we cover all major distributions and their different versions, which makes our build and delivery pipeline extremely cluttered and awkward. At the beginning of this story, we needed approximately 12 hours just to build and publish a release, if everything goes well. This is the beginning of a tale to attack this problem. Read more

Performance Impact of Meltdown and Spectre V1 Patches on ArangoDB

01GeneralTags: , ,

To investigate the impact of the Meltdown and Spectre patches on the performance of ArangoDB, we ran benchmark tests with the two storage engines available in ArangoDB (MMFiles & RocksDB). We used the arangobench benchmark and test tool for these tests.

The tests include 10 different test cases with changing test parameters like concurrency, batch requests and asynchronous execution. Read more

Milestone 1 ArangoDB 3.3: Datacenter to Datacenter Replication

02General, ReleasesTags: ,

Every company needs a disaster recovery plan for all important systems. This is true from small units like single processes running in some container to the largest distributed architectures. For databases in particular this usually involves a mixture of fault-tolerance, redundancy, regular backups and emergency plans. The larger a data store, the more difficult is it to come up with a good strategy.

Therefore, it is desirable to be able to run a distributed database in one datacenter and replicate all transactions to another datacenter in some way. Often, transaction logs are shipped over the network to replicate everything in another, identical system in the other datacenter. Some distributed data stores have built-in support for multiple datacenter awareness and can replicate between datacenters in a fully automatic fashion.

This post gives an overview over the first evolutionary step of ArangoDB towards multi-datacenter support, which is asynchronous datacenter to datacenter replication.

Read more

Sorting number strings numerically

00GeneralTags:

Recently I gave a talk about ArangoDB in front of a community of mathematicians. I advertised that nearly arbitrary data can “easily” be stored in a JSON based document store. The moment I had uttered the word “easily”, one of them asked about long integers. And if a mathematician says “long integer” they do not mean 64bit but “properly long”. He actually wanted to store orders of finite groups. I said one should use a JSON UTF-8 string for this but I should have seen the next question coming because he then wanted that a sorted index would actually sort the documents by the numerical value stored in the string. But most databases – and ArangoDB is no exception here – will compare UTF-8 strings lexicographically (dictionary order). Read more

The new Satellite Collections Feature of ArangoDB

01GeneralTags: ,

With the new Version 3.2 we have introduced a new feature called Satellite Collections. This post explains what this is all about, how it can help you, and explains a concrete use case for which it is essential.

Background and Overview

Join operations are very useful but can be troublesome in a distributed database. This is because quite often, a join operation has to bring together different pieces of your data that reside on different machines. This leads to cluster internal communication and can easily ruin query performance. As in many contexts nowadays, data locality is very important to avoid such headaches. There is no silver bullet, because there will be many cases in which one cannot do much to improve data locality.

One particular case in which one can achieve something, is if you need a join operation between a very large collection (sharded across your cluster) and a small one, because then one can afford to replicate the small collection to every server, and all join operations can be executed without network communications.

Read more

Starting an ArangoDB cluster the easy way

01cluster, GeneralTags: ,

Recently, we have got a lot of feedback about the fact that standing up an ArangoDB cluster “manually” is an awkward and error-prone affair. We have been aware of this for some time, but always expected that most users running ArangoDB clusters would do so on Apache Mesos or DC/OS, where deployment is a breeze due to our ArangoDB framework.

However, for various valid reasons people do not want to use Apache Mesos and thus are back to square one with the problem of deploying an ArangoDB cluster without Apache Mesos.

Manual cluster set-up

So we have listened to this, and have looked what other distributed databases offer, and have put together a tool called arangodb (as opposed to arangod) to help you. It essentially gives you the following experience: Read more

Deploying an ArangoDB 3 Cluster with 2 Clicks

00cluster, DC/OSTags: ,

Hurray! Last week finally saw the release of ArangoDB 3.0 with lots of new features and in particular various improvements for ArangoDB clusters. In this blog post, I want to talk about one aspect of this, which is deployment.

DC/OS

As of last Wednesday, deploying an ArangoDB 3.0 cluster on DC/OS has become even simpler, because the new version of our framework scheduler has been accepted to the DC/OS Universe. Therefore, deployment is literally only two clicks: Read more

Running ArangoDB 3.0.0 on a DC/OS cluster

00Architecture, clusterTags: ,

As you surely recognized we´ve released ArangoDB 3.0 a few days ago. It comes with great cluster improvements like synchronous replication, automatic failover, easy up- and downscaling via the graphical user interface and with lots of other improvements. Furthermore, ArangoDB 3 is even better integrated with Apache Mesos and DC/OS. Read more

Read the latest NoSQL Performance Benchmark 2018: MongoDB, PostgreSQL, OrientDB, Neo4j and ArangoDB