The latest edition of the NoSQL Performance Benchmark (2018) has been released. Please click here

It’s time for another update of my NoSQL performance blog series. This hopefully concludes the first part of this series with the initial databases ArangoDB, MongoDB, Neo4J and OrientDB and I can now start to check out other databases. I’m getting a lot of requests to test others as well and I’ll try to add them as soon as possible. Pull requests to my repository are also more than welcome. Remember it is all open-source.

The first set of benchmarks was started as a proof that multi-model can compete with specialized solutions and I started with the corresponding top dogs (Neo4J and MongoDB) for graphs and documents. After the first blog post, we were asked by the community to include OrientDB as the other multi-model database, too, which makes sense and therefore I expanded the initial lineup.

Concluding the tests did take a bit longer than expected, because vendors took up the challenge and improved their products with impressive results – as we asked them to do. Still, for each iteration we needed some time to run all tests, see below. However, on the upside, everyone can benefit from the improvements, which is an awesome by-product of the benchmark tests.

For the impatient awaiting new test results: Doing the tests takes a lot of time – especially with cloud machines. But cloud machines are the only way to allow everyone to verify the results. I’ve made the experience that the server performance can indeed fluctuate from day to day easily by more than 10% or that the underlying hardware gets upgraded from one day to another rendering old results useless. Therefore in order to get comparable results, I needed to rerun all tests for all databases. Each run of the tests consists of 5 cycles – which takes a lot of wall-clock time. We tried to react to improvements suggested by other as fast as possible given the above restrictions (and restrictions imposed by our other projects).

The tests should be as reproducible as possible. We want people to use and check the results for themselves. Therefore we needed to create scripts to set up the data files, sometimes this is a simple call to an import program provided by the database, sometimes this is an ETL definition file. There have been complaints that we do not react immediately. For instance, with OrientDB we started out with a node importer, which was faulty, when it restarted. With the help of Luca we got an ETL script on the 3rd of July, but we still needed to sort out import issues with UTF-8 and reports about missing nodes (aka Profiles) reported on the 5th of July. Again it is not a trivial tasks to check 30 million edges. On the other hand, even a single missing edge can have an incredible impact, if it changes a shortest path. Therefore these inconsistencies needed to be sorted out. It turned out that the original dataset is clean, that our parallel node importer created too many edges (we did not notice this at first, which is a shame – thus we paid extra attention when creating them anew and updated the data files as soon as they became stable), while the parallel ETL script lost nodes when running. The final solution is now to run the ETL script single threaded – this solution has been published in my repository as a convenient import script. Note that a single run takes 1.5 hours.

The discussion regarding the OrientDB node.js did escalate in a flame-war. In retrospective the discussion about the driver was a complete waste of time, because the only change to the driver has been that the name of the original author has been replaced by a different name – which feels a bit eerie for an open-source project. All the impressive improvements have been made inside the server. You can think about benchmark tests what you like, they must always be taken with a grain of salt. But even with this simple test, the newest version of OrientDB is now a factor of thousand better than the old one for the shortest path.

In the meantime a blog by someone else has been published. We feel a bit like Project Voldemort, because any mention of ArangoDB has been anonymised in this new blog. I’m not that afraid to mention the name: OrientDB, OrientDB, OrientDB 😉 But the results are a bit strange. I tried to reproduce the results, using both our freshly created database and the database provided for download in that blog, as well as the version 2.1Rc5 and 2.2alpha provided there and the current version 2.2 from the git repository. The results are not fully understandable. The aggregation time given in the blog is 5ms. In order to achieve such a result without any query cache, even in assembler, would be hard. With 8 cores and 8 hyper-cores at 2.3 GHz (GCE spec), it gives you 37 million instructions per milli-second. With 1.6 million profiles, that is roughly 115 instructions per profile entry. Not taking into account loading data into the L1 cache, missing parallelism with Hyper-Cores, optimal distribution to 16 cores.

Therefore I used the OrientDB console to verify the result for the aggregation:

This matches the number of items expected and the timings of the node driver and not the 5ms published in the blog. Unfortunately, the OrientDB blog does not allow for comments, so we cannot ask what the magic trick is nor are we allowed to post to the Google group. Therefore we currently stick to the observed runtimes on the Google machine. We would love to get feedback, if you can confirm the 5ms for the aggregation using either the console or the benchmark. Our blog is open for comments, but moderated to avoid flamewars and pure marketing articles. Feel free to publish your results here.

Further optimizations of ArangoDB

We have optimized our shortest path in ArangoDB 2.7.0 alpha 1 as well. The results make us very proud. We believe this shows, how much improvements are possible with C++. The calculation of the shortest paths needs only 34ms in ArangoDB 2.7. The neighbors have also become 20% faster. We have not yet exploited all possibilities so far. So, for us as well as all others, the NodeJS driver is currently the bottleneck. We are working on a new solution here. But other areas will also show that many improvements are still possible.

We do not want to remain behind other offers. If someone has performance problems with ArangoDB, we will be happy to help and I am sure that together we will find a solution. I do not advertise this like a shopping channel solution with a money-back guarantee. It comes free of charge for ArangoDB if it is a bug – that is what we expect of ourselves.

Finally a short disclaimer

We are mostly in memory, you need at least 14 GByte of free memory to run the tests. If you use a smaller machine, the results will be different and worse for us. Neo4J has a very small memory footprint when running the tests and will “benefit” from less memory. We use a typical client / server setup, if your application allows you to embed a Java based database, you might get a performance boost by avoiding network communication. On the other hand, if you can embed a C++ driver, ArangoDB will be much faster than the node driver. So, you have to decide what your target architecture will look like before basing any decision on this benchmark.

Updated Results

We used ArangoDB 2.7.0 Alpha, OrientDB 2.2 Alpha, Neo4J 2.3 Snapshot provided by Michael Hunger and MongoDB 3.0.3.

Multi-Model benchmark chart

Multi-Model benchmark result


Update

OrientDB has revealed the secret behind the unbelievable performance improvements in 2.2 alpha. For computing the aggregation they only need no more than 5ms, that is faster than any analytic engine. The secret is that OrientDB has implemented with 2.2 alpha a kind of query cache (see the update at the bottom in “Our Take on NoSQL DBMS Benchmarks”).

We did not test the query caches and have explained why at the start of our performance series, because we wanted to test the performance of the databases and their algorithms and not the efficiency of the query cache implementations (see the first blog post “Native multi-model can compete with pure document and graph databases”).

Other tested products have also implemented such functionality, namely Neo4J and ArangoDB 2.7. We have explicit not used these caches. If only one product uses a query cache and then advertises it as a incredible performance improvement, this is comparing apples and oranges. If one adds such features and use it for a benchmark, it should of course be communicated clearly to all users or potential users. Or even better, run two comparisons – one with and one without cache enabled in all products.