TL;DR: Our initial benchmark has raised a lot of interest. Initially we wanted to show that multi-model can compete with other solutions. Due to the open and competitive way we have conducted the benchmark, the discussions around it have lead to improvements in all products, better algorithms, faster drivers and better ways to use the databases.
From the outset we published all code and data and asked the vendors of all tested products as well as the general public, not only to run the tests on their own machines, but also to suggest improvements in the data models, test code, database configuration, driver usage and server configuration. This lead to a lively discussion, lots of pull requests and even to the release of improved versions of the database products themselves!
This process exceeded all our expectations and is yet another great example of community collaboration not only for fact finding but also for product improvements. Obviously, the same benchmark code will always show slightly different results when run on different hardware, operating systems, network setups and with more or less RAM. Therefore, a reliable result of a benchmark can essentially only be achieved by allowing everybody to run it on their own machines.
The technical setup is described in the above blog post. Let me briefly repeat the key facts.
We wanted to test a client/server setup, where the client is implemented in node.js. The server and the client run on different machines.
We took realistic data from a social network that allowed to do document based as well as graph queries. For more details, see here.
For some databases it is possible to define a schema on the profile data set. As we want to test the schema-less implementation of the database engines, we have not defined a schema – with the exception of _key which is defined as string and contains a unique hash index for fast lookup.
The test-cases assume that there is enough main memory available. The test machine has 60GB of memory. If you do the tests on a machine with only a few GBs of RAM, the result will look differently. But that is not the test-case we had in mind, because in a productive environment, you normally want to avoid swapping at all costs. That is why we also measured the memory usage.
We’ve got many requests asking to test a particular database. The testing framework is open-source and available on GITHUB at
If you have a database DatabaseDB that you want to test, please create a directory called databasedb and within this directory provide a description.js file, which implements the database calls used in the tests. If possible, create an import.sh to generate the database – for example, see the script import.sh in neo4j. Then issue a pull request and we will run the tests on the GCE used for the initial tests.
New Products / Versions
New versions of Neo4J and OrientDB are available. Therefore we reran the tests. Michael Hunger has pointed out, that the single write test compares apple and oranges as Neo4J guaranties durability. We have therefore split this test into two use cases single write and single write sync. The latter waits until the write has synced to disk.
Changes in the OrientDB Test
A new version 2.1 RC 4 of OrientDB is available. This version implements the shortest path algorithm in a two-sided way (look at the remark).
Another major change is a different data-model. OrientDB provides so-called lightweight edges, which need to be turned on when creating the database, as you can see in their documentation. It is possible to use ALTER DATABASE to enable lightweight edges, but this will not change existing edges. Unfortunately, that meant we had to recreate the database from scratch – which took a while. Therefore we have created a new database dump using lightweight edges for your convenience, if you want to rerun the tests yourself. You can find the dump on S3.
An official node.js driver orientjs is available. It is a fork of the oriento driver with minor changes, look here for details. There is also a new version of the oriento driver. Both drivers show the same performance, therefore we have now switched to the official fork.
It is possible to define a schema. As mentioned above, we wanted to tests the schema-less implementation in all databases. Therefore we have not enabled a fixed schema in OrientDB for the final tests. Defining a schema in OrientDB reduces the average resident memory from 18GB to 15GB and speeds up the aggregation, but on the other hand slows down the single reads and neighbors.
Changes in Neo4J
A new version Enterprise 2.3 SNAPSHOT of Neo4J is available. We have upgraded to this version.
Michael Hunger has provided a much better warmup phase. This is now used in the tests and it has improved the shortest path dramatically.
We have switched from node-neo4j to neo4j for the node.js driver as suggested here. As mentioned in the blog, we have observed some glitches in the driver when doing a lot of single reads and writes in parallel. Following Michael Hunger’s suggestions we have used the async library to limit the outstanding requests issued to 32 concurrent requests for Neo4j. However, doing a direct test with Apache Bench shows a much higher throughput. Therefore we assume that there still are some improvements possible within the driver.
The dbms.pagecache.memory parameter has been set to 10GB.
We have also created a new database dump for the 2.3 version
Changes in ArangoDB
It is possible to configure the durability on a per-collection or per-write-request basis. For the durable-write test, the durability has been enabled on a collection basis.
Changes in MongoDB
It is possible to wait after a write-request until the data has been saved to the journal file, see under journaled. This options is used for the durable-write test.
The Hard Path
As mentioned in the original blog post, we originally started with 20 pairs – one, however, blew up the tests. We are proud to report that now OrientDB and Neo4J are capable of finishing the search for the missing path – maybe thanks to our tests:
- ArangoDB: 4ms
- Neo4J: 254ms
- OrientDB: 282.233 ms
The throughput measurements on the test machine for ArangoDB define the baseline (100%) for the comparisons. Lower percentages point to higher throughput and accordingly, higher percentages indicate lower throughput.
Overall test results:
For our tests we run the workloads 5 times (from scratch), averaging the results. For details about the hardware see the original blog post.