In this blog post – which is a roundup of the performance blog series – I want to complete the picture of our NoSQL performance test and include some of the supportive feedback from the community. First of all, thanks for all your comments, contributions and suggestions to improve this open source NoSQL performance test (Github). This blog post describes a complete overhaul of the test with no need to read all the previous articles to get the picture – have a look at the appendix below to get all the details on hard- and software, the dataset and tests, used in this NoSQL performance comparison.

In response to many requests, I have now added PostgreSQL to the comparison, a popular RDBMS that supports a JSON data type. The relational data model is a perfect addition to our test suite, now covering common project use cases (read/write and ad-hoc queries) as well as some social network related – implemented in tables, documents and/or graphs. How does a multi-model approach perform against their generic counterparts?

For this edition of the performance test I have also updated the software sources, replacing the custom preview/snapshot versions with the latest available products (releases or release candidates) of the particular databases and a NodeJS version bumped to 4.1.1. In response to a user feedback I have also added another test – returning the whole profile data when requesting neighbors of neighbors and increased the number of test cases for shortest path (40 instead of 19) and aggregation (1,000 instead of 500 vertices) due to performance improvements of all databases in the test field.

Before I dig into performance numbers and test details:

This is a vendor initiated test that – of course – want’s to show that his database is competitive by setting the scene and choosing the weapons: here – nodejs as the driver of choice and a in-memory enabling setting, using a 16 core machine on GCE with 60GB of RAM. Nevertheless, the setup is not chosen to just benefit ArangoDB, but to enable a comparable basis for the tests with basic use cases and nodejs as a (not that uncommon) client that is supported by every vendor (@see: Appendix).

ArangoDB currently works best when the data fits complete into memory. The performance will suffer if the dataset is much bigger than the memory. We are working on this issue and will provide an improved storage engine in the near future.

I want to be as trustworthy as possible, so I have published all the data, settings and test scripts in a public Github repository nosql-tests. No magic, no tricks – check the code and make your own tests!

Brief Test Description and Results

The following performance tests compare the same types of queries in different databases.

For these tests, I’ve used a dataset that enables us to test basic db operations as well as graph related queries – a social network site with user profiles and a friendship relation – Pokec from Stanford University SNAP. I won’t measure every possible database operation. Rather, we focus on queries that are sensible for nearly every project and some typical for a social network. We perform single reads and writes of profiles, we compute an ad-hoc aggregation to get an overview of the age distribution, we ask for friends of friends, or we ask for shortest friendship paths. These queries are run for all tested databases, irrespective of the data model they are using internally. As a result, we have a performance comparison between specialised solutions and multi-model databases.

The throughput measurements on the test machine for ArangoDB define the baseline (100%) for the comparisons. Lower percentages point to higher throughput and accordingly, higher percentages indicate lower throughput.

I performed the following tests, all implemented in JavaScript running in node.js 4.1.1:

  • single read: single document reads of profiles (100,000 different documents)
  • single write: single document writes of profiles (100,000 different documents)
  • aggregation: ad-hoc aggregation over a single collection (1,632,803 documents).
    Here, we compute statistics about the age distribution for everyone in the network, simply counting which age occurs how often.
  • neighbors: finding (distinct) direct neighbors plus the neighbors of the neighbors, returning IDs (for 1,000 vertices)
  • neighbors with data: finding (distinct) direct neighbors plus the neighbors of the neighbors and return their profiles (for 100 vertices)
  • shortest path: finding 40 shortest paths (in a highly connected social graph). This answers the question how close to each other two people are in the social network.

For our tests we run the workloads 5 times, averaging the results. Each test starts with an individual warm-up phase that allows databases to load data in memory and every test iteration starts from scratch to prevent a cache comparison test.

Overall Results

chart_v207

The tests show that multi-model databases can compete with single model databases. MongoDB is faster at single document reads but couldn’t compete when it comes to aggregations or 2nd neighbors selections. Note: the shortest path query was not tested for MongoDB as it would have to be implemented completely on the client side.

table_v207

Let’s go a step further and have a look what exactly I have tested in certain use cases so that you can understand what happens. Perhaps, the single read / single write is not that difficult to understand so I concentrate on aggregation and graph functionality here.

What does the age-distribution look like in the social network?

Test: Aggregation

In this test we aggregate over a single collection (1,632,803 documents). We compute statistics about the age distribution for everyone in the network by simply counting which age occurs how often. We did not put a secondary index for this attribute on any of the databases, so they all have to perform a full collection scan and do a counting statistics – that’s a typical ad-hoc query to perform.

aggregation_v207

The aggregation in ArangoDB is efficient, using 1.25 sec. in AVG for the 1.6M documents which defines the baseline of 100%. Only an explicit table column age in PostgreSQL is – as expected – much faster, processing the aggregation in 0.61 sec. Of course that’s a good use case for RDBMS. As PostgreSQL offers the JSON data type as well, you might want to check the performance here…: Nope, 17.5 sec. is beyond everything you want to accept. All other databases are much slower than ArangoDB, from factor x2.5 in MongoDB to x20 in case of OrientDB.

Who is part of my extended Friend network?

Test: Neighbors Search

Finding direct neighbors plus the neighbors of the neighbors for 1000 vertices.

neighbor_v207

Looks like a case for graph databases, but isn’t necessarily. At least Neo4j and OrientDB can’t stand out in this test – despite it’s a simple graph traversal. ArangoDB is really fast using just 464ms in AVG, no graph database comes close. That’s because lookups in a graph of a known length is faster with an Index lookup than using outbound links in every vertex.

How many people are between [me] and [Barack Obama]?

Test: Shortest Path

Finding 40 shortest paths (in a highly connected social graph). This answers the question how close to each other two people are in the social network.

shortest-path_v207

The shortest path is a speciality of graph databases so I don’t even tried to implement something similar in PostgreSQL or MongoDB. ArangoDB needs 61ms in AVG to process the 40 shortest paths.

Conclusion

The test results show that ArangoDB can compete with leading databases in their fields and also with the other multi-model database, OrientDB. Memory is our pain point and it will be addressed in the next major release. With a flexible data model, you can use a multi-model database in many different situation without the need to learn new technologies. A short selection of real life tasks has been given here.

Please have a look at our repository, do your own tests, and share the results. Different hardware – different results: Your mileage may vary and your requirements differ – so use this repo as a boilerplate and extend it with your own tests. If you want to verify our results, please use the same hardware configuration.

I appreciate your contribution and trust in open-source benchmarks.

Learn more around ArangoDB and its cluster capabilities in the Cluster Performance white paper.

Appendix – Details about data, machines, software and tests

The data

Pokec is the most popular online social network in Slovakia. I used a snapshot of its data provided by the Stanford University SNAP. It contains profile data from 1,632,803 people. The corresponding friendship graph has 30,622,564 edges. The profile data contain gender, age, hobbies, interest, education etc., but the individual JSON documents are very diverse, because many fields are empty for many people. Profile data are in the Slovak language. Friendships in Pokec are directed. The uncompressed JSON data for the vertices need around 600 MB and the uncompressed JSON data for the edges requires around 1.832 GB. The diameter of the graph (longest shortest path) is 11, but the graph is highly connected, as is normal for a social network. This makes the shortest path problem particularly hard.

The hardware

All benchmarks were done on a virtual machine of type n1-standard-16 in Google Compute Engine with 16 virtual cores (on these, a virtual core is implemented as a single hardware hyper-thread on a 2.3 GHz Intel Xeon E5 v3 (see Haswell)) and altogether 60 GB of RAM. The data was stored on a 256 GB SSD drive, directly attached to the server. The client was an n1-standard-8 (8 vCPU, 30 GB RAM) in the same network.

The software

I wanted to use a client/server model, thus I needed a language to implement the tests, and I decided that it has to fulfill the following criteria:

  • Each database in the comparison must have a reasonable driver.
  • It is not one of the native languages our contenders are implemented in, because this would potentially give an unfair advantage for some. This ruled out C++ and Java.
  • The language must be reasonably popular and relevant in the market.
  • The language should be available on all major platforms.

This essentially left JavaScript, PHP, Python and Ruby. I decided to use JavaScript with node.js 4.1.1, because it’s popular and known to be fast, in particular with network workloads.

For each database I used the most up-to-date JavaScript driver that was recommended by the respective database vendor.

I have used

  • ArangoDB V2.7.0 RC2 for x86_64 (Driver: arangojs@3.9.1)
  • MongoDB V3.0.6 for x86_64, using the WiredTiger storage engine (Driver: mongodb@2.0.45)
  • Neo4j Enterprise Edition V2.3.0 M3 running on JDK 1.7.0_79 (Driver: neo4j@2.0.0-RC2)
  • OrientDB 2.2 alpha – Community Edition (Driver: orientjs@2.1.0)
  • PostgreSQL 9.4.4 (Driver: pg-promise@1.11.0)

All databases were installed on the same machine, I have done our best to tune the configuration parameters best, I have for example switched off transparent huge pages and configured up to 40,000 open file descriptors for each process. Furthermore, I’ve adapted community and vendor provided configuration parameters from Michael Hunger (Neo4j) and Luca Garulli (OrientDB) to improve individual settings.

The tests

I have made sure for each experiment that the database has a chance to load all relevant data into RAM. Some DBs allow explicit load commands for collections, others not. Therefore, I have increased cache sizes accordingly where relevant and used full collection scans as a warm-up procedure.

I don’t want to benchmark query caches or likewise – a databases might need a warm-up phase, but you can’t compare databases based on cache size / efficiency. Whether a cache is useful or not depends highly on the individual use case, executing a certain query multiple times.

For the single document tests, I use individual requests for each document but use keep-alive and allow multiple simultaneous connections, since I wanted to test throughput rather than latency.

Whenever the driver allowed to configure this, I chose to use a TCP/IP connection pool of up to 25 connections. Note that the ArangoDB driver does not use HTTP pipelining, whereas the MongoDB driver seems to do a corresponding thing for its binary protocol, which can help to increase throughput. For more detailed information about each individual database see below.

I discuss each of the six tests separately:

single document reads (100,000 different documents)

In this test we store 100,000 ids of people in the node.js client and try to fetch the corresponding profiles from the database, each in a separate query. In node.js, everything happens in a single thread but asynchronously. To fully load the database connections we first submit all queries to the driver and then await all the callbacks using the node.js event loop. We measure the wallclock time from just before we start sending queries until the last answer has arrived. Obviously, this measures throughput of the driver/database combination and not latency, therefore we give as a result the complete wallclock time for all requests.

single document writes (100,000 different documents)

For this test we proceed similarly: We load 100,000 different documents into the node.js client and then measure the wallclock time needed to send all of them to the database, using individual queries. We again first schedule all requests to the driver and then wait for all callbacks using the node.js event loop. As above, this is a throughput measurement.

single document writes sync (100,000 different documents)

Same as before, but the latter waits until the write has synced to disk – which is the default behavior of Neo4j. To be fair we have introduced this additional test to the comparison.

aggregation over a single collection (1,632,803 documents)

In this test we do an ad-hoc aggregation over all 1,632,803 profile documents and count how often each value of the AGE attribute occurs. We did not put a secondary index for this attribute on any of the databases, so they all have to perform a full collection scan and do a counting statistics. We only measure a single request, since this is enough work to get an accurate measurement. The amount of data scanned should be more than any CPU cache can hold, so we should see real RAM accesses but usually no disk accesses because of the above warm-up procedure.

finding the neighbors and the neighbors of the neighbors (distinct, for 1,000 vertices)

This is the first test related to the network use case. For each of altogether 1,000 vertices we find all neighbors and all neighbors of all neighbors, which achieves finding the friends and friends of the friends of a person and return a distinct set of friend id’s. This is a typical graph matching problem considering paths of length 1 or 2. For the non-graph database MongoDB, we can use the aggregation framework to compute the result. In PostgreSQL we can use a relational table with id from / id to backed by an index. In the used Pokec dataset we get 18,972 neighbors and 852,824 neighbors of neighbors for our 1,000 queried vertices.

finding the neighbors and the neighbors of the neighbors with profile data (distinct, for 100 vertices)

As there was a complaint that for a real use case we need to return more than IDs, I’ve added a test case neighbors with profiles that addresses this concern and returns the complete profiles. In our test case we retrieve 84,972 profiles from the first 100 vertices we query. The complete set of 853k profiles (1,000 vertices) would have been to much for nodejs.

finding 40 shortest paths (in a highly connected social graph)

This is a pure graph test with a query that is particularly suited for a graph database. We ask the database in 40 different requests to find a shortest path between two given vertices in our social graph. Due to the high connectivity of the graph, such a query is hard, since the neighborhood of a vertex grows exponentially with the radius. Shortest path is notoriously bad in more traditional databases, because the answer involves an a priori unknown number of steps in the graph, usually leading to an a priori unknown number of joins.

Originally we picked 20 random pairs of vertices but it turned out that for one of the pairs there is not path in the graph at all. We excluded that one for the first measurements because Neo4j, which did altogether quite well at shortest paths, was exceedingly slow to notice that there is no such path. After the first published performance test the vendors improved their tools so that we could increase the number of shortest paths to 40 – which are enough to get an accurate measurement. Note however, that the time for different pairs varies considerably, because it depends on the length of the shortest path as well as sometimes on the order in which edges are traversed.

We finish the description with a few more detailed comments for each individual database:

ArangoDB:

ArangoDB allows to specify the value of the primary key attribute _key, as long as the unique constraint is not violated. It automatically creates a primary hash index on that attribute, as well as an edge index on the _from and _to attributes in the friendship relation (edge collection). No other indexes were used.

MongoDB:

Since MongoDB treats the edges just as documents in another collection, I have helped it for the graph queries by creating two more indexes on the _from and _to attributes of the friendship relation. Due to the absence of graph operations I did neighbors of neighbors using the aggregation framework as suggested by Hans-Peter Grasl and did not even try to do shortest paths.

Please note: The write performance of MongoDB 3.0.6 declined significantly. I re-validated the test with MongoDB 3.0.3 and measured the known fast results from the previous tests. (103 sec vs. 324 sec. for 100,000 single writes synced). Nevertheless, as there is no indication that it’s a bug in one of those versions, I stay with the latest release.

Neo4j:

In Neo4j the attribute values of the profile documents are stored as properties of the vertices. For a fair comparison, I created an index on the _key attribute. Neo4j claims to use “index-free adjacency” for the edges, so I did not add another index on edges.

I’ve got the configuration parameters from the vendor (thanks to Michael Hunger) and added the writes with sync to disk test as this is the default (and only) behavior Neo4j supports. After the first performance test I’ve also got a custom built Neo4j 2.3 Snapshot from Michael Hunger that improved the performance of Neo4j. With the Enterprise Version 2.3 (M3) it looks like the improvements found their way into the official releases so that everyone can benefit. Open-source is such a cool thing.

OrientDB:

OrientDB 2.0.9 was the 4th best database in most disciplines of the first test. The developers used the published results to analyze some bottlenecks and improved the performance of OrientDB within two weeks after the first published blog post (2.1 RC4). I could now switch from the provided 2.2 preview snapshot to the current 2.2 alpha which seems to includes all the performance improvements of the snapshot.

Please note: There is an OrientDB blog post in response, but it compares apples with oranges by activating / implementing query caches – just in OrientDB – to improve the results.

Postgres:

I have used PostgreSQL with the user profiles stored in a table with two columns, the Profile ID and a JSON data type for the whole profile data. In a second approach, I used a classical relational data modelling with all profile attributes as columns in a table – just for comparison.

Resources and Contribution

All code used in this test can be downloaded from my Github repository and all the data is published in a public Amazon S3 bucket. The tar file consists of two folders data (database) and import (source files).

Everybody is welcome to contribute by testing other databases and sharing the results.