My recent blog post “Native multi-model can compete” has sparked considerable interest on HN and other channels. As expected, the community has immediately suggested improvements to the published code base and I have already published updated results several times (special thanks go to Hans-Peter Grahsl, Aseem Kishore, Chris Vest and Michael Hunger).
Here are the latest figures and diagrams:
The aim of the exercise was to show that a multi-model database can successfully compete with special players on their own turf with respect to performance and memory consumption. Therefore it is not surprising that quite a few interested readers have asked, whether I could include OrientDB, the other prominent native multi-model database.
Eager to make our case for multi-model even stronger, my collegues and I immediately set out to give OrientDB a spin. There is an officially supported node.js driver Oriento using the fast binary protocol and OrientDB easily offers everything we need for the comparison at hand in its SQL dialect and due to its multi-model nature.
To our great disappointment the results were not as good as expected. It seems obvious that there is something wrong in the way I am using OrientDB. So I have asked OrientDB users to check the implementation but they could not immediately spot the problem. Therefore I would love to see suggestions and will publish updated results as soon as they come in.
Enough talk, here are the results, this time including OrientDB, as was suggested by many.
Clearly something is wrong here. Any help to improving things would be much appreciated. Discuss on HN.
Details of how we use OrientDB
For OrientDB, we import the profiles in one class and the relations in another, but they are also stored as direct links between nodes in the objects of the
Profile class (“index free adjacency”). All queries are sent as SQL queries to the database. The Oriento driver uses promises in the usual node.js style. Single document queries and aggregations are native to SQL and therefore straightforward. For the neighbor queries we use OrientDB’s graph extensions that use the direct links. The second degree neighbors with unique results was a bit of a challenge but with some outside help we managed to find an SQL query that works. ShortestPath is also directly supported as a function call in SQL.
Oriento currently only supports one connection to the database. In order to parallelize the requests, we used 25 instances of the driver – giving us 25 parallel connections to the server. This improved performance considerably.
Some conjectures about reasons
As mentioned above, we do not really understand what is going on here. We had a short look at the (open source version of the) code. It seems that OrientDB uses an implementation of Dijkstra’s algorithm for the ShortestPath that proceeds only from the source, contrary to ArangoDB and Neo4j. This essentially explains the bad performance in the shortest path test, since our social graph is typical in that it is highly connected and essentially shows an exponential growth of the neighborhood of each vertex with the distance.
We can only repeat that any hints about how to improve this situation would be more than welcome, and that we will immediately update this post when new findings arise.
We perform exactly the same tests as before, please read the full description there.
Resources and Contribution
All code used in this test can be downloaded from my Github repository and all the data is published in a public Amazon S3 bucket. The tar file consists of two folders data (database) and import (source files).
Everybody is welcome to contribute by testing other databases and sharing the results.