TL;DR: Our initial benchmark has raised a lot of interest. Initially we wanted to show that multi-model can compete with other solutions. Due to the open and competitive way we have conducted the benchmark, the discussions around it have lead to improvements in all products, better algorithms, faster drivers and better ways to use the databases.
General Setup
From the outset we published all code and data and asked the vendors of all tested products as well as the general public, not only to run the tests on their own machines, but also to suggest improvements in the data models, test code, database configuration, driver usage and server configuration. This lead to a lively discussion, lots of pull requests and even to the release of improved versions of the database products themselves!
This process exceeded all our expectations and is yet another great example of community collaboration not only for fact finding but also for product improvements. Obviously, the same benchmark code will always show slightly different results when run on different hardware, operating systems, network setups and with more or less RAM. Therefore, a reliable result of a benchmark can essentially only be achieved by allowing everybody to run it on their own machines.
The technical setup is described in the above blog post. Let me briefly repeat the key facts.