Bulk Inserts: MongoDB vs CouchDB vs ArangoDB (Dec 2014)

December 18, 2014/General, Performance

More than two years ago, we compared the bulk insert performance of ArangoDB, CouchDB and MongoDB in a blog post.

The original blog post dates back to the times of ArangoDB 1.1-alpha. We have been asked several times to re-run the tests with the current versions of the databases. So here we go.

Test setup

We have again used the PHP bulk insert benchmark tool to generate results for MongoDB, CouchDB, and ArangoDB. The benchmark tool uses the HTTP bulk documents APIs for CouchDB and ArangoDB, and the binary protocol for MongoDB (as MongoDB does not have an HTTP bulk API). The benchmark tool was run on the same machine as the database servers so network latency can be ruled out as an influence factor. No replication or sharding were used.

The test machine specifications are:

Linux Kernel, cfq scheduler
64 bit OS
8x Intel(R) Core(TM) i7 CPU, 2.67 GHz
12 GB total RAM
SATA II hard drive (7.200 RPM, 32 MB cache)

The total “net insert time” is reported for several datasets in the following charts. This is the time spent in the benchmark tool for sending to request to the database and waiting for the database response, i.e. excluding the time needed to create the chunks that are sent to the database.

The database versions used in the tests were the current stable versions:

MongoDB 2.6.6, with defaults
CouchDB 1.6.1, with defaults (i.e. delayed_commits), but without compression
ArangoDB 2.3.3, with defaults (i.e. waitForSync=false)

Different artificial and real-words datasets with different sizes were imported. The datasets are all included in the benchmark tool’s repository.

Test results

Here’s an overview of the results containing most datasets in a single chart:

results-overall-1

Here’s a closer look, with the x-axis scale limited to 0 – 200 to eliminate the CouchDB outlier:

results-overall-2

The following charts each contain a few datasets only, providing a better detail view:

results-detail-1

results-detail-2

results-detail-3

Conclusion

As can be seen from the results, the insertion times for ArangoDB and MongoDB are still in the same ballpark. The changes are minimal compared to the results of the 2012 edition of this test. ArangoDB is still a bit slower than MongoDB for most of the tests, but the difference is still not great. This is good to see. Several features have been added to ArangoDB since 2012 which could have had a potential negative impact on the performance (transactions, write-ahead log, replication, sharding – to name only the most important). It looks like adding these features hasn’t messed up the performance.

And CouchDB is still trailing. The insertion times for CouchDB are still significantly higher than the ones for ArangoDB and MongoDB.

Caveats

These are benchmarks for specific datasets. The dataset volumes and types might or might not be realistic, depending on what you plan to do with a database. Results might look completely different for other datasets or for other test clients.

In addition, the benchmarks compare the HTTP API of CouchDB and ArangoDB against the binary protocol of MongoDB, which gives MongoDB a slight efficiency advantage. However, real-world applications will also use Mongo’s binary protocol so this is an advantage that MongoDB does have in real life (though it comes with the disadvantage that the protocol is not human-readable). Furthermore, there are of course other aspects that would deserve observation, e.g. CPU and memory usage. These aspects haven’t been looked at in this post. So please be sure to run your own tests in your own environment before adopting the results.

We have even more performance tests between various databases.

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

December 18, 2014,Jan Steemann

7 Comments

danielwertheim on December 18, 2014 at 8:32 pm

Would be interesting to see a compare, with random reads and writes, and more interestingly, durable writes.

Reply
Ted Wood on January 2, 2015 at 10:07 pm

I only learned of ArangoDB today. I’m currently a MongoDB user, which has replaced many years of primarily using MySQL. Based on the above performance, it looks like there’s no huge advantage for me to adopt Arango, but the query language is definitely interesting. I’ll be keeping my eyes on this project in case it brings an advantage to the table that Mongo doesn’t have. Good work folks.

Reply
- Thomas Schmidts on January 5, 2015 at 2:02 pm
  
  Thank you 🙂
  
  If you have any questions regarding ArangoDB don’t hesitate to ask us on googlegroups or github
  
  Reply
- Steve Howe on February 5, 2015 at 4:12 am
  
  Only to name a few: joins, graphs, transactions, query language, http api, data compression (to be added on Mongo 3.0 I’ve heard), mvcc. But my benchmarks show greater differences then the numbers above (MongoDB being faster), and MongoDB has GridFS (which I like), and better sharding support. Oh well. Can’t be happy on *everything*.
  
  Reply
  - Claudius Weinberger on February 5, 2015 at 3:13 pm
    
    Can you share with more information about your benchmarks? Would be great. Just send me an email to claudius@arangodb.com
    
    Reply
CoDEmanX on May 29, 2015 at 4:40 pm

MongoDB 3.0 is out and comes with a new storage engine, I think it’s a good time to benchmark ArangoDB 2.6 against it!

Reply
- fceller on June 1, 2015 at 9:27 am
  
  We will definitely do this.
  
  Reply

Download Now
ArangoDB Enterprise