Many ArangoDB users rely on our arangodump and arangorestore tools as an integral part of their backup and recovery procedures. As such, we want to make the use of these tools, especially arangodump, as fast as possible. We’ve been working hard toward this goal in preparation for the upcoming 3.4 release.

We’ve made a number of low-level server-side changes to significantly reduce overhead and improve throughput. Additionally, we’ve put some work into rewriting much of the code for the client tools to allow dumping and restoring collections in parallel, using a number of worker threads specified by --threads n.

Virtually all datasets should see reduced time to dump or restore, thanks to the server-side changes. Additionally, datasets which are distributed over multiple collections should see further speedup thanks to the parallelization. Let’s consider a few scenarios.

In the case where we have lots of collections and the data is distributed evenly across them, we expect the client tools to benefit quite a bit from using an increased thread count, at least up to the point where we saturate our system, especially with respect to disk I/O. To test this, we used a randomly generated dataset with 200 collections, each with 200,000 documents. We tested using 3.3.8 (the latest 3.3 release at the time) against our then-current devel branch. Further, we tested with both the RocksDB and MMFiles engines. The numbers in the below table correspond to the speedup for the devel version relative to 3.3.8.

Engine Thread Count Dump Speedup Restore Speedup
RocksDB 4 10.58x 2.92x
RocksDB 16 8.27x 4.30x
MMFiles 4 4.79x 2.79x
MMFiles 16 13.27x 0.55x

While most of the figures are quite impressive, you may notice two strange entries there. First, dumping with RocksDB is about 22% slower with 16 threads compared to 4 threads. Second, restoring with MMFiles is actually significantly slower with 16 threads compared to either 4 threads or the 3.3.8 version with a single thread. What’s going on?

The tests were performed with both the client and the server running on the same machine and using the same disk. We hit a system-specific bottleneck, where we were fully saturating the disk. In the case of MMFiles restore with 16 threads, we hammered the disk way, way too hard, and it brought everything to a crashing halt.

Lesson: every system will have its own sweet spot depending on the hardware and I/O characteristics. For our system, we could not handle a full 16 threads for MMFiles restore, but we got a decent speedup using 4 threads. On the other hand, MMFiles dump took pretty full advantage of those same 16 threads to get more than a 13x speedup!

Now let’s consider a case that’s less easily parallelized: the Pokec social network dataset from SNAP. This consists of a single vertex collection with about 1.6 million entries, and a single edge collection with about 30 million entries. Since we can dump and restore the two collections independently in parallel, we expect to see a slight speedup from this factor. Realistically though, the edge collection is so much larger than the vertex collection that we should expect most of our gains to come from our other server-side improvements.

Engine Thread Count Dump Speedup Restore Speedup
RocksDB 4 3.85x 1.02x
MMFiles 4 2.85x 1.21x

We see that the dump performance is quite improved, even for this two-collection case. Restore performance also sees a slight speedup, but it is less dramatic.

So the good news is that pretty much regardless of your dataset and system configuration, these changes should help speed up your backups considerably. If you’d like to test out the changes for yourself and help us get ready for the official 3.4 release, feel free to play around with one of our recent nightly builds! Keep in mind that this is pre-release software which may still contain some bugs, and as such should not be used in production. Let us know if you run into any weird performance issues or encounter any bugs. Enjoy!