home shape

ArangoDB 3.4: Enhancements in RocksDB Storage Engine

With ArangoDB 3.4 we finally made the RocksDB storage engine the default. This decision was made after a year of constant improvements to the engine to make it suitable for all our customer’s use cases.

Improved Read / Write-Performance

We massively improved the binary on-disk storage format with the 3.4 release. This format allows inserting new documents in an order that RocksDB prefers. Using the new format will reduce the number of compactions that RocksDB needs to do for the ArangoDB documents stored, allowing for better long-term insertion performance. The sustained insertion performance should be about 30% better than in 3.3!

This should also reduce the number of IO operations necessary for compactions, this will for example reduce the amount of IOPS you need when running ArangoDB on an AWS EBS volumes or similar.

Please note, this optimization only comes in effect on a server started with a new database directory and ArangoDB 3.4.

Improved RocksDB Geo index performance

The rewritten geo index implementation 3.4 speeds up the RocksDB-based geo index functionality by a factor of 3x to 6x for many common cases when compared to the RocksDB-based geo index in 3.3.

A notable implementation detail of previous versions of ArangoDB was that accessing a RocksDB collection with a geo index acquired a collection-level lock.

This severely limited concurrent access to RocksDB collections with geo indexes in previous versions. This requirement is now gone and no extra locks need to be acquired when accessing a RocksDB collection with a geo index.

Improved Memory Management

A frequent issue in database systems is the effective allocation and limitation of your systems RAM. With ArangoDB 3.4 it is now possible to permanently limit the ArangoDB process to a fixed amount of RAM via the
`–rocksdb.total-write-buffer-size` option.

This is especially important for low-memory systems, and embedded applications.

Improved Replication Speed

With a new ArangoDB 3.4 set-up the synchronization between servers (i.e. Cluster, Active-Failover, Master-Slave) is now up to 10x faster than before:

This has been achieved by reducing the amount of disk operations necessary during the crucial incremental synchronization operation. Additionally we optimized the write performance of replication operations.

New Collection-Level Document Caches

As you may know RocksDB always offered a block-level cache. This cache is shared by the entire database, which is good to control overall memory usage, but sometimes you need finer grained control over caching.

The new collection level caches allow you to turn on caching for especially high-read intensive workloads on certain collections.

The new per-collection property `cacheEnabled` enables in-memory caching of documents and primary index entries. This can potentially speed up point-lookups significantly, especially if collection have a subset of frequently accessed documents.

The option can be enabled for a collection as follows:

js
db..properties({ cacheEnabled: true });

If the cache is enabled, it will be consulted when reading documents and primary index entries for the collection. If there is a cache miss and the document or primary index entry has to be looked up from the RocksDB storage engine, the cache will be populated.

Memory for the documents and primary index entries cache will be provided by ArangoDB’s central cache facility, whose maximal size can be configured by adjusting the value of the startup option `–cache.size`.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

Leave a Comment





Get the latest tutorials, blog posts and news: