home shape

RocksDB Integration in ArangoDB – FAQs

The new release of ArangoDB 3.2 is just around the corner and will include some major improvements like distributed graph processing with Pregel or a powerful export tool. But most importantly we integrated Facebook’s RocksDB as the first pluggable storage engine in ArangoDB. With RocksDB you will be able to use as much data in ArangoDB as fits on your disc.

As this is an important change and many questions reached us from the community we wanted to share some answers on the most common questions. Please find them below

Will I be able to go beyond the limit of RAM?

Yes. By defining RocksDB as your storage engine you will be able to work with as much data as fits on your disc.

What is the locking behaviour with RocksDB in ArangoDB?

With RocksDB as your storage engine locking is on document level on writes and no locking on reads. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

… when you say “Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code”, does it mean that the behavior will differ from currently? Won’t writes try to acquire a lock on the document first?

Yes, it does mean the behavior will differ from currently. The current (MMAP files) engine has collection-level locks so write-write conflicts are not possible. The RocksDB engine has document-level locks so write-write conflicts are possible.

Consider the following example of two transactions T1 and T2 both trying to write a document in collection “c”.

In the old (MMFiles) engine, these transactions would be serialized, e.g.

T1 begins
T1 writes document “a” in collection “c”
T1 commits
T2 begins
T2 writes document “a” in collection “c”
T2 commits

so no write conflicts here.

In the RocksDB engine, the transactions can run in parallel, but as they modify the same document, it needs to be locked to prevent lost updates. The following scheduling will cause a write-write conflict:

T1 begins
T2 begins
T1 writes document “a” in collection “c”
T2 writes document “a” in collection “c”

here one of the transactions (T2) will abort to prevent an unnoticed lost update. Concurrent writes of the same documents will cause write-write conflicts that will be propagated to the calling code, so users can retry the operations when required.

When using RocksDB as a storage engine will I need a fast disc/SSD if an index is disc based?

It will be beneficial to use fast storage. This is true for the memory-mapped files storage engine as well as for the RocksDB-based storage engine.

Will I be able to choose how different collections are stored, or will it be a per-database choice?

It is a per server / cluster choice. It is not possible yet to mix modes or to use different storage engines in the same arangodb instance or cluster.

Can I switch from RocksDB to memory-mapped files with a collection or a database?

It is a per server / cluster choice. The choice must be made before the first server start. The first server start will store the storage engine selection in a file on disk, and this file is validated on all restarts. If the storage engine must be changed after the initial change, data from the ArangoDB instance can be dumped with arangodump, and then arangodb can be restarted with an empty database directory and a different storage engine. The data produced by arangodump can then be loaded into arangod with arangorestore.

Do indexes always store on disk now? Or only persisted type of index?

If you choose RocksDB as your storage engine all indexes will be persisted on disc.

I’m using Microsoft Azure where virtual machines have very fast local SSD disks that are unfortunately “temporary” (meaning they may not survive a reboot), compared to slower but persistent network-attached disks (that can be SSD as well). Would there be any way to leverage the local disk? I’m thinking about something like, using the local disk for fast queries but having the data persisted to the network-attached disk?

RocksDB in general allows specifying different data directories for the different levels of the database. Data on lower levels in newer data, so it would in general be possible to write low-level data to SSD first and have RocksDB move it to slower HDD or network-attached disks when it is moved to higher levels. Note that this is an option that RocksDB offers but that ArangoDB does not yet exploit. In general we don’t think the “read from fast SSD vs. read from slow disks” can be made on a per query-basis, because a query may touch arbitrary data. But recent data or data that is accessed often will likely sit in RocksDB’s in-memory block cache anyway.

 

If you like to dig a bit deeper into our upgrades in 3.2 please find more infos here in our release notes. If you like to take the latest technical preview including RocksDB for a spin you can download ArangoDB 3.2alpha4.

We hope to have covered all important questions. Please let us know if we missed something important via hackers@arangodb.com.

Julie Ferrario

Leave a Comment





Get the latest tutorials, blog posts and news: