Improved Deadlock Detection

The upcoming ArangoDB version 2.8 (currently in devel) will provide a much better deadlock detection mechanism than its predecessors.

The new deadlock detection mechanism will kick in automatically when it detects operations that are mutually waiting for each other. In case it finds such deadlock, it will abort one of the operations so that the others can continue and overall progress can be made.

In previous versions of ArangoDB, deadlocks could make operations wait forever, requiring the server to be stopped and restarted.

How deadlocks can occur

Here’s a simple example for getting into a deadlock state:

Transaction A wants to write to collection c1 and to read from collection c2. In parallel, transaction B wants to write to collection c2 and read from collection c1. If the sequence of operations is interleaved as follows, then the two transactions prevent each other from making progress:

  • transaction A successfully acquires write-lock on c1
  • transaction B successfully acquires write-lock on c2
  • transaction A tries to acquire read-lock on c2 (and must wait for B)
  • transaction B tries to acquire read-lock on c1 (and must wait for A)

Here’s these such two transactions being started from two ArangoShell instances in parallel (left is A, right is B):


(note that this screenshot is from 2.8 and the automatic deadlock detection had already detected the deadlock and aborted one of the transactions) (more…)

Posted in Architecture | 3 Comments

Lockfree protection of data structures that are frequently read


In multi-threaded applications running on multi-core systems, it occurs often that there are certain data structures, which are frequently read but relatively seldom changed. An example of this would be a database server that has a list of databases that changes rarely, but needs to be consulted for every single query hitting the database. In such situations one needs to guarantee fast read access as well as protection against inconsistencies, use after free and memory leaks.

Therefore we seek a lock-free protection mechanism that scales to lots of threads on modern machines and uses only C++11 standard library methods. The mechanism should be easy to use and easy to understand and prove correct. This article presents a solution to this, which is probably not new, but which we still did not find anywhere else.

The concrete challenge at hand

Assume a global data structure on the heap and a single atomic pointer P to it. If (fast) readers access this completely unprotected, then a (slow) writer can create a completely new data structure and then change the pointer to the new structure with an atomic operation. Since writing is not time critical, one can easily use a mutex to ensure that there is only a single writer at any given time. The only problem is to decide, when it is safe to destruct the old value, because the writer cannot easily know that no reader is still accessing the old values. The challenge is aggravated by the fact that without thread synchronization it is unclear, when a reader actually sees the new pointer value, in particular on a multi-core machine with a complex system of caches.

If you want to see our solution directly, scroll down to “Source code links“. We first present a classical good approach and then try to improve on it. (more…)

Posted in Architecture, Security | 4 Comments

Running V8 isolates in a multi-threaded ArangoDB database

ArangoDB allows running user-defined JavaScript code in the database. This can be used for more complex, stored procedures-like database operations. Additionally, ArangoDB’s Foxx framework can be used to make any database functionality available via an HTTP REST API. It’s easy to build data-centric microservices with it, using the scripting functionality for tasks like access control, data validation, sanitation etc.

We often get asked how the scripting functionality is implemented under the hood. Additionally, several people have asked how ArangoDB’s JavaScript functionality relates to node.js.

This post tries to explain that in detail.


Posted in API, Architecture, C++, Documentation, nodejs | Leave a comment

Fulltext Index Enhancements

This post is about improvements for the fulltext index in ArangoDB 2.6. The improvements address the problem that non-string attributes were ignored when fulltext-indexing.

Effectively this prevented string values inside arrays or objects from being indexed. Though this behavior was documented, it was limited the usefulness of the fulltext index much. Several users requested the fulltext index to be able to index arrays and object attributes, too.

Finally this has been accomplished, so the fulltext index in 2.6 supports indexing arrays and objects!

Read on in Jan’s blog post about Fulltext Index Enhancements.

Posted in API, Architecture | Leave a comment

Agile development vs. schema enforcement – a paradox resolved

The fans of modern and agile software development usually propose to use schemaless database engines to allow for greater flexibility, in particular during the early rapid prototyping phase of IT projects. The more traditionally minded insist that having a strict schema that is enforced by the persistence layer throughout the lifetime of a project is necessary to ensure quality and security.
In this post I would like to explain briefly, why I believe that both groups are completely right and why this is not so paradoxical as it sounds at first glance. (more…)

Posted in Architecture | Leave a comment

Building Hypermedia APIs – a Design Approach using Statecharts

This is the second blog post on building hypermedia APIs with the focus on API design. In part 1 Lucas describes the concept of links in JSON.

Imagine we have an API where people can like books and other people can then see, who likes a certain book. We want this API to be highly connected: We don’t want to look up URLs in a documentation, we want to follow links as we know it from the world wide web. All we want to do as the author of the API is give our users a single URL from which they can then follow links to all other resources. This is similar to the way we would do this with a website. Leonard Richardson and Mike Amundsen refer to this as the billboard URL for this reason: If you put this URL on some billboard, people know everything to get started with your API. (more…)

Posted in API, Architecture, Foxx | Tagged , , , | Leave a comment

Modeling Data in MongoDB vs ArangoDB

MongoDB is a document DB whereas ArangoDB is a multi-model DB supporting documents, graphs and key/values within a single database. When it comes to data modeling and data querying, they pursue somewhat different approaches.

In a Nutshell: In MongoDB, data modeling is “aggregate-oriented”, avoiding relations and joins. On the other side, everybody has probably used relational databases which organize the data in tables with relations and try to avoid as much redundancy as possible. Both approaches have their pros and cons. ArangoDB is somewhat in-between: You can both model and query your data in a “relational way” but also in an “aggregate-oriented way”, depending on your use case. ArangoDB offers joins, nesting of sub-documents and multi-collection graphs. (more…)

Posted in Architecture, Documentation, Graphs, Query Language | Tagged , , , | 5 Comments

FullStack London

I recently had the chance to visit FullStack London, a well organized conference. Thanks a lot to Skills Matter. FullStack was opened by Douglas Crockford about “The Better Parts” of ES6. I cannot wait to start using them. Douglas was followed by Isaac Schlueter talking about open source in companies. Although this talk was not technical I learned a lot and it was very inspiring.

The remainder of the conference was all about using JavaScript mostly on server-side using Node.js or in robotics. As robotics is not my kind of topic I visited the talks about server-side JS. They confirmed my impression where JS development is heading to: Microservices. (more…)

Posted in Architecture, Conferences, Foxx, General, nodejs | Tagged , , | Leave a comment

Disk space usage with different journal sizes

A while ago we wrote some blog article that explained how ArangoDB uses disk space. That article compared the disk usage of ArangoDB, CouchDB, and MongoDB for loading some particular datasets. In this post, we’ll show in more detail the disk usage of ArangoDB for insert, update, and delete operations. We’ll also compare it to CouchDB for reference.

Posted in Architecture, Performance | Leave a comment

Disk space usage in ArangoDB

In this post we’ll explain how ArangoDB stores collection data on disk and look at its storage space requirements, compared to other popular NoSQL databases such as CouchDB and MongoDB.

How ArangoDB allocates disk space

ArangoDB stores documents in collections. The collection data is persisted on disk so it does not get lost in case of a server restart.

Posted in Architecture, General, Performance | 2 Comments