Index Speedups in 2.8

The upcoming 2.8 version of ArangoDB will provide several improvements in the area of index usage and query optimization.

First of all, hash and skiplist indexes can now index individual array values. A dedicated post on this will follow shortly. Second, the query optimizer can make use multiple indexes per collection for queries with OR-combined filter conditions. This again is a subject for another post. Third, there have been some speed improvements due to changes in the general index handling code. This is what this post is about.

In order to assess the speedups in 2.8, I have run some already existing performance tests that I initially ran when comparing ArangoDB 2.5 with 2.6. The test cases and methodology are detailed in this earlier blog post.

For measuring the index-related performance improvements, I simply re-ran the index related tests in 2.7 and in 2.8 / devel. I did not bother re-running all tests from the original blog article because only some are index-related. In particular, I only ran these tests again:

  • join-key: for each document in the collection, perform a join on the _key attribute on the collection itself (i.e. FOR c1 IN @@c FOR c2 IN @@c FILTER c1._key == c2._key RETURN c1)
  • join-id: ditto, but perform the join using the _id attribute
  • join-hash-number and join-hash-string: ditto, but join using a hash index on a numeric or string attribute
  • join-skiplist-number and join-skiplist-string: ditto, but join using a skiplist index on a numeric or string attribute lookup-key, lookup-hash-number, lookup-hash-string, lookup-skiplist-number, lookup-skiplist-string: compile an IN-list of 10,000 lookup values and search these 10,000 documents in the collection using either the primary index (_key attribute), a hash index or a skiplist index. The latter two are tested on numeric and string attributes.

The test queries were run 5 times each on collections containing 10,000, 100,000 and 1,000,000 documents. More »

Posted in Performance | Leave a comment

Bi-Weekly #39 | VPack, MongoDB/Neo4j comparison, Case Study AboutYou

Our team used the last two weeks to hustle things on, mainly the next 2.8 release that comes with graph pattern matching, array indexes, performance improvements and deadlock detection – to name a few and maybe even not the biggest , yet. :) Furthermore, we’ve just published VelocyPack (VPack) , a fast and compact format for serialization and storage. It’s open-source, freaking fast and gained a lot of attention on Twitter, and btw was even trending on Github. We appreciate contributions and marked some issues with the tag help wanted.

We’ve put a bit of work into comparisons between ArangoDB vs MongoDB and ArangoDB vs Neo4j to point out the main differences and some advantages of a broader multi-model approach.

…and as an extra icing: The guys from Germany´s leading fashion site AboutYou (eCommerce | Otto Group) wrote a case study on “Data-Driven Personalization with ArangoDB” and show some of the magic they create with our technology.

Happy Thanksgiving!

ArangoDB Releases

The maintenance releases ArangoDB 2.7.1 and 2.6.10 are available for download. Changes include a switch from readline to linenoise NG (fork from linenoise by @antirez), a more detailed output in arango-dfdb and fixed replication issues. More details in our changelog.

Amazon Marketplace AMI 2.7.1 : The marketplace image is updated to the current version 2.7.1 of ArangoDB, just a few clicks to run ArangoDB on AWS.

The Official Docker Hub Image of ArangoDB supports the latest 2.7.1 as well.

Articles and Presentations

Posted in Newsletter | Leave a comment

Foxx: Module resolution will change in 2.8

The implementation of the JavaScript require function will be adjusted to improve compatibility with npm modules. The current implementation in 2.7 and earlier versions of ArangoDB strictly adheres to the CommonJS module standard, which deviates from the behaviour implemented in Node and browser bundlers.

Module paths will now be resolved in the following ways:

  • relative paths (e.g. ./hello) will be resolved relative to the current file
  • absolute paths (e.g. /hello) will be resolved relative to the file system root
  • global names (e.g. hello) will be resolved by looking in the following places:
    1. In a node_modules folder in the current directory
    2. In a node_modules folder in any ancestor of the current directory
    3. In the js/node, js/server/modules or js/server/common folders of ArangoDB
    4. In the internal _modules collection
    5. In the base folder of the current Foxx service or module

Prior to 2.8 global names and absolute paths are being treated interchangeably and prioritize the service’s (or module’s) base folder, breaking compatibility with some dependencies like babel-runtime (which has both a submodule core-js as well as an npm dependency in node_modules/core-js).

Note that Foxx services generated with the web admin interface in 2.7 and earlier use global names instead of relative paths and may need to be adjusted in order to work with ArangoDB 2.8 and later. More »

Posted in Documentation, Foxx | Leave a comment

Improved Deadlock Detection

The upcoming ArangoDB version 2.8 (currently in devel) will provide a much better deadlock detection mechanism than its predecessors.

The new deadlock detection mechanism will kick in automatically when it detects operations that are mutually waiting for each other. In case it finds such deadlock, it will abort one of the operations so that the others can continue and overall progress can be made.

In previous versions of ArangoDB, deadlocks could make operations wait forever, requiring the server to be stopped and restarted.

How deadlocks can occur

Here’s a simple example for getting into a deadlock state:

Transaction A wants to write to collection c1 and to read from collection c2. In parallel, transaction B wants to write to collection c2 and read from collection c1. If the sequence of operations is interleaved as follows, then the two transactions prevent each other from making progress:

  • transaction A successfully acquires write-lock on c1
  • transaction B successfully acquires write-lock on c2
  • transaction A tries to acquire read-lock on c2 (and must wait for B)
  • transaction B tries to acquire read-lock on c1 (and must wait for A)

Here’s these such two transactions being started from two ArangoShell instances in parallel (left is A, right is B):


(note that this screenshot is from 2.8 and the automatic deadlock detection had already detected the deadlock and aborted one of the transactions) More »

Posted in Architecture | 3 Comments

Using Bind Parameters in the AQL Editor

The AQL editor in the web interface is useful for running ad hoc AQL queries and trying things out. It provides a feature to explain the query and inspect its execution plan. This can be used to check if the query uses indexes, and which.

So far the AQL editor only supported using query string literals, but it lacked support for bind parameters. Queries issued by application code however often will use bind parameters for security reasons. Often enough this prevented copying & pasting queries from the application code into the AQL editor and vice versa without making manual adjustments.

This has been fixed in the upcoming ArangoDB version 2.8 (currently in development). Bind parameters can now be used in the AQL editor as well. Bind parameters can be entered as JSON values, the same format that is used for bind parameters in the HTTP REST API and in (JavaScript) application code.

The queries can also be saved in the AQL editor along with their bind parameter values for later reuse.

Screenshot from the feature in 2.8:


Posted in API, General | Leave a comment

ArangoDB 2.7.1 Maintenance release available

In the first maintenance release of ArangoDB 2.7 we have switched from readline to linenoise ng for line-editing and history capabilities on the command line.

Furthermore, the AQL function MERGE() improved, now working on a single array parameter, too. This allows combining the attributes of multiple objects from an array into a single object, e.g.:

Other changes in 2.7.1 address issues in the replication and miscellaneous others. You can find the complete changelog on Github. More »

Posted in Releases | Leave a comment

ArangoDB Bi-Weekly #38 | Great Things To Come

Over the past two weeks we were focussing ourselves on some major improvement and a big, sweet partnership. With the upcoming release of ArangoDB 2.8 we will make another big performance step forward and ease the handling of graphs by implementing sophisticated pattern matching.

Currently we are receiving a lot of feedback of ArangoDB being used in production. Enterprise and research teams from Germany, USA, France, Denmark, India, Japan, Finland, Brazil, Italy and many more are writing us about their use case. We are so freaking happy to see what these great guys are doing.

We will start a case study series soon to show you some of the magic these teams create!

Drivers and Integrations

Posted in Newsletter | Leave a comment

ArangoDB Bi-Weekly #37 | ArangoDB 2.7, PostgreSQL added to NoSQL Benchmark

The next stable version ArangoDB 2.7 is now available for download . Again we conducted our intensive performance benchmark with other well known databases. This time we added PostgreSQL to our list and gathered many insights about where we are and what we have to do. Find our analyses and the results here.

Our architect Max was invited to join #MesosCon in Dublin and talked about persistence primitives with our close friend Jörg from Mesosphere.

The new release ArangoDB 2.7 is available for download . After fixing the last issues we are happy to release the next stable version of ArangoDB with these improvements:

  • Index Buckets (Reducing loading time for collections and enable faster resizing)
  • Throughput Enhancements (Real world tests showed 25-75% increase of throughput compared to 2.6)
  • Enhancements for AQL like “return distinct”, “template query strings” or the brand new “AQL query result cache”

You can find a full list of changes in our change-log (2.7)

Articles and Presentations

Posted in Newsletter | Leave a comment

Benchmark: PostgreSQL, MongoDB, Neo4j, OrientDB and ArangoDB

In this blog post – which is a roundup of the performance blog series – I want to complete the picture of our NoSQL performance test and include some of the supportive feedback from the community. First of all, thanks for all your comments, contributions and suggestions to improve this open source NoSQL performance test (Github). This blog post describes a complete overhaul of the test with no need to read all the previous articles to get the picture – have a look at the appendix below to get all the details on hard- and software, the dataset and tests, used in this NoSQL performance comparison.

In response to many requests, I have now added PostgreSQL to the comparison, a popular RDBMS that supports a JSON data type. The relational data model is a perfect addition to our test suite, now covering common project use cases (read/write and ad-hoc queries) as well as some social network related – implemented in tables, documents and/or graphs. How does a multi-model approach perform against their generic counterparts?

For this edition of the performance test I have also updated the software sources, replacing the custom preview/snapshot versions with the latest available products (releases or release candidates) of the particular databases and a NodeJS version bumped to 4.1.1. In response to a user feedback I have also added another test – returning the whole profile data when requesting neighbors of neighbors and increased the number of test cases for shortest path (40 instead of 19) and aggregation (1,000 instead of 500 vertices) due to performance improvements of all databases in the test field. More »

Posted in Performance | 33 Comments

GA of ArangoDB 2.7 – Big + for Indexes, Throughput, AQL and Foxx

Long awaited and now we´ve finished it! New major release of ArangoDB 2.7 is ready for download. First of all a big thanks to our community for your great support! We´ve implemented a lot of your ideas! After your feedback to RC1 and RC2 we are happy to bring a new major release to the world. With ArangoDB 2.7 we increased our performance even further and improved query handling a lot.

What big improvements are in for you?

Index buckets

  • The primary indexes and hash indexes of collections can now split into multiple index buckets.

Throughput Enhancements

  • A lot is not enough. Throughput is another key requirement for a premium database. Again we pushed our throughput a big step forward with 2.7.

AQL Improvements – Ease of Use and Performance

  • Our goal was to further shorten and ease the writing of statements. AQL has always been an efficient and intuitive query language similar to SQL but with ArangoDB 2.7 AQL got even better.

Find a detailed overview in our blogpost about RC1.

Furthermore we fixed some issues and enabled Foxx apps to be installed underneath URL path /_open/, so they can be (intentionally) accessed without authentification. The extensibility for your data-centric microservices got even bigger.

More »

Posted in General | Leave a comment