home shape

Using Multiple Indexes per Collection in ArangoDB

The query optimizer in ArangoDB 2.8 has been improved in terms of how it can make use of indexes. In previous versions of ArangoDB, the query optimizer could use only one index per collection used in an AQL query. When using a logical OR in a FILTER condition, the optimizer did not use any index for the collection in order to ensure the result is still correct.

This is much better in 2.8. Now the query optimizer can use multiple indexes on the same collection for FILTER conditions that are combined with a logical OR.

For all following queries, I have set up a collection named test, which has two isolated hash indexes on the attributes value1 and value2, and a skiplist index on attribute value3.

Let’s first try an AQL queries that uses a logical OR on two different attributes of the collection:

FOR doc IN test
  FILTER doc.value1 == 11 || doc.value2 == 19 
  RETURN doc

The execution plan for this query in 2.7 reveals that query will perform a full collection scan and cannot use indexes because of the logical OR on two different attributes:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      0     - FOR doc IN test   /* full collection scan */
  3   CalculationNode              0       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19
  4   FilterNode                   0       - FILTER #1
  5   ReturnNode                   0       - RETURN doc

Indexes used:
 none

Optimization rules applied:
 none

Running the same query in 2.8 / devel will produce a much better execution plan:

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  6   IndexNode            2     - FOR doc IN test   /* hash index scan, hash index scan */
  3   CalculationNode      2       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19  
  4   FilterNode           2       - FILTER #1
  5   ReturnNode           2       - RETURN doc

Indexes used:
 By   Type   Collection   Unique   Sparse   Selectivity   Fields         Ranges
  6   hash   test         false    false       100.00 %   [ `value1` ]   doc.`value1` == 11
  6   hash   test         false    false       100.00 %   [ `value2` ]   doc.`value2` == 19

Optimization rules applied:
 Id   RuleName
  1   use-indexes

Multiple indexes will also be used if different index types are accessed, or for non-equality filter conditions. For example, the following query will make use of the two hash indexes and also the skiplist index:

FOR doc IN test 
  FILTER doc.value1 == 11 || doc.value2 == 19 || doc.value3 > 42 
  RETURN doc

Here is its execution plan from 2.8:

Execution plan:
 Id   NodeType          Est.   Comment
  1   SingletonNode        1   * ROOT
  6   IndexNode            3     - FOR doc IN test   /* hash index scan, hash index scan, skiplist index scan */
  3   CalculationNode      3       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42 
  4   FilterNode           3       - FILTER #1
  5   ReturnNode           3       - RETURN doc

Indexes used:
 By   Type       Collection   Unique   Sparse   Selectivity   Fields         Ranges
  6   hash       test         false    false       100.00 %   [ `value1` ]   doc.`value1` == 11
  6   hash       test         false    false       100.00 %   [ `value2` ]   doc.`value2` == 19
  6   skiplist   test         false    false            n/a   [ `value3` ]   doc.`value3` > 42

Optimization rules applied:
 Id   RuleName
  1   use-indexes

For comparison, here is the non-optimized plan from 2.7 for the same query:

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  2   EnumerateCollectionNode      0     - FOR doc IN test   /* full collection scan */
  3   CalculationNode              0       - LET #1 = doc.`value1` == 11 || doc.`value2` == 19 || doc.`value3` > 42
  4   FilterNode                   0       - FILTER #1
  5   ReturnNode                   0       - RETURN doc

Indexes used:
 none

Optimization rules applied:
 none

Still the query optimizer will not be able to use any indexes on a collection when there are multiple FILTER conditions combined with logical OR and at least one of them is not satisfiable by an index of the collection. In this case it has no other choice but to do a full collection scan.

For queries that combine multiple FILTER conditions with a logical AND, the optimizer will still try to pick the most selective index for the query and use it for the collection.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

3 Comments

  1. Haseb Ansari on January 14, 2016 at 3:49 pm

    how to do full text search in a collection with multiple attributes ? plsssssss some expert answer ???

    • jsteemann on January 18, 2016 at 9:31 am

      A fulltext index can be created on a single attribute. However, if the given attribute value is an array or object, the fulltext index will index the direct subvalues, e.g.

      { “text” : “some value” } => will index “some value”
      { “text” : [ “some value”, “foobar” ] } => will index “some value” and “foobar”
      { “text” : { “what” : “some value”, “else” : “foobar” } } => will index “some value” and “foobar”

      • Haseb Ansari on February 18, 2016 at 3:12 pm

        how will it handle indexing if I have an array and in that array I have another array which have my desired key:value……that means I have to index the value which lies inside a nested array……?????

Leave a Comment





Get the latest tutorials, blog posts and news: