home shape

An introduction to index hints and named indices

In previous releases, index usage in queries was governed entirely by internal decisions in the AQL query optimizer. If a query could theoretically use one of two indices, the optimizer would make a decision based on estimated cost, and there was no way to externally influence that decision. In ArangoDB 3.5 Community and Enterprise, we have introduced index hints as a way to override the optimizer in cases where the optimizer may make a less than optimal decision.

In order to facilitate index hints, and to generally make index management a bit easier, we have added support for naming indices. Previously, indices were only identified numerically, or by their full definition. First, let’s explore how to work with index names.

scroll down line

Working with index names

Suppose we create a test collection.

localhost:8529@test> const test = db._create("test")

Every collection starts with a primary index. Since this is created without any user input, it is instantiated with the fixed name primary.

localhost:8529@test> test.getIndexes()[0].name
primary

Similarly, for edge collections, the edge index is instantiated with the fixed name edge.

Now let’s see what happens with a secondary index.

localhost:8529@test> test.ensureIndex({type: 'hash', fields: ['value']})
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/246",
  "isNewlyCreated" : true,
  "name" : "idx_1635058787485220864",
  "selectivityEstimate" : 1,
  "sparse" : false,
  "type" : "hash",
  "unique" : false,
  "code" : 201
}

We find an auto-generated name idx_1635058787485220864. This isn’t necessarily the easiest name to remember, and unfortunately, we currently do not support renaming indices. So let’s drop the index, and recreate it with a name of our choosing, by specifying the name attribute.

localhost:8529@test> test.ensureIndex({type: 'hash', name: 'byValue', fields: ['value']})
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/332",
  "isNewlyCreated" : true,
  "name" : "byValue",
  "selectivityEstimate" : 1,
  "sparse" : false,
  "type" : "hash",
  "unique" : false,
  "code" : 201
}

That’s better.

The values used for index names must be unique with respect to the collection. So each collection may have an index with the name foo, but an individual collection may not have two indices each named foo.

Once we know a collection name, we may use it to look up the index definition.

localhost:8529@test> test.index("byValue")
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/332",
  "name" : "byValue",
  "sparse" : false,
  "type" : "hash",
  "unique" : false,
  "code" : 200
}
 
localhost:8529@test> db._index("test/byValue")
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/332",
  "name" : "byValue",
  "sparse" : false,
  "type" : "hash",
  "unique" : false,
  "code" : 200
}

We can also drop an index by name.

localhost:8529@test> test.dropIndex("byValue")
true
localhost:8529@test> db._dropIndex("test/byValue")
true

In this way, index names offer an alternative to working with numeric IDs, which we hope will be semantically useful and help make index management easier and more intuitive. Armed with this tool, let’s move on to index hints.

Providing index hints to the AQL query optimizer

Suppose that we have both a hash index and a skiplist index defined on the same field.

localhost:8529@test> test.ensureIndex({type: 'hash', name: 'byValueUnsorted', fields: ['value']})
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/3456",
  "isNewlyCreated" : true,
  "name" : "byValueUnsorted",
  "selectivityEstimate" : 1,
  "sparse" : false,
  "type" : "hash",
  "unique" : false,
  "code" : 201
}
localhost:8529@test> test.ensureIndex({type: 'skiplist', name: 'byValueSorted', fields: ['value']})
{
  "deduplicate" : true,
  "fields" : [
    "value"
  ],
  "id" : "test/3475",
  "isNewlyCreated" : true,
  "name" : "byValueSorted",
  "selectivityEstimate" : 1,
  "sparse" : false,
  "type" : "skiplist",
  "unique" : false,
  "code" : 201
}

If we want to run a simple equality lookup on the value field, the optimizer will generally prefer the hash index.

localhost:8529@test> db._explain('FOR doc IN test FILTER doc.value == "foo" RETURN doc')
Query String:
 FOR doc IN test FILTER doc.value == "foo" RETURN doc
 
Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR doc IN test   /* hash index scan */
  5   ReturnNode         1       - RETURN doc
 
Indexes used:
 By   Name              Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  6   byValueUnsorted   hash   test         false    false       100.00 %   [ `value` ]   (doc.`value` == "foo")
 
Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2

We can use the OPTIONS syntax for the FOR loop to provide an indexHint to tell the optimizer we want to use the skiplist index instead

localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "byValueSorted"} FILTER doc.value == "foo" RETURN doc')
Query String:
 FOR doc IN test OPTIONS {indexHint: "byValueSorted"} FILTER doc.value == "foo" RETURN doc
 
Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR doc IN test   /* skiplist index scan */
  5   ReturnNode         1       - RETURN doc
 
Indexes used:
 By   Name            Type       Collection   Unique   Sparse   Selectivity   Fields        Ranges
  6   byValueSorted   skiplist   test         false    false       100.00 %   [ `value` ]   (doc.`value` == "foo")
 
Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2

If we provide a hint that cannot be used, such as a non-existent index, or simply a non-compatible one, the optimizer will fall back to default behavior.

localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "primary"} FILTER doc.value == "foo" RETURN doc')
Query String:
 FOR doc IN test OPTIONS {indexHint: "primary"} FILTER doc.value == "foo" RETURN doc
 
Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR doc IN test   /* hash index scan */
  5   ReturnNode         1       - RETURN doc
 
Indexes used:
 By   Name              Type   Collection   Unique   Sparse   Selectivity   Fields        Ranges
  6   byValueUnsorted   hash   test         false    false       100.00 %   [ `value` ]   (doc.`value` == "foo")
 
Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2

If we want to enforce that the index is used, or the query fails, for instance to guard against the case of a full collection scan in case an index creation fails, we can specify the forceIndexHint flag.

localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: "foo", forceIndexHint: true} FILTER doc.value == "foo" RETURN doc')
JavaScript exception in file '/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arangosh.js' at 99,7: ArangoError 1577: could not use index hint to serve query; {"indexHint":{"forced":true,"type":"simple","hint":["foo"]}} (while optimizing plan)
!      throw error;
!      ^
stacktrace: ArangoError: could not use index hint to serve query; {"indexHint":{"forced":true,"type":"simple","hint":["foo"]}} (while optimizing plan)
    at Object.exports.checkRequestResult (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arangosh.js:97:21)
    at ArangoStatement.explain (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arango-statement.js:123:12)
    at Object.explain (/home/dan/src/arangodb/arangodb/js/common/modules/@arangodb/aql/explainer.js:1791:21)
    at Proxy.ArangoDatabase._explain (/home/dan/src/arangodb/arangodb/js/client/modules/@arangodb/arango-database.js:981:38)
    at :1:4
 

In case there are multiple indices which could be used to serve a query, we can specify the hint as an array of index names, such that the optimizer will attempt to use them in the order specified. If none can be used, it will fall back to default behavior, or fail if the hint is forced.

localhost:8529@test> db._explain('FOR doc IN test OPTIONS {indexHint: ["foo", "byValueSorted"]} FILTER doc.value == "foo" RETURN doc')
Query String:
 FOR doc IN test OPTIONS {indexHint: ["foo", "byValueSorted"]} FILTER doc.value == "foo" RETURN doc
 
Execution plan:
 Id   NodeType        Est.   Comment
  1   SingletonNode      1   * ROOT
  6   IndexNode          1     - FOR doc IN test   /* skiplist index scan */
  5   ReturnNode         1       - RETURN doc
 
Indexes used:
 By   Name            Type       Collection   Unique   Sparse   Selectivity   Fields        Ranges
  6   byValueSorted   skiplist   test         false    false       100.00 %   [ `value` ]   (doc.`value` == "foo")
 
Optimization rules applied:
 Id   RuleName
  1   use-indexes
  2   remove-filter-covered-by-index
  3   remove-unnecessary-calculations-2
 

Limitations

As mentioned previously, we currently do not support renaming indices. In order to change the name of an index, we must drop and recreate it. Furthermore, for existing installations which will be upgraded to 3.5, existing indices will be automatically named based on their numeric identifier.

Index hints currently may only be used with FOR loops which iterate over collections. Traversals do not support index hints at this time.