An introduction to index hints and named indices

An introduction to index hints and named indices

In previous releases, index usage in queries was governed entirely by internal decisions in the AQL query optimizer. If a query could theoretically use one of two indices, the optimizer would make a decision based on estimated cost, and there was no way to externally influence that decision. In ArangoDB 3.5 Community and Enterprise, we have introduced index hints as a way to override the optimizer in cases where the optimizer may make a less than optimal decision.

In order to facilitate index hints, and to generally make index management a bit easier, we have added support for naming indices. Previously, indices were only identified numerically, or by their full definition. First, let’s explore how to work with index names.

Working with index names

Suppose we create a test collection.

Every collection starts with a primary index. Since this is created without any user input, it is instantiated with the fixed name primary.

Similarly, for edge collections, the edge index is instantiated with the fixed name edge.

Now let’s see what happens with a secondary index.

We find an auto-generated name idx_1635058787485220864. This isn’t necessarily the easiest name to remember, and unfortunately, we currently do not support renaming indices. So let’s drop the index, and recreate it with a name of our choosing, by specifying the name attribute.

That’s better.

The values used for index names must be unique with respect to the collection. So each collection may have an index with the name foo, but an individual collection may not have two indices each named foo.

Once we know a collection name, we may use it to look up the index definition.

We can also drop an index by name.

In this way, index names offer an alternative to working with numeric IDs, which we hope will be semantically useful and help make index management easier and more intuitive. Armed with this tool, let’s move on to index hints.

Providing index hints to the AQL query optimizer

Suppose that we have both a hash index and a skiplist index defined on the same field.

If we want to run a simple equality lookup on the value field, the optimizer will generally prefer the hash index.

We can use the OPTIONS syntax for the FOR loop to provide an indexHint to tell the optimizer we want to use the skiplist index instead.

If we provide a hint that cannot be used, such as a non-existent index, or simply a non-compatible one, the optimizer will fall back to default behavior.

If we want to enforce that the index is used, or the query fails, for instance to guard against the case of a full collection scan in case an index creation fails, we can specify the forceIndexHint flag.

In case there are multiple indices which could be used to serve a query, we can specify the hint as an array of index names, such that the optimizer will attempt to use them in the order specified. If none can be used, it will fall back to default behavior, or fail if the hint is forced.

Limitations

As mentioned previously, we currently do not support renaming indices. In order to change the name of an index, we must drop and recreate it. Furthermore, for existing installations which will be upgraded to 3.5, existing indices will be automatically named based on their numeric identifier.

Index hints currently may only be used with FOR loops which iterate over collections. Traversals do not support index hints at this time.

Do you like ArangoDB?
icon-githubStar this project on GitHub.
close-link