Arango Query Language - Query Optimization with Query Profiling

Sign up for ArangoGraph Insights Platform

Before signing up, please accept our terms & conditions and privacy policy.

What to expect after you signup
You can try out ArangoDB Cloud FREE for 14 days. No credit card required and you are not obligated to keep using ArangoDB Cloud.

At the end of your free trial, enter your credit card details to continue using ArangoDB Cloud.

If you decide that ArangoDB Cloud is not (yet) for you, you can simply leave and come back later.

AQL Query Optimization with Query Profiling

AQL Query Optimization with Query Profiling

Sometimes when you have a complex query it can become very unclear where time is spend during the execution. Depending on the actual number of documents that have to be processed, it might be worthwhile to spend some time optimizing certain parts of queries while others are unimportant.

For example the collection scan and filtering might take up most time and it would be worth it to add an index to your collection.

To give you more insight into your query ArangoDB now allows to execute your query with special instrumentation code enabled. This will then print a query plan with detailed execution statistics. To use this in an interactive fashion on the shell you can use the db._profileQuery(..) in the arangosh. Alternatively there is a new button Profile in Query tab of the WebUI.

The printed execution plan then contains three additional columns:

  • Call: The number of times this query stage was executed
  • Items: The number of temporary result rows at this stage
  • Runtime: The total time spend in this stage

Below the execution plan there are additional sections for the overall runtime statistics and the query profile.

Simple AQL query

Assuming we got a collection named collection and insert 100000 documents via for (let i=0; i < 100000;i++) db.collection.insert({value:i}). Then a simple query filtering for value < 10 should return 10 results:

An AQL query is essentially executed in a pipeline that chains togehter different functional execution nodes. Each block gets the input rows from it’s parent, does some processing and then outputs a certain number of output rows.

Without any indexes this query should have to perform the following operations:

  1. Perfom a full collection scan via a EnumerateCollectionNode and outputing a row containing the document in doc.
  2. Calculate the boolean expression LET #1 = doc.value < 10 from all inputs via a CalculationNode
  3. Filter out all input rows where #1 is false via the FilterNode
    Put the doc variable of the remaining rows into the result set via the ResultNode

Without any detailed insight into the query execution it is impossible to tell how many results each stage had to work on and how long this took. By executing the query with the query profiler (db._profileQuery or via the Profile button in the WebUI) you can now tell exactly how much work each stage had to do:

The EnumerateCollectionNode processed and returned all 100k rows (documents), as did the CalculationNode. Because the AQL execution engine also uses an internal batch size of 1000 these blocks were also called 100 times each. The FilterNode as well as the ReturnNode however only ever returned 10 rows and only had to be called once, because the result size fits within a single batch.

Now lets add a skiplist index on value to speedup the query.

db.test.ensureIndex({type:"skiplist", fields:["value"]})

The results in replacing the collection scan and filter block with an IndexNode. The execution pipeleine of the AQL query has become much shorter. Also the number of rows processed by each pipeline block is only 10, because we no longer need to look at all documents.

AQL with Subquery

Just for the sake of completeness, let’s also consider the following

The resulting query profile contains a SubqueryNode which has the runtime of all its children combined.

Actually, I cheated a little. The optimizer would have completely removed the subquery, if I had not deactivated it. The optimimized version would take longer in the “optimizing plan” stage, but should perform better with a lot of results.

AQL with aggregation

Let’s try a more advanced example, using a COLLECT statement. Assuming we have a user collection with a city, username and an age value:

Now lets try a query which gets us all age groups in buckets (0-9, 10-19, 20-29, …)

Without any indexes this query should have to perform the following operations:

  1. Perfom a full collection scan via a EnumerateCollectionNode and outputing a row containg the document in doc.
  2. Calculate expressions LET #1 = FLOOR(u.age / 10) * 10, etc from all inputs via multiple CalculationNode
  3. Perform the aggregations via the CollectNode
  4. Sort the resulting aggregated rows via a SortNode
  5. Build a result value via another CalculationNode
  6. Put the result variable into the result set via the ResultNode

Like within the example above, you can see that after the CalculationNode stage, from the originally 20 rows only a handful remained.

Typical AQL Performance Mistakes

With the new query profilier you should be able to spot typical performance mistakes that we see quite often:

  • Not employing indexes to speed up queries with common filter expressions
  • Not using shard keys in filter statements, when it is known (Only a cluster problem)
  • Using subqueries to calculate an intermediary result, but only using a few results

Bad example:

Adding a LIMIT 1 into the subquery whould have probably gained some performance

  • Starting a Graph Traversal from the wrong side (if both ends are known)

Assuming you have two vertex collections users, products as well as an edge collection purchased. The graph model would look like this (users) <--[purchased]--> (products), i.e. every user is connected with an edge in purchased to zero or more products.

Assuming we want to know all users that have purchased the product playstation as well as products of type legwarmer we could use this query:

This query first finds all legwarmer products and then performs a traversal for each of them. But we could also inverse the traversal by starting of with the known playstation product. This way we only need a single traversal, to achieve the same result