Making Subqueries fast
Subqueries in AQL are an expressive and useful tool when writing queries. And now with a rewrite of the query execution for release 3.6, they are also pretty fast.
After laying the foundation with a major rework of the AQL internals in 3.5, we now improved the way subqueries are executed. This was achieved by allowing batching beyond the boundaries of a single subquery. We call it subquery splicing.
How much of a benefit this is depends, of course, a lot on the actual query. If a subquery produces several hundred or thousands of result rows, the effect will probably be negligible. On the other hand, for several subqueries that produce smaller amount of result rows, the difference is pretty significant. This is particularly true in a cluster, because a lot of HTTP requests and round trips can be saved by effective batching.
Benchmarks
One very common example for using subqueries are (left outer) joins like this one:
1 2 3 4 5 6 7 8 9 |
FOR doc IN someCollection LET others =( FOR x IN otherCollection FILTER doc.foreignKey == x._key RETURN x ) RETURN {doc, others} |
Such queries are now noticeably faster on a single server because the subquery can be inlined. In a cluster, it can be up to 30 times faster! For some nested subqueries, this can be even higher, though we’d like to restrain ourselves to more common query patterns for this test.
In the following, three examples are presented that we benchmarked which show how significantly the new execution scheme improves subquery performance. The same queries are run with and without the optimization rule splice-subqueries
. Disabling the rule forces the pre-3.6 execution scheme.
Self-join
The following query iterates over a collection, and for every document does a subquery which looks up the documents with the same attribute value via an index. For each document, the number of documents with a matching value is returned.
1 2 3 4 5 |
FOR c IN coll LET sub = (FOR s IN coll FILTER s.value1 == c.value1 RETURN s) RETURN LENGTH(sub) |
Side-by-side joins
This is similar to the above query, but additionally looks up all documents where another field matches for each document, and returns, per document, the summed number of matches.
1 2 3 4 5 6 |
FOR c IN coll LET sub1 = (FOR s IN coll FILTER s.value1 == c.value1 RETURN s) LET sub2 = (FOR s IN coll FILTER s.value2 == c.value1 RETURN s) RETURN LENGTH(sub1) + LENGTH(sub2) |
Nested joins
The last query also iterates over a whole collection, and for each document executes a subquery, which in turn does five different lookups in separate subqueries.
1 2 3 4 5 6 7 8 9 10 |
FOR c IN coll LET sub = ( FOR s IN 1..5 LET subsub = (FOR t IN coll FILTER t.value1 == c.value1 + s RETURN t) FILTER LENGTH(subsub) > 0 RETURN s ) RETURN LENGTH(sub) |
Results
To put the results into perspective, note that each query returns one result per document in the test collection. The three queries make 2, 3, and 6 document lookups times the collection size, respectively – that’s e.g. 200k, 300k and 600k lookups for a collection with 100k documents. The result set will, for all three queries, be of the same size as the whole collection, and the collection will be read multiple times in full. Thus the execution times are relatively high; this is so we get reliable and comparable results for the performance test. In principle, speedups are similar for smaller result sets as well.
Single server tests
All single server tests are run against a collection with 100.000 documents.
testname | collection | runs | avg (s) | avg speedup |
Self join, without splicing | 100k | 5 | 0.5680 | |
Self join, with splicing | 100k | 5 | 0.4875 | 1.17 |
Side-by-side joins, without splicing | 100k | 5 | 1.6362 | |
Side-by-side joins, with splicing | 100k | 5 | 0.9170 | 1.78 |
Nested joins, without splicing | 100k | 5 | 3.2399 | |
Nested joins, with splicing | 100k | 5 | 2.7808 | 1.17 |
Cluster tests
The cluster tests are done on a cluster with three machines, each running one Agent, Coordinator and DB-Server. The test collection contains 10.000 documents. Note that the lookups done in the collection cannot make use of any document locality, so every lookup has to be done on every DB-Server.
testname | collection | runs | avg (s) | avg speedup |
Self join, without splicing | 10k | 5 | 36.4326 | |
Self join, with splicing | 10k | 5 | 1.2663 | 28.8 |
Side-by-side joins, without splicing | 10k | 5 | 76.1388 | |
Side-by-side joins, with splicing | 10k | 5 | 2.7961 | 27.2 |
Nested joins, without splicing | 10k | 5 | 196.8784 | |
Nested joins, with splicing | 10k | 5 | 7.7322 | 25.5 |
Performance tests
If you would like to verify the performance yourself, the tests quoted above are
part of the ArangoDB simple performance tests repository. These tests were done with ArangoDB v3.6.0-rc1.
What do I have to do as a user
Nothing, apart from upgrading to 3.6. The optimizer will enable the new subqueries automatically (but see caveats below). Look out for the optimizer rule splice-subqueries
in the explain or profile output.
As always, make sure to take a backup before upgrading!
Caveats
The new execution scheme for subqueries cannot yet be used in some cases, particularly in conjunction with LIMIT
or COLLECT WITH COUNT INTO
operations. See the description of the optimizer rule splice-subqueries
in the AQL optimizer rules documentation for details.
We are already working on lifting this restriction, and you can expect it to be publicly available in the next minor version of ArangoDB.