For the ArangoDB 2.6 release from last week we’ve put some performance tests together. The tests will compare the AQL query execution times in 2.5 and 2.6.
The results look quite promising: 2.6 outperformed 2.5 for all tested queries, mostly by factors of 2 to 5. A few dedicated AQL features in the tests got boosted even more, resulting in query execution time reductions of 90 % and more. Finally, the tests also revealed a dedicated case for which 2.6 provides a several hundredfold speedup.
Also good news is that not a single of the test queries ran slower in 2.6 than in 2.5.
What was tested?
The tests execute several read-only AQL queries on datasets of different sizes and measure the query execution times. The tests were conducted in both ArangoDB 2.5 and 2.6, so the results of the two ArangoDB versions can be compared.
Though the tests do not cover every possible type of AQL operation, feature and function, they still do cover a wide range of features, e.g. lookups, joins, COLLECT operations, sorting, subqueries, and some AQL functions. Overall, the test suite contains 33 different cases.
All queries were run on datasets of three different sizes to validate that the results are relevant for datasets of various sizes. The dataset sizes are 10,000 documents, 100,000 documents, and 1,000,000documents. Each query was repeated a few times so outliers in execution time can be identified.
There is full disclosure of the test methodology and the test script below, so anyone interested can repeat the tests locally and verify the results.
The combined test results from 2.5 and 2.6 can be found in this PDF file. There is also an ods version of the same file here. A description of the columns and test cases used in these files can be found further below.
For the detail-loving folks, here are the raw results for both versions in isolation:
The results show that ArangoDB 2.6 was consistently faster for all AQL queries included in the tests.
The queries that improved most in 2.6 over 2.5 include:
FILTERconditions as used in the tests are 3 to 5 times faster
- simple joins using the primary index (
_keyattribute), hash index or skiplist index are 2 to 3.5 times faster
- sorting on a string attribute is 2.5 to 3 times faster
- extracting the
_keyor other top-level attributes from documents is 4 to 5 times faster
COLLECTstatements like the ones in the tests are 7 to 15 times faster
- looking up documents using
INlists with a substantial amount of values contained in the
INlist is 250 to 700 times faster
The one thing that did not change much when comparing 2.6 with 2.5 is iterating over a collection and returning all its documents unmodified. The speedups observed for this type of query are between 18 and 25 %, which is the lowest speedup measured by the tests. Still 18 to 25 % seem okay as a free take-away.
Speedups were observed for all three test dataset sizes alike. In some cases, the speedups varied a bit with the dataset sizes, but it was still in the same ballpark for all three datasets. The conclusion is thus that the speedups did not depend much on the dataset sizes.
Reasons for speedups
There are several reasons why the 2.6 performance is better than in previous versions. The main reason is that we spent much time optimizing some of the critical AQL code paths. Then we also worked on optimizations for specific features, which are used by some of the tested queries.
If you’re interested in the details, here they are:
- COLLECTing with a hash table
- AQL functions improvements
- Subquery optimizations
- Return value optimization for AQL
- IN-list improvements
Additionally, UTF-8 string comparisons were boosted by the upgrade from ICU 52 to ICU 54. The latter version contains a rewritten and much faster UTF-8-aware strcmp, which we heavily rely on.
Each query was run five times on each dataset, so execution time outliers can be identified. The results contain the minimum, maximum and average execution times for each query.
Queries were run in isolation on an otherwise idle server. The queries were all run inside the server, so there was no HTTP/network traffic involved for shipping query results (note: this was also vastly improved in 2.6 but this is not the subject of this post).
All tests were run on my local machine, which has 4 cores, 8 CPUs (though the number of CPUs will not matter for any of the tests), 12 GB of physical memory, a Linux 3.16 kernel and an Ubuntu 15 OS. All datasets fit into the main memory, so tests were not I/O-bound.
The ArangoDB versions tested were 2.5.4 and 2.6.0-alpha2. Both versions were hand-compiled with g++ 4.9.1 with options
CC='gcc' CXX='g++' CFLAGS='-O3 -Wall' CXXFLAGS='-O3 -Wall'.
The complete test script, including the setup of the test data, is contained in this file. It can be run inside arangod by typing the following in the server console:
Note that this needs an arangod started with option
--console. Also note that running the script will only test the current arangod instance, so the script needs to be run once in a 2.5 instance and once in 2.6.
Running the script will set up the test collections, run all queries on them (you will need some patience for this) and finally print a table like the following:
test name | collection | runs | min (s) | max (s) | avg (s)
collect-number | 10k | 5 | 0.0760 | 0.1638 | 0.0943
collect-number | 100k | 5 | 0.8697 | 0.8966 | 0.8803
collect-number | 1000k | 5 | 10.4320 | 10.6597 | 10.5314
collect-string | 10k | 5 | 0.1211 | 0.1319 | 0.1250
collect-string | 100k | 5 | 1.5406 | 1.5974 | 1.5641
collect-string | 1000k | 5 | 19.0708 | 19.0966 | 19.0825
collect-count-number | 10k | 5 | 0.0763 | 0.0792 | 0.0778
These result columns have the following meanings:
- test name: name of test
- collection: name of collection. 10k is a collection with 10,000 documents, 100k contains 100,000 documents, and 1000k contains 1,000,000 documents.
- runs: number of times the query was run
- min (s): minimum query execution time (in seconds)
- max (s): maximum query execution time (in seconds)
- avg (s): average query execution time (in seconds)
The test datasets for the three collections are filled with artificial data. Test documents are created like this:
_key: "test" + i,
value2: "test" + i,
value4: "test" + i,
value6: "test" + i,
value7: i % g,
value8: "test" + (i % g)
Each document has a
_key attribute and 8 other attributes,
value7 are numeric attributes, the other attributes contain string values. The attributes
value6 contain unique values. The attributes
value8 contain repeating values. They are used for
value2 are each indexed with a hash index.
value4 are each indexed with a skiplist index.
value8 are not indexed. This way queries can be run on the same values, but with different indexes and even without indexes.
The test cases cover the following queries:
- collect-number and collect-string: run
COLLECTon a repeating attribute, which is either numeric or a string
- collect-count-number and collect-count-string: ditto, but also calculate the group lengths using
WITH COUNT INTO
- subquery: run a single-document subquery for each document of the original collection
- concat: for each document in the collection, concat the document
_keyattribute with another document attribute using
- merge: for each document in the collection, merge the document with another object using
- keep: for each document in the collection, remove all but a few named attributes from it using
- unset: for each document in the collection, remove a few named attributes from it using
- min-number and min-string: return the minimum value of a specific attribute from all documents in the collection, which is either numeric or a string. This uses
- max-number and max-string: ditto, but using
- sort-number and sort-string: sort all documents in the collection by a non-indexed attribute, which is either numeric or a string
- filter-number and filter-string: filter all documents in the collection using a non-indexed attribute, which is either numeric or a string
- extract-doc: return all documents in the collection unmodified
- extract-key: return the
_keyattribute of all documents in the collection
- extract-number and extract-string: return an attribute from all documents in the collection, which is either numeric or a string
- join-key: for each document in the collection, perform a join on the
_keyattribute on the collection itself (i.e.
FOR c1 IN @@c FOR c2 IN @@c FILTER c1._key == c2._key RETURN c1)
- join-id: ditto, but perform the join using the
- join-hash-number and join-hash-string: ditto, but join using a hash index on a numeric or string attribute
- join-skiplist-number and join-skiplist-string: ditto, but join using a skiplist index on a numeric or string attribute
- lookup-key, lookup-hash-number, lookup-hash-string, lookup-skiplist-number, lookup-skiplist-string: compile an IN-list of 10,000 lookup values and search these 10,000 documents in the collection using either the primary index (
_keyattribute), a hash index or a skiplist index. The latter two are tested on numeric and string attributes.
Further implementation details can be checked in the test script.