The upcoming 2.8 version of ArangoDB will provide several improvements in the area of index usage and query optimization.
First of all, hash and skiplist indexes can now index individual array values. A dedicated post on this will follow shortly. Second, the query optimizer can make use multiple indexes per collection for queries with OR-combined filter conditions. This again is a subject for another post. Third, there have been some speed improvements due to changes in the general index handling code. This is what this post is about.
In order to assess the speedups in 2.8, I have run some already existing performance tests that I initially ran when comparing ArangoDB 2.5 with 2.6. The test cases and methodology are detailed in this earlier blog post.
For measuring the index-related performance improvements, I simply re-ran the index related tests in 2.7 and in 2.8 / devel. I did not bother re-running all tests from the original blog article because only some are index-related. In particular, I only ran these tests again:
- join-key: for each document in the collection, perform a join on the _key attribute on the collection itself (i.e.
FOR c1 IN @@c FOR c2 IN @@c FILTER c1._key == c2._key RETURN c1
) - join-id: ditto, but perform the join using the
_id
attribute - join-hash-number and join-hash-string: ditto, but join using a hash index on a numeric or string attribute
- join-skiplist-number and join-skiplist-string: ditto, but join using a skiplist index on a numeric or string attribute lookup-key, lookup-hash-number, lookup-hash-string, lookup-skiplist-number, lookup-skiplist-string: compile an IN-list of 10,000 lookup values and search these 10,000 documents in the collection using either the primary index (
_key
attribute), a hash index or a skiplist index. The latter two are tested on numeric and string attributes.
The test queries were run 5 times each on collections containing 10,000, 100,000 and 1,000,000 documents.
Here are the query execution times from 2.7 and 2.8 for the individual tests, in seconds (less is better):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
test name collection avg 2.7 (s) avg 2.8 (s) -------------------------------------------------------------------- join-key 10k 0.0997 0.0612 join-key 100k 1.0611 0.6538 join-key 1000k 10.2975 6.2507 join-id 10k 0.1088 0.0606 join-id 100k 1.1126 0.6694 join-id 1000k 10.8180 6.5336 join-hash-number 10k 0.1044 0.0679 join-hash-number 100k 1.1137 0.7430 join-hash-number 1000k 10.7923 7.1534 join-hash-string 10k 0.1193 0.0712 join-hash-string 100k 1.2656 0.7915 join-hash-string 1000k 12.3075 7.6667 join-skiplist-number 10k 0.1387 0.1030 join-skiplist-number 100k 1.6693 1.3062 join-skiplist-number 1000k 18.0406 15.5203 join-skiplist-string 10k 0.1928 0.1469 join-skiplist-string 100k 2.3166 1.8997 join-skiplist-string 1000k 27.1513 23.2058 lookup-key 10k 1.4996 1.5245 lookup-key 100k 1.4951 1.5189 lookup-key 1000k 1.5545 1.5662 lookup-hash-number 10k 1.5572 1.5526 lookup-hash-number 100k 1.5595 1.5435 lookup-hash-number 1000k 1.5436 1.5648 lookup-hash-string 10k 1.6023 1.5623 lookup-hash-string 100k 1.5892 1.5741 lookup-hash-string 1000k 1.5841 1.5770 lookup-skiplist-number 10k 1.5978 1.6145 lookup-skiplist-number 100k 1.5782 1.6269 lookup-skiplist-number 1000k 1.5891 1.6258 lookup-skiplist-string 10k 1.6443 1.6840 lookup-skiplist-string 100k 1.6787 1.6985 lookup-skiplist-string 1000k 1.7319 1.7562 in-key 10k 0.1076 0.0754 in-key 100k 0.1083 0.0775 in-key 1000k 0.1104 0.0773 in-hash-number 10k 0.0898 0.0696 in-hash-number 100k 0.0889 0.0674 in-hash-number 1000k 0.0877 0.0675 in-hash-string 10k 0.1174 0.0867 in-hash-string 100k 0.1188 0.0878 in-hash-string 1000k 0.1174 0.0853 in-skiplist-number 10k 0.1095 0.0849 in-skiplist-number 100k 0.1110 0.0873 in-skiplist-number 1000k 0.1106 0.0910 in-skiplist-string 10k 0.1744 0.1315 in-skiplist-string 100k 0.1990 0.1594 in-skiplist-string 1000k 0.2369 0.1968 |
It looks like joins and IN list lookups got significantly faster in 2.8, whereas the performance for point lookups is more or less the same as in 2.7.
Note that the changes to the index code in 2.8 only affect how indexes are accessed from within AQL queries and how filtering works. No changes have been made for other index operations such as insert, updates, and removals.
One Comment
multiple attributes indexing ???? HOWWWW??????