home shape

AQL Functions Enhancements: Boosting ArangoDB Query Capabilities

Waiting for a git pull to complete over an 8 KiB/s internet connection is boring. So I thought I’d rather use the idle time and quickly write about some performance improvements for certain AQL functions that were recently completed and that will become available with ArangoDB 2.6.

The improvements affect the following AQL functions:

  • UNSET(): remove specified attributes from an object/document
  • KEEP(): keep only specified attributes of an object/document
  • MERGE(): merge the attributes of multiple objects/documents

This blog post shows a few example queries that will benefit from 50 to more than 60 % reductions in query execution times due to the changes done to these functions.

Please read the full blog post AQL Functions Improvements.

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

7 Comments

  1. coltbg on May 13, 2015 at 2:32 pm

    Hi, I’m new in ArangoDB, and lookig for way to make diff between to docs. like MERGE RECURSIVE work but to get only diffs. How can be done that. Thanks in advance

    • jsteemann on May 26, 2015 at 11:05 am

      Sorry for the delay. For some reasons the “new comments” notifications did not seem to work properly.

      Regarding your question:
      There is no AQL function that would create a diff of two documents, however, it can be done by combining several AQL expressions. I am not clear about how the result should look like, but one possible solution would be:

      /* input document 1*/
      LET doc1 = { foo: ‘bar’, a: 1, b: 2 }

      /* input document 2 */
      LET doc2 = { foo: ‘baz’, a: 2, c: 3 }

      /* collect all attributes present in doc1, but missing in doc2 */
      LET missing = (
      FOR key IN ATTRIBUTES(doc1)
      FILTER ! HAS(doc2, key)
      RETURN {
      [ key ]: doc1[key]
      }
      )

      /* collect all attributes present in both docs, but with different values */
      LET changed = (
      FOR key IN ATTRIBUTES(doc1)
      FILTER HAS(doc2, key) && doc1[key] != doc2[key]
      RETURN {
      [ key ] : {
      old: doc1[key],
      new: doc2[key]
      }
      }
      )

      /* collect all attributes present in doc2, but missing in doc1 */
      LET added = (
      FOR key IN ATTRIBUTES(doc2)
      FILTER ! HAS(doc1, key)
      RETURN {
      [ key ] : doc2[key]
      }
      )

      /* return final result */
      RETURN {
      missing: missing,
      changed: changed,
      added: added
      }

      • coltbg on May 26, 2015 at 6:47 pm

        Thanks a lot for your help!
        What is the fastest way to get range of documents between 2 dates? Im tryng to convert all dates to integer vales, but Im not sure that this is fastest way to do.

        • jsteemann on May 26, 2015 at 7:04 pm

          To get all documents in a given date range, you can probably usie something like the following AQL query:

          FOR doc IN collection
          FILTER doc.dt >= minDate && doc.dt <= maxDate
          RETURN doc

          provided your collection is named `collection` and your date attribute is named `dt`. You have to insert the date range bounds for `minDate` and `maxDate` of course. Numeric date values (e.g. UNIX timestamps) are normally faster and less space-consuming than when using string date values (e.g. "2015-05-26 19:03:00").

          If the query will return many documents or the documents are quite big, it might be sensible to restrict the return value to just the attributes actually needed (about the same as avoiding `SELECT *` in SQL).

          Note additionally that if you run that query often, it may be sensible to put a (sorted) skiplist index on the `dt` attribute. The query will then use the index and avoid a full collection scan.

          • coltbg on May 26, 2015 at 7:10 pm

            I’m using the this approach with numeric date values (UNIX), but as I read in this case without index this will analize all docs to get docs in range. Is that mean that skiplist work with numeric values?



          • jsteemann on May 26, 2015 at 8:22 pm

            Yes, the skiplist index works fine with numeric values. Without an index there is no choice but to scan the entire collection and run the FILTER statement on all documents in it. With the index, this is avoided: the scan will start at the first document in the target range and end at the last document of the target range. That’s the minimum work the server can do.



          • coltbg on May 26, 2015 at 10:13 pm

            Thanks a lot. I really appreciate your help …



Leave a Comment





Get the latest tutorials, blog posts and news: