home shape

AQL Improvements for ArangoDB 2.7: Enhanced Query Capabilities

With ArangoDB 2.6 being in beta already, it’s time to look at some features scheduled for 2.7. Today I’ll showcase a few AQL parser improvements that are present in the devel branch already, which will be the foundation for the 2.7 release.

Star operator

The already existing star operator ([*]) is much more flexible in 2.7 than in previous ArangoDB versions. It now allows filtering the values it iterates over, and optional projections.

These features will be demonstrated using the following example member data:

[
  { "name" : "sir alfred", "age" : 60, "likes" : [ "lettuce", "tortoises" ] },
  { "name" : "mozquito", "age" : 15, "likes" : [ "skateboards", "music" ] },
  { "name" : "murphy", "age" : 28, "likes" : [ "juice", "tarts", "cakes" ] },
  { "name" : "helga", "age" : 52, "likes" : [ "home", "garden", "tortoises", "cakes" ] }
]

To return all members with an age of 40 or higher and that also like tortoises, we can now combine the star operator with an inline FILTER expression:

RETURN @members[* FILTER CURRENT.age >= 40 && "tortoises" IN CURRENT.likes]

The inline FILTER expression has access to the current array element via the pseudo-variable CURRENT. The FILTER expression can thus access the element’s attributes and sub-attributes, and also use them in function expressions.

The above query will return only two array members as expected:

[
  { "name" : "sir alfred", "age" : 60, "likes" : [ "lettuce", "tortoises" ] },
  { "name" : "helga", "age" : 52, "likes" : [ "home", "garden", "tortoises", "cakes" ] }
]

It’s also possible to extract just a specific sub-attribute of each result value:

RETURN @members[* FILTER CURRENT.age >= 40 && "tortoises" IN CURRENT.likes].name

This will return:

[
  "sir alfred",
  "helga"
]

If we don’t want to return the whole match but a single attribute is not enough, the star operator can be used in combination with arbitrary projections, too.

The following query extracts the matching members a FILTER as above, but now only returns each member’s age attribute and the number of values in the member’s likes attribute:

RETURN @members[* FILTER CURRENT.age >= 40 && "tortoises" IN CURRENT.likes RETURN {
  name: CURRENT.name,
  likes: LENGTH(CURRENT.likes)
}]

This will produce the following result:

[
  { "name" : "sir alfred", "likes" : 2 },
  { "name" : "helga", "likes" : 4 }
]

If only a certain number of values is required, the star operator can be combined with a LIMIT clause, too. This is useful for testing whether at least one of the array members satisfies a FILTER condition:

RETURN @members[* FILTER "garden" IN CURRENT.likes LIMIT 1]

Overall, the star operator is now much more powerful than before, so in many queries it can replace full-blown FOR loops and subqueries when the goal simply is to iterate over sub-parts of a result.

Multi-star operator

In 2.7 there is now also a multi-star operator ([**]). This operator can be used to iterate over an array, too. In addition it will also flatten its input, so it can be used for collapsing nested array structures.

This is useful in cases where a flat result is required but the single star operator would return a nested array.

Consider this query with the single star operator:

RETURN @members[* FILTER CURRENT.age >= 40].likes[*]

This will produce:

[
  [ "lettuce", "tortoises" ],
  [ "home", "garden", "tortoises", "cakes" ]
]

To get a collapsed array, we can employ the double star operator:

RETURN @members[* FILTER CURRENT.age >= 40].likes[**]

Then we’ll get:

[
  "lettuce",
  "tortoises",
  "home",
  "garden",
  "tortoises",
  "cakes"
]

Note: the result of this query can also be made unique using the standard AQL function UNIQUE().

The star operator in 2.7 can have any number of levels. Using it with a single star will simply iterate over the input array, using it with two stars will collapse one level of the input, using it with three stars will collapse two levels of the input etc.

Subquery result usage

While working on the AQL parser anyway, the grammar has been modified so it allows a few more types of expressions.

For example, the result of a subquery can now be used as an array and the subquery results can be accessed by position directly:

RETURN (
  FOR i IN [ 1, 2, 3 ]
   RETURN i
)[0]

The trailing [0] led to a parse error in previous versions. To make this work in previous versions, the subquery result needs to be captured in an extra variable using LET before accessing the 0th member of that variable:

LET temp = (
  FOR i IN [ 1, 2, 3 ]
   RETURN i
)
RETURN temp[0]

The parser generalizations now also allow to use the star operator directly on a subquery result and access its individual members:

RETURN (
  FOR member IN [
    { name: "sir alfred" },
    { name: "mozquito" }
  ]
   RETURN member
)[*].name
Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

2 Comments

  1. CoDEmanX on June 18, 2015 at 7:28 pm

    Wow, great new features! Some questions:
    1. Will inline filters be the same speed as their full-blown equivalents?
    2. Will something like [***********] flatten levels up to 10 without errors if there are less than that many levels, and what if they have different number of levels?
    3. RETURN i in the last snippet is supposed to be RETURN member, isn’t it?

    • jsteemann on June 19, 2015 at 5:18 pm

      Re 1: This will depend on the case. In case the inline filters are replacing a subquery (e.g. in case `FOR doc IN collection RETURN { name: doc.name, filteredSubAttributes: (FOR sub IN doc.attr FILTER sub.whatever == 30 RETURN sub.name) }`), they may be faster, because the overhead of a full-blown subquery can be avoided. In other cases, they may be on par with other constructs, and some cases they may be sub-par. We don’t yet have good heuristics yet, but I am pretty sure these operations can be optimized if required.
      Re 2: [***********] will flatten up to 10 levels without errors, regardless of how deeply nested the value is. It will go depth-first into the value up to 10 levels deep and pull each member of an array up one level. It won’t recurse further than 10 levels, so nested structures beyond level 10 would remain. Only arrays and sub-arrays will be flattened, so if an array has less than 10 levels of nesting, then this won’t be a problem, and all nested values will be pulled up to the top level.
      Re 3: yes, you’re right. The article is adjusted now. Thanks!

Leave a Comment





Get the latest tutorials, blog posts and news: