home shape

Best Practices for AQL Graph Queries

Estimated reading time: 8 minutes

Best Practices for AQL Graph Que

The ArangoDB Query Language(AQL) was designed to accomplish a few important goals, including:

  • Be a human-readable query language
  • Client independency 
  • Support complex query patterns
  • Support all ArangoDB data models with one language

The goal of this guide is to ensure that you get the most out of the above design goals, by providing some suggested best practices. Just like many programming languages, AQL allows the developer to format their code in whatever way comes naturally to them. However, just like with programming languages, there are different style guides and best practices that the community agrees upon. These guidelines can make reading and maintaining code easier and sometimes even more performant. 

By the end of this guide you will be aware of what ArangoDB considers best practices for both formatting and performance, allowing you to write fast and clean graph queries in AQL.

This guide is split into two sections:

  • Formatting:
    • Syntax formatting
    • Styling conventions
  • IMDB Dataset Example Notebook

This guide mostly focuses on best practices, styling, and considerations when writing queries and assumes some prior knowledge of AQL. As such, it will be beneficial if you are already familiar with AQL and performing graph queries. 
If you would like to get up to speed with Graphs, AQL, and running queries take a look at the Graph Course for Freshers which takes you from beginner to advanced graph queries in AQL.

Formatting

The first section of the guide will contain:

  • A review of the different graph syntax
  • Terms used  throughout the guide
  • AQL styling conventions

I like your style

As was mentioned in the introduction, AQL has a flexible formatting schema, similar to most popular programming languages. Not requiring strict formatting in AQL was a conscious choice, meant to allow developers the flexibility to write queries in an easy and natural way.

This flexibility is especially helpful for developers new to AQL and removes some of the barriers to learning the query language. However, as your AQL queries start to become a part of your application’s business logic, maintainability and readability gain importance. It becomes important to have rules setup that allow other developers to be able to review and contribute changes as needed. When the business needs change and queries need to be updated or new ones added, having a thought out guide for formatting can save time and effort.

This section will lay the foundation for basic styling decisions and then we will expand upon these guidelines in the syntax and examples sections.

The following statement is the first line of a typical graph query in AQL. There are already a few things worth pointing and providing some insight into why we chose to format it this way.

FOR v, e, p IN 1..1 OUTBOUND

Guideline #1: Capitalization

In AQL we often capitalize the keyword or function being used in the query. This capitalization is an example of a convention and not a requirement. However, variables declared in queries are case-sensitive and typically lowercase.

✘ for v,e,p in 1..1 outbound
✔ FOR v,e,p IN 1..1 OUTBOUND

These guidelines are just that, guidelines, not requirements. The important thing is that you are thinking about your queries in terms of readability and if the lowercase version works for you and your team, that is what is important.

Guideline #2: Naming

In the following FOR loop we have supplied names for the 3 possible variables to be emitted in a graph traversal. 

FOR v,e,p IN 1..1 OUTBOUND

This guideline will most likely be review for many developers but it is equally important to choose descriptive and explicit names for your AQL queries. In fact, one could argue that instead of choosing non-descript letters for our variable names, we should instead choose the names themselves as the variables:

FOR vertex,edge,path IN 1..1 OUTBOUND

This comes down to personal preference and what you decide is most readable. 

You can use variables or reference attributes of documents with matching names of AQL keywords by using backticks, for example:

FOR `filter` IN collection
RETURN `filter`

This functionality exists to help in situations where this conflict cannot be avoided. The ArangoDB recommended guideline is to instead avoid using names that conflict with any AQL keyword.

Guideline #3: Next Line

When forming AQL queries you have the freedom to space and go to the next line, whenever it makes sense to you. For example, if we were to add a FILTER statement:

FOR vertex, edge, path IN 1..1 OUTBOUND
  startVertex GRAPH “graphName”
  FILTER vertex._key == “KeyValue”
  RETURN path

It is convention to use two spaces on the next line after each FOR statement. The purpose of this is to use white space to show where the bulk of the query statements are happening and resembles function declarations in programming languages.

Guideline #4: Commenting

Putting comments in your AQL queries is an easy and inexpensive way to provide clarity to potentially complex queries. AQL supports two styles of commenting:

Single line commenting:

// Your comment here

Multi-line comment (recommended):

/* Your multi-line
     comment here. */

There are certain situations where using the single line commenting format causes issues when attempting to copy-paste queries between systems. Either style will work properly with AQL but using the multi-line style of commenting is more portable and is our recommended style for comments.

Graph Syntax

This section serves two purposes:

  • Highlight style and formatting decisions with graph traversals
  • Review basic graph syntax

This is the graph syntax example pulled from our documentation and it shows all of the possible options available for graph queries. In this section we will go through each line and clarify the terms being used and the style decisions made.

[WITH vertexCollection1[, vertexCollection2[, ...vertexCollectionN]]]
FOR vertex[, edge[, path]]
  IN [min[..max]]
  OUTBOUND|INBOUND|ANY startVertex
  GRAPH graphName || edgeCollection1, ..., edgeCollectionN  
  [PRUNE pruneCondition]
  [OPTIONS options]

WITH

[WITH vertexCollection1[, vertexCollection2[, ...vertexCollectionN]]]

The first line in the query starts with the WITH keyword. WITH is a versatile keyword in AQL and in this context it is required for AQL queries in a cluster. While the WITH keyword is only required for cluster queries it is recommended that you use it with all graph queries. There are a few advantages to this:

  • Provides clarity to collections used in query.
  • Makes your queries future ready, when/if you move from single server to cluster.
  • Read locks collections, avoiding deadlocking queries.

FOR

FOR vertex[, edge[, path]]

Nothing new to point out here, we capitalize FOR and in this representation we are using the full names for the variables. Remember, it is not required to use these variable letters or names, just a convention. You could instead put a,b,c or node, line, route, or whatever works best for you and your team.

IN

IN [min[..max]]

Here again, we capitalize IN and supply the min .. max value. We have this on a separate line keeping the depth separate from other parts of the query, we find this helps improve readability. As a general rule, going to the next line for different portions of the query provides some nice whitespace to improve readability and can be helpful when needing to make changes.

Direction

OUTBOUND|INBOUND|ANY startVertex

We place the direction keyword on the same line as the startVertex because they both deal with navigation. This is another thing that helps keep the query readable and allows you to think about queries in bit size chunks, which is useful for new users reading the query and when you are debugging your own queries.

Graph

GRAPH graphName || edgeCollection1, ..., edgeCollectionN

This line, in the query, is the difference between using a named graph or an anonymous graph. There are some key differences between the two but, style-wise, it is pretty straightforward.

Using the GRAPH keyword followed by the ‘graphName’ provides:

  • Readability
  • Maintainability by being able to update one graph definition used with multiple queries
  • Potential performance decrease when traversing large graphs

Using an anonymous graph provides:

  • Query-time flexibility
  • Reduced readability
  • Performance improvements, due to only traversing the specified collections

As you can see, the decision of which graph to use in AQL is not as clear. You can make a trade off of being able to keep queries clean and easily manageable, with a potential for a loss in performance when traversing graphs that contain a large number of collections. 

While with an anonymous graph, you trade in readability and maintainability for some flexibility and possible performance gain. You would only improve performance if your graph contains many large collections that don’t need to be traversed in most of your queries.

In AQL, the decision to use a defined named graph instead of an anonymous graph comes down to how your data is modeled and the queries your application needs to run. This is what we will continue to explore throughout this guide.

Condition and Options

Conditions

The final two lines deal with the conditions for finding the desired documents. This may be done by using FILTER, PRUNE, and any other appropriate statements to narrow down the results for your traversal. These follow the general rules covered previously in this guide and conventionally you would go to the next line for each new statement.

Options

The options statement requires you to submit an object and typically you will see this object following a javascript-like format. This really comes down to what feels most natural to you but as an example here is how we format it:

OPTIONS {
 bfs: true,
 uniqueVertices: 'path',
 uniqueEdges: 'path'
 }

We have covered some of the formatting, styling, and various conventions used in AQL graph traversals in the previous section. While some of it may have seemed obvious, I hope it serves as a good reference for deciding how to structure your queries, from a styling perspective. 

In the next section we will put this styling to use and cover performance considerations when writing graph queries. We will take a look at some example queries and review some common pitfalls when coming from other query languages that can help you keep your queries fast and clean.

Examples

The below is an interactive Google Colab Notebook that will walk you through some examples of ways to keep your graph queries fast, clean, and readable. We use the IMDB dataset and AQL to solve various queries that benefit from using our graph AQL best practice guidelines.

Check it out on github

Hear More from the Author

IP Whitelists with ArangoDB ArangoGraph

Certificates with ArangoDB ArangoGraph

Continue Reading

Happy Holidays from ArangoDB!

Deploying ArangoDB 3.4 on Kubernetes

ArangoDB 3.4 GA
Full-text Search, GeoJSON, Streaming & More

Chris

Christopher Woodward

Chris has over 10 years experience at all angles of technology including service, support, and development. He is also passionate about learning and right now he is focused on improving the learning experience for the ArangoDB community. Chris believes the future is native multi-model and wants to help tell the world.

Leave a Comment





Get the latest tutorials, blog posts and news: