Graph traversals in AQL

You can traverse named graphs and anonymous graphs with a native AQL language construct

Syntax

There are two slightly different syntaxes for traversals in AQL, one for

named graphs and another to
specify a set of edge collections (anonymous graph).

Working with named graphs

The syntax for AQL graph traversals using named graphs is as follows (square brackets denote optional parts and | denotes alternatives):

FOR vertex[, edge[, path]]
  IN [min[..max]]
  OUTBOUND|INBOUND|ANY startVertex
  GRAPH graphName
  [PRUNE [pruneVariable = ]pruneCondition]
  [OPTIONS options]

FOR: emits up to three variables:
- vertex (object): the current vertex in a traversal
- edge (object, optional): the current edge in a traversal
- path (object, optional): representation of the current path with two members:
  - vertices: an array of all vertices on this path
  - edges: an array of all edges on this path
IN min..max: the minimal and maximal depth for the traversal:
- min (number, optional): edges and vertices returned by this query start at the traversal depth of min (thus edges and vertices below it are not returned). If not specified, it defaults to 1. The minimal possible value is 0.
- max (number, optional): up to max length paths are traversed. If omitted, max defaults to min. Thus only the vertices and edges in the range of min are returned. max cannot be specified without min.
OUTBOUND|INBOUND|ANY: follow outgoing, incoming, or edges pointing in either direction in the traversal. Note that this can’t be replaced by a bind parameter.
startVertex (string|object): a vertex where the traversal originates from. This can be specified in the form of an ID string or in the form of a document with the _id attribute. All other values lead to a warning and an empty result. If the specified document does not exist, the result is empty as well and there is no warning.
GRAPH graphName (string): the name identifying the named graph. Its vertex and edge collections are looked up. Note that the graph name is like a regular string, hence it must be enclosed by quote marks, like GRAPH "graphName".
PRUNE expression (AQL expression, optional): An expression, like in a FILTER statement, which is evaluated in every step of the traversal, as early as possible. The semantics of this expression are as follows:
- If the expression evaluates to false, the traversal continues on the current path.
- If the expression evaluates to true, the traversal does not continue on the current path. However, the paths up to this point are considered as a result (they might still be post-filtered or ignored due to depth constraints). For example, a traversal over the graph (A) -> (B) -> (C) starting at A and pruning on B results in (A) and (A) -> (B) being valid paths, whereas (A) -> (B) -> (C) is not returned because it gets pruned on B.
You can only use a single PRUNE clause per FOR traversal operation, but the prune expression can contain an arbitrary number of conditions using AND and OR statements for complex expressions. You can use the variables emitted by the FOR operation in the prune expression, as well as all variables defined before the traversal.
You can optionally assign the prune expression to a variable like PRUNE var = <expr> to use the evaluated result elsewhere in the query, typically in a FILTER expression.
See Pruning for details.
OPTIONS options (object, optional): used to modify the execution of the traversal. Only the following attributes have an effect, all others are ignored:
- order (string): optionally specify which traversal algorithm to use
  - "bfs" – the traversal is executed breadth-first. The results first contain all vertices at depth 1, then all vertices at depth 2 and so on.
  - "dfs" (default) – the traversal is executed depth-first. It first returns all paths from min depth to max depth for one vertex at depth 1, then for the next vertex at depth 1 and so on.
  - "weighted" - the traversal is a weighted traversal (introduced in v3.8.0). Paths are enumerated with increasing cost. Also see weightAttribute and defaultWeight. A returned path has an additional attribute weight containing the cost of the path after every step. The order of paths having the same cost is non-deterministic. Negative weights are not supported and abort the query with an error.
- bfs (bool): deprecated, use order: "bfs" instead.
- uniqueVertices (string): optionally ensure vertex uniqueness
  - "path" – it is guaranteed that there is no path returned with a duplicate vertex
  - "global" – it is guaranteed that each vertex is visited at most once during the traversal, no matter how many paths lead from the start vertex to this one. If you start with a min depth > 1 a vertex that was found before min depth might not be returned at all (it still might be part of a path). It is required to set order: "bfs" or order: "weighted" because with depth-first search the results would be unpredictable. Note: Using this configuration the result is not deterministic any more. If there are multiple paths from startVertex to vertex, one of those is picked. In case of a weighted traversal, the path with the lowest weight is picked, but in case of equal weights it is undefined which one is chosen.
  - "none" (default) – no uniqueness check is applied on vertices
- uniqueEdges (string): optionally ensure edge uniqueness
  - "path" (default) – it is guaranteed that there is no path returned with a duplicate edge
  - "none" – no uniqueness check is applied on edges. Note: Using this configuration, the traversal follows edges in cycles.
- edgeCollections (string|array): Optionally restrict edge collections the traversal may visit. If omitted, or an empty array is specified, then there are no restrictions.
  - A string parameter is treated as the equivalent of an array with a single element.
  - Each element of the array should be a string containing the name of an edge collection.
- vertexCollections (string|array): Optionally restrict vertex collections the traversal may visit. If omitted, or an empty array is specified, then there are no restrictions.
  - A string parameter is treated as the equivalent of an array with a single element.
  - Each element of the array should be a string containing the name of a vertex collection.
  - The starting vertex is always allowed, even if it does not belong to one of the collections specified by a restriction.
- parallelism (number, optional):
  ArangoDB Enterprise Edition ArangoGraph
  Optionally parallelize traversal execution. If omitted or set to a value of 1, traversal execution is not parallelized. If set to a value greater than 1, then up to that many worker threads can be used for concurrently executing the traversal. The value is capped by the number of available cores on the target machine.
  Parallelizing a traversal is normally useful when there are many inputs (start vertices) that the nested traversal can work on concurrently. This is often the case when a nested traversal is fed with several tens of thousands of start vertices, which can then be distributed randomly to worker threads for parallel execution.
- maxProjections (number, optional):
  ArangoDB Enterprise Edition ArangoGraph
  Specifies the number of document attributes per FOR loop to be used as projections. The default value is 5.
- weightAttribute (string, optional): Specifies the name of an attribute that is used to look up the weight of an edge. If no attribute is specified or if it is not present in the edge document then the defaultWeight is used. The attribute value must not be negative.
- defaultWeight (number, optional): Specifies the default weight of an edge. The value must not be negative. The default value is 1.

Weighted traversals do not support negative weights. If a document attribute (as specified by weightAttribute) with a negative value is encountered during traversal, or if defaultWeight is set to a negative number, then the query is aborted with an error.

Working with collection sets

The syntax for AQL graph traversals using collection sets is as follows (square brackets denote optional parts and | denotes alternatives):

[WITH vertexCollection1[, vertexCollection2[, vertexCollectionN]]]
FOR vertex[, edge[, path]]
  IN [min[..max]]
  OUTBOUND|INBOUND|ANY startVertex
  edgeCollection1[, edgeCollection2[, edgeCollectionN]]
  [PRUNE [pruneVariable = ]pruneCondition]
  [OPTIONS options]

WITH: Declaration of collections. Optional for single server instances, but required for graph traversals in a cluster. Needs to be placed at the very beginning of the query.
- collections (collection, repeatable): list of vertex collections that are involved in the traversal
edgeCollections (collection, repeatable): One or more edge collections to use for the traversal (instead of using a named graph with GRAPH graphName). Vertex collections are determined by the edges in the edge collections.
You can override the default traversal direction by setting OUTBOUND, INBOUND, or ANY before any of the edge collections.
If the same edge collection is specified multiple times, it behaves as if it were specified only once. Specifying the same edge collection is only allowed when the collections do not have conflicting traversal directions.
Views cannot be used as edge collections.
See the named graph variant for the remaining traversal parameters. The edgeCollections restriction option is redundant in this case.

Traversing in mixed directions

For traversals with a list of edge collections you can optionally specify the direction for some of the edge collections. Say for example you have three edge collections edges1, edges2 and edges3, where in edges2 the direction has no relevance but in edges1 and edges3 the direction should be taken into account. In this case you can use OUTBOUND as general traversal direction and ANY specifically for edges2 as follows:

FOR vertex IN OUTBOUND
  startVertex
  edges1, ANY edges2, edges3

All collections in the list that do not specify their own direction use the direction defined after IN. This allows to use a different direction for each collection in your traversal.

Graph traversals in a cluster

Due to the nature of graphs, edges may reference vertices from arbitrary collections. Following the paths can thus involve documents from various collections and it is not possible to predict which are visited in a traversal. Which collections need to be loaded by the graph engine can only be determined at run time.

Use the WITH statement to specify the collections you expect to be involved. This is required for traversals using collection sets in cluster deployments.

Pruning

You can define stop conditions for graph traversals to return specific data and to improve the query performance. This is called pruning and works by checking conditions during the traversal as opposed to filtering the results afterwards (post-filtering). This reduces the amount of data to be checked by stopping the traversal down specific paths early.

You can specify one PRUNE expression per graph traversal, but it can contain an arbitrary number of conditions. You can use the vertex, edge, and path variables emitted by the traversal in a prune expression, as well as all other variables defined before the FOR operation. Note that PRUNE is an optional clause of the FOR operation and that the OPTIONS clause needs to be placed after PRUNE.

FOR v, e, p IN 0..10 OUTBOUND "places/Toronto" GRAPH "kShortestPathsGraph"
  PRUNE v.label == "Edmonton"
  OPTIONS { uniqueVertices: "path" }
  RETURN CONCAT_SEPARATOR(" -- ", p.vertices[*].label)

Graph traversals in AQL

Syntax

Working with named graphs

Working with collection sets

Traversing in mixed directions

Graph traversals in a cluster

Pruning

Using filters

Filtering on paths

Filtering edges on the path

Filtering vertices on the path

Combining several filters

Filter on the entire path

Filtering on the path vs. filtering on vertices or edges

Examples

Filter examples

Comparing OUTBOUND / INBOUND / ANY

Use the AQL explainer for optimizations