right blob img min

Comparing ArangoDB AQL to Neo4j Cypher

ArangoDB is a multi-model database, and one of its supported data models are graphs. If you come from Neo4j, this comparison should help you to get started with ArangoDB’s graph related features, but also demonstrate what else you can do with a native multi-model database like ArangoDB.

right blob img min
right blob img min
right blob img min
right blob long
scroll down line
right blob long
right blob min

Language Models

Cypher is a query language solely focused on graphs, created by and primarily used in Neo4j. As you might already know, the pattern you want to find in the full graph is described in a visual way, like ASCII art. Around that, clauses inspired by SQL like WHEREORDER BY and others are used to process the data. It also covers data definition with the CREATE keyword. The language can be classified as declarative, but less structured than SQL. There are also functions which can be called, like shortestPath().

In comparison, AQL is a full multi-model query language – encompassing document, relational, search and graph query capabilities. It was invented to overcome the limitations of SQL for dealing with schemaless data and the JSON document model. It enables multi-model queries with one language backed by a single database core.

What that means is that you can do, for example, a prefix search over multiple collections and fields (ArangoSearch), then a traversal from the found documents to neighbor nodes at a variable depth, then resolve values in the found documents by using a join, and all that in a single query at high speed. AQL is declarative, but also borrows concepts from programming languages. A lot of core functionality is based around the FOR loop construct. There are also plenty of functions. CRUD operations are supported via the INSERTUPDATEREPLACE and REMOVE constructs, but collections and indices can’t be created or managed through AQL. It can be done in the server’s web interface, arangosh (an interactive shell we ship) or through the HTTP API instead.

Graph Database Concepts in ArangoDB

Naming convention comparison

Here is a quick overview of terms which describe similar concepts:

AQL Cypher
vertexnode
edgerelationship
collection(group of nodes)
document (node with properties)
document collection node label
edge collection relationship type
attributeproperty
depthhops
arraylist
objectmap

While you can use arbitrary labels and types in Neo4j in an ad-hoc fashion, it is necessary to create collections in ArangoDB before you can insert vertices and edges into them. In ArangoDB you may create secondary indices on collections for faster lookup speeds. Collections can be organized in databases for multi-tenancy.

ArangoDB Architecture Employee manages

Keyword Comparison

The basic language constructs and their keywords in comparison:

AQL Cypher
FOR … IN … RETURN	
MATCH … RETURN
FOR … IN	
UNWIND
FILTER
WHERE
SORT
ORDER BY
LIMIT count	
LIMIT count
LIMIT offset, count	
SKIP offset LIMIT count
OUTBOUND*
-->
INBOUND*
<--
ANY
--
INSERT … INTO	
CREATE
UPDATE … IN	
SET
REPLACE … IN	
SET
REMOVE … IN	
DELETE

* in Cypher you express the edge direction as stored or you use two hyphens to traverse an edge either way. In AQL, you provide a start vertex and control the traversal direction with a keyword: OUTBOUND to follow in edge direction, INBOUND to follow in reverse direction or ANY to traverse the edge regardless of the direction.

Example Data

We use a simple company graph for our comparison:

Employee Graph

As you can see, edges point from superior to subordinate in our demonstration. Beside their names, we will also give them a job title (role) and an age:

NameRole Age
Ann Boss42
TraceyDeveloper Lead 35
JosefinaMarketing Lead 29
SammyProgrammer35
ErynFrontend Developer 51
QuinnGraphics Designer42
MarkMarketing Operations 35

To store this data in ArangoDB, we use a trivial model:

  • Our employee nodes will be stored in a document collection Employee
  • The relations will be stored in an edge collection manages

Data Model in ArangoDB

A few remarks:

  • Collections need to be created before data can be inserted into them. You can use the ArangoDB’s web interface to do so.
  • Every collection has a primary index on a special property, the _key attribute. This index is automatically created and can not be removed. The _key attribute stores the document key as string, which is unique within a collection.
  • There is a virtual attribute _id for stored documents, which is the concatenation of the collection name, a forward slash and the document key. It uniquely identifies a document within a database.
  • Edges are also documents in ArangoDB, but with special _from and _to attributes which reference other documents (nodes). Because documents are JSON objects you may store arbitrary attributes on edges, including nested objects.
  • Edge collections have a special edge index built-in, which enables fast graph traversals. It indexes the _from and _to attributes, which reference other documents using _id values.
Graph Structure 2

To try out the AQL queries presented below, get ArangoDB if you don’t have it already, then open its web interface, go to COLLECTIONS and create a document collection Employee and an edge collection manages. Then click on QUERIES and run the following query:

LET temp = (FOR e IN [
 {"_key":"ann", name:"Ann", "role":"boss", "age":42},
 {"_key":"tracey", name:"Tracey", "role":"lead developer", "age":35},
 {"_key":"josefina", name:"Josefina", "role":"marketing manager", "age":29},
 {"_key":"sammy", name:"Sammy", "role":"programmer", "age":35},
 {"_key":"eryn", name:"Eryn", "role":"frontend developer", "age":51},
 {"_key":"quinn", name:"Quinn", "role":"graphics designer", "age":42},
 {"_key":"mark", name:"Mark", "role":"marketing operator", "age":35}
] INSERT e INTO Employee)
 
FOR m IN [
 {"_from": "Employee/ann", "_to": "Employee/tracey"},
 {"_from": "Employee/ann", "_to": "Employee/josefina"},
 {"_from": "Employee/tracey", "_to": "Employee/sammy"},
 {"_from": "Employee/tracey", "_to": "Employee/eryn"},
 {"_from": "Employee/josefina", "_to": "Employee/quinn"},
 {"_from": "Employee/josefina", "_to": "Employee/mark"}
] INSERT m INTO manages

This will create the regular documents and the edge documents in the two collections.

Basic Traversals

The basic syntax for traversals in AQL is as follows:

Graph Traversal Syntax explained 2 1

Let us compare some queries so that you understand how it works.

Get the employees directly managed by Ann:

FOR v IN OUTBOUND "Employee/ann" 
manages RETURN v.name
MATCH (:Employee
{name:'Ann'})-[:MANAGES]->(e:Employee) 
RETURN e.name

Result of AQL query:

[ "Tracey", "Josefina" ]

Find the superior of Tracey:

FOR v IN INBOUND "Employee/tracey" 
manages RETURN v.name
MATCH (e:Employee)-[:MANAGES]->
(:Employee {name:'Tracey'}) 
RETURN e.name

Result of AQL query:

[ "Ann" ]

Get the employees managed by Ann, directly and indirectly (up to two levels, which means the entire graph in our example):

FOR e IN 1..2 
OUTBOUND "Employee/Ann" manages 
RETURN e.name
MATCH (:Employee 
{name:'Ann'})-[:MANAGES*1..2]->
(e:Employee) 
RETURN e.name

Result of AQL query:

[
   "Tracey",
   "Sammy",
   "Eryn",
   "Josefina",
   "Quinn",
   "Mark"
]

Traversals in AQL default to a depth of 1, so FOR … IN OUTBOUND … means the minimum and maximum number of hops will be 1. If you write FOR … IN 2 … then the minimum as well as the maximum will be 2. To specify different values you write it as shown in above query. Traversals with an unlimited depth like in Cypher using an asterisk (*) is not supported in AQL, but you may set a very high maximum.

Pattern Matching

In ArangoDB, we call traversals with conditions pattern matching. Without conditions it would be a simple traversal, even though in Cypher every search may be considered a pattern matching.

Using the previous query, let us extend it with filter conditions. In below example, we want to find employees at least 30 and at most 35, managed by Ann directly or indirectly:

FOR e IN 1..2 OUTBOUND
"Employee/ann" manages 
FILTER e.age >= 30 AND e.age < 40 
RETURN {name: e.name, age: e.age}
MATCH (:Employee 
{name:'Ann'})-[:MANAGES*1..2]->
(e:Employee) 
WHERE e.age > 30 AND e.age <= 40 
RETURN e.name

Result of AQL query:

[
  { "name": "Tracey", "age": 35 },
  { "name": "Sammy", "age": 35 },
  { "name": "Mark", "age": 35 }
]

Shortest Path

We can determine the official channel for Quinn to pass a message on to Eryn by finding the shortest path between them. We follow in any direction, because the edge orientation changes midway at Ann. If you know that the direction doesn’t change on the paths you are interested, then use either directed traversals, so INBOUND or OUTBOUND in AQL.

Employee Shortest path
FOR e IN ANY 
SHORTEST_PATH "Employee/quinn" 
TO "Employee/eryn" manages 
RETURN e.name
MATCH (quinn:Employee 
{name:'Quinn'}),(eryn:Employee 
{name:'Eryn'}), 
p=shortestPath((quinn)-[*]-(eryn)) 
UNWIND nodes(p) as n 
RETURN n.name

Result of AQL query:

[
  "Quinn",
  "Josefina",
  "Ann",
  "Tracey",
  "Eryn"
]

Aggregation

AQL comes with a broad aggregation framework to group by one or multiple values. It can also be used to calculate things like an average value on the fly. The following example could also use a graph traversal, but for simplicity we just use all employee records we have to calculate the average age, rounded to a full number:

FOR e IN Employee
COLLECT AGGREGATE avg = AVG(e.age)
RETURN ROUND(avg)
 
MATCH (e:Employee)
RETURN ROUND(AVG(e.age))

Result of AQL query:

[ 38 ]

Here is a simple example how to group by age and count how many employees are of the same age:

FOR e IN Employee
COLLECT age = e.age WITH COUNT INTO count
RETURN {age, count}
MATCH (e:Employee)
RETURN e.age as age, 
COUNT(e.age) as count

Result of AQL query:

[
  { "age": 29, "count": 1 },
  { "age": 35, "count": 3 },
  { "age": 42, "count": 2 },
  { "age": 51, "count": 1 }
]