home shape

What Makes ArangoDB a Graph Database?

When looking for a solution for your project, it is important to understand what makes each technology unique, what sets it apart. With ArangoDB that is its native multi-model approach including full graph database capabilities and I am going to explain the fundamental pieces of what that means.

Using ArangoDB as a Graph Database

If you are already familiar with the graph database concept, then you know that a graph consists of vertices (or nodes) connected via edges. Graph databases usually store edges connected to vertices directly at the vertex object. In ArangoDB this is handled differently (if you want to take a technical deep dive into ArangoDB’s approach, see this article).

Today, we will take a look at how ArangoDB let’s you map graph data natively to the database and how the database provides efficient access to graph datasets with a variety of different access patterns like traversals, shortest path or pattern matching.

Vertices (or nodes) are being stored in normal collections. The key to graph database capabilities comes from something called, an edge collection and an edge index.

Let’s take a quick look at storing vertices and then explore a bit more on edge collections and indices.

Vertex Collections

To showcase the benefits of using graphs with ArangoDB we will use the example of domestic flights in the USA. The dataset describes the relationship of airports (vertices) and the flights (edges) between them. We use the same dataset in our Graph Course for Beginners

Here are two example JSON documents from our airports collection:

{
    "_key": "JFK",
    "_id": "airports/JFK",
    "_rev": "_YOO08KG-_T",
    "name": "John F Kennedy Intl",
    "city": "New York",
    "state": "NY",
    "country": "USA",
    "lat": 40.63975111,
    "long": -73.77892556,
     "vip": true
}
{
    "_key": "BIS",
    "_id": "airports/BIS",
    "_rev": "_YOSrLBe--r",
    "name": "Bismarck Municipal",
    "city": "Bismarck",
    "state": "ND",
    "country": "USA",
    "lat": 46.77411111,
    "long": -100.7467222,
     "vip": false
}

The airports collection is a normal collection of JSON documents and requires nothing special or out of the ordinary to work with a graph. Please note the _id attribute, as this will play a crucial role for our graph.

We will explore these documents a bit more in a moment, for now though, just understand that our airports collection contains normal JSON documents that represent airports.

The Edge Collection

To explain what an edge collection is, let’s start with a simple explanation of it; a special collection of JSON documents that describe the connection between two other documents.

Pretty simple right? Well, I have some good news, it actually is that simple. The power of native multi-model in ArangoDB is that edges stored in an edge collection are not tied to vertices stored in another collection, but can be stored and distributed independently – providing advantages in terms of data modeling flexibility and, most importantly, horizontal scalability.

Let’s go a little deeper here and take a look at what exactly, “describing the connection between two other documents” looks like.

A document in an Edge Collection will always contain at least five attributes. Those attributes are _id, _key, _rev, _to, and _from. The ‘magic’ comes from the _to and _from attributes. These two attributes define the beginning and end points for the edge, they are the _id attributes of the vertices that they connect to.

In our airports and flights example, airports are the vertices and flights ‘connect’ the airports with one another and therefore are the edges of our graph. Here are two edge documents from the flights/edge collection.

{
    "_key": "25471",
    "_id": "flights/25471",
    "_from": "airports/BIS",
    "_to": "airports/MSP",
    "_rev": "_YOO8JXG--f",
    "Year": 2008,
    "Month": 1,
    "Day": 2,
    "DayOfWeek": 3,
    "DepTime": 1055,
    "ArrTime": 1224,
    "DepTimeUTC": "2008-01-02T16:55:00.000Z",
    "ArrTimeUTC": "2008-01-02T18:24:00.000Z",
    "UniqueCarrier": "9E",
    "FlightNum": 5660,
    "TailNum": "85069E",
    "Distance": 386
}
{
    "_key": "71374",
    "_id": "flights/71374",
    "_from": "airports/JFK",
    "_to": "airports/DCA",
    "_rev": "_YOO8LYG--N",
    "Year": 2008,
    "Month": 1,
    "Day": 4,
    "DayOfWeek": 5,
    "DepTime": 1604,
    "ArrTime": 1724,
     "DepTimeUTC": "2008-01-04T21:04:00.000Z",
     "ArrTimeUTC": "2008-01-04T22:24:00.000Z",
     "UniqueCarrier": "MQ",
     "FlightNum": 4755,
     "TailNum": "N854AE",
     "Distance": 213
}

In addition to our required attributes, these documents actually contain all of the information for individual flights; the first flight goes from Bismarck(BIS) to Minneapolis(MSP) airport, while the second flight is from John F Kennedy(JFK) to DCA(Ronald Reagan) airport. You can tell this by looking at the _from and _to fields in the documents.

Taking a look at the rest of the fields in the document, notice that we have all of the information related to those individual flights including date, departure and arrival times, flight number, and more. Although these documents are a part of an edge collection, they can still be queried like documents in standard collections, as well. You could even use nested properties on edges if you wanted to.

Now, let’s change our approach here, just slightly. Let’s say we wanted to get a flight from the Bismarck(BIS) airport to the Denver airport, how could we do that? The edge collection allows us to explore connections between airports by querying our flights edge collection, to find a flight that goes from Bismarck to Denver. We can do this due to the _from and _to attributes. Bismarck is shown as “_from”: “airports/BIS” in the above example document. Our graph also knows the destination airports of those flights with the _to fields. Our result ends up returning something like this.

flight edge

Here we can see that flight 14426 departs from Bismarck and lands in Denver. We also have all of the information we would need for each airport because the edge has a reference to the actual documents for each airport. This is something that we were able to find due to having the edge collection(flights) that creates a relationship between airports with the flights to and from them.

Edge Index

Typically, with ArangoDB, documents contain a hash index of their document keys attribute that offers a way to quickly lookup documents using either their _key or _id attributes. Edge collections in ArangoDB have an additional, implicitly created, hash edge index, that provides quick access to the _to and _from fields of the edge documents, this means our queries can fetch results quickly and with a constant lookup time.

Due to the edge index specifically indexing the _to and _from fields, they are most useful when doing equality lookups, such as, looking for a connection to or from a specific airport. When doing other queries such as range queries or when sorting, edge indexes won’t be very beneficial. Although additional edge indexes cannot be explicitly created, you can use the _from and _to fields in your own indexes to improve your query performance. We showcase some of the performance benefits of using indexes with graphs in our performance benchmark and we also offer a performance course with strategies for improving AQL query speed.

Conclusion

As promised, I have shown you just a little piece of what makes ArangoDB a highly flexible graph database, but edge collections are just one of the many features that ArangoDB has to offer.

This example shows just one connection from Bismarck to Denver, what if we weren’t able to find a direct flight to Denver? Using our edge collection and the power of graph traversals we can start doing more complex queries that can do things like allow for connecting flights, all flights from an airport or to a specific airport, all flights between two airports, and more. If you would like to know more about graph traversals, pattern matching, and doing more complex queries, take our Graph Course for Freshers, which takes you from zero to advanced with the ArangoDB Query Language(AQL).

Chris

Christopher Woodward

Chris has over 10 years experience at all angles of technology including service, support, and development. He is also passionate about learning and right now he is focused on improving the learning experience for the ArangoDB community. Chris believes the future is native multi-model and wants to help tell the world.

Leave a Comment





Get the latest tutorials, blog posts and news: