home shape

Alpha3 of ArangoDB 3.2: Support for Distributed Graph Processing

The next alpha release of the upcoming ArangoDB 3.2 is available for testing. You can download and install alpha3 here.

Moving forward

As ArangoDB 3.2 will include several new features and improvements, we realized that the release model that we currently follow has room for improvement. Going forward we will introduce milestone releases with ArangoDB 3.3. For this major release you will see a bit more alphas. You can read detailed info about the new structure model here.

Pregel computing model

In this alpha we introduce support for incremental graph processing algorithms in a single mode server as well as in the cluster.

Internally we implement the pregel computing model, which will enable us to support arbitrary graph algorithms, which will scale with your data (or with the size of your database cluster).

The pregel computing model was developed at Google and published in “Pregel: A System for Large-scale Graph Processing” by Malewicz et al. in 2010 (full paper).

A directed graph in pregel consists of vertices and edges, where each vertex only knows its outgoing edges. Crucially a vertex is not able to see the state of any other vertex. The core idea then is to imagine each vertex as an independent computer program, which is able to send and receive messages from and to all other vertices.

Pregel Messages

The vertex program can process messages and send messages to other machines. To allow for iterative processing there is a concept of Supersteps. A superstep is an iteration where every worker in a computing cluster can process messages and send messages, but sending a message in one superstep guarantees that it will be received in the next superstep. Similarly all messages processed in a superstep were sent during the previous superstep. This guarantee is called the Global Barrier between supersteps, all workers must wait until every other worker is finished before continuing.

Pregel Supersteps

A pregel execution system then becomes a distributed message passing system, which can easily work in a distributed computing cluster.

Example algorithm

To better understand how it works we will describe the model with an example algorithm. The algorithm “Single-Source Shortest Paths” or SSSP calculates the shortest path distance of every vertex from a single source vertex.

The red lines in the image below represent global superstep barriers, the graph is displayed after each superstep. Every superstep a vertex will send its current distance value plus the distance value on each edge to outgoing neighbor vertices.

  1. In superstep 1. the source vertex with value 0 will send the message “1” to his neighbour.
  2. In superstep 2. the second vertex processes the message “1” and updates his local value to “1” (because 1 < ∞). Then he sends the message “4” to his neighbour connected by an edge with value 3.
  3. The third (rightmost) vertex receives the message “4” and updates its local value to “4” (because 4 < ∞).

Afterwards there are no more messages to send, therefore the algorithm ends.

ExampleSSSP (Pregel)

Run a Pregel Algorithm

The first step to running a pregel algorithm is to import a graph, our graph will be named “demo”, and you can just copy and paste it in arangosh.

var graphName = "demo";
var vColl = "demo_v", eColl = "demo_e";
var graph = graph_module._create(graphName);
db._create(vColl, {numberOfShards: 4});
graph._addVertexCollection(vColl);
db._createEdgeCollection(eColl, {
                          numberOfShards: 4,
                          replicationFactor: 1,
                          shardKeys:["vertex"],
                          distributeShardsLike:vColl});
      
var rel = graph_module._relation(eColl, [vColl], [vColl]);
graph._extendEdgeDefinitions(rel);
      
var vertices = db[vColl];
var edges = db[eColl];

var A = vertices.insert({_key:'A'})._id;
var B = vertices.insert({_key:'B'})._id;
var C = vertices.insert({_key:'C'})._id;
var D = vertices.insert({_key:'D'})._id;
var E = vertices.insert({_key:'E'})._id;
var F = vertices.insert({_key:'F'})._id;
var G = vertices.insert({_key:'G'})._id;
var H = vertices.insert({_key:'H'})._id;
var I = vertices.insert({_key:'I'})._id;
var J = vertices.insert({_key:'J'})._id;
var K = vertices.insert({_key:'K'})._id;

edges.insert({_from:B, _to:C, vertex:'B'});
edges.insert({_from:C, _to:B, vertex:'C'});
edges.insert({_from:D, _to:A, vertex:'D'});
edges.insert({_from:D, _to:B, vertex:'D'});
edges.insert({_from:E, _to:B, vertex:'E'});
edges.insert({_from:E, _to:D, vertex:'E'});
edges.insert({_from:E, _to:F, vertex:'E'});
edges.insert({_from:F, _to:B, vertex:'F'});
edges.insert({_from:F, _to:E, vertex:'F'});
edges.insert({_from:G, _to:B, vertex:'G'});
edges.insert({_from:G, _to:E, vertex:'G'});
edges.insert({_from:H, _to:B, vertex:'H'});
edges.insert({_from:H, _to:E, vertex:'H'});
edges.insert({_from:I, _to:B, vertex:'I'});
edges.insert({_from:I, _to:E, vertex:'I'});
edges.insert({_from:J, _to:E, vertex:'J'});
edges.insert({_from:K, _to:E, vertex:'K'});

Then you can start for example PageRank on the “demo” graph:

var pregel = require("@arangodb/pregel");
var handle = pregel.start("pagerank", "graphname", {maxGSS: 25})
pregel.status(handle);
// Results can be seen
db.demo_v.all().toArray();

Or shortest paths:

handle = pregel.start("sssp", "demograph", {source: "vertices/V"});
pregel.status(handle);

Supported Algorithms

In the beginning we will support a number of well-known graph algorithms:

If you are interested in adding your own algorithms have a look at the source. For more info check the documentation.

Your comments, feedback and bug reports are very welcome – get in touch via our Community Slack #feedback32 channel.

Download alpha3 of ArangoDB 3.2

Jan Steemann

Jan Steemann

After more than 30 years of playing around with 8 bit computers, assembler and scripting languages, Jan decided to move on to work in database engineering. Jan is now a senior C/C++ developer with the ArangoDB core team, being there from version 0.1. He is mostly working on performance optimization, storage engines and the querying functionality. He also wrote most of AQL (ArangoDB’s query language).

Leave a Comment





Get the latest tutorials, blog posts and news: