Introduction to the Pregel Module in ArangoDB

This post is outdated, please see more recent infos below.

Please see a technical article about our current Pregel integration in our blog, details about the various Pregel algorithms ArangoDB supports in our documentation and a tutorial about Community Detection with real data in our training center.

 

 

Ever since Google introduced Pregel as a system for large-scale graph processing we thought of a way how to enable this feature in ArangoDB. So we set up an ArangoDB cluster, created some huge graphs and started evaluating the concept. We came up with a new ArangoDB module (called pregelRunner) that enables the user to write own algorithms, pass them to ArangoDB and have them executed in a Pregel-like fashion.

This means that the user’s algorithm is executed step wise on each server in the cluster in parallel for all its local vertices. In each step the vertices can send messages to each other to distribute information. These messages can be received by the other vertex in the next step. The algorithm terminates when there are no more active vertices left and no message has been sent.

We started to implement an experimental version of Pregel in ArangoDB. You need to check-out the pregel branch of ArangoDB in order to play with the following examples. Please be advised that the implementation is still in an early phase and very like to change. In this post we will provide a brief introduction to ArangoDB’s Pregel module by guiding you through the implementation of an example algorithm.

(more…)

More info...

ArangoDB Java Driver: Graph Data Manipulation & Queries

With ArangoDB 2.2 the new graph API was released featuring multi collection graphs (see blog). With the new version (2.2.1) of arangodb-java-driver the new graph API is supported. In the following you can find a small example of creating a graph with Java.

For the import via maven and configuring the driver, please read the Basics and Driver Setup. For the following we assume, that arangodbDriver is a configured instance of the driver.

So let’s start the whole thing…

(more…)

More info...

New feature: Multi Collection Graphs

This version is deprecated. Download the new version of ArangoDB

With the 2.2 release we proudly announce that we now offer the feature of multi collection graphs. This allows you to group an arbitrary set of ArangoDB collections into one graph. In order to do this grouping you execute the following steps in arangosh or Foxx: 1. Require the general-graph module and create a graph.

var graph_module = require("org/arangodb/general-graph");
var graph = graph._create("myGraph");
  1. Add collections to be used as vertex collections. If the collections are not yet created, ArangoDB will create them for you.
graph._addVertexCollection("person");
graph._addVertexCollection("publication");
graph._addVertexCollection("university");
graph._addVertexCollection("company");
  1. Create edge definitions between your vertex collections. These definitions create edge collections and allow you to store edges where source and target vertices come from the collections written in the definition. If you give vertex collections unknown to the graph they will be added to the graph. (more…)
More info...

ArangoDB at FOSDEM 2014: Insights & Presentations

FOSDEM is an absolutely open and free conference in Brussels, Belgium. The conference offers an impressive amount of developer rooms discussing a broad range of technical topics, including NoSQL and graphs.

After a funny and productive ArangoDB hackathon weekend Frank and I arrived at FOSDEM on Sunday noon. We were looking forward to the talks in the graph devroom, but unfortunately for us it was not possible to enter the room (it was overfull, indicating a great quality of talks).
At the next speakers change Frank and I managed to slip into the room and could enjoy two inspiring talks by Neo4J.
Afterwards it was my turn to present the graph visualization interface in ArangoDB in a still crowded room (slides).
After my presentation, we handed out all our ArangoDB t-shirts to our fans.

Michael handing out T-Shirts to the audience

Michael handing out T-Shirts to the audience

All together it was a great experience and a really huge amount of people visiting the same conference which impressed me a lot.
Thanks for the team at the Graph Processing devroom for giving me the opportunity to speak and for organizing such a great devroom.
We will meet again next year.

so long,
Michael

More info...

Graph Visualization Techniques | ArangoDB: Insights & Tutorials

Using ArangoDB as your graph database?

you now have the opportunity to visualize your data graphically, using the “Graphs” tab in the admin interface.
In this post i will explain you how to do this:

New to multi-model and graphs? Check out our free ArangoDB Graph Course.

Step 1) Configure the Interface
At first you have to configure the interface to display exactly the data you desire.
For the dataset you have two options:

  1. Select one vertex and one edge collection directly
  2. Select the graph by name created using the graphs api
  3. Manage your graphs

Then the interface offers you several advanced options:

  1. Decide if the graph is directed (only outbound edges are followed) or undirected (all edges are followed). By default we treat your graphs as undirected
  2. Define if you want to start at a random vertex in the graph. By default we will do that.
  3. Then you can define which of your attributes should be used as the label on the verticies. By default we will use the _key for this.
  4. Now lets bring in some colour. You can either use the same attribute for colouring as for labeling, but you can also use a different one
  5. Finally you can define a grouping of new nodes. This is useful to speed up the performance of the interface and to increase the usability. You can define a priority list of attributes here, which is elaborated from top to bottom: All vertices having the first attribute defined are grouped by its value. All other nodes are grouped by the second attribute if present etc.

Use the filter on top to identify the starting point to explore your graph.

(more…)

More info...

Graph Visualization Screencast | ArangoDB: Advanced Techniques

In the latest version of ArangoDB (1.4) we have introduced a new tab in the Admin Interface: Graphs.

You can use this tab to view and modify your graphs stored in ArangoDB.
In this screencast you will get a short introduction on how to use the new system.

(more…)

More info...

Is UNQL Dead? Future of NoSQL Query Languages | ArangoDB Blog 2012

Note: We changed the name of the database in May 2012. AvocadoDB is now called ArangoDB.

UNQL started with quite some hype last year. However, after some burst of activity the project came to a hold. So it seems, that – at least as a project – UNQL has been a failure. IMHO one of the major issues with the current UNQL is, that it tries to cover everything in NoSQL, from key-value stores to document-stores to graph-database. Basically you end up with greatest common divisor – namely key-value access. But with graph structures and also document-structures you really want to supports joins, paths or some sort of sub-structures.

Apart from all the technical and theoretical benefits of SQL and what advantages the underlying theory has to offer, the major plus from an users point of view is that it is readable. You simple can see an SQL statement – be it in C, Java, Ruby – and understand what is going on. It is declarative, not imperative. With other imperative solution, like a fluent interface or a map-reduce, you need to understand the underlying syntax or language. With SQL you only need to understand English – at least most of the time.

And here I think is where UNQL is totally right. We need something similar for the NoSQL world. But it should not try to be a “fits all situation”. It should be a fit for 80% of the problems. For simple key-values stores a fluent-interface is indeed enough. For very complex graph traversals a traversal program must be written. For very complex map-reduces you might need to write a program – but check out Google’s talk (www.nosql-matters.org/program) about NoNoSQL. There they describe why they are developing a SQL-like interface for Map/Reduce.

In my experience most of the time you have a set of collections holding different “types” of documents with some relations between them. One of the biggest advantages of document stores or graph databases is that you can have lists and sub-objects. The problem with SQL is, that it has no good way to deal with these structures. So I believe UNQL would be quite successful if it would concentrate on these strong advantages of NoSQL, instead of trying to unify everything – especially after hear Jan’s talk about a document query language at the NoSQL Cologne UG (an English version is also available).

More info...

Get the latest tutorials,
blog posts and news: