home shape

Data Modeling: MongoDB vs ArangoDB | ArangoDB Blog

MongoDB is a document DB whereas ArangoDB is a multi-model DB supporting documents, graphs and key/values within a single database. When it comes to data modeling and data querying, they pursue somewhat different approaches.


In a Nutshell: In MongoDB, data modeling is “aggregate-oriented”, avoiding relations and joins. On the other side, everybody has probably used relational databases which organize the data in tables with relations and try to avoid as much redundancy as possible. Both approaches have their pros and cons. ArangoDB is somewhat in-between: You can both model and query your data in a “relational way” but also in an “aggregate-oriented way”, depending on your use case. ArangoDB offers joins, nesting of sub-documents and multi-collection graphs.


Use Case Example: A Product Wish List

Imagine you have a simple web shop application. There a products which have pictures, a price, a description and product specific attributes. Customer can order those products. Each order must be archived with both the ordered products and the customer data at that specific point in time.

Additionally customers can put products on a wish list. The shop will contain products that are not yet released but can be pre-ordered. If a customer put one of those products on the wish list the system should inform the customer some time before the actual release date to allow her to place an order. Additionally the wish list should indicate price changes of the products, if the products are removed or have been updated.

Modeling

We will assume the modeling of the products and customers is already done. Both will benefit from the schema flexibility offered by MongoDB and ArangoDB. For this blog post we will focus on the modeling of the wish list feature.

Each entry on a wish list will require at least the following information:

  • The customer this wish list belongs to
  • A list of products on the wish list
  • For each product the price at the point it was put on the wish list

In a relational database this will eventually result in a table similar as the following:

CREATE TABLE `wishlists` (
  `id` INT(11) NOT NULL AUTO_INCREMENT,
  `customer_id` INT(11) DEFAULT NULL,
  `product_id` INT(11) DEFAULT NULL,
  `date_added` DATETIME DEFAULT NULL,
  `price_when_added` INT(11) DEFAULT NULL,
  `note` TEXT,
  PRIMARY KEY (`id`)
)

Again in MongoDB and ArangoDB we can take advantage of embedding the list of products in a wish list instead of defining a wish list based on the rows with the same customer id:

{
  "customer_id": 23,
  "products" [
      { "id": 42, "price": 100, "added": "2014-10-15 15:56", "note": null },
      { "id": 101, "price": 20, "added": "2014-09-08 11:35", "note": null }
    ]
}

This allows us to even have multiple wish lists per customer without much additional effort (compared to the relational approach).

On top of the wish list we want to implement a feature to inform the customer whenever a product is on the brink of its release. Maybe they want to order it now. For this purpose we need a list of customers with their email address and a list of the products that will be released at a specific date. Since the wish list data model is based on a relational approach the solution involves some joins in SQL. Overall the task should be straight forward. In the case of MongoDB we don’t have something like joins. We could use the Aggregation Framework to accomplish this task. While this reduces the situation where a client side join is required we eventually would end up with one. And even in that case it is not recommended to link between documents but rather implement it based on some kind of caching layer. But still this would lead to cache invalidation issues and those are known to be one of the two hard things in IT.

ArangoDB can use joins in AQL and thus we could achieve a solution similar to the SQL one. The following query would select all customers to be informed along with a list of products:

FOR p IN products
  FILTER p.release_date == '2014-11-11'
  LET customer_list = (
    FOR w IN wishlist
      LET product_ids = (
        FOR wp IN w.products
          FILTER wp.id == p.id
          RETURN wp.id
      )
    FILTER LENGTH(product_ids) > 0
    RETURN { customer_id: w.customer_id, products: product_ids }
  )
  RETURN customer_list

And while this query is already fairly complex we didn’t even select the email address of the customer or the products details. Personally I like AQL a lot better then using MongoDBs JSON based query system. But that is just personal taste. One could easily argue that both solutions will not be very elegant nor easy to understand. In the end a document based approach can do only so much.

The Graph Solution

Luckily ArangoDB can not only handle documents but graphs too. And modeling this problem as a graph will make life so much easier: First of all we would get rid of a collection where we need to store relational data. The relation will be modeled as edges between customers and products. And since edges are just documents we could store all required information directly along with the edge (i.e. when the product was added and what the price at this point was). But the fun part begins when we start querying the graph:

FOR c IN customers
  LET product_list = GRAPH_NEIGHBORS("shop_graph", c, {edgeCollectionRestriction: "wishlist", neighborExamples: { release_date: "2014-11-11" }})
  FILTER LENGTH(product_list) > 0
  RETURN {
    customer: c,
    products: product_list[*].vertex
  }

Not only will the data model be more natural this way but querying the data is more simple as well.

An additional plus would be that now you have a graph at hand you could start to implement recommendation features on top of it 😉

Frank Celler

Frank Celler

Frank is both entrepreneur and backend developer, developing mostly memory databases for two decades. He is the CTO and co-founder of ArangoDB. Try to challenge Frank asking him questions on C, C++ and MRuby. Besides Frank organizes Cologne’s NoSQL group & is an active member of NoSQL community.

5 Comments

  1. Jafar on November 7, 2014 at 5:04 pm

    this is great. please make more just like it.

  2. Bryan Livingston on November 8, 2014 at 3:59 am

    Who actually chooses to use all caps keywords in this day and age? Its a relic of the 60s and is considered very bad taste these days.

    • JPatrick Davenport on June 2, 2015 at 5:20 pm

      I presume you are referring to the AQL keywords. I’ve never seen anyone say such a practice is in bad taste. In fact, I’ve seen many people think it’s good since it helps provide quick visual cues about which elements are Language specific verse Domain specific.

      • jsteemann on June 2, 2015 at 6:36 pm

        And the good news is that if anyone thinks upper-case keywords are bad practice, they can use the lower-case variants, too. In AQL, `LET foo = 1 RETURN foo` is equivalent to `let foo = 1 return foo`.

  3. 廖师虎 on November 12, 2014 at 2:47 am

    Good example! Thanks!

Leave a Comment





Get the latest tutorials, blog posts and news: