FAQ

What is ArangoDB and for what kind of applications is it designed for?

ArangoDB is a multi-model mostly-memory database with a flexible data model for documents and graphs. It is designed as a “general purpose database”, offering all the features you typically need for modern web applications.

ArangoDB is supposed to grow with the application—the project may start as a simple single-server prototype, nothing you couldn’t do with a relational database equally well. After some time, some geo-location features are needed and a shopping cart requires transactions. ArangoDB’s graph data model is useful for the recommendation system. The smartphone app needs a lean API to the back-end—this is where Foxx, ArangoDB’s integrated Javascript application framework, comes into play.
The overall idea is: “We want to prevent a deadlock where the team is forced to switch the technology in the middle of the project because it doesn’t meet the requirements any longer.”

Back To Top

How does ArangoDB differ from other NoSQL databases like MongoDB, CouchDB and neo4j?

ArangoDB’s feature scope is driven by the idea to give the developer everything needed to master typical tasks in a web application — in a convenient and technically sophisticated way alike.

From our point of view it’s the combination of features and quality of the product which accounts for ArangoDB: ArangoDB not only handles documents but also graphs.

ArangoDB is extensible via JavaScript. Enclosed with ArangoDB you get “Foxx”. Foxx is an integrated application framework ideal for lean back-ends and single page JavaScript applications (SPA).

Multi-collection transactions are useful not only for online banking and e-commerce but they become crucial in any web app in a distributed architecture. Here again, we offer many choices to developers. If transactions are needed, developers can use them. If, on the other hand, the problem requires a higher performance and less transaction-safety, developers are free to ignore multi-collections transactions and to use the standard single-document transactions implemented by most NoSQL databases.

Another unique feature is ArangoDB’s query language AQL — it makes querying powerful and convenient. AQL enables you to describe complex filter conditions and joins in a readable format, much in the same way as SQL.
For simple queries, we offer a simple query-by-example interface and specialized low-level APIs.

Back To Top

Is ArangoDB production ready?

Starting with version 1.0 (spring 2012) ArangoDB was ready to be used in production, it is fully tested and documented.

Back To Top

For which use cases is ArangoDB not the perfect choice?

Though ArangoDB as a universal approach, there are edge cases where we don’t recommend ArangoDB. Actually, ArangoDB doesn’t compete with massively distributed systems like Cassandra with thousands of nodes and many terabytes of data.

Back To Top

What licence does ArangoDB have?

ArangoDB is published under the Apache 2.0 license. This means essentially that you can use it free for non-commercial and commercial use.

Back To Top

What languages can I use to work with ArangoDB?

For the list of programming language specific client libraries have a look at the Drivers page.
Your language of choice is not supported? Use the HTTP API.

And: we are always happy about new implementations, so if you decide to write something: please let us know and contribute!

Back To Top

Why Foxx and not Node?

Node.js is a single-threaded environment, while ArangoDB is a multi-threaded environment. Foxx was created to run in a multi-threaded environment. Integrating Node.js would therefore be difficult.

If we would use Node instead of Foxx we could not scale alongside ArangoDB because you would create independent databases when running multiple instances. You could also put all of the instances into a cluster, but this would create a lot of data exchange and we would lose the advantages of ‘running inside the database’. In Foxx you can configure the number of Foxx threads on each of the ArangoDB coordinators. To add more capacity, you can add a machine and configure a coordinator on it with the required number of threads.

Also Foxx is rather an extension system than an application server and it is possible and reasonable to run Foxx behind Node.js. Foxx does not target all use cases of Node.js.

Back To Top

Which data models does ArangoDB support?

You can model your data in several ways:

  • in key/value pairs
  • as collections of documents
  • as graphs with nodes, edges, and properties for both

ArangoDB as key value store

Did you ever use Memcache? Then you are already familiar with the concept of a key-value store: A unique key is assigned to a value which is in the simplest form a string (or a string with some structure like a JSON document … you get the idea).

ArangoDB as a document store

In a “document store” the data is encapsulated in text documents. You can roughly compare a document to a row in a table in a relational database though documents are not as rigid. The documents are not required to follow the same schema, e.g. your first document may have the attributes “name” and “hobbies” while your second document only has the “name” attribute. Nevertheless you can easily query all documents for “hobbies”.

Note from the name-hobbies example that you do not have to follow the rules of normalization: in a relational database you would probably create a table “hobbies” and another one for “users”, in a document store you would store it most likely in the same document.

Being schema free does not mean chaos! You can organize your documents into collections: a collection consists of a number of documents, e.g. all documents with user data.

In ArangoDB the documents are encoded in JSON. You can also save binary data base64-encoded.  Unlike in other NoSQL databases, ArangoDB allows to query data across collections (similar to “joins” in SQL).

ArangoDB as a graph database

A graph database uses graph structures with nodes, edges and properties to represent and store data. This means that you can easily model even complex relationships between single documents.

Let’s say you want to implement a feature “people who like product X also like product Y”. You could do that in ArangoDB by creating collections for “people” and “products”, and an additional edges collection to store the relationships between them. Besides linking documents from the other collections, the edge documents can have any properties you like. What you’ll end up with is a so-called property graph, which you can then query.

For querying graphs, ArangoDB offers a few possibilities:

  • traversals in JavaScript, running on the server
  • using graph functions from inside ArangoDB’s query language, AQL
  • using the low level graph REST APIs to access node- or edge-specific data, or modify them
  • from Java: using Gremlin, as there is a Blueprints implementation for ArangoDB

Back To Top

What tools can be used to access data?

You can access data in ArangoDB

  • using the general HTTP REST API via curl/wget, or your browser
  • via the ArangoDB shell (“arangosh”)
  • using a programming language specific client library

ArangoDB comes with a web based user interface and its own HTTP server. Open http://localhost:8529/_admin in your browser and – voilá – there it is.

The ArangoDB-Shell can be invoked after the server has been started with

Without arguments it will try to connect the server on port 8529 on localhost.

For the list of programming language specific client libraries check out the drivers page.

Back To Top

Does ArangoDB support SQL?

ArangoDB does not support SQL. SQL is not well-suited to cover the different data models in ArangoDB.
For example, think of nested list structures inside a document, graph traversals etc. There is no way to query such structures in standard SQL, and deviating from standard SQL does not make much sense.

ArangoDB brings its own declarative language called AQL (ArangoDB Query Language). If you are familiar with SQL you will probably feel quickly at home with ArangoDB. For syntax examples see the SQL to AQL comparison.

Back To Top

How do you query ArangoDB?

ArangoDB offers various options for getting data out of the database. It has a REST interface for CRUD operations and also allows “querying by example”. “Querying by example” means that you create a JSON document with the attributes you are looking for. The database returns all documents which look like the “example document”.

Expressing complex queries as JSON documents can become a tedious task—and it’s almost impossible to support joins following this approach. We wanted a convenient and easy-to-learn way to execute even complex queries, not involving any programming as would be necessary in a map/reduce-based approach.

As ArangoDB supports multiple data models including graphs, it was neither sufficient to stick to SQL nor to simply implement UNQL (another query language idea that was around when ArangoDB came out). We ended up with the “ArangoDB query language” (AQL), a declarative language similar to SQL and JSONiq. AQL supports joins, graph queries, list iteration, results filtering, results projection, sorting, variables, grouping and aggregation.

Of course, ArangoDB also offers drivers for all major programming languages. The drivers wrap the mentioned query options following the paradigm of the programming language and/or frameworks like Ruby on Rails.

Back To Top

How fast is ArangoDB?

We did some performance tests and published the results in our blog. The test setup is on Github so that everyone can reproduce and verify our benchmark results. Use ArangoDB Performance as a starting point.

But to quote Jan Lenhardt from CouchDB: “NoSQL is not about performance, scaling, dropping ACID or hating SQL — it is about choice. As NoSQL databases are somewhat different it does not help very much to compare the databases by their throughput and chose the one which is faster. Instead—the user should carefully think about his overall requirements and weight the different aspects. Massively scalable key/value stores or memory-only systems can archive much higher benchmarks. But your aim is to provide a much more convenient system for a broader range of use-cases—which is fast enough for almost all cases.”

Back To Top

What are the server requirements for ArangoDB?

ArangoDB runs on Linux, OS X and Microsoft Windows.
It runs on 32bit and 64bit systems, though using a 32bit system will limit you to using only approximately 2 to 3 GB of data with ArangoDB.
We thus strongly recommend using ArangoDB on a 64bit system and SSD hard disks.

ArangoDB is a “mostly memory” database, which means that it appreciates RAM very much and is most performing when it is not forced to swap data to the hard disk.

So how much RAM do you need? This depends on the size and structure of your data: Your application will access one or many collections (think of collections as denormalized tables for the time being). Once you open a collection the indexes for this collection are created in the RAM and the data is loaded into the RAM using memory-mapped files. If your collections are bigger than your RAM, the operation system will be forced to swap data in and out of the swap space.

Back To Top

What language is ArangoDB written in?

ArangoDB is mainly written in C and C++. It also uses Google’s V8 engine to run JavaScript code on the server-side.

The server actions and many of the high level functionalities are written in JavaScript.

Back To Top

Does ArangoDB support transactions?

ArangoDB provides support for user-definable transactions. Transactions in ArangoDB are atomic, consistent, isolated, and durable (ACID).

These ACID properties provide the following guarantees:

  • The atomicity principle makes transactions either complete in their entirety or have no effect at all.
  • The consistency principle ensures that no constraints or other invariants will be violated during or after any transaction.
  • The isolation property will hide the modifications of a transaction from other transactions until the transaction commits.
  • Finally, the durability proposition makes sure that operations from transactions that have committed will be made persistent. The amount of transaction durability is configurable in ArangoDB, as is the durability on collection level.

ArangoDB transactions are different from transactions in SQL. In SQL, transactions are started with explicit BEGIN or START TRANSACTION commands.
Following any series of data retrieval or modification operations, an SQL transaction is finished with a COMMIT command, or rolled back with a ROLLBACK command. There may be client/server communication between the start and the commit/rollback of an SQL transaction.

In ArangoDB, a transaction is always a server-side operation, and is executed on the server in one go, without any client interaction. All operations to be executed inside a transaction need to be known by the server when the transaction is started. This is achieved by the user shipping the transaction declaration to the server (or having it stored there already if the transaction is going to be run many times) and executing it there.

Transactions in ArangoDB can span multiple operations, even on multiple collections.

Read more about transactions in our documentation.

Back To Top

What durability guarantees does ArangoDB offer?

ArangoDB stores all data in collections. Collections consist of memory-mapped data-files, so all data will be saved to disk.
The way of disk synchronization is configurable though: eventual or immediate.

The choice between eventual or immediate synchronization can be made on a per-collection level, and also on a per operation level:

  • by default, ArangoDB uses the eventual way of synchronization: it will accept any data-modifying operation and return to the caller when the operation system confirms the write operation was successful. That does not guarantee immediate disk synchronization, though ArangoDB permanently synchronizes data to disk in a background thread. In this setting, there is the possibility of a data loss between the disk write operation and the asynchronous synchronization.
  • optionally, collections and individual write operations can be configured to be synchronized immediately. They will only return to the caller after a successful disk synchronization. In this setting, there is full durability (at least the operating system confirmed the data was synchronized to disk – as usual there may be subtleties with filesystem and operating system configuration which are outside of ArangoDB’s reach).

From the durability point of view, immediate synchronization is of course better, but it means performing an extra system call for each operation. On systems with slow sync/msync, this might be a big performance penalty. Thus ArangoDB leaves the user the choice. There might also be collections of different importance: for example, a collection that works a cache, with data that can be recalculated when needed can be configured to have lower durability than collections with more important data. In the end, it’s all up to user to decide.

Back To Top

How do shapes work in ArangoDB?

Documents that have similar structure (i.e., that have the same attribute names and attribute types) can share their structural information. The structure (called “shape”) is saved just once, and multiple documents can re-use it by storing just a pointer to their “shape”.
In practice, documents in a collection are likely to be homogenous, and sharing the structure data between multiple documents can greatly reduce disk storage space and memory usage for documents.

Back To Top

Cursors in ArangoDB vs. cursors in MongoDB

Both ArangoDB and MongoDB return data as a cursor after a successful find operation. Yet there is a significant difference between the two databases: Let us assume that you first fetch a large result set from a collection and remove some of the data from the collection afterwards, before you have fully iterated over the cursor.

ArangoDB will fill the cursor with the result of your query and won’t touch the result of it even if you removed the data from the collection in the meantime. MongoDB seems to fetch data into the cursor incrementally so the result set is affected by the change in the collection. Both approaches have their advantages and disadvantages – just make sure that you know how it works.

Back To Top

How does authentication work in ArangoDB?

Activating authentication for the server

The ArangoDB server can be configured to require authentication, or to not require it.

What mode you use the server in is up to you:

  • Running ArangoDB without authentication will allow everyone access to all collections and documents in the database, plus all API functions. This is convenient for development, but would be a security risk in production.
  • To run ArangoDB in production, you would enable the authentication feature of the server. The authentication feature will make the server require authentication for every incoming request. Only requests of authenticated users will then be allowed, and all other requests will be answered with an HTTP 401 error (Unauthorized) by the server.

The server authentication can be activated and deactivated using the option “server.disable-authentication”. The option can be passed to arangod on the command-line or be put in the server’s configuration file (arangod.conf).

For example, to start the server with authentication turned on, use:

Managing users

By default, ArangoDB comes with a user “root” that has a password of “” (empty string). Before using ArangoDB in production you might want to either remove this user, change its password, or deactivate it.

You can do so in arangosh, the command-line shell that comes with ArangoDB. Please note that you need to use the arangosh binary and not the browser-based admin interface.

In arangosh, you can issue the following commands to manage users:

If you want to remove the root user, the commands would be:

All user-related commands are listed in detail here:

Manual

Using authentication with arangosh and arangoimp

arangosh will by default connect to the server using the “root” user. To use a different user with arangosh, use the –server.username option for arangosh, e.g.:

You will then be prompted to enter the user’s password. The same option is available for arangoimp, the import tool.

You can also specify the user password directly on the command-line, though this might also be a security risk (the password might be stored in the shell history file!).

Back To Top

How can I import data from files into ArangoDB?

The most convenient method to import a lot of data into ArangoDB is to use the arangoimp command-line tool. arangoimp allows you to import data records from a file into an existing database collection.

Let’s assume you want to import user records into an existing collection named “users” on the server.

Importing JSON-encoded data

Let’s further assume the import data at hand is encoded in JSON. We’ll be using these example user records to import:

To import these records, all you need to do is to put them into a file (with one line for each record to import) and run the following command:

This will transfer the data to the server, import the records, and print a status summary.

As the import file already contains the data in JSON format, attribute names and data types are fully preserved. As can be seen in the example data, there is no need for all data records to have the same attribute names or types. Records can be homogenous.

Importing CSV data

arangoimp also offers the possibility to import data from CSV files. This comes handy when the data at hand is in CSV format already and you don’t want to spend time converting them to JSON for the import.

To import data from a CSV file, make sure your file contains the attribute names in the first row. All the following lines in the file will be interpreted as data records and will be imported.

The CSV import requires the data to have a homogenous structure. All records must have exactly the same amount of columns as there are headers.

The cell values can have different data types though. If a cell does not have any value, it can be left empty in the file. These values will not be imported so the attributes will not “be there” in document created. Values enclosed in quotes will be imported as strings, so to import numeric values, boolean values or the null value, don’t enclose the value into the quotes in your file.

We’ll be using the following import for the CSV import:

The command line to execute the import then is:

Running the import programmatically

arangoimp uses ArangoDB’s HTTP API to perform the actual import, and so can you.

The HTTP API provides the import action at
/_api/import

You need to send an HTTP POST to this URL and put the import data into the request body. The target collection name needs to be specified in the “collection” URL parameter.

Back To Top

How can I contribute?

As in all open source projects ArangoDB thrives on the contribution of the user community. You can help making ArangoDB better in a couple of ways:

  • install and use it and report bugs and difficulties on Github
  • send us patches using Github’s excellent social coding capabilities
  • join the Google group to discuss implementation details and future versions of ArangoDB
  • contribute API client libraries and fancy add-ons

And: We are happy to support students and graduate students interested in writing their bachelor/master thesis or dissertation in the context of alternative databases.

Back To Top

ArangoDB stores indexes in memory only

A database in ArangoDB can be larger than the available main memory.
Although ArangoDB stores all data on disk for durability reasons, it is a “mostly main memory database”. This means that the working set (the set of pages that are frequently accessed) should fit into main memory. It’s left to the operating system to determine the working set and to transfer pages between main memory and secondary storage. The data that are currently not needed are kept only on secondary storage.

This is in principle also true for indexes: a part of the index (the working set) could be in main memory and the remainder could be on secondary storage if not used frequently.

Unused index data may be swapped to the operating system swap space. One consequence of this is that this information cannot be used when the server goes down and is restarted. When the server is restarted all the indexes need to be rebuilt from the raw document data and this may take a while.

Indexes are typically much smaller than the raw document data. So an index typically puts much less pressure on the main memory than even just the raw document data in the working set. It may also be beneficial if the operating system detects that index data are not used much or at all, and may use the costly physical memory for other, more relevant data instead.

Back To Top

How much space does ArangoDB need in the file system?

To ensure data durability, ArangoDB stores all data on disk.
Any data-modification operation will be stored in ArangoDB’s write-ahead log first. Operations that made it into the write-ahead log and have been committed will eventually be moved from there into collection data-files. During this garbage collection process, only those operations will be moved that are still relevant. Operations that have become obsolete at the point of garbage collection will be ignored.

Data in collections are stored more efficiently than in the write-ahead logs. ArangoDB will inspect documents and try to find identical structures in them. Each unique structure is only saved once to save storage space. This also means that document attribute names are not stored over and over again with each document.

After garbage collection is finished, data are contained in the collection data-files and the collected write-ahead log-file will be removed.

Both the write-ahead log and the collection log-files are sequences of memory-mapped files. Files are allocated in blocks of 32 MB by default. ArangoDB will try to keep a few spare log-files by default, so even an empty ArangoDB will have some log-files allocated.

Log-file sizes and the number of spare log-files to keep are configurable parameters. Log-file sizes can also be adjusted on a per-collection level.

The log-files will be mapped into the server’s virtual memory. It is up to the operating system to determine how much of the log-files memory-mapped regions actually remain in physical memory. Normally, the operating system will ensure that frequently accessed regions of the files will reside in physical memory, too.

Please note that ArangoDB also starts a few V8 threads (V8 is the JavaScript engine in ArangoDB) that also use virtual memory.

In an empty ArangoDB, V8 actually accounts for most of the virtual memory usage. For example, on some 64 bit computer, the V8 threads in total consumed about 4 GB of virtual memory. On a 32 bit computer, the V8 threads used about 600-700 MB in an empty ArangoDB.
This is in contrast to the ArangoDB process only using about 114 MB of physical RAM and about 132 MB of disk space, mostly for the pre-allocated and still empty write-ahead log-files. Overall, the reported virtual memory usage may be much higher than the usage of “real” physical resources.

Back To Top

Comments are closed.