Migration from ArangoDB 2.8 to 3.0
I want to use ArangoDB 3.0 from now on but I still have data in ArangoDB 2.8. I need to migrate my data. I am running an ArangoDB 3.0 cluster (and possibly a cluster with ArangoDB 2.8 as well).
The internal data format changed completely from ArangoDB 2.8 to 3.0,
therefore you have to dump all data using
arangodump and then
restore it to the new ArangoDB instance using
General instructions for this procedure can be found in the manual. Here, we cover some additional details about the cluster case.
Dumping the data in ArangoDB 2.8
Basically, dumping the data works with the following command (use
from your ArangoDB 2.8 distribution!):
arangodump --server.endpoint tcp://localhost:8530 --output-directory dump
or a variation of it, for details see the above mentioned manual page and
If your ArangoDB 2.8 instance is a cluster, simply use one of the
coordinator endpoints as the above
Restoring the data in ArangoDB 3.0
The output consists of JSON files in the output directory, two for each
collection, one for the structure and one for the data. The data format
is 100% compatible with ArangoDB 3.0, except that ArangoDB 3.0 has
an additional option in the structure files for synchronous replication,
namely the attribute
replicationFactor, which is used to specify,
how many copies of the data for each shard are kept in the cluster.
Therefore, you can simply use this command (use the
your ArangoDB 3.0 distribution!):
arangorestore --server.endpoint tcp://localhost:8530 --input-directory dump
to import your data into your new ArangoDB 3.0 instance. See
for details on the available command line options. If your ArangoDB 3.0
instance is a cluster, then simply use one of the coordinators as
That is it, your data is migrated.
Controlling the number of shards and the replication factor
This procedure works for all four combinations of single server and cluster for source and destination respectively. If the target is a single server all simply works.
So it remains to explain how one controls the number of shards and the replication factor if the destination is a cluster.
If the source was a cluster,
arangorestore will use the same number
of shards as before, if you do not tell it otherwise. Since ArangoDB 2.8
does not have synchronous replication, it does not produce dumps
replicationFactor attribute, and so
use replication factor 1 for all collections. If the source was a
single server, the same will happen, additionally,
will always create collections with just a single shard.
There are essentially 3 ways to change this behavior:
- The first is to create the collections explicitly on the
ArangoDB 3.0 cluster, and then set the
--create-collection falseflag. In this case you can control the number of shards and the replication factor for each collection individually when you create them.
- The second is to use
--default-replication-factor(this option was introduced in Version 3.0.2) respectively to specify default values, which are taken if the dump files do not specify numbers. This means that all such restored collections will have the same number of shards and replication factor.
- If you need more control you can simply edit the structure files
in the dump. They are simply JSON files, you can even first
use a JSON pretty printer to make editing easier. For the
replication factor you simply have to add a
replicationFactorattribute to the
parameterssubobject with a numerical value. For the number of shards, locate the
shardssubattribute of the
parametersattribute and edit it, such that it has the right number of attributes. The actual names of the attributes as well as their values do not matter. Alternatively, add a
numberOfShardsattribute to the
parameterssubobject, this will override the
shardsattribute (this possibility was introduced in Version 3.0.2).
Note that you can remove individual collections from your dump by
deleting their pair of structure and data file in the dump directory.
In this way you can restore your data in several steps or even
parallelize the restore operation by running multiple
processes concurrently on different dump directories. You should
consider using different coordinators for the different
processes in this case.
All these possibilities together give you full control over the sharding layout of your data in the new ArangoDB 3.0 cluster.