SmartGraphs & Disjoint SmartGraphs
When the data set for a graph exceeds the limits of what you can host on a single instance of ArangoDB, you need to scale. However, sharding a graph through a cluster introduces new issues. When using standard graphs, traversals can involve many network hops between database servers. As edges carry the traversal onto different machines, performance worsens.
SmartGraphs and Disjoint SmartGraphs solve this issue by optimizing the distribution of data between the shards, reducing the number of edges that require network hops to other servers.
Scaling with Graphs
The Community Edition of ArangoDB can handle large data-sets on a single instance, allowing you to scale vertically without issue. It can also handle scaling horizontally to a cluster with all three data models. However, you may begin to encounter performance issues when, in scaling horizontally, you shard a graph through the cluster.
Picture a graph that handles a large dataset, such as what you might find in say an IoT, finance, communications, healthcare or genomics application. The natural distribution of data involves a series of highly interconnected communities with many edges running between these communities.
Traversing graphs on this scale can take you through billions or even trillions of vertices. That amount of data is far too much to fit on a single machine and whenever an edge takes you from one machine to another, performance bottlenecks on the network connection. If an edge on the second machine takes you back to the first or out to a third, it grows worse still. The more network hops the traversal requires, the greater the network latency, which can grow very expensive compared to in-memory computations. Eventually, performance degrades to a point where it’s no longer suitable for your given use case.
Scaling with SmartGraphs
Performance issues when traversing sharded graphs relate to network latency. The more network hops your traversal requires, the less benefit you get from horizontal scaling. With ArangoDB Enterprise Edition you benefit from SmartGraphs, solving the network latency issues of traversals by using the smartness of your application layer.
Graphs know nothing of themselves. But, your application knows a lot about the graph. In many data-sets there are highly interconnected communities, but few connections between these communities. For instance, a set covering your customers, regions or any other logic you apply to organize your graph at the application layer can in turn be used in sharding the graph through the cluster.
SmartGraphs use the smartness of your application layer to optimize how it shards data through the cluster. For instance, customer ID, regions or any other logic that fits into your main queries. With this smartness, you can shard highly connected communities within your graph to specific instances.
By optimizing the distribution of data, SmartGraphs reduce the number of network hops traversals require. Internal tests show a 40-120x performance gain when traversing sharded graphs.
For some use cases you can optimize even further with Disjoint SmartGraphs. Disjoint SmartGraphs is an optimization for use cases which have to deal with either large hierarchical graphs or holistic analytics against multiple customer graphs. In both cases you have clearly separated branches in your graph dataset.
Disjoint SmartGraphs enables the automatic sharding of these branches and prohibits edges connecting them. This allows the query optimizer to push the whole query execution down to each DBserver and greatly improve performance for graph queries like traversals, pattern matching, shortest and k-shortest paths.
ArangoDB Enterprise Edition users can now work on complete new use cases or further optimize current graph-based applications. If you’d like to know more contact us and book a meeting with our technical experts.