ArangoDB - One database to rule them all for Innoplexus’ KOL discovery and management platform
Akshesh Doshi, Software & Data Engineer, Innoplexus
Requirement - Varied features with high performance
Innoplexus offers a KOL discovery and management platform – kPlexus™, which offers analysis over around 10 million profiles and their network in real-time. Due to the size and continuously growing nature of the dataset, we required a fast and distributed data store to serve our needs. Furthermore, due to the varied nature of the features it offers such as network visualization, search, profile data, etc., kPlexus™ had a huge dependency on various data stores and search engines including MongoDB, Elasticsearch, Neo4j & Redis. Due to the maintenance overhead that using so many data stores posed and the conventional problems that come along with data redundancy, we saw a dire need for a simpler solution that could serve our use-cases without having any impact on the product’s performance.
The most performance-intrinsic, and probably the most exciting, insight provided by kPlexus™ was how top KOLs are connected with each other – that too in real-time! This was initially served by a copy of the data, stored in a graph format, in Neo4j. Due to ArangoDB’s capability of storing graphs together with documents, we were able to avoid storing our data both in a document store and a graph database. To add a cherry to the cake, our benchmarking of ArangoDB’s performance as a graph database, against Neo4j, revealed 4-5x faster performance for few of our complex queries with MMFiles storage engine.
And this is what we finally built:
Moreover, ArangoDB’s full-text feature obviated our need to index data into Elasticsearch for a search feature that we provide. Also, thanks to Jan Steemann whose code snippets from the official cookbook documentation served us as a base code to build further upon.
Adding more, the ability of ArangoDB to perform JOINS via AQL allowed us to model the data in a much cleaner way, eliminating redundancy as much as possible.
All in all, with the help of ArangoDB, we were able to maintain our application with less number of developers and saved on our server infrastructure cost.
Experience with ArangoDB in production?
No more copies of the data and still a faster response time – a rare combination offered by ArangoDB
Although we were facing issues with the cluster mode initially, the constant support from the team, bug fixes and frequent stability patches from the team soon gained our trust. Our application stores around 75GB of data owing to tens of millions of nodes & half-a-billion edges.
Another point to be noted here is that although we achieved a much higher performance, against Neo4j, with ArangoDB MMFiles storage engine the same results were not reproduced with RocksDB so we have sticked to MMFiles in production.
Importance of key characteristics of ArangoDB
|Feature||not important||important||very important|
|AQL / JOINs||x|
A very big thanks to Akshesh for taking the time to share this use case with the community!