Meet PrivacyPerfect: Privacy bookkeeping for GDPR
PrivacyPerfect is a SaaS product that helps organisations comply with administrative obligations of the European Union’s General Data Protection Regulation (GDPR), including data protection impact assessments, processing activities, and data breaches. As of mid-2019, PrivacyPerfect has more than 150 customers across a variety of industries, including aviation, banking, government, insurance, retail, and telecom.
The Challenge: Finding the right data model to minimize architectural complexity
The idea for PrivacyPerfect came from a couple of experts in the field of privacy law. When they began building the first version of the product, the software development company they were working with created an event-sourcing system. However, as the company and its customer base grew, this setup soon became too complex — every time a user entered a record into a field, separate events needed to be pushed to various data sources to build the latest view.
This had a domino effect on the company’s ability to execute and grow — not only was the architecture time consuming for the development team to maintain, but it also slowed feature development, which hindered customer acquisition and sales efforts.
Despite the progress they had made, PrivacyPerfect CTO Jaco De Vroed made the tough decision to start over. “Complexity can always be introduced if it’s needed, but if I don’t need it, I don’t want it,” he asserts.
De Vroed and his team began to put together a list of requirements for the next iteration of their product. They concluded quite rapidly that the business data they had fit the graph model quite well, as it consists of a lot of highly-connected data points. “You could do it all in a relational database, but when we started drawing up a model for that, it became quite a headache after an hour because of all the joined tables we would need,” recalls De Vroed.
Other requirements for PrivacyPerfect were:
- A multi-tenant solution to accommodate a separate database per customer, with automatic database provisioning
- Preference for open source software, with an active community
The Solution: An open source database that supports document and edge collections
Once PrivacyPerfect settled on implementing their data in the graph model and had their requirements defined, they began to explore solutions that would fit their needs — which eventually led them to ArangoDB.
At the time, ArangoDB wasn’t very well known, so De Vroed and his team first looked into who else was using it, as well as if there was a healthy community and if the software was actively maintained and developed.
The next step was to protype their data model in ArangoDB to see that it worked, was performant, and that their more complicated queries, such as access rights filtering, functioned properly.
“We had to be sure, especially for our customers who have many thousands of privacy records, that when a user requests the list of records he has access to — that this query in particular remains fast, as it needs to traverse a lot of data taking into account the user’s access rights,” explains De Vroed. “We were pleased to discover this worked in a fairly simple AQL query.”
As PrivacyPerfect continued prototyping its data model, they were pleased to discover that all of its data could be put into ArangoDB without dozens of many-to-many tables that would become unreadable — just clear functional escalations and document collections. De Vroed in particular also appreciated how logical it was to compose queries in AQL.
It soon became clear they would build the next iteration of PrivacyPerfect using ArangoDB.
“We ultimately picked ArangoDB because our data fits really well in a graph data model, ArangoDB is open source and active, a complete product, and easy to use,” De Vroed concludes.
The Implementation: Mapping a Privacy Register as a Graph (among other things)
Each privacy register has a number of privacy records that belong to an organization. The privacy register serves as the nucleus of PrivacyPerfect’s graph data model, with all the other data items mapped to it, such as personal data items, data subject category, retention terms, and processing category. Each of these attributes are managed as separate entities in ArangoDB.
“We see a privacy register basically as a graph, that has things like personal data items, data sources, personal data categories, organisations, and users as vertices,” De Vroed explains. “A privacy record (for instance, the processing of personal data) is also a vertex, that has many edges to the other vertices.”
In addition to attributes of the privacy register and records mapped to it, there is additional data mapped to the privacy register through separate edge collections, including organization hierarchy, user management, and versioning.
How PrivacyPerfect models a privacy record.
What a privacy record looks like in the front-end of PrivacyPerfect’s application.
PrivacyPerfect’s backend has a microservices-based architecture, and is written in Scala. The services communicate with each other using RabbitMQ. Its front end is a single-page application based on React.
They are running ArangoDB in a Kubernetes cluster in a primary/replica setup, and model their data in the graph data model, consisting of document collections holding JSON documents, and edge collections.
Faster and more flexible product development
Due to switching to ArangoDB, PrivacyPerfect was able to greatly simplify its application architecture. This allowed them to speed up product development considerably, allowing them to better meet customer demand.
And, because they ended up modeling their data using graph in a schemaless database system, PrivacyPerfect now has more possibilities to build features that would have been difficult in the past. They are currently exploring how to implement workflow features, as well as incorporate more extensive visualizations in their application.
A piece of advice: Explore different data models
When asked what advice he would give to someone just getting started with ArangoDB, De Vroed said, “Try to model your data in such a way that it fits a graph model, and build a prototype with real data to see if AQL can get out everything you need.”
Why does he suggest this?
“There are various ways to model your data in order to solve a business problem. We tried modeling PrivacyPerfect’s data into the standard relational model, and you can do it, but it just didn’t match what we wanted to do with the data as well as the graph model did.
“Try modeling your data in multiple ways — either a relational and/or a graph model, or even another model, because there’s others out there — and see what fits your needs best.
“If it’s a graph model, or also when you have to go with a lot of documents, I would quite confidently go with ArangoDB.”