The fans of modern and agile software development usually propose to use schemaless database engines to allow for greater flexibility, in particular during the early rapid prototyping phase of IT projects. The more traditionally minded insist that having a strict schema that is enforced by the persistence layer throughout the lifetime of a project is necessary to ensure quality and security.
In this post I would like to explain briefly, why I believe that both groups are completely right and why this is not so paradoxical as it sounds at first glance. I am one of the developers of ArangoDB, which is a multi-model NoSQL database, by which I mean an engine that is a document store, a key/value store as well as a graph database with a query language that allows to use and indeed mix all three data models in queries.
As a document store, ArangoDB is schemaless, which is usually very convenient in the beginning of a software project, where the actual schema is not yet completely clear and subject to frequent changes. Obviously, at any given time in a project, the developers actually have a concrete schema in mind, the only problem is, that it undergoes frequent changes, in particular when using a more agile software development style. With a schemaless database one can tackle these changes in many different ways:
- one can migrate (or indeed erase) the data for every change
- one can make the application client code aware of multiple versions of the schema and teach it to work well with different document types
- one can migrate the data lazily with each update or replacement of a document.
None of these approaches is “right” or “wrong“, but different approaches might be the best in different situations.
Later in the development cycle of most applications the schema becomes more and more fixed and undergoes less changes. In these later phases the classical arguments for schema validation apply again and often security and stability concerns counter flexibility arguments.
In the end, when one has customized the whole API for the app, one can even switch off the standard database API, which further increases security and cleanliness. With this final step one has arrived at a software architecture that implements data-centric microservices in an application-specific way directly in the database server, which is good against bugs, good for performance (complex queries can be run close to the data), good for the simplicity of the application design and good for maintainability. Even the devops like this because the microservices can be deployed and updated independently.