Jan is a developer with ArangoDB since version 0.0.1. He works on the core components, such as the query language (AQL), the storage engine, and performance optimizations.
He originally intended to work in the field of economics, but somehow signed up to IT jobs and stuck with them ever since.
Running complex data queries in a distributed system
With the always-growing amount of data, it is getting increasingly hard to store and get it back efficiently. While the first versions of distributed databases have put all the burden of sharding on the application code, there are now some smarter solutions that handle most of the data distribution and resilience tasks inside the database.
This poses some interesting questions, e.g.
- how are other than by-primary-key queries actually organized and executed in a distributed system, so that they can run most efficiently?
- how do the contemporary distributed databases actually achieve transactional semantics for non-trivial operations that affect different shards/servers?
This talk will give an overview of these challenges and the available solutions that some open source distributed databases have picked to solve them.
The challenges of running distributed database queries
Writing a database engine for running queries on a single machine is challenging, but doable. Building a distributed database engine is even much harder. It is surprisingly hard to make distributed queries perform efficiently, and to make them behave according to the logical semantics of transactions. There are also various trade-offs here between performance and consistency.
In this talk we will overview some of the approaches that different database products, e.g. Google Spanner, CockroachDB, ArangoDB and MongoDB, have chosen to tackle this problem.
This talk targets developers that are interested in database technology in general and running distributed databases in particular.