Max Neunhöffer

Max Neunhöffer

Senior Developer and Architect

Bio:

Max Neunhöffer is a mathematician turned database developer. In his academic career he has worked for 16 years on the development and implementation of new algorithms in computer algebra. During this time he has juggled a lot with mathematical big data like group orbits containing trillions of points. Recently he has returned from St. Andrews to Germany, has shifted his focus to NoSQL databases, and now helps to develop ArangoDB. He has spoken at international conferences including O’Reilly Software Architecture London, J On The Beach or MesosCon Seattle.

Talk proposals:

Building a Graphy Time Machine

Graph databases allow users to analyze highly interconnected datasets and find patterns within these relationships. Social networks, corporate hierarchies, fraud detection, network analytics, or building whole knowledge graphs are great use cases for graph databases. However, these datasets of nodes and connecting edges change over time. Whether you are a developer, architect or data scientist, you may want to time travel for analyzing the past or even predict tomorrow.

While your graph database may be lacking built-in support for managing the revision history of graph data, this talk will show you how to manage it in a performant manner for general classes of graphs. Best of all, this won’t require any groundbreaking new ideas. We’ll simply borrow a few tools and tricks from existing persistent data structure literature and adapt them for good performance within the graph database software. This will help enable new ways to manipulate and exploit graph data and hopefully power new and exciting applications.

Developing a devops friendly Kubernetes integration for ArangoDB

ArangoDB is a scalable, distributed multi-model database. However, for this talk, it is not necessary to know what this means. Rather the only crucial fact is that it is distributed and written in C++.

ArangoDB is a scalable, distributed multi-model database. It can be deployed on many operating systems in various environments.

In recent months we’ve received many requests to support Kubernetes “out of the box”.

This talk is about an ongoing journey from early yaml files to an advanced integration supporting scaling, upgrading and monitoring in a DevOps friendly way.

The journey will take us from tiny cloud environments to bespoke hardware, touching subjects such as persistency performance, session stickiness, access control and federation.

Our journey has not reached its destination yet, but the destination looks promising and the journey so far has been an interesting one, that we will happily share with you.

Implementing data center to data center replication for a distributed database

ArangoDB is a scalable, distributed multi-model database. However, for this talk, it is not necessary to know what this means. Rather the only crucial fact is that it is distributed and written in C++.

Before you stop reading: This talk is about a golang success story.
Namely, we had to implement resilient data center to data center (DC2DC) replication for ArangoDB clusters from scratch within 6 weeks (plus some time for testing and debugging). Therefore, we built upon
– ArangoDB’s HTTP-based API for asynchronous replication,
– the existing golang driver,
– the fault tolerant scalable message queue system Kafka,
– a lot of existing golang libraries and
– golang’s fantastic capabilities for parallelism, communication and data manipulation

and pulled this task off. This talk is the story of this project with its many challenges and successes and ends with a surprising revelation about which of the above we did not actually need in the end.

Fishing Graphs in a Hadoop Data Lake

Hadoop clusters can store nearly everything in a cheap and blazingly fast way to your data lake. Answering questions and gaining insights out of this ever growing stream becomes the decisive part for many businesses. Increasingly data has a natural structure as a graph, with vertices linked by edges, and many questions arising about the data involve graph traversals or other complex queries, for which one does not have an a priori given bound on the length of paths.

Spark with GraphX is great for answering relatively simple graph questions which are worth starting a Spark job for, because they essentially involve the whole graph. But does it make sense to start one for every ad-hoc query or is it suitable for complex real-time queries?

The Computer Science behind a modern distributed data store

What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Most applications need a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance.

Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhoeffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.

Topics are:

  • Challenges in developing a distributed, resilient data store
  • Consensus, distributed transactions, distributed query optimization and execution
  • The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB

The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.

Handling Billions Of Edges in a Graph Database

The complexity and amount of data rises. Modern graph databases are designed to handle the complexity but still not for the amount of data. When hitting a certain size of a graph, many dedicated graph databases reach their limits in vertical or, most common, horizontal scalability. In this talk I’ll provide a brief overview about current approaches and their limits towards scalability. Dealing with complex data in a complex system doesn’t make things easier… but more fun finding a solution. Join me on my journey to handle billions of edges in a graph database.

Static binaries, universal packages and Docker in build pipelines

There are a lot of variants of Linux out there and many versions of each.

As a manufacturer of an application that is written in C++, providing binary packages to customers is a nightmare. Wouldn’t it be nice if one could make a “Linux package” with a unique static binary that simply works everywhere? As it turns out, this is essentially possible, but needs a bit of care.

This is a story about using Alpine Linux and libmusl to build completely static binaries, creating universal Debian packages which run on any version of Debian or Ubuntu, creating universal RPM packages which run on any version of any RPM based Linux distribution and about doing all this on any variant of Linux using Docker images of various Linux distributions.
For us at ArangoDB, this approach brings the release process down from 10 hours to approximately half an hour.

Want to learn more about multi-model and graphs?
Have a look here:

ArangoDB Max Neunhöffer

Recent talks:

  • Devoxx Belgium:
    The Computer Science behind a modern distributed data store
    View Video
  • Data Works Summit:
    Fishing Graphs in a Hadoop Data Lake
    View Video
  • MesosCon Asia:
    Handling Billions of Edges in a Graph Database
    View Video
  • DevOps & Infrastructure NRW:
    View Slides

Recent podcasts:

Do you like ArangoDB?
icon-githubStar this project on GitHub.
close-link