Joerg Schad

Joerg Schad

Head of Engineering and Machine Learning

Bio:

Jörg Schad is Head of Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.

Talk proposal:

A Tale of Two Worlds: Canary-Testing for Both ML Models and Microservices

With the rapid and recent rise of data science, organizations are leveraging Cloud Native tools, especially Kubeflow for Data Science. One of the big challenges is how to deploy models in productions using similar practices like A/B testing and Canary-releasing which have proven successful for microservices.

How to easily test and update your data models to production without impacting users? These are typical challenges a data-scientist will encounter when self-deploying and -managing the lifecycle of data models in production.

In this talk Vincent Lesierse and Jörg Schad are going to show how experiences learned from releasing Microservices on Kubernetes can be applied to the world of ML Models, and where the deployment and lifecycle management of these ML Models differs from Microservices.

The case for a common Metadata Layer for Machine Learning Platforms

With the rapid and recent rise of data science, the Machine Learning Platforms being built are becoming more complex. For example consider the various Kubeflow components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature store, and more. Each of these components is producing metadata: Different (versions) Datasets, different versions of jupyter notebooks, different training parameters, test/training accuracy, different features, model serving statistics, and many more.
For production use, it is critical to have a common view across all these metadata as we have to ask questions such as: Which Jupyter notebook has been used to build Model xyz current running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated?
As the overall Machine Learning stack is still rapidly changing (and also different companies typically choose different components for their stack) with new tools coming out every month (if not week), it seems key to specify a generic API first supporting new and different components. Furthermore, Data Scientists need a simple model and intuitive interface to query across all metadata.

Webinar: Scalable Graph Processing on Kubernetes

In recent years Kubernetes has become the default platform for deploying your microservices. A multi-model database such as ArangoDB can help to provide a scalable persistent backend for both graph and document data models for such microservice architectures.
Especially for a combination of unstructured but highly connected data which can be found in a typical recommendation engine, ArangoDB is a great fit as it can natively store and query both data models. Even better, with the new Kubernetes ArangoDB operator one can easily deploy, scale, and operate your ArangoDB cluster on top of Kubernetes.

In this webinar, we discuss how we can use Kubernetes to build an end-to-end recommender system with ArangoDB and Kubernetes and discuss
how a multi-model datamodel can be used to build a recommendation engine.
how to deploy, scale, and manage ArangoDB on Kubernetes using the new ArangoDB operator
An end-to-end demo of a complete recommendation system on Kubernetes

Want to learn more about multi-model and graphs?
Have a look here:

Joerg Schad
  • Tale of Two Worlds: Canary-Testing for Both ML Models and Microservices
    View Video
  • Scalable Graph Processing on Kubernetes
    View Video
  • Towards Data Science Engineering Principles
    View Video

Do you like ArangoDB?
icon-githubStar this project on GitHub.
close-link