From Data to Metadata for Machine Learning Platforms

From Data to Metadata for Machine Learning Platforms

October 8, 2019

It is a common fact that data quality and quantity is a crucial aspect for building machine learning models, especially when dealing with deep learning and neural networks.

But besides the data required to build the model itself there is another often overlooked type of data required to build a production grade machine learning platform: metadata. For the dataops team managing such production platforms, it is critical to have a common view across all this metadata.

In this webinar Jörg Schad, our Head of Engineering and Machine Learning, presented ArangoML Pipeline, a common metadata layer for machine learning platforms. ArangoML Pipeline provides an extensible HTTP pipeline so you can use together with you favorite machine learning framework or tools. Furthermore, as it is based on top of ArangoDB we can leverage its multi-model capabilities and a queries such as “Which models being served right now have been trained from a particular dataset?” turn into a simple graph traversal.

Just in case you want to learn even more about multi-model and machine learning, you can also check out our ArangoML page.

Joerg Schad

About the Presenter:
Jörg Schad is Head of Machine Learning at ArangoDB. In a previous life, he has worked on or built machine learning pipelines in healthcare, distributed systems at Mesosphere, and in-memory databases. He received his Ph.D. for research around distributed databases and data analytics. He’s a frequent speaker at meetups, international conferences, and lecture halls.