Developer Relations Engineer
Chris has over 10 years of experience at all angles of technology including service, support, and development. He is also passionate about learning and right now he is focused on improving the learning experience for the ArangoDB community. Chris believes the future is a native multi-model and wants to help tell the world.
The Case for Common Metadata Layer for Machine Learning Platforms
With the rapid and recent rise of data science, the Machine Learning Platforms being built are becoming more complex. For example consider the various Kubeflow components: Distributed Training, Jupyter Notebooks, CI/CD, Hyperparameter Optimization, Feature store, and more. Each of these components is producing metadata: Different (versions) Datasets, different versions of jupyter notebooks, different training parameters, test/training accuracy, different features, model serving statistics, and many more.
For production use, it is critical to have a common view across all these metadata as we have to ask questions such as: Which Jupyter notebook has been used to build Model xyz current running in production? If there is new data for a given dataset, which models (currently serving in production) have to be updated?
As the overall Machine Learning stack is still rapidly changing (and also different companies typically choose different components for their stack) with new tools coming out every month (if not week), it seems key to specify a generic API first supporting new and different components. Furthermore, Data Scientists need a simple model and intuitive interface to query across all metadata.
Challenges in Building Multi-Cloud-Provider Platform With Managed Kubernetes
Building a cloud-agnostic platform used to be a challenging task as one had to deal with a large number of different cloud APIs and service offerings. Today, as most Cloud providers are offering a managed Kubernetes solution (e.g., GKE, AKS, or EKS), it seems like developers could simply build a platform based on Kubernetes and be cloud-agnostic. While this assumption is mostly correct, there are still a number of differences and pitfalls when deploying across those managed Kubernetes solutions.
This talk discusses the experiences made while building the ArangoDB Managed Service offering across and GKE, AKS, or EKS.
While the (managed) Kubernetes API being a great abstraction from the actual cloud provider, a number of challenges remain including for example networking, autoscaler, cluster provisioning, or node sizing. This talk provides an overview of those challenges and also discusses how they were solved as part of the ArangoDB managed Service.