Workshop: Serving AI Models at Scale with Nvidia Triton

Serving AI Models at Scale with Nvidia Triton

Workshop: Serving AI Models at Scale with Nvidia Triton

2021-07-08 9:00:00 2021-07-08 11:00:00 America/Los_Angeles Workshop: Serving AI Models at Scale with Nvidia Triton" In this workshop, we will use Nvidia's Triton Inference server (formerly known as TensorRT Inference Server), which simplifies the deployment of AI models at scale in production. Find more info here: https://www.arangodb.com/events/serving-ai-models-at-scale/ Zoom link will be sent to your inbox after registration July 8th, 2021 - 9am PDT/ 12pm EDT/ 6pm CEST

In this workshop, join Machine Learning Research Engineer Sachin Sharma and learn how to use Nvidia’s Triton Inference server (formerly known as TensorRT Inference Server), which simplifies the deployment of AI models at scale in production. We focus on hosting/deploying multiple trained models (Tensorflow, PyTorch) on the Triton inference server to leverage its full potential for this examination. Once models are deployed, we can make inference requests and can get back the predictions.

Disclaimer

In order to make the flow of the workshop smooth, the audience needs to have some packages install beforehand:

  1. Install Docker (https://docs.docker.com/get-docker/)
  2. Pulling triton server docker image from Nvidia NGC: docker pull nvcr.io/nvidia/tritonserver:21.05-py3
  3. Image size: 10.6 GB (10-15 mins to install depending upon the internet)
  4. To view the downloaded docker image: docker images
  5. The repository which we will follow throughout the workshop (optional) https://github.com/sachinsharma9780/AI-Enterprise-Workshop-Building-ML-Pipelines

Agenda:
– Introduction to ArangoDB and Nvidia’s Triton Inference Server (Need, features, applications, etc.)
– Setting up Triton Inference server on a local machine
– Deploy your first trained model (Tensorflow) with an application to image classification on Triton inference server
– Deploy almost any Hugging Face PyTorch models with an application to zero-short text classification on Triton inference server (Here we will convert given PyTorch models to Triton acceptable models)
– Once models are deployed, we can write a python-client side script to interact with the Triton server (i.e., sending requests and receiving back the predictions)
– Exploring the python image_client.py script to make an image classification request
– Writing down our own client-side script to interact with NLP Models
– Triton Metrics
– Storing inference results in ArangoDB using python-arango

About the Presenter:

Sachin is a Machine Learning Research Engineer at ArangoDB whose aim is to build Intelligent products using thorough research and engineering in the area of Graph Machine Learning. He completed his Masters’s degree in Computer Science with a specialization in Intelligent Systems. He is an AI Enthusiast who has conducted research in the areas of Computer Vision, NLP, and Graph Neural Networks at DFKI (German Research Centre for AI) during his academic career. Sachin also worked on building Machine Learning pipelines at Define Media Gmbh where he worked as a Machine Learning Engineer and Scientist.

Sachin Sharma

Sachin Sharma

Do you like ArangoDB?
icon-githubStar this project on GitHub.
close-link