In the world of machine learning, models are trained using existing data sets and then deployed to do inference on new data. In a previous post, Simplifying and Scaling Inference Serving with NVIDIA Triton 2.3, we discussed inference workflow and the need for an efficient inference serving solution. In that post, we introduced Triton Inference Server and its benefits and looked at the new features in version 2.3. Because model deployment is critical to the success of AI in the organization, we revisit the key benefits of using Triton Inference Server in this post.
Triton is designed as an enterprise-class software that is also open source. It supports the following features:
- Multiple frameworks: Developers and ML engineers can run inference on models from any framework such as TensorFlow, PyTorch, ONNX, TensorRT, and even custom framework backends. Triton has standard HTTP/gRPC communication endpoints to communicate with the AI application. It provides flexibility to the engineers and standardized deployment to DevOps and MLOps. More……