Host ML models on Amazon SageMaker using Triton: TensorRT models
AWS Machine Learning Blog
MAY 8, 2023
With kernel auto-tuning, the engine selects the best algorithm for the target GPU, maximizing hardware utilization. Overall, TensorRT’s combination of techniques results in faster inference and lower latency compared to other inference engines. Set up the environment We begin by setting up the required environment.
Let's personalize your content