AI, Auto-complete, Inference Engine and Metadata

Host ML models on Amazon SageMaker using Triton: TensorRT models

AWS Machine Learning Blog

MAY 8, 2023

TensorRT supports major deep learning frameworks and includes a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for AI applications. With kernel auto-tuning, the engine selects the best algorithm for the target GPU, maximizing hardware utilization.

ML BERT Deep Learning Auto-complete

Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

Webinars

Stay Connected