Blog - Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

AWS Machine Learning Blog

MAY 8, 2023

With kernel auto-tuning, the engine selects the best algorithm for the target GPU, maximizing hardware utilization. This engine is then loaded into Triton Inference Server and used to perform inference on incoming requests. Load the TensorRT engine in Triton Inference Server. Generating serialized engines from models.

ML BERT Deep Learning Auto-complete

Training large language models on Amazon SageMaker: Best practices

AWS Machine Learning Blog

MARCH 6, 2023

Storage – We see data loading and checkpointing done in two ways, depending on skills and preferences: with an Amazon FSx Lustre file system, or Amazon Simple Storage Service (Amazon S3) only. Resiliency – At scale, hardware failures can happen. In the case of the SageMaker Training API, the computing fleet can be heterogeneous.

Large Language Models

Large Language Models LLM Machine Learning ML

Artificial Intelligence Zone

Host ML models on Amazon SageMaker using Triton: TensorRT models

Training large language models on Amazon SageMaker: Best practices

Webinars

Stay Connected