article thumbnail

NLP News Cypher | 07.26.20

Towards AI

Photo by Will Truettner on Unsplash NATURAL LANGUAGE PROCESSING (NLP) WEEKLY NEWSLETTER NLP News Cypher | 07.26.20 GitHub: Tencent/TurboTransformers Make transformers serving fast by adding a turbo to your inference engine!Transformer Primus The Liber Primus is unsolved to this day.

NLP 82
article thumbnail

Host ML models on Amazon SageMaker using Triton: TensorRT models

AWS Machine Learning Blog

TensorRT is an SDK developed by NVIDIA that provides a high-performance deep learning inference library. It’s optimized for NVIDIA GPUs and provides a way to accelerate deep learning inference in production environments. Triton Inference Server supports ONNX as a model format.

ML 88
article thumbnail

Build a personalized avatar with generative AI using Amazon SageMaker

AWS Machine Learning Blog

amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117" cu117" ) print(f"Image going to be used is - > {inference_image_uri}") In addition to that, we need to have a serving.properties file that configures the serving properties, including the inference engine to use, the location of the model artifact, and dynamic batching.