Remove how-to-serve-machine-learning-models-with-tensorflow-serving-and-docker
article thumbnail

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

AWS Machine Learning Blog

As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.

BERT 80
article thumbnail

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

AWS Machine Learning Blog

When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. In this post we walk you through the process of deploying FastAPI model servers on AWS Inferentia devices (found on Amazon EC2 Inf1 and Amazon EC Inf2 instances).

BERT 73
professionals

Sign Up for our Newsletter

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

article thumbnail

Get started with the open-source Amazon SageMaker Distribution

AWS Machine Learning Blog

Data scientists need a consistent and reproducible environment for machine learning (ML) and data science workloads that enables managing dependencies and is secure. AWS Deep Learning Containers already provides pre-built Docker images for training and serving models in common frameworks such as TensorFlow, PyTorch, and MXNet.

article thumbnail

How Patsnap used GPT-2 inference on Amazon SageMaker with low latency and cost

AWS Machine Learning Blog

This blog post was co-authored, and includes an introduction, by Zilong Bai, senior natural language processing engineer at Patsnap. At the same time, Patsnap is embracing the power of machine learning (ML) to develop features that can continuously improve user experiences on the platform. Patent search is one of them.

article thumbnail

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

The DJL is a deep learning framework built from the ground up to support users of Java and JVM languages like Scala, Kotlin, and Clojure. With the DJL, integrating this deep learning is simple. Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football.

ML 75
article thumbnail

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

AWS Machine Learning Blog

Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. In the following example figure, we show INT8 inference performance in C6i for a BERT-base model.

BERT 78
article thumbnail

GPT-NeoXT-Chat-Base-20B foundation model for chatbot applications is now available on Amazon SageMaker

AWS Machine Learning Blog

Today we are excited to announce that Together Computer’s GPT-NeoXT-Chat-Base-20B language foundation model is available for customers using Amazon SageMaker JumpStart. GPT-NeoXT-Chat-Base-20B is an open-source model to build conversational bots. You can easily try out this model and use it with JumpStart.