Blog - Artificial Intelligence Zone

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

AWS Machine Learning Blog

APRIL 7, 2023

As one of the most prominent use cases to date, machine learning (ML) at the edge has allowed enterprises to deploy ML models closer to their end-customers to reduce latency and increase responsiveness of their applications. As our sample workload, we deploy a pre-trained model from Amazon SageMaker JumpStart.

BERT

BERT Metadata Natural Language Processing ML

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

AWS Machine Learning Blog

JULY 24, 2023

When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. In this post we walk you through the process of deploying FastAPI model servers on AWS Inferentia devices (found on Amazon EC2 Inf1 and Amazon EC Inf2 instances).

BERT

BERT Deep Learning Python Machine Learning

Get started with the open-source Amazon SageMaker Distribution

AWS Machine Learning Blog

JUNE 8, 2023

Data scientists need a consistent and reproducible environment for machine learning (ML) and data science workloads that enables managing dependencies and is secure. AWS Deep Learning Containers already provides pre-built Docker images for training and serving models in common frameworks such as TensorFlow, PyTorch, and MXNet.

Data Scientist

Data Scientist ML Machine Learning Deep Learning

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

How Patsnap used GPT-2 inference on Amazon SageMaker with low latency and cost

AWS Machine Learning Blog

JULY 24, 2023

This blog post was co-authored, and includes an introduction, by Zilong Bai, senior natural language processing engineer at Patsnap. At the same time, Patsnap is embracing the power of machine learning (ML) to develop features that can continuously improve user experiences on the platform. Patent search is one of them.

Metadata

Metadata Generative AI Natural Language Processing Deep Learning

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

AWS Machine Learning Blog

APRIL 19, 2023

The DJL is a deep learning framework built from the ground up to support users of Java and JVM languages like Scala, Kotlin, and Clojure. With the DJL, integrating this deep learning is simple. Since 2018, our team has been developing a variety of ML models to enable betting products for NFL and NCAA football.

ML

ML Deep Learning Python Auto-classification

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

AWS Machine Learning Blog

MARCH 20, 2023

Customers are always looking for ways to improve the performance and response times of their machine learning (ML) inference workloads without increasing the cost per transaction and without sacrificing the accuracy of the results. In the following example figure, we show INT8 inference performance in C6i for a BERT-base model.

BERT

BERT Deep Learning ML Neural Network

GPT-NeoXT-Chat-Base-20B foundation model for chatbot applications is now available on Amazon SageMaker

AWS Machine Learning Blog

MAY 16, 2023

Today we are excited to announce that Together Computer’s GPT-NeoXT-Chat-Base-20B language foundation model is available for customers using Amazon SageMaker JumpStart. GPT-NeoXT-Chat-Base-20B is an open-source model to build conversational bots. You can easily try out this model and use it with JumpStart.

Chatbots

Chatbots Machine Learning Algorithm ML

Deep Learning in the Browser - Exploring TF.js, WebDNN and ONNX.js

Shreyansh Singh

JANUARY 24, 2021

After my last post on deploying Machine Learning and Deep Learning models using FastAPI and Docker, I wanted to explore a bit more on deploying deep learning models. My last post discussed a server-side method for deploying the model. It already has a large collection of some pretrained models.

Deep Learning

Deep Learning Neural Network Machine Learning ML

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

The MLOps Blog

MARCH 28, 2023

In this second installment of the series “Real-world MLOps Examples,” Paweł Pęczek , Machine Learning Engineer at Brainly , will walk you through the end-to-end Machine Learning Operations (MLOps) process in the Visual Search team at Brainly. Their user base spans more than 35 countries.

Machine Learning

Machine Learning Automation Data Scientist ML

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

AWS Machine Learning Blog

DECEMBER 13, 2023

In this post, we showcase fine-tuning a Llama 2 model using a Parameter-Efficient Fine-Tuning (PEFT) method and deploy the fine-tuned model on AWS Inferentia2. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution.

Auto-complete

Auto-complete Machine Learning Deep Learning Python

How to Save Trained Model in Python

The MLOps Blog

MAY 10, 2023

When working on real-world machine learning (ML) use cases, finding the best algorithm/model is not the end of your responsibilities. It is crucial to save, store, and package these models for their future use and deployment to production. Reusability & reproducibility: Building ML models is time-consuming by nature.

Python

Python Metadata ML Deep Learning

ML Model Packaging [The Ultimate Guide]

The MLOps Blog

APRIL 5, 2023

Have you ever spent weeks or months building a machine learning model, only to later find out that deploying it into a production environment is complicated and time-consuming? Or have you struggled to manage multiple versions of a model and keep track of all the dependencies and configurations required for deployment?

ML

ML Machine Learning Deep Learning Metadata

Deploy large language models on AWS Inferentia2 using large model inference containers

AWS Machine Learning Blog

APRIL 10, 2023

You don’t have to be an expert in machine learning (ML) to appreciate the value of large language models (LLMs). ML practitioners keep improving the accuracy and capabilities of these models. ML practitioners keep improving the accuracy and capabilities of these models.

Large Language Models

Large Language Models Deep Learning Software Development LLM

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

As you delve into the landscape of MLOps in 2023, you will find a plethora of tools and platforms that have gained traction and are shaping the way models are developed, deployed, and monitored. Open-source tools have gained significant traction due to their flexibility, community support, and adaptability to various workflows.

Machine Learning

Machine Learning Metadata Data Quality Data Scientist

Serving With TF and GKE: Stable Diffusion

TensorFlow

APRIL 28, 2023

Posted by Chansung Park and Sayak Paul (ML and Cloud GDEs) Generative AI models like Stable Diffusion 1 that lets anyone generate high-quality images from natural language text prompts enable different use cases across different industries. Then an initial noise is sampled, which is fed to the Diffusion model along with the text embeddings.

Machine Learning

Machine Learning ML Deep Learning Linked Data

Announcing provisioned concurrency for Amazon SageMaker Serverless Inference

AWS Machine Learning Blog

MAY 9, 2023

Amazon SageMaker Serverless Inference allows you to serve model inference requests in real time without having to explicitly provision compute instances or configure scaling policies to handle traffic variations. This is called a cold start.

Auto-complete

Auto-complete Machine Learning ML Software Development

Learnings From Building the ML Platform at Stitch Fix

The MLOps Blog

AUGUST 3, 2023

This article was originally an episode of the ML Platform Podcast , a show where Piotr Niedźwiedź and Aurimas Griciūnas, together with ML platform professionals, discuss design choices, best practices, example tool stacks, and real-world learnings from some of the best ML platform professionals. Stefan: Yeah. Stefan: Yeah. What is DAGWorks?

ML

ML Data Scientist Software Engineer Machine Learning

Host ML models on Amazon SageMaker using Triton: ONNX Models

AWS Machine Learning Blog

JUNE 9, 2023

ONNX ( Open Neural Network Exchange ) is an open-source standard for representing deep learning models widely supported by many providers. ONNX provides tools for optimizing and quantizing models to reduce the memory and compute needed to run machine learning (ML) models.

ML

ML Computer Vision NLP Deep Learning

Large Language Models: A Complete Guide

Heartbeat

MAY 29, 2023

These models have the potential to revolutionize industries ranging from customer service to scientific research, but their capabilities and limitations are still not fully understood. We will explore how to better understand the data that these models are trained on, and how to evaluate and optimize them for real-world use.

Large Language Models

Large Language Models LLM Machine Learning Natural Language Processing

Host ML models on Amazon SageMaker using Triton: CV model with PyTorch backend

AWS Machine Learning Blog

MAY 31, 2023

PyTorch is a machine learning (ML) framework based on the Torch library, used for applications such as computer vision and natural language processing. In this post, we dive deep to see how Amazon SageMaker can serve these PyTorch models using NVIDIA Triton Inference Server.

ML

ML Auto-classification Auto-complete Natural Language Processing

Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

AWS Machine Learning Blog

DECEMBER 14, 2023

Amazon SageMaker Studio offers a broad set of fully managed integrated development environments (IDEs) for machine learning (ML) development, including JupyterLab, Code Editor based on Code-OSS (Visual Studio Code Open Source), and RStudio. Both JupyterLab and Code Editor can be launched using a flexible workspace called Spaces.

Generative AI

Generative AI AI Tools ML AI

End-to-End Pipeline for Segmentation with TFX, Google Cloud, and Hugging Face

TensorFlow

JANUARY 18, 2023

Posted by Chansung Park, Sayak Paul (ML and Cloud GDEs) TensorFlow Extended ( TFX ) is a flexible framework allowing Machine Learning (ML) practitioners to iterate on production-grade ML workflows faster with reliability and resiliency. Finally, you will see how we implemented CI/CD into the mix by leveraging GitHub Actions.

Data Ingestion

Data Ingestion ML DevOps Automation

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

AWS Machine Learning Blog

APRIL 18, 2023

Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more.

Large Language Models

Large Language Models LLM NLP Deep Learning

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

AWS Machine Learning Blog

JUNE 6, 2023

PyTorch is a machine learning (ML) framework that is widely used by AWS customers for a variety of applications, such as computer vision, natural language processing, content creation, and more. The following figure shows a performance benchmark of fine-tuning a RoBERTa model on Amazon EC2 p4d.24xlarge With the recent PyTorch 2.0

ML

ML Deep Learning BERT Python

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

AWS Machine Learning Blog

APRIL 18, 2023

Large language models (LLMs) with billions of parameters are currently at the forefront of natural language processing (NLP). These models are shaking up the field with their incredible abilities to generate text, analyze sentiment, translate languages, and much more.

LLM

LLM NLP Deep Learning ML

How to Build an End-To-End ML Pipeline

The MLOps Blog

MAY 9, 2023

One of the most prevalent complaints we hear from ML engineers in the community is how costly and error-prone it is to manually go through the ML workflow of building and deploying models. Building end-to-end machine learning pipelines lets ML engineers build once, rerun, and reuse many times.

ML

ML Machine Learning Metadata Automation

Definite Guide to Building a Machine Learning Platform

The MLOps Blog

MARCH 21, 2023

Moving across the typical machine learning lifecycle can be a nightmare. From gathering and processing data to building models through experiments, deploying the best ones, and managing them at scale for continuous value in production—it’s a lot. How to understand your users (data scientists, ML engineers, etc.).

Machine Learning

Machine Learning Data Scientist ML Metadata

Deploying ML Models on GPU With Kyle Morris

The MLOps Blog

DECEMBER 29, 2022

Every episode is focused on one specific ML topic, and during this one, we talked to Kyle Morris from Banana about deploying models on GPU. Who is an expert in today’s topic, which is deploying models on GPU. How would you explain deploying models on GPU in one minute? Kyle, to warm you up a little bit.

ML

ML Auto-complete Machine Learning Python

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

AWS Machine Learning Blog

FEBRUARY 19, 2024

Amazon SageMaker multi-model endpoints (MMEs) are a fully managed capability of SageMaker inference that allows you to deploy thousands of models on a single endpoint. In this post, we discuss a solution in which an MME can dynamically adjust the compute power assigned to each model based on the model’s traffic pattern.

ML

ML Auto-complete Deep Learning Software Engineer

Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs

AWS Machine Learning Blog

SEPTEMBER 6, 2023

Multi-model endpoints (MMEs) are a powerful feature of Amazon SageMaker designed to simplify the deployment and operation of machine learning (ML) models. With MMEs, you can host multiple models on a single serving container and host all the models behind a single endpoint.

Generative AI

Generative AI AI Modeling Deep Learning ML

Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker

AWS Machine Learning Blog

JANUARY 9, 2023

Machine learning (ML) applications are complex to deploy and often require the ability to hyper-scale, and have ultra-low latency requirements and stringent cost budgets. Deploying ML models at scale with optimized cost and compute efficiencies can be a daunting and cumbersome task. Single-model based ML applications.

ML

ML Auto-complete Auto-classification Deep Learning

Deploy thousands of model ensembles with Amazon SageMaker multi-model endpoints on GPU to minimize your hosting costs

AWS Machine Learning Blog

AUGUST 8, 2023

Recent scientific breakthroughs in deep learning (DL), large language models (LLMs), and generative AI is allowing customers to use advanced state-of-the-art solutions with almost human-like performance. However, in addition to model invocation, those DL application often entail preprocessing or postprocessing in an inference pipeline.

BERT

BERT Deep Learning Auto-classification ML

Artificial Intelligence Zone

Deploy pre-trained models on AWS Wavelength with 5G edge using Amazon SageMaker JumpStart

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

Webinars

Trending Sources

Get started with the open-source Amazon SageMaker Distribution

Webinars

How Patsnap used GPT-2 inference on Amazon SageMaker with low latency and cost

How Sportradar used the Deep Java Library to build production-scale ML platforms for increased performance and efficiency

Accelerate Amazon SageMaker inference with C6i Intel-based Amazon EC2 instances

GPT-NeoXT-Chat-Base-20B foundation model for chatbot applications is now available on Amazon SageMaker

Deep Learning in the Browser - Exploring TF.js, WebDNN and ONNX.js

Real-World MLOps Examples: End-To-End MLOps Pipeline for Visual Search at Brainly

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

How to Save Trained Model in Python

ML Model Packaging [The Ultimate Guide]

Deploy large language models on AWS Inferentia2 using large model inference containers

MLOps Landscape in 2023: Top Tools and Platforms

Serving With TF and GKE: Stable Diffusion

Announcing provisioned concurrency for Amazon SageMaker Serverless Inference

Learnings From Building the ML Platform at Stitch Fix

Host ML models on Amazon SageMaker using Triton: ONNX Models

Large Language Models: A Complete Guide

Host ML models on Amazon SageMaker using Triton: CV model with PyTorch backend

Boost productivity on Amazon SageMaker Studio: Introducing JupyterLab Spaces and generative AI tools

End-to-End Pipeline for Segmentation with TFX, Google Cloud, and Hugging Face

Financial text generation using a domain-adapted fine-tuned large language model in Amazon SageMaker JumpStart

Build high-performance ML models using PyTorch 2.0 on AWS – Part 1

Domain-adaptation Fine-tuning of Foundation Models in Amazon SageMaker JumpStart on Financial data

How to Build an End-To-End ML Pipeline

Definite Guide to Building a Machine Learning Platform

Deploying ML Models on GPU With Kyle Morris

Run ML inference on unplanned and spiky traffic using Amazon SageMaker multi-model endpoints

Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs

Model hosting patterns in Amazon SageMaker, Part 1: Common design patterns for building ML applications on Amazon SageMaker

Deploy thousands of model ensembles with Amazon SageMaker multi-model endpoints on GPU to minimize your hosting costs

Stay Connected