Blog - Artificial Intelligence Zone

LLM Inference Performance Engineering: Best Practices

databricks

OCTOBER 12, 2023

In this blog post, the MosaicML engineering team shares best practices for how to capitalize on popular open source large language models (LLMs).

LLM

LLM Large Language Models

Establishing an AI/ML center of excellence

AWS Machine Learning Blog

MAY 9, 2024

They establish and enforce best practices encompassing design, development, processes, and governance operations, thereby mitigating risks and making sure robust business, technical, and governance frameworks are consistently upheld.

ML

ML Generative AI AI AI

? Guest Post: Evaluating LLM Applications*

TheSequence

MARCH 11, 2024

To successfully build an AI application, evaluating the performance of large language models (LLMs) is crucial. Given the inherent novelty and complexities surrounding LLMs, this poses a unique challenge for most companies. Today, Peter shares his insights on LLM evaluations.

LLM

LLM Categorization Software Development Prompt Engineer

Webinars

How to Optimize the Developer Experience for Monumental Impact

Generative AI Deep Dive: Advancing from Proof of Concept to Production

Understanding User Needs and Satisfying Them

Leading the Development of Profitable and Sustainable Products

How To Get Promoted In Product Management

MORE WEBINARS

Personalize your generative AI applications with Amazon SageMaker Feature Store

AWS Machine Learning Blog

OCTOBER 6, 2023

Large language models (LLMs) are revolutionizing fields like search engines, natural language processing (NLP), healthcare, robotics, and code generation. The personalization of LLM applications can be achieved by incorporating up-to-date user information, which typically involves integrating several components.

Generative AI

Generative AI LLM Natural Language Processing Metadata

A Guide to LLMOps: Large Language Model Operations

Heartbeat

JANUARY 9, 2024

The smooth deployment, continuous monitoring, and effective maintenance of LLMs within production systems are major concerns in the field of LLMOps. Solving these concerns entails creating procedures and techniques to guarantee that these potent language models perform as intended and provide accurate results in practical applications.

Large Language Models

Large Language Models Natural Language Processing LLM Machine Learning

Pinterest's Text to SQL system through LLMs!

Bugra Akyildiz

APRIL 20, 2024

Now, back to the original programming: Articles Pinterest wrote a blog post on generating SQL queries from text. The initial version of the Text-to-SQL solution relied on an LLM as the core component. This information, along with the specific SQL dialect used at Pinterest, would then be compiled into a prompt and fed into the LLM.

LLM

LLM Natural Language Processing NLP Big Data

Foundational data protection for enterprise LLM acceleration with Protopia AI

AWS Machine Learning Blog

DECEMBER 5, 2023

New and powerful large language models (LLMs) are changing businesses rapidly, improving efficiency and effectiveness for a variety of enterprise use cases. Speed is of the essence, and adoption of LLM technologies can make or break a business’s competitive advantage.

LLM

LLM AI AI Generative AI

What Is Retrieval-Augmented Generation?

NVIDIA

NOVEMBER 15, 2023

The paper, with coauthors from the former Facebook AI Research (now Meta AI), University College London and New York University, called RAG “a general-purpose fine-tuning recipe” because it can be used by nearly any LLM to connect with practically any external resource. Another great advantage of RAG is it’s relatively easy.

LLM

LLM Generative AI AI Modeling Neural Network

Beyond prompting: getting production quality LLM performance with Snorkel Flow

Snorkel AI

AUGUST 9, 2023

However, as enterprises begin to look beyond proof-of-concept demos and toward deploying LLM-powered applications on business-critical use cases, they’re learning that these models (often appropriately called “ foundation models ”) are truly foundations, rather than the entire house. is currently the state-of-the-art LLM. Handcrafted.

LLM

LLM Prompt Engineer Prompt Engineering Artificial Intelligence

Beyond prompting: getting production quality LLM performance with Snorkel Flow

Snorkel AI

AUGUST 9, 2023

However, as enterprises begin to look beyond proof-of-concept demos and toward deploying LLM-powered applications on business-critical use cases, they’re learning that these models (often appropriately called “ foundation models ”) are truly foundations, rather than the entire house. is currently the state-of-the-art LLM. Handcrafted.

LLM

LLM Prompt Engineer Prompt Engineering Artificial Intelligence

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

AWS Machine Learning Blog

SEPTEMBER 1, 2023

Furthermore, we deep dive on the most common generative AI use case of text-to-text applications and LLM operations (LLMOps), a subset of FMOps. Data science team – Data scientists need to focus on creating the best model based on predefined key performance indicators (KPIs) working in notebooks.

Generative AI

Generative AI Prompt Engineer Prompt Engineering AI

Deploy large language models for a healthtech use case on Amazon SageMaker

AWS Machine Learning Blog

FEBRUARY 6, 2024

In this solution, we fine-tune a variety of models on Hugging Face that were pre-trained on medical data and use the BioBERT model, which was pre-trained on the Pubmed dataset and performs the best out of those tried. We implemented the solution using the AWS Cloud Development Kit (AWS CDK). Another column for the label class.

Large Language Models

Large Language Models BERT NLP Data Scientist

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

AWS Machine Learning Blog

AUGUST 14, 2023

With SageMaker JumpStart, ML practitioners can choose from a growing list of best performing and publicly available foundation models (FMs) such as BLOOM , Llama 2 , Falcon-40B , Stable Diffusion , OpenLLaMA , Flan-T5 / UL2 , or FMs from Cohere and LightOn. You can also access the foundation models thru Amazon SageMaker Studio.

NLP

NLP Robotics Prompt Engineer Prompt Engineering

Exploring summarization options for Healthcare with Amazon SageMaker

AWS Machine Learning Blog

AUGUST 1, 2023

Fine-tuning is the process by which a pre-trained model is given another more domain-specific dataset in order to enhance its performance on a specific task. Jurassic-2 Grande Instruct is a large language model (LLM) by AI21 Labs, optimized for natural language instructions and applicable to various language tasks.

ML

ML Large Language Models NLP Natural Language Processing

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

AWS Machine Learning Blog

JULY 24, 2023

When deploying Deep Learning models at scale, it is crucial to effectively utilize the underlying hardware to maximize performance and cost benefits. Our objective is to achieve highest performance at lowest cost through maximum utilization of the hardware. This allows us to handle more inference requests with fewer accelerators.

BERT

BERT Deep Learning Python Machine Learning

The Full Story of Large Language Models and RLHF

AssemblyAI

MAY 3, 2023

A New Era of Language Intelligence At its essence, ChatGPT belongs to a class of AI systems called Large Language Models , which can perform an outstanding variety of cognitive tasks involving natural language. As it turns out, the effectiveness of LMs in performing various tasks is largely influenced by the size of their architectures.

Large Language Models

Large Language Models Neural Network LLM Chatbots

Optimize generative AI workloads for environmental sustainability

AWS Machine Learning Blog

SEPTEMBER 21, 2023

In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering.

Generative AI

Generative AI Prompt Engineer Prompt Engineering Deep Learning

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

AWS Machine Learning Blog

DECEMBER 13, 2023

We use the AWS Neuron software development kit (SDK) to access the AWS Inferentia2 device and benefit from its high performance. We then use a large model inference container powered by Deep Java Library (DJLServing) as our model serving solution. In this post, we use the Large Model Inference Container for Neuron.

Auto-complete

Auto-complete Machine Learning Deep Learning Python

Information extraction with LLMs using Amazon SageMaker JumpStart

AWS Machine Learning Blog

MAY 7, 2024

What makes LLMs so transformative, however, is their ability to achieve state-of-the-art results on these common tasks with minimal data and simple prompting, and their ability to multitask. This post walks through examples of building information extraction use cases by combining LLMs with prompt engineering and frameworks such as LangChain.

Prompt Engineer

Prompt Engineer Prompt Engineering Large Language Models LLM

LLMOps: What It Is, Why It Matters, and How to Implement It

The MLOps Blog

MARCH 12, 2024

TL;DR LLMOps involves managing the entire lifecycle of Large Language Models (LLMs), including data and prompt management, model fine-tuning and evaluation, pipeline orchestration, and LLM deployment. Retrieval Augmented Generation (RAG) enables LLMs to extract and synthesize information like an advanced search engine.

Prompt Engineer

Prompt Engineer Prompt Engineering LLM Large Language Models

Gemma is now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

MARCH 13, 2024

It achieves better performance compared to other publicly available models of similar or larger scales across different domains, including question answering, commonsense reasoning, mathematics and science, and coding. You will also find a Deploy button, which takes you to a landing page where you can test inference with an example payload.

Large Language Models

Large Language Models LLM Machine Learning Python

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

AWS Machine Learning Blog

SEPTEMBER 14, 2023

The following risks and limitations are associated with LLM based queries that a RAG approach with Amazon Kendra addresses: Hallucinations and traceability – LLMS are trained on large data sets and generate responses on probabilities. Running LLMs can require substantial computational resources, which may increase operational costs.

Generative AI

Generative AI LLM Large Language Models Software Engineer

This AI newsletter is all you need #97

Towards AI

APRIL 30, 2024

In the popular lmsys LLM arena and leaderboard, LLama 70GB scores second to only the latest GPT-4 Turbo on English text-based prompts. For the smaller 3.8GB Phi-3 model from Microsoft, feedback has been mixed with more skepticism on its real-world performance relative to benchmarks.

AI

AI AI LLM Prompt Engineer

A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction

AWS Machine Learning Blog

SEPTEMBER 27, 2023

The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. These requests range from simple retrieval of baseline design values, to review of value engineering proposals, to analysis of reports and compliance checks.

Generative AI

Generative AI LLM AI AI

Your guide to generative AI and ML at AWS re:Invent 2023

AWS Machine Learning Blog

NOVEMBER 22, 2023

Hear best practices for using unstructured (video, image, PDF), semi-structured (Parquet), and table-formatted (Iceberg) data for training, fine-tuning, checkpointing, and prompt engineering. Join this session to learn which FM is best suited for your use case. Reserve your seat now! or “Because you watched.”

ML

ML Generative AI Prompt Engineer Prompt Engineering

Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain

AWS Machine Learning Blog

OCTOBER 24, 2023

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) through easy-to-use APIs. You can use LLMs in one or all phases of IDP depending on the use case and desired outcome. In this architecture, LLMs are used to perform specific tasks within the IDP workflow.

IDP

IDP LLM Prompt Engineer Prompt Engineering

How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks

AWS Machine Learning Blog

AUGUST 16, 2023

In this post, we discuss how Thomson Reuters Labs created Open Arena, Thomson Reuters’s enterprise-wide large language model (LLM) playground that was developed in collaboration with AWS. These comprehensive tools were instrumental in ensuring the fast and seamless deployment of our LLMs. Can these models handle long documents?

Large Language Models

Large Language Models Machine Learning Generative AI LLM

Model management for LoRA fine-tuned models using Llama2 and Amazon SageMaker

AWS Machine Learning Blog

NOVEMBER 14, 2023

In this post, we walk through best practices for managing LoRA fine-tuned models on Amazon SageMaker to address this emerging question. Working with FMs on SageMaker Model Registry In this post, we walk through an end-to-end example of fine-tuning the Llama2 large language model (LLM) using the QLoRA method.

LLM

LLM ML Natural Language Processing Machine Learning

Llama Guard is now available in Amazon SageMaker JumpStart

AWS Machine Learning Blog

DECEMBER 20, 2023

Llama Guard provides input and output safeguards in large language model (LLM) deployment. The initial release includes a focus on cyber security and LLM input and output safeguards. Llama Guard model Llama Guard is a new model from Meta that provides input and output guardrails for LLM deployments.

Machine Learning

Machine Learning Algorithm LLM ML

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

AWS Machine Learning Blog

APRIL 29, 2024

In 2024, however, organizations are using large language models (LLMs), which require relatively little focus on NLP, shifting research and development from modeling to the infrastructure needed to support LLM workflows. This often means the method of using a third-party LLM API won’t do for security, control, and scale reasons.

ML

ML Python Data Scientist Deep Learning

Advanced RAG patterns on Amazon SageMaker

AWS Machine Learning Blog

MARCH 28, 2024

With the advancements being made with LLMs like the Mixtral-8x7B Instruct , derivative of architectures such as the mixture of experts (MoE) , customers are continuously looking for ways to improve the performance and accuracy of generative AI applications while allowing them to effectively use a wider range of closed and open source models.

LLM

LLM Auto-complete Auto-classification Generative AI

Improve performance of Falcon models with Amazon SageMaker

AWS Machine Learning Blog

OCTOBER 11, 2023

Despite the abundance of options for serving LLMs, this is a hard question to answer due to the size of the models, varying model architectures, performance requirements of applications, and more. The LMI container has a powerful serving stack called DJL serving that is agnostic to the underlying LLM.

Auto-complete

Auto-complete LLM Deep Learning Machine Learning

Create a Generative AI Gateway to allow secure and compliant consumption of foundation models

AWS Machine Learning Blog

SEPTEMBER 28, 2023

For example: The state-of-the-art (SOTA) of models, architectures, and best practices are constantly changing. This means companies need loose coupling between app clients (model consumers) and model inference endpoints, which ensures easy switch among large language model (LLM), vision, or multi-modal endpoints if needed.

Generative AI

Generative AI Metadata AI AI

Optimize deployment cost of Amazon SageMaker JumpStart foundation models with Amazon SageMaker asynchronous endpoints

AWS Machine Learning Blog

SEPTEMBER 5, 2023

as the engines that power the generative AI innovation. For example, the TII Falcon-40B Instruct model requires at least an ml.g5.12xlarge instance to be loaded into memory successfully, but performs best with bigger instances. To learn more about the different deployment options, refer to Deploy models for Inference.

Auto-complete

Auto-complete Python Computer Vision Large Language Models

MLOps Landscape in 2023: Top Tools and Platforms

The MLOps Blog

JUNE 27, 2023

Alignment to other tools in the organization’s tech stack Consider how well the MLOps tool integrates with your existing tools and workflows, such as data sources, data engineering platforms, code repositories, CI/CD pipelines, monitoring systems, etc. For example, neptune.ai and Pandas or Apache Spark DataFrames.

Machine Learning

Machine Learning Metadata Data Quality Data Scientist

Best prompting practices for using the Llama 2 Chat LLM through Amazon SageMaker JumpStart

AWS Machine Learning Blog

NOVEMBER 15, 2023

Llama 2 demonstrates the potential of large language models (LLMs) through its refined abilities and precisely tuned performance. In this post, we explore best practices for prompting the Llama 2 Chat LLM. We highlight key prompt design approaches and methodologies by providing practical examples.

LLM

LLM Large Language Models Chatbots Generative AI

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

AWS Machine Learning Blog

APRIL 16, 2024

They then use SQL to explore, analyze, visualize, and integrate data from various sources before using it in their ML training and inference. This new feature enables you to perform various functions. For security best practices, it’s recommended to use Secrets Manager to securely store sensitive information such as passwords.

Data Scientist

Data Scientist Generative AI ML Machine Learning

Large Language Models: A Complete Guide

Heartbeat

MAY 29, 2023

LLMs like GPT-3 and T5 have already shown promising results in various NLP tasks such as language translation, question-answering, and summarization. However, LLMs are complex, and training and improving them require specific skills and knowledge. LLMs rely on vast amounts of text data to learn patterns and generate coherent text.

Large Language Models

Large Language Models LLM Machine Learning Natural Language Processing

Intelligent video and audio Q&A with multilingual support using LLMs on Amazon SageMaker

AWS Machine Learning Blog

AUGUST 15, 2023

Whisper is a multitasking speech recognition model that can perform multilingual speech recognition, speech translation, and language identification. To handle large audio data, we adopt transformers.pipeline to run inference with Whisper. You can refer to the following GitHub example when choosing this option.

Chatbots

Chatbots Metadata LLM Generative AI

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

AWS Machine Learning Blog

MAY 1, 2024

Llama2 by Meta is an example of an LLM offered by AWS. In this post, we explore how you can use the Neuron distributed training library to fine-tune, continuously pre-train, and reduce the cost of training LLMs such as Llama 2 with AWS Trainium instances on Amazon SageMaker.

Auto-complete

Auto-complete ML Deep Learning Generative AI

Choosing the Right Prompt for Language Models: A Key to Task-Specific Performance

Heartbeat

DECEMBER 20, 2023

This level of interaction is made possible through prompt engineering, a fundamental aspect of fine-tuning language models. By carefully choosing prompts, we can shape their behavior and enhance their performance in specific tasks. In contrast, general-purpose prompts are versatile and can be applied across various tasks and domains.

Prompt Engineer

Prompt Engineer Prompt Engineering Deep Learning Natural Language Processing

Achieve high performance with lowest cost for generative AI inference using AWS Inferentia2 and AWS Trainium on Amazon SageMaker

AWS Machine Learning Blog

MAY 4, 2023

However, their increasing complexity also comes with high costs for inference and a growing need for powerful compute resources. The high cost of inference for generative AI models can be a barrier to entry for businesses and researchers with limited resources, necessitating the need for more efficient and cost-effective solutions.

Generative AI

Generative AI Deep Learning Machine Learning Python

Automate Amazon SageMaker Pipelines DAG creation

AWS Machine Learning Blog

FEBRUARY 29, 2024

The framework code and examples presented here only cover model training pipelines, but can be readily extended to batch inference pipelines as well. Reproducibility – With a predefined configuration file, data scientists and ML engineers can reproduce the entire workflow, achieving consistent results across multiple runs and environments.

Automation

Automation Python Machine Learning ML

The Shift from Models to Compound AI Systems

BAIR

FEBRUARY 17, 2024

Please provide this image (and any other images and GIFs) in the blog to the BAIR Blog editors directly. The `static/blog` directory is a location on the blog server which permanently stores the images/GIFs in BAIR Blog posts. The text directly below gets tweets to work. Please adjust according to your post.

LLM

LLM Neural Network AI AI

LLM Inference Performance Engineering: Best Practices

Establishing an AI/ML center of excellence

Webinars

Trending Sources

? Guest Post: Evaluating LLM Applications*

Webinars

Personalize your generative AI applications with Amazon SageMaker Feature Store

A Guide to LLMOps: Large Language Model Operations

Pinterest's Text to SQL system through LLMs!

Foundational data protection for enterprise LLM acceleration with Protopia AI

What Is Retrieval-Augmented Generation?

Beyond prompting: getting production quality LLM performance with Snorkel Flow

Beyond prompting: getting production quality LLM performance with Snorkel Flow

FMOps/LLMOps: Operationalize generative AI and differences with MLOps

Deploy large language models for a healthtech use case on Amazon SageMaker

Zero-shot and few-shot prompting for the BloomZ 176B foundation model with the simplified Amazon SageMaker JumpStart SDK

Exploring summarization options for Healthcare with Amazon SageMaker

Optimize AWS Inferentia utilization with FastAPI and PyTorch models on Amazon EC2 Inf1 & Inf2 instances

The Full Story of Large Language Models and RLHF

Optimize generative AI workloads for environmental sustainability

Fine-tune Llama 2 using QLoRA and Deploy it on Amazon SageMaker with AWS Inferentia2

Information extraction with LLMs using Amazon SageMaker JumpStart

LLMOps: What It Is, Why It Matters, and How to Implement It

Gemma is now available in Amazon SageMaker JumpStart

Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents

This AI newsletter is all you need #97

A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction

Your guide to generative AI and ML at AWS re:Invent 2023

Intelligent document processing with Amazon Textract, Amazon Bedrock, and LangChain

How Thomson Reuters developed Open Arena, an enterprise-grade large language model playground, in under 6 weeks

Model management for LoRA fine-tuned models using Llama2 and Amazon SageMaker

Llama Guard is now available in Amazon SageMaker JumpStart

Develop and train large models cost-efficiently with Metaflow and AWS Trainium

Advanced RAG patterns on Amazon SageMaker

Improve performance of Falcon models with Amazon SageMaker

Create a Generative AI Gateway to allow secure and compliant consumption of foundation models

Optimize deployment cost of Amazon SageMaker JumpStart foundation models with Amazon SageMaker asynchronous endpoints

MLOps Landscape in 2023: Top Tools and Platforms

Best prompting practices for using the Llama 2 Chat LLM through Amazon SageMaker JumpStart

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Large Language Models: A Complete Guide

Intelligent video and audio Q&A with multilingual support using LLMs on Amazon SageMaker

Simple guide to training Llama 2 with AWS Trainium on Amazon SageMaker

Choosing the Right Prompt for Language Models: A Key to Task-Specific Performance

Achieve high performance with lowest cost for generative AI inference using AWS Inferentia2 and AWS Trainium on Amazon SageMaker

Automate Amazon SageMaker Pipelines DAG creation

The Shift from Models to Compound AI Systems

Stay Connected