Emily Webber of AWS on Pretraining Large Language Models

4 min readAug 4, 2023

As newer fields emerge within data science and the research is still hard to grasp, sometimes it’s best to talk to the experts and pioneers of the field. Recently, we spoke with Emily Webber, Principal Machine Learning Specialist Solutions Architect at AWS. She’s the author of “Pretrain Vision and Large Language Models in Python: End-to-end techniques for building and deploying foundation models on AWS.” In the interview, we discussed pretraining vision and large language models (LLMs) in Python. You can listen to the full Lightning Interview here, and read the transcript for two interesting questions with Emily Webber below.

Q: LLMs didn’t pick up in popularity until late 2022. What gave you the idea to start writing about this book before the rise of LLMs?

Emily Webber: To me, the exciting moment was the scaling laws, more than anything else. Obviously, we care about interacting with LLMs and seeing really high-performance language coming out of models, but I was really moved by the scaling laws more than anything.

In machine learning, so much of our work is experimental. We try one thing, we get accuracy, we evaluate the results, and then we try and try again. It’s incredibly iterative and experimental, but there’s also this degree of uncertainty where there’s just not really a good way of knowing how well your model will perform after a certain period.

And so when I saw the scaling laws by Jared Kaplan back in early 2020, to me, that was actually the shift, because essentially, the scaling laws give us ways to estimate the

performance of your model. It’s literally an equation where you can say “Oh here’s how many accelerators I have, here’s how large my data set is, and here’s my model, so what’s my accuracy going to be?” And then when you have that equation, you can then much more easily experiment and quantify how good your model is going to be.

Basically, I saw that, and that pushed me to reevaluate my machine learning journey

and the way I approached ML. Again and again, that came up in a variety of ways as models got larger and optimization stocks got better. And then I spent many years working with customers. At AWS I’m working with customers who were doing their own large-scale modeling projects way before it was cool, but because they saw benefits right, they could also see this trend, and so that led me to believe that this really is the future.

Q: What are some other advancements in AI that are worth paying attention to?

Emily Webber: In some ways, if you look at some of the most interesting and the most state-of-the-art performance in AI for more than the last decade, honestly a lot of it has to do with scale. A lot of it comes down to building a really great distributed system using techniques to optimize your data sets at really large scales and optimize your neural networks and your models at really large scales.

Richard Sutton, who’s obviously considered the father of reinforcement learning, wrote this famous blog post in 2019 that he called The Bitter Lesson. And so Richard Sutton’s bitter lesson, which I discuss in great detail in the book along with the scaling laws, essentially Sutton throws up his hands and he’s like “Oh, what we learn after the last 70 years of AI research is that what’s ultimately most impactful, is what uses the most computes.”

To me, more than anything, it’s an efficiency game. Foundation models are powerful because they’re just more efficient; instead of going after hundreds and thousands of these tiny little trees, or logistic regressions, or actually boosts or RNNs or CNNs, rather than N number of models, let’s just create one massive model that does all the things, that has all the use cases, that has all of the high accuracy, and let’s front load it, let’s go big on creating this model, but then we can use it for everything.

With SageMaker and working with customers at Amazon, it’s tough enough to take a machine learning project all the way from ideation, to scaling, to operationalizing, and to product life cycle and management. Foundation models are powerful because they’re efficient and it’s just a more efficient and more streamlined use of resources. Once I saw that come to life essentially through my work at AWS, I became convinced that this was just unambiguously the direction forward.

How to learn more about large language models

If you haven’t already gotten started with large language models or you want to further your existing expertise, then ODSC West is the conference for you. This October 30th to November 2nd, you can check out dozens of sessions related to NLP, large language models, and more. Here are a few confirmed sessions with plenty more to come:

Personalizing LLMs with a Feature Store: Jim Dowling | CEO | Hopsworks
Evaluation Techniques for Large Language Models: Rajiv Shah, PhD | Machine Learning Engineer | Hugging Face
Understanding the Landscape of Large Models: Lukas Biewald | CEO and Co-founder | Weights & Biases

Don’t delay getting your ticket! 60% off ends soon! Register here.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

Emily Webber of AWS on Pretraining Large Language Models

Written by ODSC - Open Data Science