Active learning is the future of generative AI: Here’s how to leverage it

5:00 AM PST • February 28, 2023

Digital generated image of silhouette of male head with multicoloured gears inside on white background. — **Image Credits:** Andriy Onufriyenko (opens in a new window) / Getty Images

Eric Landau

Contributor

Before Eric Landau co-founded Encord, he spent nearly a decade at DRW, where he was lead quantitative researcher on a global equity delta one desk and put thousands of models into production. He holds an S.M. in Applied Physics from Harvard University, an M.S. in Electrical Engineering and a B.S. in Physics from Stanford University.

What is active learning?

Active learning makes training a supervised model an iterative process. The model trains on an initial subset of labeled data from a large dataset. Then, it tries to make predictions on the rest of the unlabeled data based on what it has learned. ML engineers evaluate how certain the model is in its predictions and, by using a variety of acquisition functions, can quantify the performance benefit added by annotating one of the unlabeled samples.

By expressing uncertainty in its predictions, the model is deciding for itself what additional data will be most useful for its training. In doing so, it asks annotators to provide more examples of only that specific type of data so that it can train more intensively on that subset during its next round of training. Think of it like quizzing a student to figure out where their knowledge gap is. Once you know what problems they are missing, you can provide them with textbooks, presentations and other materials so that they can target their learning to better understand that particular aspect of the subject.

With active learning, training a model moves from being a linear process to a circular one with a strong feedback loop.

Why sophisticated companies should be ready to leverage active learning

Active learning is fundamental for closing the prototype-production gap and increasing model reliability.

It’s a common mistake to think of AI systems as a static piece of software, but these systems must be constantly learning and evolving. If not, they make the same mistakes repeatedly, or, when they’re released in the wild, they encounter new scenarios, make new mistakes and don’t have an opportunity to learn from them. They need to have the ability to learn over time, making corrections based on previous mistakes as a human would. Otherwise, models will have issues of reliability and micro robustness, and AI systems will not work in perpetuity.

Most companies using deep learning to solve real-world problems will need to incorporate active learning into their stack. If they don’t, they’ll lag their competitors. Their models won’t respond to or learn from the shifting landscape of possible scenarios.

However, incorporating active learning is easier said than done. For years, a lack of tooling and infrastructure made it difficult to facilitate active learning. Out of necessity, companies that began taking steps to improve their models’ performance with respect to the data have had to take a Frankenstein approach, cobbling together external tools and building tools in-house.

As a result, they don’t have an integrated, comprehensive system for model training. Instead, they have modular block-like processes that can’t talk to each other. They need a flexible system made up of decomposable components in which the processes communicate with one another as they go along the pipeline and create an iterative feedback loop.

The best ways to leverage active learning

Some companies, however, have implemented active learning to great effect and we can learn from them. For companies that have yet to put active learning in place also can do a few things to prepare for and make the most out of this methodology.

The gold standard of active learning is stacks that are fully iterative pipelines. Every component is run with respect to optimizing the performance of the downstream model: data selection, annotation, review, training and validation are done with an integrated logic rather than as disconnected units.

Counterintuitively, the best systems also have the most human interaction. They fully embrace the human-in-the-loop nature of iterative model improvement by opening up entry points for human supervision within each subprocess while also maintaining optionality for completely automated flows when things are working.

The most sophisticated companies therefore have stacks that are iterative, granular, inspectable, automatable and coherent.

Companies seeking to build neural networks that take advantage of active learning should build their stacks with the future in mind. These ML teams should project the types of problems they’ll have and understand the issues they’re likely to encounter when attempting to run their models in the wild. What edge cases will they encounter? In what unreasonable way is the model likely to behave?

If ML teams don’t think through these scenarios, models will inevitably make mistakes in a way that a human never would. Those errors can be quite embarrassing for companies and they should have been highly penalized because they’re so misaligned with human behavior and intuition.

Fortunately, for companies just entering the game, there’s now plenty of know-how and knowledge to be gained from companies that have broken through the production barrier. With more and more companies putting models into production, ML teams can more easily think about forward problems by studying their predecessors, as they will likely face similar problems when moving from proof of concept to production.

Another way to troubleshoot problems before they occur is to think about what a working model looks like beyond its performance metric scores. By thinking about how that model should operate in the wild and the sorts of data and scenarios it will encounter, ML teams will better understand the kinds of issues that might arise once it’s in the production stage.

Lastly, companies should make themselves aware of and understand the tools available to support an active learning and training data pipeline. Five or six years ago, companies had to build infrastructure internally and combine these in-house tools with imperfect external ones. Nowadays, every company should think before they build something internally. New tooling is being developed rapidly, and it’s likely that there’s already a tool that will save time and money while requiring no internal resourcing to maintain it.

Active learning is still in its very early days. However, every month, more companies are expressing an interest in taking advantage of this methodology. The most sophisticated ones will put the infrastructure, tooling and planning in place to harness its power.

More TechCrunch

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

Mary Ann Azevedo

7 hours ago

The prospects for troubled banking-as-a-service startup Synapse have gone from bad to worse this week after a United States Trustee filed an emergency motion on Wednesday. The trustee is asking…

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

Space

Seraphim’s latest space accelerator welcomes nine companies

Aria Alamalhodaei

7 hours ago

U.K.-based Seraphim Space is spinning up its 13th accelerator program, with nine participating companies working on a range of tech from propulsion to in-space manufacturing and space situational awareness. The…

Seraphim’s latest space accelerator welcomes nine companies

OpenAI inks deal to train AI on Reddit data

Kyle Wiggers

8 hours ago

OpenAI has reached a deal with Reddit to use the social news site’s data for training AI models. In a blog post on OpenAI’s press relations site, the company said…

OpenAI inks deal to train AI on Reddit data

Social

X pushes more users to Communities

Sarah Perez

10 hours ago

X users will now be able to discover posts from new Communities that are trending directly from an Explore tab within the section.

Social

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Amanda Silberling

11 hours ago

For Mark Zuckerberg’s 40th birthday, his wife got him a photoshoot. Zuckerberg gives the camera a sly smile as he sits amid a carefully crafted re-creation of his childhood bedroom.…

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

Paul Sawers

12 hours ago

Strava announced a slew of features, including AI to weed out leaderboard cheats, a new ‘family’ subscription plan, dark mode and more.

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

Robotics

Astronauts fall over. Robotic limbs can help them back up.

Brian Heater

12 hours ago

We all fall down sometimes. Astronauts are no exception. You need to be in peak physical condition for space travel, but bulky space suits and lower gravity levels can be…

Astronauts fall over. Robotic limbs can help them back up.

Enterprise

Microsoft’s custom Cobalt chips will come to Azure next week

Frederic Lardinois

12 hours ago

Microsoft will launch its custom Cobalt 100 chips to customers as a public preview at its Build conference next week, TechCrunch has learned. In an analyst briefing ahead of Build,…

Microsoft’s custom Cobalt chips will come to Azure next week

Transportation

Tesla keeps cutting jobs and the feds probe Waymo

Kirsten Korosec

13 hours ago

What a wild week for transportation news! It was a smorgasbord of news that seemed to touch every sector and theme in transportation.

Tesla keeps cutting jobs and the feds probe Waymo

Sony Music warns tech companies over ‘unauthorized’ use of its content to train AI

Aisha Malik

13 hours ago

Sony Music Group has sent letters to more than 700 tech companies and music streaming services to warn them not to use its music to train AI without explicit permission.…

Sony Music warns tech companies over ‘unauthorized’ use of its content to train AI

GrubMarket buys Butter to give its food distribution tech an AI boost

Rita Liao

13 hours ago

Winston Chi, Butter’s founder and CEO, told TechCrunch that “most parties, including our investors and us, are making money” from the exit.

GrubMarket buys Butter to give its food distribution tech an AI boost

Fintech

Bolt founder Ryan Breslow wants to settle an investor lawsuit by returning $37 million worth of shares

Christine Hall

13 hours ago

The investor lawsuit is related to Bolt securing a $30 million personal loan to Ryan Breslow, which was later defaulted on.

Bolt founder Ryan Breslow wants to settle an investor lawsuit by returning $37 million worth of shares

Enterprise

With the end of Workplace, it’s fair to wonder if Meta was ever serious about the enterprise

Ron Miller

13 hours ago

Meta, the parent company of Facebook, launched an enterprise version of the prominent social network in 2015. It always seemed like a stretch for a company built on a consumer…

With the end of Workplace, it’s fair to wonder if Meta was ever serious about the enterprise

Social

Meta Threads is testing pinned columns on the web, similar to the old TweetDeck

Ivan Mehta

13 hours ago

X, formerly Twitter, turned TweetDeck into X Pro and pushed it behind a paywall. But there is a new column-based social media tool in town, and it’s from Instagram Threads.…

Meta Threads is testing pinned columns on the web, similar to the old TweetDeck

Apps

Google expands hands-free and eyes-free interfaces on Android

Devin Coldewey

14 hours ago

As part of 2024’s Accessibility Awareness Day, Google is showing off some updates to Android that should be useful to folks with mobility or vision impairments. Project Gameface allows gamers…

Google expands hands-free and eyes-free interfaces on Android

Security

Hacker claims theft of India’s Samco account data

Jagmeet Singh

14 hours ago

A hacker listed the data allegedly breached from Samco on a known cybercrime forum.

Hacker claims theft of India’s Samco account data

Security

Ireland privacy watchdog confirms Dell data breach investigation

Lorenzo Franceschi-Bicchierai

14 hours ago

A top European privacy watchdog is investigating following the recent breaches of Dell customers’ personal information, TechCrunch has learned. Ireland’s Data Protection Commission (DPC) deputy commissioner Graham Doyle confirmed to…

Ireland privacy watchdog confirms Dell data breach investigation

Ampere teams up with Qualcomm to launch an Arm-based AI server

Frederic Lardinois

14 hours ago

Ampere and Qualcomm aren’t the most obvious of partners. Both, after all, offer Arm-based chips for running data center servers (though Qualcomm’s largest market remains mobile). But as the two…

Ampere teams up with Qualcomm to launch an Arm-based AI server

Google I/O was an AI evolution, not a revolution

Sarah Perez

14 hours ago

At Google’s I/O developer conference, the company made its case to developers — and to some extent, consumers — why its bets on AI are ahead of rivals. At the…

Google I/O was an AI evolution, not a revolution

Meet the Magnificent Six: A tour of the stages at Disrupt 2024

TechCrunch Events

14 hours ago

TechCrunch Disrupt has always been the ultimate convergence point for all things startup and tech. In the bustling world of innovation, it serves as the “big top” tent, where entrepreneurs,…

Meet the Magnificent Six: A tour of the stages at Disrupt 2024

Startups

Khosla Ventures, Pear VC triple down on Honey Homes, a smart way to hire a handyman

Mary Ann Azevedo

15 hours ago

There’s apparently a lot of demand for an on-demand handyperson. Khosla Ventures and Pear VC have just tripled down on their investment in Honey Homes, which offers up a dedicated…

Khosla Ventures, Pear VC triple down on Honey Homes, a smart way to hire a handyman

Apps

TikTok tests 60-minute video uploads as it continues to take on YouTube

Aisha Malik

15 hours ago

TikTok is testing the ability for users to upload 60-minute videos, the company confirmed to TechCrunch on Thursday. The feature is available to a limited group of users in select…

TikTok tests 60-minute video uploads as it continues to take on YouTube

Privacy

Flock Safety’s solar-powered cameras could make surveillance more widespread

Haje Jan Kamps

15 hours ago

Flock Safety is a multibillion-dollar startup that’s got eyes everywhere. As of Wednesday, with the company’s new Solar Condor cameras, those eyes are solar-powered and use wireless 5G networks to…

Flock Safety’s solar-powered cameras could make surveillance more widespread

Startups

Agora raises $34M Series B to keep building the Carta for real estate

Marina Temkin

16 hours ago

Since he was very young, Bar Mor knew that he would inevitably do something with real estate. His family was involved in all types of real estate projects, from ground-up…

Agora raises $34M Series B to keep building the Carta for real estate

Commerce

Poshmark’s ‘Promoted Closet’ tool lets sellers boost all their listings at once

Lauren Forristal

17 hours ago

Poshmark, the social commerce site that lets people buy and sell new and used items to each other, launched a paid marketing tool on Thursday, giving sellers the ability to…

Poshmark’s ‘Promoted Closet’ tool lets sellers boost all their listings at once

Google adds Gemini to its Education suite

Ivan Mehta

17 hours ago

Google is launching a Gemini add-on for educational institutes through Google Workspace.

Google adds Gemini to its Education suite

YC-backed Recall.ai gets $10M Series A to help companies use virtual meeting data

Kate Park

17 hours ago

More money for the generative AI boom: Y Combinator-backed developer infrastructure startup Recall.ai announced Thursday it has raised a $10 million Series A funding round, bringing its total raised to over…

YC-backed Recall.ai gets $10M Series A to help companies use virtual meeting data

Enterprise

CoLab’s collaborative tools for engineers line up $21M in new funding

Kyle Wiggers

17 hours ago

Engineers Adam Keating and Jeremy Andrews were tired of using spreadsheets and screenshots to collab with teammates — so they launched a startup, CoLab, to build a better way. The…

CoLab’s collaborative tools for engineers line up $21M in new funding

Apps

Reddit reintroduces its awards system

Ivan Mehta

17 hours ago

Reddit announced on Wednesday that it is reintroducing its awards system after shutting down the program last year. The company said that most of the mechanisms related to awards will…

Enterprise

Sigma is building a suite of collaborative data analytics tools

Kyle Wiggers

17 hours ago

Sigma Computing, a startup building a range of data analytics and business intelligence tools, has raised $200 million in a fresh VC round.

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Tags

A US Trustee wants troubled fintech Synapse to be liquidated via Chapter 7 bankruptcy, cites ‘gross mismanagement’

Seraphim’s latest space accelerator welcomes nine companies

OpenAI inks deal to train AI on Reddit data

X pushes more users to Communities

Mark Zuckerberg’s makeover: Midlife crisis or carefully crafted rebrand?

Strava taps AI to weed out leaderboard cheats, unveils ‘family’ plan, dark mode and more

Astronauts fall over. Robotic limbs can help them back up.

Microsoft’s custom Cobalt chips will come to Azure next week

Tesla keeps cutting jobs and the feds probe Waymo

Sony Music warns tech companies over ‘unauthorized’ use of its content to train AI

GrubMarket buys Butter to give its food distribution tech an AI boost

Bolt founder Ryan Breslow wants to settle an investor lawsuit by returning $37 million worth of shares

With the end of Workplace, it’s fair to wonder if Meta was ever serious about the enterprise

Meta Threads is testing pinned columns on the web, similar to the old TweetDeck

Google expands hands-free and eyes-free interfaces on Android

Hacker claims theft of India’s Samco account data

Ireland privacy watchdog confirms Dell data breach investigation

Ampere teams up with Qualcomm to launch an Arm-based AI server

Google I/O was an AI evolution, not a revolution

Meet the Magnificent Six: A tour of the stages at Disrupt 2024

Khosla Ventures, Pear VC triple down on Honey Homes, a smart way to hire a handyman

TikTok tests 60-minute video uploads as it continues to take on YouTube

Flock Safety’s solar-powered cameras could make surveillance more widespread

Agora raises $34M Series B to keep building the Carta for real estate

Poshmark’s ‘Promoted Closet’ tool lets sellers boost all their listings at once

Google adds Gemini to its Education suite

YC-backed Recall.ai gets $10M Series A to help companies use virtual meeting data

CoLab’s collaborative tools for engineers line up $21M in new funding

Reddit reintroduces its awards system

Sigma is building a suite of collaborative data analytics tools

Active learning is the future of generative AI: Here’s how to leverage it

Eric Landau

More posts from Eric Landau

What is active learning?

Why sophisticated companies should be ready to leverage active learning

The best ways to leverage active learning

More TechCrunch

Get the industry’s biggest tech news

TechCrunch Daily News

Startups Weekly

TechCrunch Fintech

TechCrunch Mobility

Tags