Hypernetworks and Long-Form AI: Jason Phang’s Transformative Research in NLP

NYU Center for Data Science
3 min readDec 1, 2023

The quest to refine AI’s understanding of extensive textual data has recently been advanced due to two recent papers by CDS PhD student Jason Phang, who is the first author of two recent NLP papers that were presented at ICML 2023 and EMNLP 2023.

Phang’s first paper, “HyperTuning: Toward Adapting Large Language Models without Back-propagation,” presented at ICML 2023, is a kind of proof of concept of hypernetworks, a way to fine-tune large language models. This research explores the potential of hyper-networks — models designed to generate other models — to generate task-specific parameters to be embedded into large language models. Phang describes this as a step towards training models that can adapt themselves in a forward pass, without the need for back-propagation. “The goal is efficiency gains and fast model adaptation,” Phang says.

This approach paves a path toward a future where AI can be rapidly tailored to specific contexts or requirements, a stark contrast to the current, resource-intensive model training methods. HyperTuning, as Phang calls it, represents a step toward an ambitious vision for AI’s future, where models can be adapted swiftly and efficiently for various applications, from personalized AI assistants to specialized enterprise tools. This work signals a move towards more agile and accessible AI, breaking down the barriers imposed by the computational demands of traditional model training.

The second paper, presented at EMNLP 2023, titled “Investigating Efficiently Extending Transformers for Long Input Summarization,” to be presented at EMNLP 2023, addresses a crucial challenge in NLP: processing and summarizing long-form content. This research stands out for its meticulous approach to evaluating various training strategies for models dealing with lengthy contexts. Phang’s team focused on six key areas, including efficient attention mechanisms and the optimal training preregime. The culmination of this extensive research is a model designed based on these findings, which surpasses its contemporaries in summarization tasks. Phang elaborates, “By running a series of ablation experiments, we determined which components are truly impactful, leading to PEGASUS-X, which outperforms existing models in summarization for the same number of parameters.”

PEGASUS-X is an extension of PEGASUS, an existing model introduced in 2019.

The significance of Phang’s research lies not only in the development of PEGASUS-X but also in the methodical analysis of design decisions in long context models. “This is about rigorously answering what components or tweaks are needed to build a good long input context model,” Phang says. His work provides a clear roadmap for future developments in NLP, emphasizing the importance of a systematic approach in a field often marked by rapid, incremental advances.

One striking aspect of Phang’s research is its adherence to what he describes as ‘slow science.’ He says, “We are saying which of these [long context model] tweaks matter, and which don’t. This will be extremely useful for the academic community.” This meticulous approach contrasts with the prevailing trend of rapidly churning out models with incremental changes.

On the broader implications of his work, Phang underlines the potential of his findings to inform future models, including those at the forefront of the industry like GPT-4. His research on long input summarization, though scoped to this specific task, offers insights that could be pivotal in enhancing the performance of models across various NLP tasks.

Phang’s work also illustrates the collaborative and interdisciplinary nature of cutting-edge AI research. His work on long input summarization was conducted while a Student Researcher at Google, while the hyper-tuning research took place during an internship at Microsoft.

Jason Phang’s work epitomizes the blend of rigorous scientific inquiry and practical application that is pushing the boundaries of natural language processing at CDS, and laying a robust foundation for future explorations in this ever-evolving field.

By Stephen Thomas

--

--

NYU Center for Data Science

Official account of the Center for Data Science at NYU, home of the Undergraduate, Master’s, and Ph.D. programs in Data Science.