Visual Compositionality as a Case Study for Building AI Models to Produce Human Behavior

3 min readFeb 8, 2024

Imagine teaching a child to recognize a creature they’ve never seen before — a dragon, perhaps, with wings borrowed from a bat and the tail of a lizard. The child will quickly grasp the concept, blending familiar elements into a coherent whole. This aspect of cognition is called compositionality, and humans do it across various domains. As part of a symbiotic research agenda emblematic of CDS wherein human and machine intelligence is explored in tandem, a recent paper published in Cognition by CDS PhD student Yanli Zhou investigates how both humans and AI models achieve visual compositionality.

The paper, created with recent NYU alumnus Reuben Feinman and CDS Assistant Professor of Psychology and Data Science Brenden Lake, is titled “Compositional diversity in visual concept learning.” In a recent interview, Zhou explained that compositionality was a cornerstone of human learning, allowing us to assemble new concepts from known elements. “Humans are very compositional learners,” Zhou said. Her team’s research meticulously examines how humans and AI models process and generate novel visual constructs, offering a quantitative analysis of human inductive biases in these tasks.

Humans easily see how the three objects on the left combine to create the object on the right. Historically, AI models have struggled, though newer technology is becoming competent.

The study employs AI models as tools to quantitatively describe and replicate human cognitive processes. The team’s use of a Bayesian program induction model and a generative neuro-symbolic (GNS) model underlines this approach. The Bayesian model, which was meticulously designed to simulate the generation and classification of complex visual objects known as “alien figures,” leverages parameters estimated from human data. This methodology aligns closely with human cognitive patterns, showcasing the model’s capacity to mirror human thought processes in visual compositionality.

The team’s findings illuminate the nuanced interplay between human cognition and computational models. While initial observations suggested humans exhibited more flexible behavior in tasks involving novel visual concepts, further analysis revealed that the GNS model could account for additional behavioral patterns not captured by the Bayesian model alone. Zhou notes, “The neuro-symbolic approach […] showed improvement in terms of capturing these additional behavioral nuances,” emphasizing the synergy between human data and behaviorally guided synthetic data in refining the AI’s performance.

While the purpose of this paper is basic science, the implications of this research touch on potential advancements in AI learning models. By integrating human inductive biases and direct behavioral data into their modeling, Zhou and her team have charted a course for developing AI systems that can learn in a more human-like, efficient manner. This could lead to AI that requires less data and training to understand and create complex visual concepts, echoing the compositional learning prowess of humans.

Reflecting on the collaborative spirit of the project, Zhou shares, “It was a happy collaboration.” This sentiment encapsulates the essence of the research effort at CDS, where interdisciplinary collaboration fosters groundbreaking insights into the workings of both human and artificial intelligence.

Zhou’s paper, co-authored with Feinman and Lake, is a testament to the power of computational models in enhancing our understanding of human cognition. By accurately modeling human behavior in visual concept learning, their work paves the way for future AI systems that more closely mimic the intricacy and efficiency of human thought processes, promising a new era of AI development grounded in a profound understanding of human intelligence.

By Stephen Thomas

Visual Compositionality as a Case Study for Building AI Models to Produce Human Behavior

Written by NYU Center for Data Science