Inverse Graphics: How the Brain Converts 2D Images to 3D

Summary: Researchers have identified how primate brains convert flat, two-dimensional visual inputs into detailed three-dimensional mental representations. This process, described as “inverse graphics,” essentially reverses the steps of computer graphics: starting from a 2D image, moving through an intermediate, view-dependent stage, and producing a robust, view-tolerant 3D model of the object.

Using a computational model called the Body Inference Network (BIN), the team mapped stages of this inverse-graphics process and found close parallels with activity in primate brain areas specialized for body-shape recognition. The work clarifies neural mechanisms behind depth perception and object understanding and may help guide improvements in machine vision and treatments for visual disorders.

Key Facts:

The inferotemporal cortex in primates builds internal 3D models from 2D images through an “inverse graphics” process.
Researchers trained a neural network (the Body Inference Network) to reconstruct 3D human and monkey bodies from labeled images, then compared its internal stages to macaque brain activity.
The model’s processing stages corresponded closely to responses in macaque body-selective regions, suggesting a shared algorithmic strategy that could inform machine vision design and studies of visual perception disorders.

Source: Yale

Yale researchers report a computational explanation for how primate vision builds three-dimensional object representations from two-dimensional inputs.

This shows a brain. — Researchers show that a region of the temporal lobe—the inferotemporal cortex, crucial for visual processing—transforms visual images into 3D mental models of objects. Credit: Neuroscience News

“This gives us evidence that the goal of vision is to establish a 3D understanding of an object,” said study senior author Ilker Yildirim, assistant professor of psychology at Yale. “When you open your eyes, you perceive 3D scenes. The visual system constructs three-dimensional understanding from a stripped-down two-dimensional view.”

The investigators name this algorithmic strategy “inverse graphics” because it mirrors computer graphics but in reverse: instead of taking a 3D model and producing a 2D image, the brain begins with a 2D image, builds an intermediate representation often called “2.5D,” and then forms a more view-tolerant 3D object model.

Using BIN, a neural-network model originally designed to produce 2D renderings of bodies from parameters like shape, posture, and orientation, the team retrained the model to invert that mapping: it learned to infer the underlying 3D body structure directly from 2D images labeled with 3D data. In other words, BIN reconstructed 3D properties from flat images, effectively performing inverse graphics.

The researchers then compared BIN’s internal processing stages with neural recordings from macaques shown images of macaque bodies. They found that BIN’s stages aligned with activity in two macaque inferotemporal regions known to process body shape (referred to in the study as MSB and ASB). BIN matched the brain data more closely than several standard artificial intelligence vision models, both supervised and unsupervised.

“Our model explained the visual processing in the brain much more closely than other AI models typically do,” Yildirim said. While the primary interest of the team lies in understanding the neuroscience and cognitive science of vision, they also hope this work will inspire improved machine vision systems and guide interventions for visual disorders.

The study has implications for multiple fields: visual neuroscience, computational models of perception, and applied machine vision. By proposing inverse graphics as a computational objective implemented across multiple areas of the inferotemporal cortex, the findings point toward how primates achieve robust recognition and 3D reconstruction in everyday scenes—capabilities that remain challenging for artificial systems.

Other authors include first author Hakan Yilmaz and Aalap Shah, both Ph.D. candidates at Yale, along with collaborators from Princeton University and KU Leuven.

About this visual neuroscience research news

Author: Bess Connolly
Source: Yale
Contact: Bess Connolly – Yale
Image: Image credited to Neuroscience News

Original Research: Closed access. “Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics” by Ilker Yildirim et al., Proceedings of the National Academy of Sciences (PNAS).

Abstract

Multiarea processing in body patches of the primate inferotemporal cortex implements inverse graphics

Stimulus-driven, multiarea processing in the inferotemporal (IT) cortex is widely considered essential for transforming sensory input into useful internal representations. But what format do those representations take, and how are they computed across IT’s network nodes?

Motivated by classical theories of vision, the authors posit that inferring 3D object structure may be a core computational objective of IT. They propose that IT implements an algorithm analogous to graphics-based generative models—models that explain how 3D scenes produce 2D images—but running the process in reverse.

Using body perception as a test case, the study shows that inverse graphics naturally emerges in inference networks trained to map images onto 3D objects. Crucially, this reverse-graphics correspondence is not only present in trained networks but also reflected across stages of the macaque IT body-processing network. The inference networks reproduce the feedforward progression observed in IT and do so more faithfully than several prevailing vision models, supervised and unsupervised alike, none of which align with the reverse-graphics account.

These results support the view that inverse graphics operates as a multiarea neural algorithm within the primate IT cortex and suggest pathways for incorporating primate-like 3D inference into machine vision systems.