a1 Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts
The human visual system is remarkably tolerant to degradation in image resolution: human performance in scene categorization remains high no matter whether low-resolution images or multimegapixel images are used. This observation raises the question of how many pixels are required to form a meaningful representation of an image and identify the objects it contains. In this article, we show that very small thumbnail images at the spatial resolution of 32 × 32 color pixels provide enough information to identify the semantic category of real-world scenes. Most strikingly, this low resolution permits observers to report, with 80% accuracy, four to five of the objects that the scene contains, despite the fact that some of these objects are unrecognizable in isolation. The robustness of the information available at very low resolution for describing semantic content of natural images could be an important asset to explain the speed and efficiently at which the human brain comprehends the gist of visual scenes.
(Received July 22 2008)
(Accepted December 05 2008)
c1 Address correspondence and reprint requests to: Antonio Torralba, Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 32-D432, 32 Vassar Street, Cambridge, MA 02139. E-mail: firstname.lastname@example.org