Active situation detection in images

We are developing an architecture, called Situate, that can detect instances of visual situations in images. Situate integrates symbolic knowledge about visual situations, deep neural networks that caption object appearance, and probablistic models that capture spatial and semantic relationships. Situate uses these different knowledge sources are used in a perceptual process that actively detects components of a given situation in a new image.

Description:

This project investigates a novel approach to building computer systems that can detect visual situations in images. While much effort in computer vision has focused on identifying isolated objects in images, what people actually do is recognize coherent situations — collections of objects and their interrelations that, taken together, correspond to a known concept, such as "dog-walking", or "a fight breaking out", or "a blind person crossing the street". Situation recognition by humans may appear on the surface to be effortless, but it relies on a complex dynamic interplay among human abilities to perceive objects, systems of relationships among objects, and analogies with stored knowledge and memories. No computer vision system yet comes close to capturing these human abilities. Enabling computers to flexibly recognize visual situations would create a flood of important applications in fields as diverse as autonomous vehicles, medical diagnosis, interpretation of scientific imagery, enhanced human-computer interaction, and personal information organization.

The approach explored in this project integrates brain-inspired neural networks for lower-level vision, probabilistic models that capture spatial and semantic relationships, and symbolic models of concepts and analogy-making, inspired by Hofstadter and Mitchell's Copycat system. In this integrated architecture, recognizing situations is a dynamic process in which bottom-up (perceptual) and top-down (conceptual) influences affect one another as perception unfolds.

Code: The source code for our preliminary version of Situate is available at https://github.com/quinnmax/situate

Datasets: Please email mm-AT-pdx-DOT-edu to request the datasets used in this project.

People: Max Quinn, Erik Conser, Anthony Rhodes, Kennedy Hahn, Melanie Mitchell

Publications:

Erik Conser, Kennedy Hahn, Chandler M. Watson, and Melanie Mitchell, Revisiting visual grounding. To appear in Proceedings of the Workshop on Shortcomings in Vision and Language, NAACL-2019, ACL.

Max H. Quinn, Erik Conser, Jordan M. Witte, and Melanie Mitchell, Semantic image retrieval via active grounding of visual situations. In Proceedings of the 12th International Conference on Semantic Computing (IEEE), 2018.

Anthony D. Rhodes, Max H. Quinn, and Melanie Mitchell, Fast on-line kernel density estimation for active object localization. In Proceedings of the International Joint Conference on Neural Networks (IJCNN) 454-462, 2017.

Max H. Quinn, Anthony D. Rhodes, and Melanie Mitchell, Active object localization in visual situations, arXiv 1607.00548, 2016.

Funding: National Science Foundation, Robust Intelligence program (PI: Melanie Mitchell)