Early Cognitive Vision
There is a semantic gap that separates images (mere arrays of pixels) from a semantic understanding of the world (in terms of objects, trajectories and affordances).
Extraction of Visual Primitives (image from [2])
This split is present in the Computer Vision research, where Early Vision systems focus on signal processing techniques to extract features from image sequences, whereas Cognitive Vision approaches is concerned with high--level, complex tasks, such as visual navigation, object manipulation and spatial planning. The gap between these two levels, and the large amount of ambiguity and noise endemic to images, makes many vision based systems brittle.
The
Early Cognitive Vision paradigm entails that the robustness and versatility of human vision is the product of dense interconnections in the early visual cortex, providing recurrent feedback loops. This is formalized as an intermediary level between Early Vision and Cognitive Vision.
This implies two requirements: first, the need for generic representations of visual information; second, the disambiguation of those representations is mediated by recurrent feedback loops between different visual processes.
Visual Primitives
The generic representation we propose [1,2] is based on local multi-modal symbolic feature descriptors called
Visual Primitives.
Previous work focused on extracting primitives at image edges, but there is ongoing work to extend this to different type of local image structure.
An image and the visual primitives extracted.
Inter-process Feedback
During my PhD, I investigated how recurrent feedback mechanisms allow to improve the visual representation's robustness and accuracy [1,3].
I especially focused on the interaction between stereopsis, perceptual grouping and tracking. These recurrent feedbacks were shown to improve
considerably the visual representation's robustness and accuracy.
Cognitive Vision Software (CoViS)
This work constitutes the backbone of CoViS (Cognitive Vision Software) that is now at the core of the Danish research group in Odense and used in both
PACO-PLUS and
DrivSco projects. This software library has now been released under BSD license - the project's homepage is
here.
This work was conducted at the University of Stirling as part of the the EU--project ECOVISION.
Acknowledgements: Norbert Krüger and Florentin Wörgötter (see references).
References
[1] Nicolas Pugeault (2008)
Early Cognitive Vision: Feedback Mechanisms for the Disambiguation of Early Visual Representation,
ISBN 978-3-639-09357-5, Verlag Dr. Muller.
(PhD Thesis, also avaible here)
[2] Pugeault, N., Wörgötter, F., and Krüger, N. (2010).
Visual primitives: Local, condensed, semantically rich visual descriptors and their applications in robotics.
International Journal of Humanoid Robotics, Special Issue on Cognitive Humanoid Vision, 7(3):379–405.
(pdf)
[3] Pugeault, N., Wörgötter, F., and Krüger, N. (2010)
Disambiguating multi-modal scene representations using perceptual grouping constraints.
PLoS ONE 5(6): e10663. doi:10.1371/journal.pone.0010663
[4] Wörgötter, F., Krüger, N., Pugeault, N., Calow, D., Lappe, M., Pauwels, K., Hulle, M. V., Tan, S., and Johnston, A. (2004).
Early cognitive vision: Using gestalt-laws for task-dependent, active image-processing.
Natural Computing, 3(3):293-321.
(link)
[5] Krüger, N., Pugeault, N., Başeski, E., Baunegaard With Jensen, L., Kalkan, S., Kraft, D., Jessen, J.B., Pilz, F., Kjær-Nielsen, A., Popovic, M., Asfour, T., Piater, J., Kragic, D., and Wörgötter, F. (2010).
Early Cognitive Vision as a Front-end for Cognitive Systems.
In proceedings of the Workshop of Vision for Cognitive Tasks, at ECCV'2010
(pdf)
[6] Jessen, J.B., Pilz, F., Kraft, D., Pugeault, N., and Krueger, N. (2011).
Accumulation of Different Visual Feature Descriptors in a Coherent Framework.
In Proceedings of the Scandinavian Conference on Image Analysis (SCIA) 2011.
(pdf)