Iconic vision

In our work we investigate the problem of automatically learning visual objects and their geometric relationships in the context of an attention changing vision system that uses image-based representations (iconic vision system). We assume approximately planar objects that can relate through rigid body transformations and appear in a cluttered environment.

Log-polar and primal sketch feature extraction

We use an image representation inspired on the human retina. The retina is responsible for receiving input light signals and passing them to layers of neurons. These neurons performs some kind of pre-processing before sending signals to the brain. Each of these neurons collects the outputs of many photo-receptors over an approximately circular area called the receptive field. The outer retinal region is formed by receptive fields circularly distributed with an exponentially increasing sampling area and distance from the retina centre. If we project both the receptive field angular displacement and the logarithm of the distance of a receptive field from the retina centre onto a Cartesian coordinate system, we then have a log-polar image, which has the property of converting changes in scale and rotation in the retinal space into translations in the log-polar space. It is possible to take advantage of this property when matching objects that can appear at different orientations and sizes.

From this image representation, we extract low-level features (edges, bars, blobs and ends) based on Marr's primal sketch hypothesis for the human visual system. Primal sketch features are believed to be used at an early stage by the human visual system as better representations for image data and also as cues for an attention mechanism. The primal sketch allows more compact representation of the image data and provides cues for an attention mechanism under the experimental evidence that these kinds of low level features seem to attract visual attention. Extracting those features has proven to be a non-trivial task because of the unusual sensor geometry, overlapping, receptive field computation and contrast coding. We designed a neural network-based feature extractor that has produced better results than the technique used before (a set of heuristically defined operators).

Iconic model learning

Now, using the image representation described above, which is inspired by the way humans see and attend to things in the environment, is it possible to autonomously classify objects and understand how they relate to each other by looking at a sequence of primal sketch based images? Learning object models is a difficult problem in a completely autonomous situation, in which there is no one to define training sets with objects already segmented, normalised and separated into classes. Objects themselves can be made out of a number of smaller objects or can be part of a larger object, and figuring out these relationships early in the modelling process plays an important role in a subsequent recognition process. We are currently developing an architecture that tackles the above problems.