Iconic vision
In our work we investigate the problem of automatically learning
visual objects and their geometric relationships in the context of an
attention changing vision system that uses image-based representations
(iconic vision system). We assume approximately planar objects that
can relate through rigid body transformations and appear in a
cluttered environment.
Log-polar and primal sketch feature extraction
We use an image representation inspired on the human
retina. The retina is responsible for receiving input light signals
and passing them to layers of neurons. These neurons performs some
kind of pre-processing before sending signals to the brain. Each of
these neurons collects the outputs of many photo-receptors over an
approximately circular area called the receptive field. The outer
retinal region is formed by receptive fields circularly distributed
with an exponentially increasing sampling area and distance from the
retina centre. If we project both the receptive field angular
displacement and the logarithm of the distance of a receptive field
from the retina centre onto a Cartesian coordinate system, we then
have a log-polar image, which has the property of converting changes
in scale and rotation in the retinal space into translations in the
log-polar space. It is possible to take advantage of this property
when matching objects that can appear at different orientations and
sizes.
From this image representation, we extract low-level features
(edges, bars, blobs and ends) based on Marr's primal sketch
hypothesis for the human visual system. Primal sketch features are
believed to be used at an early stage by the human visual system as better
representations for image data and also as cues for an attention
mechanism. The primal sketch allows
more compact representation of the image data and provides cues
for an attention mechanism under the experimental evidence that
these kinds of low level features seem to attract visual attention.
Extracting those features has proven to be a non-trivial task
because of the unusual sensor geometry, overlapping, receptive field
computation and contrast coding. We designed a neural network-based
feature extractor that has produced better results than the
technique used before (a set of heuristically defined operators).
Iconic model learning
Now, using the image representation described above, which is inspired
by the way humans see and attend to things in the environment, is it
possible to autonomously classify objects and understand how they
relate to each other by looking at a sequence of primal sketch based
images?
Learning object models is a difficult problem in a completely
autonomous situation, in which there is no one to define training sets
with objects already segmented, normalised and separated into
classes. Objects themselves can be made out of a number of smaller
objects or can be part of a larger object, and figuring out these
relationships early in the modelling process plays an important role
in a subsequent recognition process. We are currently developing an
architecture that tackles the above problems.