Theories of perception
· Bottom up theory
of perception
-Theory
of direct perception (Ecological view)
Top-down and
bottom-up theories of perception
Psychologists
often distinguish between top-down and bottom-up approaches to
information-processing. In top-down approaches, knowledge or expectations are
used to guide processing. Bottom-up approaches, however, are more like the
structuralist approach, piecing together data until a bigger picture is arrived
at. One of the strongest advocates of a bottom-up approach was J.J. Gibson
(1904-1980), who articulated a theory of direct perception. This stated that
the real world provided sufficient contextual information for our visual
systems to directly perceive what was there, unmediated by the influence of
higher cognitive processes. Gibson developed the notion of affordances,
referring to those aspects of objects or environments that allow an individual
to perform an action. Gibson's emphasis on the match between individual and
environment led him to refer to his approach as ecological. Most psychologists
now would argue that both bottom-up and top-down processes are involved in
perception.
Bottom-Up Theories
The four main
bottom-up theories of form and pattern perception are direct perception,
template theories, feature theories, and recognition-by-components theory.
Bottom-up
theories describe approaches where perception starts with the stimuli whose
appearance you take in through your eye. You look out onto the cityscape, and
perception happens when the light
information is transported to your brain. Therefore, they are datadriven (i.e.,
stimulus-driven) theories.
Gibson’s Theory of Direct Perception
Gestalt
psychologists referred to this problem as the Hoffding function (Köhler, 1940).
It was named after 19th-century Danish psychologist Harald Hoffding. He
questioned whether perception is such a simple process that all it takes is to
associate what is seen with what is remembered (associationism). An influential
and controversial theorist who questioned associationism is James J. Gibson
(1904–1980).
According to
Gibson’s theory of direct perception, the information in our sensory receptors,
including the sensory context, is all we need to perceive anything. As the
environment supplies us with all the information we need for perception, this
view is sometimes also called ecological perception. In other words, we do not
need higher cognitive processes or anything else to mediate between our sensory
experiences and our perceptions. Existing beliefs or higher-level inferential
thought processes are not necessary for perception.
Eg. “THE CAT.”
Yet the H of “THE” is identical to the A of “CAT.”
Gibson believed
that, in the real world, sufficient contextual information usually exists to
make perceptual judgments. He claimed that we need not appeal to higherlevel
intelligent processes to explain perception. Gibson (1979) believed that we use
this contextual information directly. In essence, we are biologically tuned to
respond to it. According to Gibson, we use texture gradients as cues for depth
and distance. Those cues aid us to perceive directly the relative proximity or
distance of objects and of parts of objects.
Therefore, as
noted above, Gibson’s model sometimes is referred to as an ecological model
(Turvey, 2003). This reference is a result of Gibson’s concern with perception
as it occurs in the everyday world (the ecological environment) rather than in
laboratory situations, where less contextual information is available. Direct perception may also play a role in
interpersonal situations when we try to make sense of others’ emotions and
intentions (Gallagher, 2008). After all, we can recognize emotion in faces as
such; we do not see facial expressions that we then try to piece together to result in the perception
of an emotion (Wittgenstein, 1980).
Neuroscience
also indicates that direct perception may be involved in person perception.
Mirror neurons are active both when a person acts and when he or she observes
that same act performed by somebody else. Furthermore, studies indicate that
there are separate neural pathways (what pathways) in the lateral occipital
area for the processing of form, color, and texture in objects.
Template Theories
Template
theories suggest that we have stored in our minds myriad sets of templates.
Templates are highly detailed models for patterns we potentially might
recognize. We recognize a pattern by comparing it with our set of templates. We then choose the exact template that
perfectly matches what we observe (Selfridge & Neisser, 1960). We see
examples of template matching in our everyday lives. Fingerprints are matched
in this way. Machines rapidly process imprinted numerals on checks by comparing
them to templates. Increasingly, products of all kinds are identified with
universal product codes (UPCs or “bar codes”). They can be scanned and
identified by computers at the time of purchase. Chess players who have
knowledge of many games use a matching strategy in line with template theory to
recall previous games (Gobet & Jackson, 2002). Template matching theories
belong to the group of chunk-based theories that suggest that expertise is
attained by acquiring chunks of knowledge in long-term memory that can later be
accessed for fast recognition. Studies with chess players have shown that the
temporal lobe is indeed activated when the players access the stored chunks in
their long-term memory (Campitelli, Gobet, Head, Buckley, & Parker, 2007).
Template-matching
theories fail to explain some aspects of the perception of letters. We identify
two different letters (A and H) from only one physical form. Hoffding (1891)
noted other problems. We can recognize an A as an A despite variations in the
size, orientation, and form in which the letter is written.
The Prototype
Theory
Rosch
(1973) and Rosch (1975) proposed that rather than having a number of predefined
templates within our minds, we instead categorise percepts by referencing
prototypes. Prototypes are similar to templates in that they symbolise outlines
or ideas of what an object should look like, however unlike templates which
require an exact match, prototypes rely on best-guesses when various features
are in place.
Feature-Matching Theories
Yet another
alternative explanation of pattern and form perception may be found in
feature-matching theories. According to these theories, we attempt to match
featuresof a pattern to features stored in memory, rather than to match a whole
pattern to a template or a prototype (Stankiewicz, 2003).
The Pandemonium Model
One such
feature-matching model has been called Pandemonium (“pandemonium” refers to a
very noisy, chaotic place and hell). In it, metaphorical “demons” with specific
duties receive and analyze the features of a stimulus
(Selfridge,
1959).
In Oliver
Selfridge’s Pandemonium Model, there are four kinds of demons: image demons,
feature demons, cognitive demons, and decision demons. Figure 3.12 shows this
model. The “image demons” receive a retinal image and pass it on to “feature
demons.” Each feature demon calls out when there are matches between the
stimulus
and the given
feature. These matches are yelled out at demons at the next level of the
hierarchy, the “cognitive (thinking) demons.” The cognitive demons in turn
shout out possible patterns stored in memory that conform to one or more of the
features noticed by the feature demons. A “decision demon” listens to the
pandemonium of the cognitive demons. It decides on what has been seen, based on
which cognitive demon is shouting the most frequently (i.e., which has the most
matching features).
Feature-detection
has also been expanded to identify 'local-precedence' (Martin, 1979) and
'global-precedence' (Navon, 1977) effects. A local-precedence effect occurs
when local (smaller or unique) features are detected in an image, whereas
global-precedence takes place when the features form a larger image or a wider
outline is identified. To better demonstrate this effect, take a look at the
below image. You will notice that the 'T' shapes on the left are spaced so far
apart that they stand out more as individual letters, whereas the image to the
right stands out more as a larger 'T' even though it is formed of lots of
smaller 'Ls' put together. This is because the 'Ts' on the left trigger a local
precedence effect where less detail causes the individual parts to stand out
more, and the 'Ls' on the right trigger a global-precedence effect where more
detail comes together to form a larger, overall image.
Structural Description Theories
•
Objects
represented as configurations of parts (features plus relations among features)
•
Retinal image
used to extract parts
•
Object-centered
•
Example: Biederman’s Structural Description Theory
Structural Description Theory (Biederman)
•
Objects are
represented as arrangements of parts
•
The parts are
basic geometrical shapes or “Geons”
•
Object-centered
•
Evidence: degraded line drawings
One of the
concepts that we’ve learned about that relates to a lot of my experiences is
the concept of geons. Geons are part of a theory about how we recognize
objects. The Recognition by Components theory, developed by Biederman in 1987,
incorporates the structural description theory and says that there are 36 three
dimensional shapes that all objects are made up of. These shapes are called
geometrical icons or geons (or primitives). These geons and the idea that all
objects are made up of them is very similar to the basic process of learning
how to draw. I started drawing when I was really young. Like most kids I
started doodling as soon as I was big enough to hold a crayon. But the hobby
stuck with me and developed over the years. I was self-taught for almost my
entire life and only took an actual art class when I entered high school. It
was difficult at first to kind of unlearn the ways I was used to drawing and
relearn some of the basics of sketching. Some aspects didn’t help improve my
art at all so I didn’t use them as much. But the one important skill I learned
that I’ve taken with me throughout the rest of my life was doing your initial
sketching by using what are, essentially, geons. Visually, everything,
including human figures, is composed of basic 2 and 3 dimensional shapes like
squares, cirlces, triangles, and cylinders. Once you can visualize how this
works, it makes drawing much easier. Take a human figure: the head is a circle,
the shoulders and all the joints are circles, the arms and legs are rectangles
or cylinders, the torso is an upside down triangle, the pelvic bone is an
upright triangle, the feet and hands are ovals with thin rectangles protruding
from them. Although a theory about how we recognize objects is obviously
different than a skill used for drawing, the similarities made it easier for me
to understand Recognition by Components theory because in a way, I’d been
practicing a rudimentary version of it for years.
Structural
Description Theories
Proponents of
structural description theories propose that objects are represented by parts
and their spatial relationships, which together form a structural description
of an object. These descriptions discard an object's color and texture, for
example, as the appearance of surface properties change with changes in viewing
conditions (e.g., a change in lighting can change how color appears to an
observer). The basic idea is that the same structural description can be
recovered or otherwise derived from different retinal images of the same object.
This robustness remains an appealing aspect of structural description theories
despite the loss of surface information. Structural description theories have
also been referred to as part-based or edge-based theories, given their
reliance on parts and edges.
The first viable
structural description theory for human object perception was proposed by David
Marr and Keith Nishihara. According to their theory, object parts (e.g., a
cat's leg) are represented by 3-D primitives called generalized cones, which
specified arbitrary 3-D shapes with a set of parameters. For example, a
cylinder can be produced by taking a circular cross section and sweeping it
along a straight line. The circle traces out a cylinder with the line forming
the main axis of that cylinder. By comparison, a rectangular cross section
sweeps out the surface of a brick. More complex 3-D shapes can similarly be
produced by sweeping different 2-D cross sections across different axes-One of
the challenges faced by Marr and Nishihara was how 3-D generalized cones can be
recovered from 2-D images. They suggested that an object's bounding contour—the
outline of an object in a picture—could be used to find the axes of its main
parts. These axes could then be used to derive generalized cones and their
spatial configuration. Recognition could then proceed by matching the
structural description recovered from the image to those stored in visual
memory. Thus, Marr and Nishihara try to solve the in variance problem by
recovering view-invariant 3-D models from images.
Following Marr
and Nishihara's seminal 1978 work, Irving Bicdcrman proposed another influential
structural description theory in the mid-1980s—recognition by components (RBC).
Biederman argues that objects are mentally represented by a set of 36
components and their spatial relationship. He called these geons, for
"geometrical ions." Geons are a subset of the generalized cones
proposed by Marr and Nishihara, three of which arc shown on the top of Figure
1(b). The combination of these geons into structural descriptions can be used
to create familiar objects like a mug. a pail, or a briefcase, as shown in the
bottom of Figure 1(b).
RISC theory
builds on Marr and Nishihara's structural description theory in two innovative
ways. First, unlike generalized cones, geons only differed qualitatively from
each other. For example, a gcon's axis can only be straight or curved, whereas
generalized cones can, in principle, have any degree of curvature. Bicdcrman's
second innovation was to propose a more direct means to recover geons from
images. According to RBC theory, geons are recovered from nonaccidental
properties. These are properties of edges in an image (e.g., lines) that are
associated with properties of edges in the world. To understand non-accidental
properties, consider seeing a box from many different viewpoints. From most
views, observers sec three sides of the box, which terminates in a
"Y"-junction at a corner. This two-dimensional junction is an example
of a nonaccidental property, and it is associated with a three-dimensional
corner.
· Top Down theory of perception
Top-down Processing is an important
perceptual theory in cognitive psychology. The theory establishes the paradigm
that sensory information processing in human cognition, such as perception,
recognition, memory, and comprehension, are organized and shaped by our
previous experience, expectations, as well as meaningful context (Solso, 1998).
Top-down
processing suggests that we form our perceptions starting with a larger object,
concept, or idea before working our way toward more detailed information. In
other words, top-down processing happens when we work from the general to the
specific; the big picture to the tiny details. In top-down processing, your
abstract impressions can influence the sensory data that you gather.
Top-down
processing is also known as conceptually-driven processing, since your
perceptions are influenced by expectations, existing beliefs, and cognitions.
In some cases you are aware of these influences, but in other instances this
process occurs without conscious awareness.
In constructive perception, the
perceiver builds (constructs) a cognitive understanding (perception) of a
stimulus. He or she uses sensory information as the foundation for the structure
but also using other sources of information to build the perception. This
viewpoint also is known as intelligent perception because it states that
higher-order thinking plays an important role in perception- It also emphasizes
the role of learning in perception (Fahle, 2003). Some investigators have
pointed out that not only does the world affect our perception but also the
world we experience is actually formed by our perception (Goldstone. 2003).
These ideas go back to the philosophy of Immanuel Kant. In other words,
perception is reciprocal with the world we experience. Perception both affects
and is affected by the world as we experience it.
· Computational
theory of perception
Marr used the
term "computational theory** to describe this aspect of his approach to
visual perception. The term emphatically does not mean a theory that is just
"something to do with computers". Instead, it expresses the specific
and very powerful idea that the first stage in understanding perception is to
identify the information that a perceiver needs from the world, and the regular
properties of the world that can be incorporated into processes for obtaining
that information. In other words, we need to know what computations a
visual system needs to perform, before attempting to understand how it carries
them out. In later chapters, we will see examples of Marr's application of
computational theory to problems such as detecting the edges of surfaces,
perceiving depth, or recognizing objects. The approach has been widely
influential; we saw an example of the same way of thinking in Chapter 3 (p. 57)
when we discussed the possibility that cells in the visual cortex act as
filters tuned to statistical regularities in images of natural scenes. Indeed,
the computational approach even brings some common ground between Marr's
theory and that of Gibson (sec Chapter 14. p. 408).
Computational
theories of perception can be applied not only to human vision but also to
other species, by considering what information an animal needs from light in
order to guide its activities.
No comments:
Post a Comment