Description
please use the article in the pdf file and answer the 4 questions at the end of the pdf file. do not go over the suggested length words. use simple English words.
Freiwald, W. A., Tsao, D. Y., & Livingstone, M. S. (2009). A face feature space in the
macaque temporal lobe. Nature Neuroscience, 12, 1187-1196.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual
representation by a single neuron in the human brain. Nature, 435, 1102-1107.
The work of Hubel and Wiesel certainly addresses how our brains allow us to perceive
the lines and angles in Figure 2.1, and subsequent studies by others can explain why we
perceive the shapes of triangles that are not actually in the picture. The work of Hubel
and Wiesel, though, brought to the forefront questions of a different sort: namely, how do
we perceive very complex stimuli, like human faces, and how can we recognize and
discriminate faces so quickly and effortlessly? A very influential paper by Barlow
published in 1972 framed this question, and his answer to it, in terms similar to that of
the Nobel Prize-winning neurophysiologist Charles Sherrington. Sherrington (1941) had
posited that the perception of complex stimuli such as faces might be due to a single
neuron firing that represented the face of a single individual. Sherrington had called such
neurons “pontifical cells†as an analogy to the hierarchy of the Roman Catholic Church.
The Catholic Church hierarchy consists of many, many priests, above whom are a
smaller number of cardinals, and above all these stands the Pope, also called the Pontiff.
Thus, the term pontifical cell was meant to convey the idea that the perception of an
individual face by the brain occurs because a single cell, sitting atop several hierarchical
levels, fires when the owner of that brain sees the face in question. Sherrington rejected
this idea in favor of a “democracy†of neurons; that is, he proposed that a given face
activates a very large number of neurons and the pattern of responding by this large
group of cells is what represents the face. The perception of different faces by an
observer would not be due to different pontifical neurons1 responding to each different
face, but would be caused by different, large groups of neurons responding to each face,
a concept we will call the network approach. The feature-detecting cells discovered by
Hubel and Wiesel could fit into either of these schemes: They could represent one of the
lower hierarchical levels (e.g., the “parish priestsâ€Â) or they could be part of the great
many cells composing the “democracy†networks.
Barlow (1972) agreed with Sherrington that pontifical cells were unlikely, but he strongly
objected to the network approach and instead proposed that perception is based
on cardinal cells, or very high-level neurons (one step lower than the pope), each of
which represents a complex set of features. The responding of several such cardinal
cellsâ€â€but not a vast number like proposed by the network approachâ€â€would represent
each face an observer saw. Different overlapping sets of cardinal cells would represent
different faces.
The two articles described below are a small sample of the vast array of research that
has addressed the issues raised here. The first is a study done with human epileptic
patients by a team from the California Institute of Technology, the Massachusetts
Institute of Technology, and the University of California at Los Angeles. The team was
led by Quian Quiroga, a neurophysiologist, engineer, and mathematician who is now at
the University of Leicester in the United Kingdom. The second set of experiments used
monkeys as research subjects and was carried out by Winrich Freiwald, Doris Tsao, and
Margaret Livingstone at Harvard Medical School.
Halle Berry and Jennifer Aniston Neurons
in the Human Brain
Professor Quiroga and his co-authors open their paper by noting how quickly humans
can visually recognize the face of a familiar person. We are able to almost
instantaneously recognize someone from many different angles, in different settings, and
even in photographs where we cannot rely on the person’s voice or characteristic ways
of moving to help us. The primary question the authors are asking in this article is how
such remarkable recognition capabilities are carried out by the human brain. As Hubel
and Wiesel did with cats, and as we will see in the article by Freiwald, Tsao, and
Livingstone, who experimented with monkeys, such questions are typically carried out
with non-human animals due to the obvious ethical restrictions on drilling into a person’s
skull to implant an electrode. However, as in the work of Penfield and Rasmussen
described in Unit 1, people with diseases and malfunctions of their brains often serve as
participants in “natural experiments†as part of their therapeutic processes. Such was the
case with the work of Quiroga et al. They studied eight patients with “pharmacologically
intractable epilepsyâ€Ââ€â€meaning patients whose epileptic seizures could not be controlled
by medicationâ€â€who had electrodes implanted in their brains.
The eight patients ranged in age from seventeen to forty-seven years, all were righthanded, and five were female. Each had electrodes placed in a temporal lobe
location.2 The purpose of these electrodes was to aid in locating the focus of epileptic
seizures, which are caused by what might be called an “electrical storm in the brain.â€Â
Since other types of imaging procedures were unable to locate exactly the place in the
brain where the seizures began, electrodes were implanted in all eight patients for the
purpose of locating the sources of their seizures. These electrodes were used by
Quiroga and his colleagues to record the responses of individual neurons and small
groups of neurons, much like is done with nonhuman animals.
The procedure used by Professor Quiroga et al. involved each patient viewing several
visual stimuli on a laptop computer while he or she lay in bed. In each thirty-minute
session, several pictures, all 1.5 inches wide, were shown individually for one second
each. The simple task required the participant to press the “Y†key on the computer if the
picture showed a human and the “N†key if it did not. In the first session, called
the screening session, a given participant saw an average of ninety-four images (range =
71–114) of famous people and famous buildings, as well as animals and other objects.
This set of images included some pictures that were suggested in interviews with each
participant. For the one-second duration that each picture was visible, Quiroga and his
colleagues recorded the activity of several temporal lobe neurons through the implanted
electrodes. The data were then examined for neurons that significantly responded to any
of the images. A significant response occurred when a neuron increased its firing rate by
a large amount when a given image was on the screen relative to when no image was
present. Only about 3% of the images shown in the screening sessions elicited a
significant increase in responding from the neurons being recorded. The experimenters
recorded a total of 993 different units across the eight participants but only 132 units
showed a significant response to at least one of the images. Of these 132 units, sixtyfour were single neurons and the other sixty-eight were composed of at least two
neurons (that is, an electrode was picking up the firing of at least two neurons that were
each responding to the image).
When a unit responded to an image, it typically responded by firing several action
potentials for the one second that the image appeared on the screen and, in some
cases, firing continued up to one second after the image disappeared. Images that
evoked large responses from at least one neuron in each participant were then used in
one or more testing sessions, which occurred as soon as possible after the screening
session was completed. In these testing sessions, three to eight different variations of
the images (different pictures of the same person, animal, or object) that had evoked
significant responses in the screening session were presented. Some of these variations
were not actually pictures but letters that spelled out the name of one of the people or
objects, the pictures of which were also shown (e.g., several different pictures of Jennifer
Aniston might be shown, but her name would be shown during other trials). During these
testing sessions, each participant viewed an average of approximately eighty-nine
images.
Many of the units that responded to these images responded to several different objects
and/or people, although fifty-one were found to respond to only to a particular person or
famous building. However, Professor Quiroga and his colleagues devote most of their
report to these peculiar single units (i.e., individual neurons) that responded only to
images of a single person or object, but did so to all, or mostly all, of the different
pictures of that person/object. Quiroga et al. pay particular attention to a single unit
recorded from the left hippocampus (a structure deep within the temporal lobe that is
involved with memory) of one of the participants. To most of the eighty-seven images
shown to this participant during this testing session and during the baseline period when
no images were on the screen, this individual neuron was virtually silent, producing an
average of less than .03 action potentials every second. However, seven of the pictures
were of the actress Jennifer Aniston. When one of these pictures was on the screen, this
neuron increased its baseline firing rate from .03 per second to 9–45 times that rate. This
neuron did not respond, however, to pictures of Aniston together with Brad Pitt, the
husband she divorced the same year that this article was published.
Professor Quiroga and his co-authors also focus on a single unit from a different
participant; this neuron was also located in the patient’s hippocampus but in this case it
was on the right side of the brain. Eight of the ninety-six images shown during this testing
session were different poses of the actress Halle Berry. The baseline firing rate for this
unit was also very low, producing an average of less than .09 action potentials per
second. The pictures of Halle Berry caused this neuron to fire at 4–18 times this base
rate. One of the pictures was a drawing of Berry and three were of her dressed as
Catwoman, a movie role she played in the early 2000s. This unit did not respond to
pictures of other actresses dressed in Catwoman costumes. Perhaps the most
remarkable response of this unit was to the letter string “HALLE BERRY,†which was not
a picture at all. This unit did not respond to the names of several other well-known
people such as Kobe Bryant and Julia Roberts. Quiroga et al. report that eight of the
units showed similar selective responding to a particular person and to his/her name.
Not all units recorded by the Quiroga team responded to human faces. One unit is
described that responded preferentially to two buildings, the opera house in Sydney,
Australia, and the Baha’i Lotus Temple in New Delhi, India. These two structures, which
do look somewhat alike, were both identified as the Sydney Opera House by the
participant; this neuron also responded to the letters “SYDNEY OPERA†but not to other
strings of letters, even those that named buildings.
After presenting these and other data regarding how extremely selective some of the
recorded neurons are, Professor Quiroga and his co-authors address the issue raised at
the beginning of this chapter: Is the perception of individual persons or objects based on
the responding of single, high-level “pontifical†cells or on the activation of large,
distributed networks of neurons, each of which contributes a part or feature of the greater
whole? The authors reject the network explanation by saying:
In the latter case, recognition would require the simultaneous activation of a large
number of cells and therefore we would expect each cell to respond to many pictures
with similar basic features. This is in contrast to the sparse firing we observe, because
most [medial temporal lobe] cells do not respond to the great majority of images seen by
the patient. (p. 1106)
Another way of expressing their argument is that if any one neuron is only responding to
a single featureâ€â€for example, a certain face shapeâ€â€then that cell should respond to
every image that contains that face shape. This was not what these investigators found.
Rather, they found a small but significant number of units that responded only to a single
individual, as a pontifical cell would.
However, Quiroga et al. explicitly state that they do not feel that their results support the
pontifical cell theory, at least in its extreme form whereby the recognition of any person,
animal, or object requires the activation of a single neuron that identifies that person,
animal, or object. One reason why they reject this extreme view is that many of the
neurons they measured did respond to more than one subject. More importantly, the
authors note that they only presented a very small number of images to the participants.
In theory, in order to prove that a single neuron fires only to a single person, pictures of
every face on the planet would have to be presented and the neuron would respond to
only one. Quiroga and colleagues do, though, say that their findings support a position
closer to that of the pontifical cell concept rather than that of the distributed network:
“These results suggest an invariant, sparse and explicit code, which might be important
in the transformation of complex visual percepts into long-term and more abstract
memories†(p. 1102).
Probing the Brains of Monkeys for
Answers
The article by Quiroga, Reddy, Kreiman, Koch, and Fried is fascinating in that a few of
the neurons that they describe appear to identify an individual person in many different
contexts and in several modalities (i.e., photographs, drawings, and letters). However, as
Quiroga et al. caution, we cannot conclude that these cells respond only to
representations of that one person because the investigators did notâ€â€and obviously
cannotâ€â€present images of every possible face. Still, the neurons that Professor Quiroga
and his colleagues discovered responded to very complex stimuli, far more complex than
the simple edges that Hubel and Wiesel found. And even though we may never be able
to determine if true pontifical cells exist, understanding more about how weâ€â€and our
brainsâ€â€perceive people and things will always be a goal of researchers from many
different branches of science.
To this end, Freiwald, Tsao, and Livingstone approached the problem from a more
experimental and analytic perspective. What that means in this case is that they (a)
studied monkeys rather than humans, thus allowing the investigators to control virtually
all variables (such as where the electrodes would be placed in the brain); and (b)
reduced the number of features contained within each stimulus image. Freiwald, Tsao,
and Livingstone did this by using cartoon faces. These cartoon faces had only seven
parts (face outline, hair, eyes, irises of the eyes, eyebrows, mouth, and nose) and
nineteen different features of those seven parts (e.g., mouth shape, eye size, hair length,
inter-eye distance, etc.), with each feature only taking on one of eleven different values
(e.g., eleven different sizes of eyes).
Professor Freiwald and his colleagues first used functional magnetic resonance imaging
(fMRI) to locate an area in the brains of the three male rhesus monkeys they studied that
responded predominantly to images of faces rather than to objects. This area, called
the middle face patch, is located in the temporal lobe (which is also the part of the brain
from which the Quiroga team recorded). Electrodes were implanted into the middle face
patch of each monkey and it was confirmed that this area responded almost entirely to
images of faces; responses recorded from individual neurons to cartoon faces were
nearly as strong as responses to actual faces.3 During recording sessions, each monkey
was tested individually in a small darkened chamber and his head was held in a fixed
position. Unlike the cats of Hubel and Wiesel, the eye muscles were not paralyzed with
drugs to keep the eyes from moving. However, the monkeys were given a reward of fruit
juice for keeping their eyes fixated on the screen where the faces were projected.
The first thing that Professors Freiwald and his colleagues examined was how selective
the neurons that were sampled were to the parts of a face. They did this by presenting all
128 possible decompositions of the seven face parts to the monkeys. Of the thirty-three
neurons from which they recorded, Freiwald et al. report that all were influenced by at
least one part of the face, but at most only four parts; none responded to all seven face
parts. This suggested to the authors that these neurons were not simple features
detectors like those of Hubel and Wiesel because the cells responded to multiple parts
and to combinations of certain parts. The authors go on to say:
Notably, middle face patch neurons did not have a single best stimulus that uniquely
elicited the maximum firing rate. . . As a consequence, the same cell often fired at its
maximum rate to both the whole face and to a variety of partial faces. (p. 1188)
The next experiment that the authors carried out was to present cartoon faces that
contained all seven parts, but which varied each of the nineteen different features (e.g.,
mouth shape) in eleven ways (e.g., eleven different mouth shapes, such as a slight
smile, a big smile, a slight frown, etc.). In this experiment, 272 neurons were studied,
with 90% of those neurons responding differently to the different variations of at least
one given feature, an effect Freiwald et al. call tuning. For example, one neuron was
tuned to three of the nineteen features, namely inter-eye distance, iris size, and face
aspect ratio, which was how round or narrow the face was. For iris size and face aspect
ratio, the smallest of the eleven possible values evoked minimal responding by the
neuron. As these values increased (e.g., the irises got larger), the rate of responding by
the neuron increased as well. For inter-eye distance, the maximal neuronal response
was elicited by the smallest distance between the eyes; the response rate gradually
decreased as the inter-eye distance increased. Iris size also affected the response rate
of another neuron in a linearly increasing manner, but this neuron was also tuned to the
feature in which the eyes, nose, and mouth were placed high or low within the face.
Here, the neuron responded with a maximum rate of firing when the eyes, nose, and
mouth were lower and close to the chin; the response gradually diminished as these
parts were placed upward toward the forehead. The authors report that 62% of the
neurons that were tuned to variations of a particular feature showed their maximum
responding at either the lowest or highest value for that feature, and 67% showed
minimal responding at one of these two extremes. Two-thirds of these tunings exhibited
a maximum response at one extreme and a minimum response at the other. In general,
a given neuron showed tuning to three features. All nineteen features showed evidence
of this ramp-like tuning, where one extreme value evoked the highest response rate and
the other extreme evoked the lowest. Regarding these findings, Freiwald et al. state:
“Figuratively speaking, these cells measure feature dimensions†(p. 1191).
Professor Freiwald and his colleagues next examined forty-nine neurons that were tuned
to a given feature. They varied the values of that feature but removed all other features.
For instance, if a neuron showed tuning to iris size, then only the various sizes of the
irises were shown without any other features, including the outline of the face. Fourteen
of the forty-nine neurons studied stopped showing their tuning effects. The other thirtyfive cells still responded at differing rates depending on the value of the feature (i.e., they
still showed tuning to that feature), but the rates of responding of these cells was
significantly diminished when only that one feature, and not the rest of the face, was
presented. Freiwald et al. interpreted this to mean that these neurons respond to
variations in a single feature, but maximum responding requires that feature to be within
the larger context of an entire face.
The research team further investigated this finding by presenting complete cartoon faces
both in the usual upright position and upside down. When they did this, they found that
tuning to features decreased by 25% and that tuning to the slant of the eyebrows was
lost entirely.
Freiwald, Tsao, and Livingstone draw a number of conclusions from their experiments:
ï‚·
ï‚·
ï‚·
“No cell required the presence of a whole face to respond, indicating
that the detection process is not strictly holistic†(p. 1194). In other
words, no evidence for pontifical cells was found.
“[D]ifferent cells were selective for different face parts and interactions
between parts . . . and even the same cell can respond maximally to
different combinations of face parts†(p. 1194). Although no cell
responded to all the parts or features of a face, these temporal lobe
neurons detected fairly complex features as well as combinations of
features that influenced each other.
“The mechanism for distinguishing between individual faces appears to
rely on a division of labor among cells tuned to different subset of facial
features†(p. 1194).
This thorough study by Freiwald, Tsao, and Livingstone appears in many ways to
generally support the ideas of Barlow (1972) that visual perception of faces involves
neither pontifical cells nor large networks of neurons, but something more like the
cardinal cells he proposed, that is, very high level neurons, each of which represents a
complex set of features. However, it takes a number of these neurons responding
together to represent the rich, visual perceptual world that we humans enjoy. Freiwald et
al. added more to our understanding: Many neurons are sensitive to multiple features,
not in a manner of simply signaling “yes†or “no†to the presence of those features but
scaling those features to indicate, for example, the size or placement of that feature.
Finally, the authors add the idea that sensitivity to these features is accentuated when
those features are presented within the context of a face. These discoveries have added
tremendously to our understanding of how our brains perceive our visual world. Do they
explain how and why we can recognize almost instantaneously hundreds of different
people? Perhaps not, but we are getting there.
References
Barlow, H. B. (1972). Single units and sensation: A neuron doctrine for perceptual
psychology? Perception, 1, 371-394.
Sherrington, C. S. (1941). Man on his nature. Macmillan Publishers.
Purchase answer to see full
attachment