Sensory Systems/old/Biological Machines/Print version

From Wikibooks, open books for an open world
Jump to navigation Jump to search

The Wikibook of

Sensory Systems

Biological Organisms, an Engineer's Point of View.

From Wikibooks: The Free Library


Biological Machines/Preface

Table of Contents

Sensory Systems

Human Anatomy and Physiology

General Features

Technological Aspects

Other Animals

Additional Information


While the human brain may make us what we are, our sensory systems are our windows and doors to the world. In fact they are our ONLY windows and doors to the world. So when one of these systems fails, the corresponding part of our world is no longer accessible to us. Recent advances in engineering have made it possible to replace sensory systems by mechanical and electrical sensors, and to couple those sensors electronically to our nervous system. While to many this may sound futuristic and maybe even a bit scary, it can work magically. For the auditory system, so called “cochlea implants” have given thousands of patients who were completely deaf their hearing back, so that they can interact and communicate freely again with their family and friends. Many research groups are also exploring different approaches to retinal implants, in order to restore vision to the blind. And in 2010 the first patient has been implanted with a “vestibular implant”, to alleviate defects in his balance system.

The wikibook “Sensory Systems” wants to present our sensory systems from an engineering and information processing point of view. On the one hand, this provides some insight in the sometimes spectacular ingenuity and performance of our senses. On the other hand, it provides some understanding of how our senses transduce external information into signals that our central nervous system can work with, and how – and how well - this process can be replaced by technical components.

Sensory Systems

Biological Machines/Sensory Systems

Visual System

Technological Aspects
In Animals


Generally speaking, visual systems rely on electromagnetic (EM) waves to give an organism more information about its surroundings. This information could be regarding potential mates, dangers and sources of sustenance. Different organisms have different constituents that make up what is referred to as a visual system.

The complexity of eyes range from something as simple as an eye spot, which is nothing more than a collection of photosensitive cells, to a fully fledged camera eye. If an organism has different types of photosensitive cells, or cells sensitive to different wavelength ranges, the organism would theoretically be able to perceive colour or at the very least colour differences. Polarisation, another property of EM radiation, can be detected by some organisms, with insects and cephalopods having the highest accuracy.

Please note, in this text, the focus has been on using EM waves to see. Granted, some organisms have evolved alternative ways of obtaining sight or at the very least supplementing what they see with extra-sensory information. For example, whales or bats, which use echo-location. This may be seeing in some sense of the definition of the word, but it is not entirely correct. Additionally, vision and visual are words most often associated with EM waves in the visual wavelength range, which is normally defined as the same wavelength limits of human vision.

Electromagnetic spectrum

Since some organisms detect EM waves with frequencies below and above that of humans a better definition must be made. We therefore define the visual wavelength range as wavelengths of EM between 300nm and 800nm. This may seem arbitrary to some, but selecting the wrong limits would render parts of some bird's vision as non-vision. Also, with this range of wavelengths, we have defined for example the thermal-vision of certain organisms, like for example snakes as non-vision. Therefore snakes using their pit organs, which is sensitive to EM between 5000nm and 30,000nm (IR), do not "see", but somehow "feel" from afar. Even if blind specimens have been documented targeting and attacking particular body parts.

Firstly a brief description of different types of visual system sensory organs will be elaborated on, followed by a thorough explanation of the components in human vision, the signal processing of the visual pathway in humans and finished off with an example of the perceptional outcome due to these stages.

Sensory Organs

Vision, or the ability to see depends on visual system sensory organs or eyes. There are many different constructions of eyes, ranging in complexity depending on the requirements of the organism. The different constructions have different capabilities, are sensitive to different wave-lengths and have differing degrees of acuity, also they require different processing to make sense of the input and different numbers to work optimally. The ability to detect and decipher EM has proved to be a valuable asset to most forms of life, leading to an increased chance of survival for organisms that utilise it. In environments without sufficient light, or complete lack of it, lifeforms have no added advantage of vision, which ultimately has resulted in atrophy of visual sensory organs with subsequent increased reliance on other senses (e.g. some cave dwelling animals, bats etc.). Interestingly enough, it appears that visual sensory organs are tuned to the optical window, which is defined as the EM wavelengths (between 300nm and 1100nm) that pass through the atmosphere reaching to the ground. This is shown in the figure below. You may notice that there exists other "windows", an IR window, which explains to some extent the thermal-"vision" of snakes, and a radiofrequency (RF) window, of which no known lifeforms are able to detect.

Through time evolution has yielded many eye constructions, and some of them have evolved multiple times, yielding similarities for organisms that have similar niches. There is one underlying aspect that is essentially identical, regardless of species, or complexity of sensory organ type, the universal usage of light-sensitive proteins called opsins. Without focusing too much on the molecular basis though, the various constructions can be categorised into distinct groups:

  • Spot Eyes
  • Pit Eyes
  • Pinhole Eyes
  • Lens Eyes
  • Refractive Cornea Eyes
  • Reflector Eyes
  • Compound Eyes

The least complicated configuration of eyes enable organisms to simply sense the ambient light, enabling the organism to know whether there is light or not. It is normally simply a collection of photosensitive cells in a cluster in the same spot, thus sometimes referred to as spot eyes, eye spot or stemma. By either adding more angular structures or recessing the spot eyes, an organisms gains access to directional information as well, which is a vital requirement for image formation. These so called pit eyes are by far the most common types of visual sensory organs, and can be found in over 95% of all known species.

Pinhole eye

Taking this approach to the obvious extreme leads to the pit becoming a cavernous structure, which increases the sharpness of the image, alas at a loss in intensity. In other words, there is a trade-off between intensity or brightness and sharpness. An example of this can be found in the Nautilus, species belonging to the family Nautilidae, organisms considered to be living fossils. They are the only known species that has this type of eye, referred to as the pinhole eye, and it is completely analogous to the pinhole camera or the camera obscura. In addition, like more advanced cameras, Nautili are able to adjust the size of the aperture thereby increasing or decreasing the resolution of the eye at a respective decrease or increase in image brightness. Like the camera, the way to alleviate the intensity/resolution trade-off problem is to include a lens, a structure that focuses the light unto a central area, which most often has a higher density of photo-sensors. By adjusting the shape of the lens and moving it around, and controlling the size of the aperture or pupil, organisms can adapt to different conditions and focus on particular regions of interest in any visual scene. The last upgrade to the various eye constructions already mentioned is the inclusion of a refractive cornea. Eyes with this structure have delegated two thirds of the total optic power of the eye to the high refractive index liquid inside the cornea, enabling very high resolution vision. Most land animals, including humans have eyes of this particular construct. Additionally, many variations of lens structure, lens number, photosensor density, fovea shape, fovea number, pupil shape etc. exists, always, to increase the chances of survival for the organism in question. These variations lead to a varied outward appearance of eyes, even with a single eye construction category. Demonstrating this point, a collection of photographs of animals with the same eye category (refractive cornea eyes) is shown below.

Refractive Cornea Eyes
Hawk Eye
Sheep Eye
Cat Eye
Human Eye
Crocodile Eye

An alternative to the lens approach called reflector eyes can be found in for example mollusks. Instead of the conventional way of focusing light to a single point in the back of the eye using a lens or a system of lenses, these organisms have mirror like structures inside the chamber of the eye that reflects the light into a central portion, much like a parabola dish. Although there are no known examples of organisms with reflector eyes capable of image formation, at least one species of fish, the spookfish (Dolichopteryx longipes) uses them in combination with "normal" lensed eyes.

Compound eye

The last group of eyes, found in insects and crustaceans, is called compound eyes. These eyes consist of a number of functional sub-units called ommatidia, each consisting of a facet, or front surface, a transparent crystalline cone and photo-sensitive cells for detection. In addition each of the ommatidia are separated by pigment cells, ensuring the incoming light is as parallel as possible. The combination of the outputs of each of these ommatidia form a mosaic image, with a resolution proportional to the number of ommatidia units. For example, if humans had compound eyes, the eyes would have covered our entire faces to retain the same resolution. As a note, there are many types of compound eyes, but delving to deep into this topic is beyond the scope of this text.

Not only the type of eyes vary, but also the number of eyes. As you are well aware of, humans usually have two eyes, spiders on the other hand have a varying number of eyes, with most species having 8. Normally the spiders also have varying sizes of the different pairs of eyes and the differing sizes have different functions. For example, in jumping spiders 2 larger front facing eyes, give the spider excellent visual acuity, which is used mainly to target prey. 6 smaller eyes have much poorer resolution, but helps the spider to avoid potential dangers. Two photographs of the eyes of a jumping spider and the eyes of a wolf spider are shown to demonstrate the variability in the eye topologies of arachnids.

Anatomy of the Visual System

We humans are visual creatures, therefore our eyes are complicated with many components. In this chapter, an attempt is made to describe these components, thus giving some insight into the properties and functionality of human vision.

Getting inside of the eyeball - Pupil, iris and the lens

Light rays enter the eye structure through the black aperture or pupil in the front of the eye. The black appearance is due to the light being fully absorbed by the tissue inside the eye. Only through this pupil can light enter into the eye which means the amount of incoming light is effectively determined by the size of the pupil. A pigmented sphincter surrounding the pupil functions as the eye's aperture stop. It is the amount of pigment in this iris, that give rise to the various eye colours found in humans.

In addition to this layer of pigment, the iris has 2 layers of ciliary muscles. A circular muscle called the pupillary sphincter in one layer, that contracts to make the pupil smaller. The other layer has a smooth muscle called the pupillary dilator, which contracts to dilate the pupil. The combination of these muscles can thereby dilate/contract the pupil depending on the requirements or conditions of the person. The ciliary muscles are controlled by ciliary zonules, fibres that also change the shape of the lens and hold it in place.

The lens is situated immediately behind the pupil. Its shape and characteristics reveal a similar purpose to that of camera lenses, but they function in slightly different ways. The shape of the lens is adjusted by the pull of the ciliary zonules, which consequently changes the focal length. Together with the cornea, the lens can change the focus, which makes it a very important structure indeed, however only one third of the total optical power of the eye is due to the lens itself. It is also the eye's main filter. Lens fibres make up most of the material for the lense, which are long and thin cells void of most of the cell machinery to promote transparency. Together with water soluble proteins called crystallins, they increase the refractive index of the lens. The fibres also play part in the structure and shape of the lens itself.

Schematic diagram of the human eye

Beamforming in the eye – Cornea and its protecting agent - Sclera

Structure of the Cornea

The cornea, responsible for the remaining 2/3 of the total optical power of the eye, covers the iris, pupil and lens. It focuses the rays that pass through the iris before they pass through the lens. The cornea is only 0.5mm thick and consists of 5 layers:

  • Epithelium: A layer of epithelial tissue covering the surface of the cornea.
  • Bowman's membrane: A thick protective layer composed of strong collagen fibres, that maintain the overall shape of the cornea.
  • Stroma: A layer composed of parallel collagen fibrils. This layer makes up 90% of the cornea's thickness.
  • Descemet's membrane and Endothelium: Are two layers adjusted to the anterior chamber of the eye filled with aqueous humor fluid produced by the ciliary body. This fluid moisturises the lens, cleans it and maintains the pressure in the eye ball. The chamber, positioned between cornea and iris, contains a trabecular meshwork body through which the fluid is drained out by Schlemm canal, through posterior chamber.

The surface of the cornea lies under two protective membranes, called the sclera and Tenon’s capsule. Both of these protective layers completely envelop the eyeball. The sclera is built from collagen and elastic fibres, which protect the eye from external damages, this layer also gives rise to the white of the eye. It is pierced by nerves and vessels with the largest hole reserved for the optic nerve. Moreover, it is covered by conjunctiva, which is a clear mucous membrane on the surface of the eyeball. This membrane also lines the inside of the eyelid. It works as a lubricant and, together with the lacrimal gland, it produces tears, that lubricate and protect the eye. The remaining protective layer, the eyelid, also functions to spread this lubricant around.

Moving the eyes – extra-ocular muscles

The eyeball is moved by a complicated muscle structure of extra-ocular muscles consisting of four rectus muscles – inferior, medial, lateral and superior and two oblique – inferior and superior. Positioning of these muscles is presented below, along with functions:

Extra-ocular muscles: Green - Lateral Rectus; Red - Medial Rectus; Cyan - Superior Rectus; Pink - Inferior Rectus; Dark Blue - Superior Oblique; Yellow - Inferior Oblique.

As you can see, the extra-ocular muscles (2,3,4,5,6,8) are attached to the sclera of the eyeball and originate in the annulus of Zinn, a fibrous tendon surrounding the optic nerve. A pulley system is created with the trochlea acting as a pulley and the superior oblique muscle as the rope, this is required to redirect the muscle force in the correct way. The remaining extra-ocular muscles have a direct path to the eye and therefore do not form these pulley systems. Using these extra-ocular muscles, the eye can rotate up, down, left, right and alternative movements are possible as a combination of these.

Other movements are also very important for us to be able to see. Vergence movements enable the proper function of binocular vision. Unconscious fast movements called saccades, are essential for people to keep an object in focus. The saccade is a sort of jittery movement performed when the eyes are scanning the visual field, in order to displace the point of fixation slightly. When you follow a moving object with your gaze, your eyes perform what is referred to as smooth pursuit. Additional involuntary movements called nystagmus are caused by signals from the vestibular system, together they make up the vestibulo-ocular reflexes.

The brain stem controls all of the movements of the eyes, with different areas responsible for different movements.

  • Pons: Rapid horizontal movements, such as saccades or nystagmus
  • Mesencephalon: Vertical and torsional movements
  • Cerebellum: Fine tuning
  • Edinger-Westphal nucleus: Vergence movements

Where the vision reception occurs – The retina

Filtering of the light performed by the cornea, lens and pigment epithelium

Before being transduced, incoming EM passes through the cornea, lens and the macula. These structures also act as filters to reduce unwanted EM, thereby protecting the eye from harmful radiation. The filtering response of each of these elements can be seen in the figure "Filtering of the light performed by cornea, lens and pigment epithelium". As one may observe, the cornea attenuates the lower wavelengths, leaving the higher wavelengths nearly untouched. The lens blocks around 25% of the EM below 400nm and more than 50% below 430nm. Finally, the pigment ephithelium, the last stage of filtering before the photo-reception, affects around 30% of the EM between 430nm and 500nm.

A part of the eye, which marks the transition from non-photosensitive region to photosensitive region, is called the ora serrata. The photosensitive region is referred to as the retina, which is the sensory structure in the back of the eye. The retina consists of multiple layers presented below with millions of photoreceptors called rods and cones, which capture the light rays and convert them into electrical impulses. Transmission of these impulses is nervously initiated by the ganglion cells and conducted through the optic nerve, the single route by which information leaves the eye.

Structure of retina including the main cell components: RPE: retinal pigment epithelium; OS: outer segment of the photoreceptor cells; IS: inner segment of the photoreceptor cells; ONL: outer nuclear layer; OPL: outer plexiform layer; INL: inner nuclear layer IPL: inner plexiform layer; GC: ganglion cell layer; P: pigment epithelium cell; BM: Bruch-Membran; R: rods; C: cones; H: horizontal cell; B: bipolar cell; M: Müller cell; A:amacrine cell; G: ganglion cell; AX: Axon; arrow: Membrane limitans externa.

A conceptual illustration of the structure of the retina is shown on the right. As we can see, there are five main cell types:

  • photoreceptor cells
  • horizontal cells
  • bipolar cells
  • amacrine cells
  • ganglion cells

Photoreceptor cells can be further subdivided into two main types called rods and cones. Cones are much less numerous than rods in most parts of the retina, but there is an enormous aggregation of them in the macula, especially in its central part called the fovea. In this central region, each photo-sensitive cone is connected to one ganglion-cell. In addition, the cones in this region are slightly smaller than the average cone size, meaning you get more cones per area. Because of this ratio, and the high density of cones, this is where we have the highest visual acuity.

Distribution of Cones and Rods on Human Retina

There are 3 types of human cones, each of the cones responding to a specific range of wavelengths, because of three types of a pigment called photopsin. Each pigment is sensitive to red, blue or green wavelength of light, so we have blue, green and red cones, also called S-, M- and L-cones for their sensitivity to short-, medium- and long-wavelength respectively. It consists of protein called opsin and a bound chromphore called the retinal. The main building blocks of the cone cell are the synaptic terminal, the inner and outer segments, the interior nucleus and the mitochondria.

The spectral sensitivities of the 3 types of cones:

  • 1. S-cones absorb short-wave light, i.e. blue-violet light. The maximum absorption wavelength for the S-cones is 420nm
  • 2. M-cones absorb blue-green to yellow light. In this case The maximum absorption wavelength is 535nm
  • 3. L-cones absorb yellow to red light. The maximum absorption wavelength is 565nm
Cone cell structure

The inner segment contains organelles and the cell's nucleus and organelles. The pigment is located in the outer segment, attached to the membrane as trans-membrane proteins within the invaginations of the cell-membrane that form the membranous disks, which are clearly visible in the figure displaying the basic structure of rod and cone cells. The disks maximize the reception area of the cells. The cone photoreceptors of many vertebrates contain spherical organelles called oil droplets, which are thought to constitute intra-ocular filters which may serve to increase contrast, reduce glare and lessen chromatic aberrations caused by the mitochondrial size gradient from the periphery to the centres.

Rods have a structure similar to cones, however they contain the pigment rhodopsin instead, which allows them to detect low-intensity light and makes them 100 times more sensitive than cones. Rhodopsin is the only pigment found in human rods, and it is found on the outer side of the pigment epithelium, which similarly to cones maximizes absorption area by employing a disk structure. Similarly to cones, the synaptic terminal of the cell joins it with a bipolar cell and the inner and outer segments are connected by cilium.

The pigment rhodopsin absorbs the light between 400-600nm, with a maximum absorption at around 500nm. This wavelength corresponds to greenish-blue light which means blue colours appear more intense in relation to red colours at night.

Rod cell structure
The sensitivity of cones and rods across visible EM

EM waves with wavelengths outside the range of 400 – 700 nm are not detected by either rods nor cones, which ultimately means they are not visible to human beings.

Horizontal cells occupy the inner nuclear layer of the retina. There are two types of horizontal cells and both types hyper-polarise in response to light i.e. they become more negative. Type A consists of a subtype called HII-H2 which interacts with predominantly S-cones. Type B cells have a subtype called HI-H1, which features a dendrite tree and an axon. The former contacts mostly M- and L-cone cells and the latter rod cells. Contacts with cones are made mainly by prohibitory synapses, while the cells themselves are joined into a network with gap junctions.

Cross-section of the human retina, with bipolar cells indicated in red.

Bipolar cells spread single dendrites in the outer plexiform layer and the perikaryon, their cell bodies, are found in the inner nuclear layer. Dendrites interconnect exclusively with cones and rods and we differentiate between one rod bipolar cell and nine or ten cone bipolar cells. These cells branch with amacrine or ganglion cells in the inner plexiform layer using an axon. Rod bipolar cells connect to triad synapses or 18-70 rod cells. Their axons spread around the inner plexiform layer synaptic terminals, which contain ribbon synapses and contact a pair of cell processes in dyad synapses. They are connected to ganglion cells with AII amacrine cell links.

Amecrine cells can be found in the inner nuclear layer and in the ganglion cell layer of the retina. Occasionally they are found in the inner plexiform layer, where they work as signal modulators. They have been classified as narrow-field, small-field, medium-field or wide-field depending on their size. However, many classifications exist leading to over 40 different types of amecrine cells.

Ganglion cells are the final transmitters of visual signal from the retina to the brain. The most common ganglion cells in the retina is the midget ganglion cell and the parasol ganglion cell. The signal after having passed through all the retinal layers is passed on to these cells which are the final stage of the retinal processing chain. All the information is collected here forwarded to the retinal nerve fibres and optic nerves. The spot where the ganglion axons fuse to create an optic nerve is called the optic disc. This nerve is built mainly from the retinal ganglion axons and Portort cells. The majority of the axons transmit data to the lateral geniculate nucleus, which is a termination nexus for most parts of the nerve and which forwards the information to the visual cortex. Some ganglion cells also react to light, but because this response is slower than that of rods and cones, it is believed to be related to sensing ambient light levels and adjusting the biological clock.

Signal Processing

As mentioned before the retina is the main component in the eye, because it contains all the light sensitive cells. Without it, the eye would be comparable to a digital camera without the CCD (Charge Coupled Device) sensor. This part elaborates on how the retina perceives the light, how the optical signal is transmitted to the brain and how the brain processes the signal to form enough information for decision making.

Creation of the initial signals - Photosensor Function

Vision invariably starts with light hitting the photo-sensitive cells found in the retina. Light-absorbing visual pigments, a variety of enzymes and transmitters in retinal rods and cones will initiate the conversion from visible EM stimuli into electrical impulses, in a process known as photoelectric transduction. Using rods as an example, the incoming visible EM hits rhodopsin molecules, transmembrane molecules found in the rods' outer disk structure. Each rhodopsin molecule consists of a cluster of helices called opsin that envelop and surround 11-cis retinal, which is the part of the molecule that will change due to the energy from the incoming photons. In biological molecules, moieties, or parts of molecules that will cause conformational changes due to this energy is sometimes referred to as chromophores. 11-cis retinal straightens in response to the incoming energy, turning into retinal (all-trans retinal), which forces the opsin helices further apart, causing particular reactive sites to be uncovered. This "activated" rhodopsin molecule is sometimes referred to as Metarhodopsin II. From this point on, even if the visible light stimulation stops, the reaction will continue. The Metarhodopsin II can then react with roughly 100 molecules of a Gs protein called transducing, which then results in as and ß? after the GDP is converted into GTP. The activated as-GTP then binds to cGMP-phosphodiesterase(PDE), suppressing normal ion-exchange functions, which results in a low cytosol concentration of cation ions, and therefore a change in the polarisation of the cell.

The natural photoelectric transduction reaction has an amazing power of amplification. One single retinal rhodopsin molecule activated by a single quantum of light causes the hydrolysis of up to 106 cGMP molecules per second.

Photo Transduction
Representation of molecular steps in photoactivation (modified from Leskov et al., 2000). Depicted is an outer membrane disk in a rod. Step 1: Incident photon (hν) is absorbed and activates a rhodopsin by conformational change in the disk membrane to R*. Step 2: Next, R* makes repeated contacts with transducin molecules, catalyzing its activation to G* by the release of bound GDP in exchange for cytoplasmic GTP (Step 3). The α and γ subunit G* binds inhibitory γ subunits of the phosphodiesterase (PDE) activating its α and ß subunits. Step 4: Activated PDE hydrolyzes cGMP. Step 5: Guanylyl cyclase (GC) synthesizes cGMP, the second messenger in the phototransduction cascade. Reduced levels of cytosolic cGMP cause cyclic nucleotide gated channels to close preventing further influx of Na+ and Ca2+.
  1. A light photon interacts with the retinal in a photoreceptor. The retinal undergoes isomerisation, changing from the 11-cis to all-trans configuration.
  2. Retinal no longer fits into the opsin binding site.
  3. Opsin therefore undergoes a conformational change to metarhodopsin II.
  4. Metarhodopsin II is unstable and splits, yielding opsin and all-trans retinal.
  5. The opsin activates the regulatory protein transducin. This causes transducin to dissociate from its bound GDP, and bind GTP, then the alpha subunit of transducin dissociates from the beta and gamma subunits, with the GTP still bound to the alpha subunit.
  6. The alpha subunit-GTP complex activates phosphodiesterase.
  7. Phosphodiesterase breaks down cGMP to 5'-GMP. This lowers the concentration of cGMP and therefore the sodium channels close.
  8. Closure of the sodium channels causes hyperpolarization of the cell due to the ongoing potassium current.
  9. Hyperpolarization of the cell causes voltage-gated calcium channels to close.
  10. As the calcium level in the photoreceptor cell drops, the amount of the neurotransmitter glutamate that is released by the cell also drops. This is because calcium is required for the glutamate-containing vesicles to fuse with cell membrane and release their contents.
  11. A decrease in the amount of glutamate released by the photoreceptors causes depolarization of On center bipolar cells (rod and cone On bipolar cells) and hyperpolarization of cone Off bipolar cells.

Without visible EM stimulation, rod cells containing a cocktail of ions, proteins and other molecules, have membrane potential differences of around -40mV. Compared to other nerve cells, this is quite high (-65mV). In this state, the neurotransmitter glutamate is continuously released from the axon terminals and absorbed by the neighbouring bipolar cells. With incoming visble EM and the previously mentioned cascade reaction, the potential difference drops to -70mV. This hyper-polarisation of the cell causes a reduction in the amount of released glutamate, thereby affecting the activity of the bipolar cells, and subsequently the following steps in the visual pathway.

Similar processes exist in the cone-cells and in photosensitive ganglion cells, but make use of different opsins. Photopsin I through III (yellowish-green, green and blue-violet respectively) are found in the three different cone cells and melanopsin (blue) can be found in the photosensitive ganglion cells.

Processing Signals in the Retina

Different bipolar cells react differently to the changes in the released glutamate. The so called ON and OFF bipolar cells are used to form the direct signal flow from cones to bipolar cells. The ON bipolar cells will depolarise by visible EM stimulation and the corresponding ON ganglion cells will be activated. On the other hand the OFF bipolar cells are hyper polarised by the visible EM stimulation, and the OFF ganglion cells are inhibited. This is the basic pathway of the Direct signal flow. The Lateral signal flow will start from the rods, then go to the bipolar cells, the amacrine cells, and the OFF bipolar cells inhibited by the Rod-amacrine cells and the ON bipolar cells will stimulated via an electrical synapse, after all of the previous steps, the signal will arrive at the ON or OFF ganglion cells and the whole pathway of the Lateral signal flow is established.

When the action potential (AP) in ON, ganglion cells will be triggered by the visible EM stimulus. The AP frequency will increase when the sensor potential increases. In other words, AP depends on the amplitude of the sensor's potential. The region of ganglion cells where the stimulatory and inhibitory effects influence the AP frequency is called receptive field (RF). Around the ganglion cells, the RF is usually composed of two regions: the central zone and the ring-like peripheral zone. They are distinguishable during visible EM adaptation. A visible EM stimulation on the centric zone could lead to AP frequency increase and the stimulation on the periphery zone will decrease the AP frequency. When the light source is turned off the excitation occurs. So the name of ON field (central field ON) refers to this kind of region. Of course the RF of the OFF ganglion cells act the opposite way and is therefore called "OFF field" (central field OFF). The RFs are organised by the horizontal cells. The impulse on the periphery region will be impulsed and transmitted to the central region, and there the so-called stimulus contrast is formed. This function will make the dark seem darker and the light brighter. If the whole RF is exposed to light. the impulse of the central region will predominate.

Signal Transmission to the Cortex

As mentioned previously, axons of the ganglion cells converge at the optic disk of the retina, forming the optic nerve. These fibres are positioned inside the bundle in a specific order. Fibres from the macular zone of the retina are in the central portion, and those from the temporal half of the retina take up the periphery part. A partial decussation or crossing occurs when these fibres are outside the eye cavity. The fibres from the nasal halves of each retina cross to the opposite halves and extend to the brain. Those from the temporal halves remain uncrossed. This partial crossover is called the optic chiasma, and the optic nerves past this point are called optic tracts, mainly to distinguish them from single-retinal nerves. The function of the partial crossover is to transmit the right-hand visual field produced by both eyes to the left-hand half of the brain only and vice versa. Therefore the information from the right half of the body, and the right visual field, is all transmitted to the left-hand part of the brain when reaches the posterior part of the fore-brain (diencephalon).

The pathway to the central cortex

The information relay between the fibers of optic tracts and the nerve cells occurs in the lateral geniculate bodies, the central part of the visual signal processing, located in the thalamus of the brain. From here the information is passed to the nerve cells in the occipital cortex of the corresponding side of the brain. Connections from the retina to the brain can be separated into a 'parvocellular pathway' and a "magnocellular pathway". The parvocellular pathways signals color and fine detail, whereas the magnocellular pathways detect fast moving stimuli.

Connections from the retina to the brain can be separated into a "parvocellular pathway" and a "magnocellular pathway". The parvocellular pathway originates in midget cells in the retina, and signals color and fine detail; magnocellular pathway starts with parasol cells, and detects fast moving stimuli.

Signals from standard digital cameras correspond approximately to those of the parvocellular pathway. To simulate the responses of parvocellular pathways, researchers have been developing neuromorphic sensory systems, which try to mimic spike-based computation in neural systems. Thereby they use a scheme called "address-event representation" for the signal transmission in the neuromorphic electronic systems (Liu and Delbruck 2010 [1]).

Anatomically, the retinal Magno and Parvo ganglion cells respectively project to 2 ventral magnocellular layers and 4 dorsal parvocellular layers of the Lateral Geniculate Nucleus (LGN). Each of the six LGN layers receives inputs from either the ipsilateral or contralateral eye, i.e., the ganglion cells of the left eye cross over and project to layer 1, 4 and 6 of the right LGN, and the right eye ganglion cells project (uncrossed) to its layer 2, 3 and 5. From here the information from the right and left eye is separated.

Although human vision is combined by two halves of the retina and the signal is processed by the opposite cerebral hemispheres, the visual field is considered as a smooth and complete unit. Hence the two visual cortical areas are thought of as being intimately connected. This connection, called corpus callosum is made of neurons, axons and dendrites. Because the dendrites make synaptic connections to the related points of the hemispheres, electric simulation of every point on one hemisphere indicates simulation of the interconnected point on the other hemisphere. The only exception to this rule is the primary visual cortex.

The synapses are made by the optic tract in the respective layers of the lateral geniculate body. Then these axons of these third-order nerve cells are passed up to the calcarine fissure in each occipital lobe of the cerebral cortex. Because bands of the white fibres and axons pair from the nerve cells in the retina go through it, it is called the striate cortex, which incidentally is our primary visual cortex, sometimes known as V1. At this point, impulses from the separate eyes converge to common cortical neurons, which then enables complete input from both eyes in one region to be used for perception and comprehension. Pattern recognition is a very important function of this particular part of the brain, with lesions causing problems with visual recognition or blindsight.

Based on the ordered manner in which the optic tract fibres pass information to the lateral geniculate bodies and after that pass in to the striate area, if one single point stimulation on the retina was found, the response which produced electrically in both lateral geniculate body and the striate cortex will be found at a small region on the particular retinal spot. This is an obvious point-to-point way of signal processing. And if the whole retina is stimulated, the responses will occur on both lateral geniculate bodies and the striate cortex gray matter area. It is possible to map this brain region to the retinal fields, or more usually the visual fields.

Any further steps in this pathway is beyond the scope of this book. Rest assured that, many further levels and centres exist, focusing on particular specific tasks, like for example colour, orientations, spatial frequencies, emotions etc.

Information Processing in the Visual System

Equipped with a firmer understanding of some of the more important concepts of the signal processing in the visual system, comprehension or perception of the processed sensory information is the last important piece in the puzzle. Visual perception is the process of translating information received by the eyes into an understanding of the external state of things. It makes us aware of the world around us and allows us to understand it better. Based on visual perception we learn patterns which we then apply later in life and we make decisions based on this and the obtained information. In other words, our survival depends on perception. The field of Visual Perception has been divided into different subfields, due to the fact that processing is too complex and requires of different specialized mechanisms to perceive what is seen. These subfields include: Color Perception, Motion Perception, Depth Perception, and Face Recognition, etc.

Deep Hierarchies in the Primate Visual Cortex

Deep hierarchies in the visual system

Despite the ever-increasing computational power of electronic systems, there are still many tasks where animals and humans are vastly superior to computers – one of them being the perception and contextualization of information. The classical computer, either the one in your phone or a supercomputer taking up the whole room, is in essence a number-cruncher. It can perform an incredible amount of calculations in a miniscule amount of time. What it lacks is creating abstractions of the information it is working with. If you attach a camera to your computer, the picture it “perceives” is just a grid of pixels, a 2-dimensional array of numbers. A human would immediately recognize the geometry of the scene, the objects in the picture, and maybe even the context of what’s going on. This ability of ours is provided by dedicated biological machinery – the visual system of the brain. It processes everything we see in a hierarchical way, starting from simpler features of the image to more complex ones all the way to classification of objects into categories. Hence the visual system is said to have a deep hierarchy. The deep hierarchy of the primate visual system has inspired computer scientists to create models of artificial neural networks that would also feature several layers where each of them creates higher generalizations of the input data.

Approximately half of the human neocortex is dedicated to vision. The processing of visual information happens over at least 10 functional levels. The neurons in the early visual areas extract simple image features over small local regions of visual space. As the information gets transmitted to higher visual areas, neurons respond to increasingly complex features. With higher levels of information processing the representations become more invariant – less sensitive to the exact feature size, rotation or position. In addition, the receptive field size of neurons in higher visual areas increases, indicating that they are tuned to more global image features. This hierarchical structure allows for efficient computing – different higher visual areas can use the same information computed in the lower areas. The generic scene description that is made in the early visual areas is used by other parts of the brain to complete various different tasks, such as object recognition and categorization, grasping, manipulation, movement planning etc.

Sub-cortical vision

The neural processing of visual information starts already before any of the cortical structures. Photoreceptors on the retina detect light and send signals to retinal ganglion cells. The receptive field size of a photoreceptor is one 100th of a degree (a one degree large receptive field is roughly the size of your thumb, when you have your arm stretched in front of you). The number of inputs to a ganglion cell and therefore its receptive field size depends on the location – in the center of the retina it receives signals from as few as five receptors, while in the periphery a single cell can have several thousand inputs. This implies that the highest spatial resolution is in the center of the retina, also called the fovea. Due to this property primates posses a gaze control mechanism that directs the eyesight so that the features of interest project onto the fovea.

Ganglion cells are selectively tuned to detect various features of the image, such as luminance contrast, color contrast, and direction and speed of movement. All of these features are the primary information used further up the processing pipeline. If there are visual stimuli that are not detectable by ganglion cells, then they are also not available for any cortical visual area.

Ganglion cells project to a region in thalamus called lateral geniculate nucleus (LGN), which in turn relays the signals to the cortex. There is no significant computation known to happen in LGN – there is almost a one-to-one correspondence between retinal ganglion and LGN cells. However, only 5% of the inputs to LGN come from the retina – all the other inputs are cortical feedback projections. Although the visual system is often regarded as a feed-forward system, the recurrent feedback connections as well as lateral connections are a common feature seen throughout the visual cortex. The role of the feedback is not yet fully understood but it is proposed to be attributed to processes like attention, expectation, imagination and filling-in the missing information.

Cortical vision

Main areas of the visual system

The visual cortex can be divided into three large parts – the occipital part which receives input from LGN and then sends outputs to dorsal and ventral streams. Occipital part includes the areas V1-V4 and MT, which process different aspects of visual information and gives rise to a generic scene representation. The dorsal pathway is involved in the analysis of space and in action planning. The ventral pathway is involved in object recognition and categorization.

V1 is the first cortical area that processes visual information. It is sensitive to edges, gratings, line-endings, motion, color and disparity (angular difference between the projections of a point onto the left and right retinas). The most straight forward example of the hierarchical bottom-up processing is the linear combination of the inputs from several ganglion cells with center-surround receptive fields to create a representation of a bar. This is done by the simple cells of V1 and was first described by the prominent neuroscientists Hubel and Wiesel. This type of information integration implies that the simple cells are sensitive to the exact location of the bar and have a relatively small receptive field. The complex cells of V1 receive inputs from the simple cells, and while also responding to linear oriented patterns they are not sensitive to the exact position of the bar and have a larger receptive field. The computation present in this step could be a MAX-like operation which produces responses similar in amplitude to the larger of the responses pertaining to the individual stimuli. Some simple and complex cells can also detect the end of a bar, and a fraction of V1 cells are also sensitive to local motion within their respective receptive fields.

Area V2 features more sophisticated contour representation including texture-defined contours, illusory contours and contours with border ownership. V2 also builds upon the absolute disparity detection in V1 and features cells that are sensitive to relative disparity which is the difference between the absolute disparities of two points in space. Area V4 receives inputs from V2 and area V3, but very little is known about the computation taking place in V3. Area V4 features neurons that are sensitive to contours with different curvature and vertices with particular angles. Another important feature is the coding for luminance-invariant hue. This is in contrast to V1 where neurons respond to color opponency along the two principle axis (red-green and yellow-blue) rather than the actual color. V4 further outputs to the ventral stream, to inferior temporal cortex (IT) which has been shown through lesion studies to be essential for object discrimination.

Inferior temporal cortex: object discrimination

Stimulus reduction in area TE

Inferior temporal cortex (IT) is divided into two areas: TEO and TE. Area TEO integrates information about the shapes and relative positions of multiple contour elements and features mostly cells which respond to simple combinations of features. The receptive field size of TEO neurons is about 3-5 degrees. Area TE features cells with significantly larger receptive fields (10-20 degrees) which respond to faces, hands and complex feature configurations. Cells in TE respond to visual features that are a simpler generalization of the object of interest but more complex than simple bars or spots. This was shown using a stimulus-reduction method by Tanaka et al. where first a response to an object is measured and then the object is replaced by simpler representations until the critical feature that the TE neurons are responding to is narrowed down.

It appears that the neurons in IT pull together various features of medium complexity from lower levels in the ventral stream to build models of object parts. The neurons in TE that are selective to specific objects have to fulfil two seemingly contradictory requirements – selectivity and invariance. They have to distinguish between different objects by the means of sensitivity to features in the retinal images. However, the same object can be viewed from different angles and distances at different light conditions yielding highly dissimilar retinal images of the same object. To treat all these images as equivalent, invariant features must be derived that are robust against certain transformations, such as changes in position, illumination, size on the retina etc. Neurons in area TE show invariance to position and size as well as to partial occlusion, position-in-depth and illumination direction. Rotation in depth has been shown to have the weakest invariance, with the exception if the object is a human face.

Object categories are not yet explicitly present in area TE – a neuron might typically respond to several but not all exemplars of the same category (e.g., images of trees) and it might also respond to exemplars of different categories (e.g., trees and non-trees). Object recognition and classification most probably involves sampling from a larger population of TE neurons as well as receiving inputs from additional brain areas, e.g., those that are responsible for understanding the context of the scene. Recent readout experiments have demonstrated that statistical classifiers (e.g. support vector machines) can be trained to classify objects based on the responses of a small number of TE neurons. Therefore, a population of TE neurons in principle can reliably signal object categories by their combined activity. Interestingly, there are also reports on highly selective neurons in medial temporal lobe that respond to very specific cues, e.g., to the tower of Pisa in different images or to a particular person’s face.

Learning in the Visual System

Learning can alter the visual feature selectivity of neurons, with the effect of learning becoming stronger at higher hierarchical levels. There is no known evidence on learning in the retina and also the orientation maps in V1 seem to be genetically largely predetermined. However, practising orientation identification improves orientation coding in V1 neurons, by increasing the slope of the tuning curve. Similar but larger effects have been seen in V4. In area TE relatively little visual training has noticeable physiological effects on visual perception, on a single cell level as well as in fMRI. For example, morphing two objects into each other increases their perceived similarity. Overall it seems that the even the adult visual cortex is considerably plastic, and the level of plasticity can be significantly increased, e.g., by administering specific drugs or by living in an enriched environment.

Deep Neural Networks

Similarly to the deep hierarchy of the primate visual system, deep learning architectures attempt to model high-level abstractions of the input data by using multiple levels of non-linear transformations. The model proposed by Hubel and Wiesel where information is integrated and propagated in a cascade from retina and LGN to simple cells and complex cells in V1 inspired the creation of one of the first deep learning architectures, the neocognitron – a multilayered artificial neural network model. It was used for different pattern recognition tasks, including the recognition of handwritten characters. However, it took a lot of time to train the network (in the order of days) and since its inception in the 1980s deep learning didn’t get much attention until the mid-2000s with the abundance of digital data and the invention of faster training algorithms. Deep neural networks have proved themselves to be very effective in tasks that not so long ago seemed possible only for humans to perform, such as recognizing the faces of particular people in photos, understanding human speech (to some extent) and translating text from foreign languages. Furthermore, they have proven to be of great assistance in industry and science to search for potential drug candidates, map real neural networks in the brain and predict the functions of proteins. It must be noted that deep learning is only very loosely inspired from the brain and is much more of an achievement of the field of computer science / machine learning than of neuroscience. The basic parallels are that the deep neural networks are composed of units that integrate information inputs in a non-linear manner (neurons) and send signals to each other (synapses) and that there are different levels of increasingly abstract representations of the data. The learning algorithms and mathematical descriptions of the “neurons” used in deep learning are very different from the actual processes taking place in the brain. Therefore, the research in deep learning, while giving a huge push to a more sophisticated artificial intelligence, can give only limited insights about the brain.

Example of a neuron with its main components.
Example of a base unit of the neural networks. In the example the activation function is a Rectified Linear Unit (ReLU), but there are also other possibilities, among which the sigmoid or the hyperbolic tangent. The bias changes the threshold of activation of the unit, and as such it is analogous to the value of the threshold for the action potential in the neuron.
Example of a deep neural network. Each square represents one unit as described in the image above.


Papers on the deep hierarchies in the visual system
  • Kruger, N.; Janssen, P.; Kalkan, S.; Lappe, M.; Leonardis, A.; Piater, J.; Rodriguez-Sanchez, A. J.; Wiskott, L. (August 2013). "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?". IEEE Transactions on Pattern Analysis and Machine Intelligence. 35 (8): 1847–1871. doi:10.1109/TPAMI.2012.272.
  • Poggio, Tomaso; Riesenhuber, Maximilian (1 November 1999). Nature Neuroscience. 2 (11): 1019–1025. doi:doi:10.1038/14819. {{cite journal}}: Check |doi= value (help); Missing or empty |title= (help)
Stimulus reduction experiment
Evidence on learning in the visual system
  • Li, Nuo; DiCarlo, James J. (23 September 2010). "Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex". Neuron. 67 (6): 1062–1075. doi:10.1016/j.neuron.2010.08.029.
  • Raiguel, S.; Vogels, R.; Mysore, S. G.; Orban, G. A. (14 June 2006). "Learning to See the Difference Specifically Alters the Most Informative V4 Neurons". Journal of Neuroscience. 26 (24): 6589–6602. doi:10.1523/JNEUROSCI.0457-06.2006.
  • Schoups, A; Vogels, R; Qian, N; Orban, G (2 August 2001). "Practising orientation identification improves orientation coding in V1 neurons". Nature. 412 (6846): 549–53. PMID 11484056.
A recent and accessible overview of the status quo of the deep learning research
  • Jones, Nicola (8 January 2014). "Computer science: The learning machines". Nature. 505 (7482): 146–148. doi:10.1038/505146a.

Motion Perception

Motion Perception is the process of inferring speed and direction of moving objects. Area V5 in humans and area MT (Middle Temporal) in primates are responsible for cortical perception of Motion. Area V5 is part of the extrastriate cortex, which is the region in the occipital region of the brain next to the primary visual cortex. The function of Area V5 is to detect speed and direction of visual stimuli, and integrate local visual motion signals into global motion. Area V1 or Primary Visual cortex is located in the occipital lobe of the brain in both hemispheres. It processes the first stage of cortical processing of visual information. This area contains a complete map of the visual field covered by the eyes. The difference between area V5 and area V1 (Primary Visual Cortex) is that area V5 can integrate motion of local signals or individual parts of an object into a global motion of an entire object. Area V1, on the other hand, responds to local motion that occurs within the receptive field. The estimates from these many neurons are integrated in Area V5.

Movement is defined as changes in retinal illumination over space and time. Motion signals are classified into First order motions and Second order motions. These motion types are briefly described in the following paragraphs.

Example of a "Beta movement".

First-order motion perception refers to the motion perceived when two or more visual stimuli switch on and off over time and produce different motion perceptions. First order motion is also termed "apparent motion,” and it is used in television and film. An example of this is the "Beta movement", which is an illusion in which fixed images seem to move, even though they do not move in reality. These images give the appearance of motion, because they change and move faster than what the eye can detect. This optical illusion happens because the human optic nerve responds to changes of light at ten cycles per second, so any change faster than this rate will be registered as a continuum motion, and not as separate images.

Second order motion refers to the motion that occurs when a moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in luminance or motion energy of the image. Evidence suggests that early processing of First order motion and Second order motion is carried out by separate pathways. Second order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies to which they respond. Second-order motion produces a weaker motion aftereffect. First and second-order signals are combined in are V5.

In this chapter, we will analyze the concepts of Motion Perception and Motion Analysis, and explain the reason why these terms should not be used interchangeably. We will analyze the mechanisms by which motion is perceived such as Motion Sensors and Feature Tracking. There exist three main theoretical models that attempt to describe the function of neuronal sensors of motion. Experimental tests have been conducted to confirm whether these models are accurate. Unfortunately, the results of these tests are inconclusive, and it can be said that no single one of these models describes the functioning of Motion Sensors entirely. However, each of these models simulates certain features of Motion Sensors. Some properties of these sensors are described. Finally, this chapter shows some motion illusions, which demonstrate that our sense of motion can be mislead by static external factors that stimulate motion sensors in the same way as motion.

Motion Analysis and Motion Perception

The concepts of Motion Analysis and Motion Perception are often confused as interchangeable. Motion Perception and Motion Analysis are important to each other, but they are not the same.

Motion Analysis refers to the mechanisms in which motion signals are processed. In a similar way in which Motion Perception does not necessarily depend on signals generated by motion of images in the retina, Motion Analysis may or may not lead to motion perception. An example of this phenomenon is Vection, which occurs when a person perceives that she is moving when she is stationary, but the object that she observes is moving. Vection shows that motion of an object can be analyzed, even though it is not perceived as motion coming from the object. This definition of Motion analysis suggests that motion is a fundamental image property. In the visual field, it is analyzed at every point. The results from this analysis are used to derive perceptual information.

Motion Perception refers to the process of acquiring perceptual knowledge about motion of objects and surfaces in an image. Motion is perceived either by delicate local sensors in the retina or by feature tracking. Local motion sensors are specialized neurons sensitive to motion, and analogous to specialized sensors for color. Feature tracking is an indirect way to perceive motion, and it consists of inferring motion from changes in retinal position of objects over time. It is also referred to as third order motion analysis. Feature tracking works by focusing attention to a particular object and observing how its position has changed over time.

Motion Sensors

Detection of motion is the first stage of visual processing, and it happens thanks to specialized neural processes, which respond to information regarding local changes of intensity of images over time. Motion is sensed independently of other image properties at all locations in the image. It has been proven that motion sensors exist, and they operate locally at all points in the image. Motion sensors are dedicated neuronal sensors located in the retina that are capable of detecting a motion produced by two brief and small light flashes that are so close together that they could not be detected by feature tracking. There exist three main models that attempt to describe the way that these specialized sensors work. These models are independent of one another, and they try to model specific characteristics of Motion Perception. Although there is not sufficient evidence to support that any of these models represent the way the visual system (motion sensors particularly) perceives motion, they still correctly model certain functions of these sensors.

Two different mechanisms for motion detection. Left) A "Reichardt detector" consists of two mirror-symmetrical subunits. In each subunit, the luminance values as measured in two adjacent points become multiplied (M) with each other after one of them is delayed by a low-pass filter with time-constant τ. The resulting output signals of the multipliers become finally subtracted. Right) In the gradient detector, the temporal luminance gradient as measured after one photoreceptor (δI/δt, Left) is divided by the spatial luminance gradient (δI/δx). Here, the spatial gradient is approximated by the difference between the luminance values in two adjacent points.

The Reichardt Detector

The Reichardt Detector is used to model how motion sensors respond to First order motion signals. When an objects moves from point A in the visual field to point B, two signals are generated: one before the movement began and another one after the movement has completed. This model perceives this motion by detecting changes in luminance at one point on the retina and correlating it with a change in luminance at another point nearby after a short delay. The Reichardt Detector operates based on the principle of correlation (statistical relation that involves dependency). It interprets a motion signal by spatiotemporal correlation of luminance signals at neighboring points. It uses the fact that two receptive fields at different points on the trajectory of a moving object receive a time shifted version of the same signal – a luminance pattern moves along an axis and the signal at one point in the axis is a time shifted version of a previous signal in the axis. The Reichardt Detector model has two spatially separate neighboring detectors. The output signals of the detectors are multiplied (correlated) in the following way: a signal multiplied by a second signal that is the time-shifted version of the original. The same procedure is repeated but in the reverse direction of motion (the signal that was time-shifted becomes the first signal and vice versa). Then, the difference between these two multiplications is taken, and the outcome gives the speed of motion. The response of the detector depends upon the stimulus’ phase, contrast and speed. Many detectors tuned at different speeds are necessary to encode the true speed of the pattern. The most compelling experimental evidence for this kind of detector comes from studies of direction discrimination of barely visible targets.

Motion-Energy Filtering

Motion Energy Filter is a model of Motion Sensors based on the principle of phase invariant filters. This model builds spatio-temporal filters oriented in space-time to match the structure of moving patterns. It consists of separable filters, for which spatial profiles remain the same shape over time but are scaled by the value of the temporal filters. Motion Energy Filters match the structure of moving patterns by adding together separable filters. For each direction of motion, two space-time filters are generated: one, which is symmetric (bar-like), and one which is asymmetric (edge-like). The sum of the squares of these filters is called the motion energy. The difference in the signal for the two directions is called the opponent energy. This result is then divided by the squared output of another filter, which is tuned to static contrast. This division is performed to take into account the effect of contrast in the motion. Motion Energy Filters can model a number of motion phenomenon, but it produces a phase independent measurement, which increases with speed but does not give a reliable value of speed.

Spatiotemporal Gradients

This model of Motion sensors was originally developed in the field of computer vision, and it is based on the principle that the ratio of the temporal derivative of image brightness to the spatial derivative of image brightness gives the speed of motion. It is important to note that at the peaks and troughs of the image, this model will not compute an adequate answer, because the derivative in the denominator would be zero. In order to solve this problem, the first-order and higher-order spatial derivatives with respect to space and time can also be analyzed. Spatiotemporal Gradients is a good model for determining the speed of motion at all points in the image.

Motion Sensors are Orientation-Selective

One of the properties of Motion Sensors is orientation-selectivity, which constrains motion analysis to a single dimension. Motion sensors can only record motion in one dimension along an axis orthogonal to the sensor’s preferred orientation. A stimulus that contains features of a single orientation can only be seen to move in a direction orthogonal to the stimulus’ orientation. One-dimensional motion signals give ambiguous information about the motion of two-dimensional objects. A second stage of motion analysis is necessary in order to resolve the true direction of motion of a 2-D object or pattern. 1-D motion signals from sensors tuned to different orientations are combined to produce an unambiguous 2-D motion signal. Analysis of 2-D motion depends on signals from local broadly oriented sensors as well as on signals from narrowly oriented sensors.

Feature Tracking

Another way in which we perceive motion is through Feature Tracking. Feature Tracking consists of analyzing whether or not the local features of an object have changed positions, and inferring movement from this change. In this section, some features about Feature trackers are mentioned.

Feature trackers fail when a moving stimulus occurs very rapidly. Feature trackers have the advantage over Motion sensors that they can perceive movement of an object even if the movement is separated by intermittent blank intervals. They can also separate these two stages (movements and blank intervals). Motion sensors, on the other hand, would just integrate the blanks with the moving stimulus and see a continuous movement. Feature trackers operate on the locations of identified features. For that reason, they have a minimum distance threshold that matches the precision with which locations of features can be discriminated. Feature trackers do not show motion aftereffects, which are visual illusions that are caused as a result of visual adaptation. Motion aftereffects occur when, after observing a moving stimulus, a stationary object appears to be moving in the opposite direction of the previously observed moving stimulus. It is impossible for this mechanism to monitor multiple motions in different parts of the visual field and at the same time. On the other hand, multiple motions are not a problem for motion sensors, because they operate in parallel across the entire visual field.

Experiments have been conducted using the information above to reach interesting conclusions about feature trackers. Experiments with brief stimuli have shown that color patterns and contrast patterns at high contrasts are not perceived by feature trackers but by motion sensors. Experiments with blank intervals have confirmed that feature tracking can occur with blank intervals in the display. It is only at high contrast that motion sensors perceive the motion of chromatic stimuli and contrast patterns. At low contrasts feature trackers analyze the motion of both chromatic patterns and contrast envelopes and at high contrasts motion sensors analyze contrast envelopes. Experiments in which subjects make multiple motion judgments suggest that feature tracking is a process that occurs under conscious control and that it is the only way we have to analyze the motion of contrast envelopes in low-contrast displays. These results are consistent with the view that the motion of contrast envelopes and color patterns depends on feature tracking except when colors are well above threshold or mean contrast is high. The main conclusion of these experiments is that it is probably feature tracking that allows perception of contrast envelopes and color patterns.

Motion Illusions

As a consequence of the process in which Motion detection works, some static images might seem to us like they are moving. These images give an insight into the assumptions that the visual system makes, and are called visual illusions.

A famous Motion Illusion related to first order motion signals is the Phi phenomenon, which is an optical illusion that makes us perceive movement instead of a sequence of images. This motion illusion allows us to watch movies as a continuum and not as separate images. The phi phenomenon allows a group of frozen images that are changed at a constant speed to be seen as a constant movement. The Phi phenomenon should not be confused with the Beta Movement, because the former is an apparent movement caused by luminous impulses in a sequence, while the later one is an apparent movement caused by luminous stationary impulses.

Motion Illusions happen when Motion Perception, Motion Analysis and the interpretation of these signals are misleading, and our visual system creates illusions about motion. These illusions can be classified according to which process allows them to happen. Illusions are classified as illusions related to motion sensing, 2D integration, and 3D interpretation

The most popular illusions concerning motion sensing are four-stroke motion, RDKs and second order motion signals illusions. The most popular motion illusions concerning 2D integration are Motion Capture, Plaid Motion and Direct Repulsion. Similarly, the ones concerning 3D interpretation are Transformational Motion, Kinetic Depth, Shadow Motion, Biological Motion, Stereokinetic motion, Implicit Figure Motion and 2 Stroke Motion. There are far more Motion Illusions, and they all show something interesting regarding human Motion Detection, Perception and Analysis mechanisms. For more information, visit the following link:

Open Problems

Although we still do not understand most of the specifics regarding Motion Perception, understanding the mechanisms by which motion is perceived as well as motion illusion can give the reader a good overview of the state of the art in the subject. Some of the open problems regarding Motion Perception are the mechanisms of formation of 3D images in global motion and the Aperture Problem.

Global motion signals from the retina are integrated to arrive at a 2 dimensional global motion signal; however, it is unclear how 3D global motion is formed. The Aperture Problem occurs because each receptive field in the visual system covers only a small piece of the visual world, which leads to ambiguities in perception. The aperture problem refers to the problem of a moving contour that, when observed locally, is consistent with different possibilities of motion. This ambiguity is geometric in origin - motion parallel to the contour cannot be detected, as changes to this component of the motion do not change the images observed through the aperture. The only component that can be measured is the velocity orthogonal to the contour orientation; for that reason, the velocity of the movement could be anything from the family of motions along a line in velocity space. This aperture problem is not only observed in straight contours, but also in smoothly curved ones, since they are approximately straight when observed locally. Although the mechanisms to solve the Aperture Problem are still unknown, there exist some hypothesis on how it could be solved. For example, it could be possible to resolve this problem by combining information across space or from different contours of the same object.


In this chapter, we introduced Motion Perception and the mechanisms by which our visual system detects motion. Motion Illusions showed how Motion signals can be misleading, and consequently lead to incorrect conclusions about motion. It is important to remember that Motion Perception and Motion Analysis are not the same. Motion Sensors and Feature trackers complement each other to make the visual system perceive motion.

Motion Perception is complex, and it is still an open area of research. This chapter describes models about the way that Motion Sensors function, and hypotheses about Feature trackers characteristics; however, more experiments are necessary to learn about the characteristics of these mechanisms and be able to construct models that resemble the actual processes of the visual system more accurately.

The variety of mechanisms of motion analysis and motion perception described in this chapter, as well as the sophistication of the artificial models designed to describe them demonstrate that there is much complexity in the way in which the cortex processes signals from the outside environment. Thousands of specialized neurons integrate and interpret pieces of local signals to form global images of moving objects in our brain. Understanding that so many actors and processes in our bodies must work in concert to perceive motion makes our ability to it all the more remarkable that we as humans are able to do it with such ease.

Color Perception


Humans (together with primates like monkeys and gorillas) have the best color perception among mammals [1] . Hence, it is not a coincidence that color plays an important role in a wide variety of aspects. For example, color is useful for discriminating and differentiating objects, surfaces, natural scenery, and even faces [2],[3]. Color is also an important tool for nonverbal communication, including that of emotion [4].

For many decades, it has been a challenge to find the links between the physical properties of color and its perceptual qualities. Usually, these are studied under two different approaches: the behavioral response caused by color (also called psychophysics) and the actual physiological response caused by it [5].

Here we will only focus on the latter. The study of the physiological basis of color vision, about which practically nothing was known before the second half of the twentieth century, has advanced slowly and steadily since 1950. Important progress has been made in many areas, especially at the receptor level. Thanks to molecular biology methods, it has been possible to reveal previously unknown details concerning the genetic basis for the cone pigments. Furthermore, more and more cortical regions have been shown to be influenced by visual stimuli, although the correlation of color perception with wavelength-dependent physiology activity beyond the receptors is not so easy to discern [6].

In this chapter, we aim to explain the basics of the different processes of color perception along the visual path, from the retina in the eye to the visual cortex in the brain. For anatomical details, please refer to Sec. "Anatomy of the Visual System" of this Wikibook.

Color Perception at the Retina

All colors that can be discriminated by humans can be produced by the mixture of just three primary (basic) colors. Inspired by this idea of color mixing, it has been proposed that color is subserved by three classes of sensors, each having a maximal sensitivity to a different part of the visible spectrum [1]. It was first explicitly proposed in 1853 that there are three degrees of freedom in normal color matching [7]. This was later confirmed in 1886 [8] (with remarkably close results to recent studies [9], [10]).

These proposed color sensors are actually the so called cones (Note: In this chapter, we will only deal with cones. Rods contribute to vision only at low light levels. Although they are known to have an effect on color perception, their influence is very small and can be ignored here.) [11]. Cones are of the two types of photoreceptor cells found in the retina, with a significant concentration of them in the fovea. The Table below lists the three types of cone cells. These are distinguished by different types of rhodopsin pigment. Their corresponding absorption curves are shown in the Figure below.

Table 1: General overview of the cone types found in the retina.
Name Higher sensitivity to color Absorption curve peak [nm]
S, SWS, B Blue 420
M, MWS, G Green 530
L, LWS, R Red 560
Absorption curves for the different cones. Blue, green, and red represent the absorption of the S (420 nm), M (530 nm), and L (560 nm) cones, respectively.
Absorption curves for the different cones. Blue, green, and red represent the absorption of the S (420 nm), M (530 nm), and L (560 nm) cones, respectively.

Although no consensus has been reached for naming the different cone types, the most widely utilized designations refer either to their action spectra peak or to the color to which they are sensitive themselves (red, green, blue)[6]. In this text, we will use the S-M-L designation (for short, medium, and long wavelength), since these names are more appropriately descriptive. The blue-green-red nomenclature is somewhat misleading, since all types of cones are sensitive to a large range of wavelengths.

An important feature about the three cone types is their relative distribution in the retina. It turns out that the S-cones present a relatively low concentration through the retina, being completely absent in the most central area of the fovea. Actually, they are too widely spaced to play an important role in spatial vision, although they are capable of mediating weak border perception [12]. The fovea is dominated by L- and M-cones. The proportion of the two latter is usually measured as a ratio. Different values have been reported for the L/M ratio, ranging from 0.67 [13] up to 2 [14], the latter being the most accepted. Why L-cones almost always outnumber the M-cones remains unclear. Surprisingly, the relative cone ratio has almost no significant impact on color vision. This clearly shows that the brain is plastic, capable of making sense out of whatever cone signals it receives [15], [16].

It is also important to note the overlapping of the L- and M-cone absorption spectra. While the S-cone absorption spectrum is clearly separated, the L- and M-cone peaks are only about 30 nm apart, their spectral curves significantly overlapping as well. This results in a high correlation in the photon catches of these two cone classes. This is explained by the fact that in order to achieve the highest possible acuity at the center of the fovea, the visual system treats L- and M-cones equally, not taking into account their absorption spectra. Therefore, any kind of difference leads to a deterioration of the luminance signal [17]. In other words, the small separation between L- and M-cone spectra might be interpreted as a compromise between the needs for high-contrast color vision and high acuity luminance vision. This is congruent with the lack of S-cones in the central part of the fovea, where visual acuity is highest. Furthermore, the close spacing of L- and M-cone absorption spectra might also be explained by their genetic origin. Both cone types are assumed to have evolved "recently" (about 35 million years ago) from a common ancestor, while the S-cones presumably split off from the ancestral receptor much earlier[11].

The spectral absorption functions of the three different types of cone cells are the hallmark of human color vision. This theory solved a long-known problem: although we can see millions of different colors (humans can distinguish between 7 to 10 million different colors[5], our retinas simply do not have enough space to accommodate an individual detector for every color at every retinal location.

From the Retina to the Brain

The signals that are transmitted from the retina to higher levels are not simple point-wise representations of the receptor signals, but rather consist of sophisticated combinations of the receptor signals. The objective of this section is to provide a brief of the paths that some of this information takes.

Once the optical image on the retina is transduced into chemical and electrical signals in the photoreceptors, the amplitude-modulated signals are converted into frequency-modulated representations at the ganglion-cell and higher levels. In these neural cells, the magnitude of the signal is represented in terms of the number of spikes of voltage per second fired by the cell rather than by the voltage difference across the cell membrane. In order to explain and represent the physiological properties of these cells, we will find the concept of receptive fields very useful.

A receptive field is a graphical representation of the area in the visual field to which a given cell responds. Additionally, the nature of the response is typically indicated for various regions in the receptive field. For example, we can consider the receptive field of a photoreceptor as a small circular area representing the size and location of that particular receptor's sensitivity in the visual field. The Figure below shows exemplary receptive fields for ganglion cells, typically in a center-surround antagonism. The left receptive field in the figure illustrates a positive central response (know as on-center). This kind of response is usually generated by a positive input from a single cone surrounded by a negative response generated from several neighboring cones. Therefore, the response of this ganglion cell would be made up of inputs from various cones with both positive and negative signs. In this way, the cell not only responds to points of light, but serves as an edge (or more correctly, a spot) detector. In analogy to the computer vision terminology, we can think of the ganglion cell responses as the output of a convolution with an edge-detector kernel. The right receptive field of in the figure illustrates a negative central response (know as off-center), which is equally likely. Usually, on-center and off-center cells will occur at the same spatial location, fed by the same photoreceptors, resulting in an enhanced dynamic range.

The lower Figure shows that in addition to spatial antagonism, ganglion cells can also have spectral opponency. For instance, the left part of the lower figure illustrates a red-green opponent response with the center fed by positive input from an L-cone and the surrounding fed by a negative input from M-cones. On the other hand, the right part of the lower figure illustrates the off-center version of this cell. Hence, before the visual information has even left the retina, processing has already occurred, with a profound effect on color appearance. There are other types and varieties of ganglion cell responses, but they all share these basic concepts.

Antagonist receptive fields (on center)
On center
Antagonist receptive fields (off center)
Off center
Antagonist receptive fields
Spectrally and spatially antagonist receptive fields (on center)
On center
Spectrally and spatially antagonist receptive fields (off center)
Off center
Spectrally and spatially antagonist receptive fields.

On their way to the primary visual cortex, ganglion cell axons gather to form the optic nerve, which projects to the lateral geniculate nucleus (LGN) in the thalamus. Coding in the optic nerve is highly efficient, keeping the number of nerve fibers to a minimum (limited by the size of the optic nerve) and thereby also the size of the retinal blind spot as small as possible (approximately 5° wide by 7° high). Furthermore, the presented ganglion cells would have no response to uniform illumination, since the positive and negative areas are balanced. In other words, the transmitted signals are uncorrelated. For example, information from neighboring parts of natural scenes are highly correlated spatially and therefore highly predictable [18]. Lateral inhibition between neighboring retinal ganglion cells minimizes this spatial correlation, therefore improving efficiency. We can see this as a process of image compression carried out in the retina.

Given the overlapping of the L- and M-cone absorption spectra, their signals are also highly correlated. In this case, coding efficiency is improved by combining the cone signals in order to minimize said correlation. We can understand this more easily using Principal Component Analysis (PCA). PCA is a statistical method used to reduce the dimensionality of a given set of variables by transforming the original variables, to a set of new variables, the principal components (PCs). The first PC accounts for a maximal amount of total variance in the original variables, the second PC accounts for a maximal amount of variance that was not accounted for by the first component, and so on. In addition, PCs are linearly-independent and orthogonal to each other in the parameter space. PCA's main advantage is that only a few of the strongest PCs are enough to cover the vast majority of system variability [19]. This scheme has been used with the cone absorption functions [20] and even with the naturally occurring spectra[21],[22]. The PCs that were found in the space of cone excitations produced by natural objects are 1) a luminance axis where the L- and M-cone signals are added (L+M), 2) the difference of the L- and M-cone signals (L-M), and 3) a color axis where the S-cone signal is differenced with the sum of the L- and M-cone signals (S-(L+M)). These channels, derived from a mathematical/computational approach, coincide with the three retino-geniculate channels discovered in electrophysiological experiments [23],[24]. Using these mechanisms, visual redundant information is eliminated in the retina.

There are three channels of information that actually communicate this information from the retina through the ganglion cells to the LGN. They are different not only on their chromatic properties, but also in their anatomical substrate. These channels pose important limitations for basic color tasks, such as detection and discrimination.

In the first channel, the output of L- and M-cones is transmitted synergistically to diffuse bipolar cells and then to cells in the magnocellular layers (M-) of the LGN (not to be confused with the M-cones of the retina)[24]. The receptive fields of the M-cells are composed of a center and a surround, which are spatially antagonist. M-cells have high-contrast sensitivity for luminance stimuli, but they show no response at some combination of L-M opponent inputs[25]. However, because the null points of different M-cells vary slightly, the population response is never really zero. This property is actually passed on to cortical areas with predominant M-cell inputs[26].

The parvocellular pathway (P-) originates with the individual outputs from L- or M-cone to midget bipolar cells. These provide input to retinal P-cells[11]. In the fovea, the receptive field centers of P-cells are formed by single L- or M-cones. The structure of the P-cell receptive field surround is still debated. However, the most accepted theory states that the surround consists of a specific cone type, resulting in a spatially opponent receptive field for luminance stimuli[27]. Parvocellular layers contribute with about 80 % of the total projections from the retina to the LGN[28].

Finally, the recently discovered koniocellular pathway (K-) carries mostly signals from S-cones[29]. Groups of this type of cones project to special bipolar cells, which in turn provide input to specific small ganglion cells. These are usually not spatially opponent. The axons of the small ganglion cells project to thin layers of the LGN (adjacent to parvocellular layers)[30].

While the ganglion cells do terminate at the LGN (making synapses with LGN cells), there appears to be a one-to-one correspondence between ganglion cells and LGN cells. The LGN appears to act as a relay station for the signals. However, it probably serves some visual function, since there are neural projections from the cortex back to the LGN that could serve as some type of switching or adaptation feedback mechanism. The axons of LGN cells project to visual area one (V1) in the visual cortex in the occipital lobe.

Color Perception at the Brain

In the cortex, the projections from the magno-, parvo-, and koniocellular pathways end in different layers of the primary visual cortex. The magnocellular fibers innervate principally layer 4Cα and layer 6. Parvocellular neurons project mostly to 4Cβ, and layers 4A and 6. Koniocellular neurons terminate in the cytochrome oxidase (CO-) rich blobs in layers 1, 2, and 3[31].

Once in the visual cortex, the encoding of visual information becomes significantly more complex. In the same way the outputs of various photoreceptors are combined and compared to produce ganglion cell responses, the outputs of various LGN cells are compared and combined to produce cortical responses. As the signals advance further up in the cortical processing chain, this process repeats itself with a rapidly increasing level of complexity to the point that receptive fields begin to lose meaning. However, some functions and processes have been identified and studied in specific regions of the visual cortex.

In the V1 region (striate cortex), double opponent neurons - neurons that have their receptive fields both chromatically and spatially opposite with respect to the on/off regions of a single receptive field - compare color signals across the visual space [32]. They constitute between 5 to 10% of the cells in V1. Their coarse size and small percentage matches the poor spatial resolution of color vision [1]. Furthermore, they are not sensitive to the direction of moving stimuli (unlike some other V1 neurons) and, hence, unlikely to contribute to motion perception[33]. However, given their specialized receptive field structure, these kind of cells are the neural basis for color contrast effects, as well as an efficient mean to encode color itself[34],[35]. Other V1 cells respond to other types of stimuli, such as oriented edges, various spatial and temporal frequencies, particular spatial locations, and combinations of these features, among others. Additionally, we can find cells that linearly combine inputs from LGN cells as well as cells that perform nonlinear combination. These responses are needed to support advanced visual capabilities, such as color itself.

(Partial) flow diagram illustrating the many streams of visual information processes that take place in the visual cortex. It is important to note that information can flow in both directions.
Fig. 4. (Partial) flow diagram illustrating the many streams of visual information processes that take place in the visual cortex. It is important to note that information can flow in both directions.

There is substantially less information on the chromatic properties of single neurons in V2 as compared to V1. On a first glance, it seems that there are no major differences of color coding in V1 and V2[36]. One exception to this is the emergence of a new class of color-complex cell[37]. Therefore, it has been suggested that V2 region is involved in the elaboration of hue. However, this is still very controversial and has not been confirmed.

Following the modular concept developed after the discovery of functional ocular dominance in V1, and considering the anatomical segregation between the P-, M-, and K-pathways (described in Sec. 3), it was suggested that a specialized system within the visual cortex devoted to the analysis of color information should exist[38]. V4 is the region that has historically attracted the most attention as the possible "color area" of the brain. This is because of an influential study that claimed that V4 contained 100 % of hue-selective cells[39]. However, this claim has been disputed by a number of subsequent studies, some even reporting that only 16 % of V4 neurons show hue tuning[40]. Currently, the most accepted concept is that V4 contributes not only to color, but to shape perception, visual attention, and stereopsis as well. Furthermore, recent studies have focused on other brain regions trying to find the "color area" of the brain, such as TEO[41] and PITd[42]. The relationship of these regions to each other is still debated. To reconcile the discussion, some use the term posterior inferior temporal (PIT) cortex to denote the region that includes V4, TEO, and PITd[1].

If the cortical response in V1, V2, and V4 cells is already a very complicated task, the level of complexity of complex visual responses in a network of approximately 30 visual zones is humongous. Figure 4 shows a small portion of the connectivity of the different cortical areas (not cells) that have been identified[43].

At this stage, it becomes exceedingly difficult to explain the function of singles cortical cells in simple terms. As a matter of fact, the function of a single cell might not have meaning since the representation of various perceptions must be distributed across collections of cells throughout the cortex.

Color Vision Adaptation Mechanisms

Although researchers have been trying to explain the processing of color signals in the human visual system, it is important to understand that color perception is not a fixed process. Actually, there are a variety of dynamic mechanisms that serve to optimize the visual response according to the viewing environment. Of particular relevance to color perception are the mechanisms of dark, light, and chromatic adaptation.

Dark Adaptation

Dark adaptation refers to the change in visual sensitivity that occurs when the level of illumination is decreased. The visual system response to reduced illumination is to become more sensitive, increasing its capacity to produce a meaningful visual response even when the light conditions are suboptimal[44].

Dark adaptation. During the first 10 minutes (i.e. to the left of the dotted line), sensitivity recovery is done by the cones. After the first 10 minutes (i.e. to the right of the dotted line), rods outperform the cones. Full sensitivity is recovered after approximately 30 minutes.
Fig. 5. Dark adaptation. During the first 10 minutes (i.e. to the left of the dotted line), sensitivity recovery is done by the cones. After the first 10 minutes (i.e. to the right of the dotted line), rods outperform the cones. Full sensitivity is recovered after approximately 30 minutes.

Figure 5 shows the recovery of visual sensitivity after transition from an extremely high illumination level to complete darkness[43]. First, the cones become gradually more sensitive, until the curve levels off after a couple of minutes. Then, after approximately 10 minutes have passed, visual sensitivity is roughly constant. At that point, the rod system, with a longer recovery time, has recovered enough sensitivity to outperform the cones and therefore recover control the overall sensitivity. Rod sensitivity gradually improves as well, until it becomes asymptotic after about 30 minutes. In other words, cones are responsible for the sensitivity recovery for the first 10 minutes. Afterwards, rods outperform the cones and gain full sensitivity after approximately 30 minutes.

This is only one of several neural mechanisms produced in order to adapt to the dark lightning conditions as good as possible. Some other neural mechanisms include the well-known pupil reflex, depletion and regeneration of photopigment, gain control in retinal cells and other higher-level mechanisms, and cognitive interpretation, among others.

Light Adaptation

Light adaptation is essentially the inverse process of dark adaptation. As a matter of fact, the underlying physiological mechanisms are the same for both processes. However, it is important to consider it separately since its visual properties differ.

Light adaptation. For a given scene, the solid lines represent families of visual response curves at different (relative) energy levels. The dashed line represents the case where we would adapt in order to cover the entire range of illumination, which would yield limited contrast and reduced sensitivity.
Fig. 6. Light adaptation. For a given scene, the solid lines represent families of visual response curves at different (relative) energy levels. The dashed line represents the case where we would adapt in order to cover the entire range of illumination, which would yield limited contrast and reduced sensitivity.

Light adaptation occurs when the level of illumination is increased. Therefore, the visual system must become less sensitive in order to produce useful perceptions, given the fact that there is significantly more visible light available. The visual system has a limited output dynamic range available for the signals that produce our perceptions. However, the real world has illumination levels covering at least 10 orders of magnitude more. Fortunately, we rarely need to view the entire range of illumination levels at the same time.

At high light levels, adaptation is achieved by photopigment bleaching. This scales photon capture in the receptors and protects the cone response from saturating at bright backgrounds. The mechanisms of light adaptation occur primarily within the retina[45]. As a matter of fact, gain changes are largely cone-specific and adaptation pools signals over areas no larger than the diameter of individual cones[46],[47]. This points to a localization of light adaptation that may be as early as the receptors. However, there appears to be more than one site of sensitivity scaling. Some of the gain changes are extremely rapid, while others take seconds or even minutes to stabilize[48]. Usually, light adaptation takes around 5 minutes (six times faster than dark adaptation). This might point to the influence of post-receptive sites.

Figure 6 shows examples of light adaptation [43]. If we would use a single response function to map the large range of intensities into the visual system's output, then we would only have a very small range at our disposal for a given scene. It is clear that with such a response function, the perceived contrast of any given scene would be limited and visual sensitivity to changes would be severely degraded due to signal-to-noise issues. This case is shown by the dashed line. On the other hand, solid lines represent families of visual responses. These curves map the useful illumination range in any given scene into the full dynamic range of the visual output, thus resulting in the best possible visual perception for each situation. Light adaptation can be thought of as the process of sliding the visual response curve along the illumination level axis until the optimum level for the given viewing conditions is reached.

Chromatic Adaptation

The general concept of chromatic adaptation consists in the variation of the height of the three cone spectral responsivity curves. This adjustment arises because light adaptation occurs independently within each class of cone. A specific formulation of this hypothesis is known as the von Kries adaptation. This hypothesis states that the adaptation response takes place in each of the three cone types separately and is equivalent to multiplying their fixed spectral sensitivities by a scaling constant[49]. If the scaling weights (also known as von Kries coefficients) are inversely proportional to the absorption of light by each cone type (i.e. a lower absorption will require a larger coefficient), then von Kries scaling maintains a constant mean response within each cone class. This provides a simple yet powerful mechanism for maintaining the perceived color of objects despite changes in illumination. Under a number of different conditions, von Kries scaling provides a good account of the effects of light adaptation on color sensitivity and appearance[50],[51].

The easiest way to picture chromatic adaptation is by examining a white object under different types of illumination. For example, let's consider examining a piece of paper under daylight, fluorescent, and incandescent illumination. Daylight contains relatively far more short-wavelength energy than fluorescent light, and incandescent illumination contains relatively far more long-wavelength energy than fluorescent light. However, in spite of the different illumination conditions, the paper approximately retains its white appearance under all three light sources. This is because the S-cone system becomes relatively less sensitive under daylight (in order to compensate for the additional short-wavelength energy) and the L-cone system becomes relatively less sensitive under incandescent illumination (in order to compensate for the additional long-wavelength energy)[43].


Auditory System

Technological Aspects
In Animals


The sensory system for the sense of hearing is the auditory system. This wikibook covers the physiology of the auditory system, and its application to the most successful neurosensory prosthesis - cochlear implants. The physics and engineering of acoustics are covered in a separate wikibook, Acoustics. An excellent source of images and animations is "Journey into the world of hearing" [52].

The ability to hear is not found as widely in the animal kingdom as other senses like touch, taste and smell. It is restricted mainly to vertebrates and insects.[citation needed] Within these, mammals and birds have the most highly developed sense of hearing. The table below shows frequency ranges of humans and some selected animals:[citation needed]

Humans 20-20'000 Hz
Whales 20-100'000 Hz
Bats 1'500-100'000 Hz
Fish 20-3'000 Hz

The organ that detects sound is the ear. It acts as receiver in the process of collecting acoustic information and passing it through the nervous system into the brain. The ear includes structures for both the sense of hearing and the sense of balance. It does not only play an important role as part of the auditory system in order to receive sound but also in the sense of balance and body position.

Mother and child
Humpback whales in the singing position
Big eared townsend bat
Hyphessobrycon pulchripinnis fish

Humans have a pair of ears placed symmetrically on both sides of the head which makes it possible to localize sound sources. The brain extracts and processes different forms of data in order to localize sound, such as:

  • the shape of the sound spectrum at the tympanic membrane (eardrum)
  • the difference in sound intensity between the left and the right ear
  • the difference in time-of-arrival between the left and the right ear
  • the difference in time-of-arrival between reflections of the ear itself (this means in other words: the shape of the pinna (pattern of folds and ridges) captures sound-waves in a way that helps localizing the sound source, especially on the vertical axis.

Healthy, young humans are able to hear sounds over a frequency range from 20 Hz to 20 kHz.[citation needed] We are most sensitive to frequencies between 2000 and 4000 Hz[citation needed] which is the frequency range of spoken words. The frequency resolution is 0.2%[citation needed] which means that one can distinguish between a tone of 1000 Hz and 1002 Hz. A sound at 1 kHz can be detected if it deflects the tympanic membrane (eardrum) by less than 1 Angstrom[citation needed], which is less than the diameter of a hydrogen atom. This extreme sensitivity of the ear may explain why it contains the smallest bone that exists inside a human body: the stapes (stirrup). It is 0.25 to 0.33 cm long and weighs between 1.9 and 4.3 mg.[citation needed]

The following video provides an overview of the concepts that will be presented in more detail in the next sections.

This animated video illustrates how sounds travel to the inner ear, and then to the brain, where they are interpreted and understood. The cochlea in the inner ear is a spiral-shaped organ that contains hair cells, which sense sound vibrations. Hair cells convert sound vibrations into chemical signals that the auditory nerve can understand.

Anatomy of the Auditory System

Human (external) ear

The aim of this section is to explain the anatomy of the auditory system of humans. The chapter illustrates the composition of auditory organs in the sequence that acoustic information proceeds during sound perception.
Please note that the core information for “Sensory Organ Components” can also be found on the Wikipedia page “Auditory system”, excluding some changes like extensions and specifications made in this article. (see also: Wikipedia Auditory system)

The auditory system senses sound waves, that are changes in air pressure, and converts these changes into electrical signals. These signals can then be processed, analyzed and interpreted by the brain. For the moment, let's focus on the structure and components of the auditory system. The auditory system consists mainly of two parts:

  • the ear and
  • the auditory nervous system (central auditory system)

The ear

The ear is the organ where the first processing of sound occurs and where the sensory receptors are located. It consists of three parts:

  • outer ear
  • middle ear
  • inner ear
Anatomy of the human ear (green: outer ear / red: middle ear / purple: inner ear)

Outer ear

Function: Gathering sound energy and amplification of sound pressure.

The folds of cartilage surrounding the ear canal (external auditory meatus, external acoustic meatus) are called the pinna. It is the visible part of the ear. Sound waves are reflected and attenuated when they hit the pinna, and these changes provide additional information that will help the brain determine the direction from which the sounds came. The sound waves enter the auditory canal, a deceptively simple tube. The ear canal amplifies sounds that are between 3 and 12 kHz. At the far end of the ear canal is the tympanic membrane (eardrum), which marks the beginning of the middle ear.

Middle ear

Micro-CT image of the ossicular chain showing the relative position of each ossicle.

Function: Transmission of acoustic energy from air to the cochlea.
Sound waves traveling through the ear canal will hit the tympanic membrane (tympanum, eardrum). This wave information travels across the air-filled tympanic cavity (middle ear cavity) via a series of bones: the malleus (hammer), incus (anvil) and stapes (stirrup). These ossicles act as a lever and a teletype, converting the lower-pressure eardrum sound vibrations into higher-pressure sound vibrations at another, smaller membrane called the oval (or elliptical) window, which is one of two openings into the cochlea of the inner ear. The second opening is called round window. It allows the fluid in the cochlea to move.

The malleus articulates with the tympanic membrane via the manubrium, whereas the stapes articulates with the oval window via its footplate. Higher pressure is necessary because the inner ear beyond the oval window contains liquid rather than air. The sound is not amplified uniformly across the ossicular chain. The stapedius reflex of the middle ear muscles helps protect the inner ear from damage.

The middle ear still contains the sound information in wave form; it is converted to nerve impulses in the cochlea.

Inner ear

Structural diagram of the cochlea Cross section of the cochlea Cochlea and Vestibular System from an MRI scan

Function: Transformation of mechanical waves (sound) into electric signals (neural signals).

The inner ear consists of the cochlea and several non-auditory structures. The cochlea is a snail-shaped part of the inner ear. It has three fluid-filled sections: scala tympani (lower gallery), scala media (middle gallery, cochlear duct) and scala vestibuli (upper gallery). The cochlea supports a fluid wave driven by pressure across the basilar membrane separating two of the sections (scala tympani and scala media). The basilar membrane is about 3 cm long and between 0.5 to 0.04 mm wide. Reissner’s membrane (vestibular membrane) separates scala media and scala vestibuli.

The scala media contains an extracellular fluid called endolymph, also known as Scarpa's Fluid after Antonio Scarpa. The organ of Corti is located in this duct, and transforms mechanical waves to electric signals in neurons. The other two sections, scala tympani and scala vestibuli, are located within the bony labyrinth which is filled with fluid called perilymph. The chemical difference between the two fluids endolymph (in scala media) and perilymph (in scala tympani and scala vestibuli) is important for the function of the inner ear.

Organ of Corti

The organ of Corti forms a ribbon of sensory epithelium which runs lengthwise down the entire cochlea. The hair cells of the organ of Corti transform the fluid waves into nerve signals. The journey of a billion nerves begins with this first step; from here further processing leads to a series of auditory reactions and sensations.

Transition from ear to auditory nervous system

Section through the spiral organ of Corti

Hair cells

Hair cells are columnar cells, each with a bundle of 100-200 specialized cilia at the top, for which they are named. These cilia are the mechanosensors for hearing. The shorter ones are called stereocilia, and the longest one at the end of each haircell bundle kinocilium. The location of the kinocilium determine the on-direction, i.e. the direction of deflection inducing the maximum hair cell excitation. Lightly resting atop the longest cilia is the tectorial membrane, which moves back and forth with each cycle of sound, tilting the cilia and allowing electric current into the hair cell.

The function of hair cells is not fully established up to now. Currently, the knowledge of the function of hair cells allows to replace the cells by cochlear implants in case of hearing lost. However, more research into the function of the hair cells may someday even make it possible for the cells to be repaired. The current model is that cilia are attached to one another by “tip links”, structures which link the tips of one cilium to another. Stretching and compressing, the tip links then open an ion channel and produce the receptor potential in the hair cell. Note that a deflection of 100 nanometers already elicits 90% of the full receptor potential.


The nervous system distinguishes between nerve fibres carrying information towards the central nervous system and nerve fibres carrying the information away from it:

  • Afferent neurons (also sensory or receptor neurons) carry nerve impulses from receptors (sense organs) towards the central nervous system
  • Efferent neurons (also motor or effector neurons) carry nerve impulses away from the central nervous system to effectors such as muscles or glands (and also the ciliated cells of the inner ear)

Afferent neurons innervate cochlear inner hair cells, at synapses where the neurotransmitter glutamate communicates signals from the hair cells to the dendrites of the primary auditory neurons.

There are far fewer inner hair cells in the cochlea than afferent nerve fibers. The neural dendrites belong to neurons of the auditory nerve, which in turn joins the vestibular nerve to form the vestibulocochlear nerve, or cranial nerve number VIII'

Efferent projections from the brain to the cochlea also play a role in the perception of sound. Efferent synapses occur on outer hair cells and on afferent (towards the brain) dendrites under inner hair cells.

Auditory nervous system

The sound information, now re-encoded in form of electric signals, travels down the auditory nerve (acoustic nerve, vestibulocochlear nerve, VIIIth cranial nerve), through intermediate stations such as the cochlear nuclei and superior olivary complex of the brainstem and the inferior colliculus of the midbrain, being further processed at each waypoint. The information eventually reaches the thalamus, and from there it is relayed to the cortex. In the human brain, the primary auditory cortex is located in the temporal lobe.

Primary auditory cortex

The primary auditory cortex is the first region of cerebral cortex to receive auditory input.

Perception of sound is associated with the right posterior superior temporal gyrus (STG). The superior temporal gyrus contains several important structures of the brain, including Brodmann areas 41 and 42, marking the location of the primary auditory cortex, the cortical region responsible for the sensation of basic characteristics of sound such as pitch and rhythm.

The auditory association area is located within the temporal lobe of the brain, in an area called the Wernicke's area, or area 22. This area, near the lateral cerebral sulcus, is an important region for the processing of acoustic signals so that they can be distinguished as speech, music, or noise.

Auditory Signal Processing

Now that the anatomy of the auditory system has been sketched out, this topic goes deeper into the physiological processes which take place while perceiving acoustic information and converting this information into data that can be handled by the brain. Hearing starts with pressure waves hitting the auditory canal and is finally perceived by the brain. This section details the process transforming vibrations into perception.

Effect of the head

Sound waves with a wavelength shorter than the head produce a sound shadow on the ear further away from the sound source. When the wavelength is longer than the head, diffraction of the sound leads to approximately equal sound intensities on both ears.

Difference in loudness and timing help us to localize the source of a sound signal.

Sound reception at the pinna

The pinna collects sound waves in air affecting sound coming from behind and the front differently with its corrugated shape. The sound waves are reflected and attenuated or amplified. These changes will later help sound localization.

In the external auditory canal, sounds between 3 and 12 kHz - a range crucial for human communication - are amplified. It acts as resonator amplifying the incoming frequencies.

Sound conduction to the cochlea

Sound that entered the pinna in form of waves travels along the auditory canal until it reaches the beginning of the middle ear marked by the tympanic membrane (eardrum). Since the inner ear is filled with fluid, the middle ear is kind of an impedance matching device in order to solve the problem of sound energy reflection on the transition from air to the fluid. As an example, on the transition from air to water 99.9% of the incoming sound energy is reflected. This can be calculated using:

with Ir the intensity of the reflected sound, Ii the intensity of the incoming sound and Zk the wave resistance of the two media ( Zair = 414 kg m-2 s-1 and Zwater = 1.48*106 kg m-2 s-1). Three factors that contribute the impedance matching are:

  • the relative size difference between tympanum and oval window
  • the lever effect of the middle ear ossicles and
  • the shape of the tympanum.
Mechanics of the amplification effect of the middle ear.

The longitudinal changes in air pressure of the sound-wave cause the tympanic membrane to vibrate which, in turn, makes the three chained ossicles malleus, incus and stirrup oscillate synchronously. These bones vibrate as a unit, elevating the energy from the tympanic membrane to the oval window. In addition, the energy of sound is further enhanced by the areal difference between the membrane and the stapes footplate. The middle ear acts as an impedance transformer by changing the sound energy collected by the tympanic membrane into greater force and less excursion. This mechanism facilitates transmission of sound-waves in air into vibrations of the fluid in the cochlea. The transformation results from the pistonlike in- and out-motion by the footplate of the stapes which is located in the oval window. This movement performed by the footplate sets the fluid in the cochlea into motion.

Through the stapedius muscle, the smallest muscle in the human body, the middle ear has a gating function: contracting this muscle changes the impedance of the middle ear, thus protecting the inner ear from damage through loud sounds.

Frequency analysis in the cochlea

The three fluid-filled compartements of the cochlea (scala vestibuli, scala media, scala tympani) are separated by the basilar membrane and the Reissner’s membrane. The function of the cochlea is to separate sounds according to their spectrum and transform it into a neural code. When the footplate of the stapes pushes into the perilymph of the scala vestibuli, as a consequence the membrane of Reissner bends into the scala media. This elongation of Reissner’s membrane causes the endolymph to move within the scala media and induces a displacement of the basilar membrane. The separation of the sound frequencies in the cochlea is due to the special properties of the basilar membrane. The fluid in the cochlea vibrates (due to in- and out-motion of the stapes footplate) setting the membrane in motion like a traveling wave. The wave starts at the base and progresses towards the apex of the cochlea. The transversal waves in the basilar membrane propagate with

with μ the shear modulus and ρ the density of the material. Since width and tension of the basilar membrane change, the speed of the waves propagating along the membrane changes from about 100 m/s near the oval window to 10 m/s near the apex.

There is a point along the basilar membrane where the amplitude of the wave decreases abruptly. At this point, the sound wave in the cochlear fluid produces the maximal displacement (peak amplitude) of the basilar membrane. The distance the wave travels before getting to that characteristic point depends on the frequency of the incoming sound. Therefore each point of the basilar membrane corresponds to a specific value of the stimulating frequency. A low-frequency sound travels a longer distance than a high-frequency sound before it reaches its characteristic point. Frequencies are scaled along the basilar membrane with high frequencies at the base and low frequencies at the apex of the cochlea.

The position x of the maximal amplitude of the travelling wave corresponds in a 1-to-1 way to a stimulus frequency.

Identifying frequency by the location of the maximum displacement of the basilar membrane is called tonotopic encoding of frequency. It automatically solves two problems:

  • It automatically parallelizes the subsequent processing of frequency. This tonotopic encoding is maintained all the way up to the cortex.
  • Our nervous system transmits information with action potentials, which are limited to less than 500 Hz. Through tonotopic encoding, also higher frequencies can be accurately represented.
Action potentials have a stereotyped shape. And since during the refractive period Na-ion channels are actively blocked, the maximum frequency of action potentials is about 500 Hz - significantly lower than the frequencies required for human speach.

Sensory transduction in the cochlea

Most everyday sounds are composed of multiple frequencies. The brain processes the distinct frequencies, not the complete sounds. Due to its inhomogeneous properties, the basilar membrane is performing an approximation to a Fourier transform. The sound is thereby split into its different frequencies, and each hair cell on the membrane corresponds to a certain frequency. The loudness of the frequencies is encoded by the firing rate of the corresponding afferent fiber. This is due to the amplitude of the traveling wave on the basilar membrane, which depends on the loudness of the incoming sound.

Transduction mechanism in auditory or vestibular hair cell. Tilting the hair cell towards the kinocilium opens the potassium ion channels. This changes the receptor potential in the hair cell. The resulting emission of neurotransmitters can elicit an action potential (AP) in the post-synaptic cell.
Auditory haircells are very similar to those of the vestibular system. Here an electron microscopy image of a frog's sacculus haircell.
Additional example of the hair cells of a frog.

The sensory cells of the auditory system, known as hair cells, are located along the basilar membrane within the organ of Corti. Each organ of Corti contains about 16,000 such cells, innervated by about 30,000 afferent nerve fibers. There are two anatomically and functionally distinct types of hair cells: the inner and the outer hair cells. Along the basilar membrane these two types are arranged in one row of inner cells and three to five rows of outer cells. Most of the afferent innervation comes from the inner hair cells while most of the efferent innervation goes to the outer hair cells. The inner hair cells influence the discharge rate of the individual auditory nerve fibers that connect to these hair cells. Therefore inner hair cells transfer sound information to higher auditory nervous centers. The outer hair cells, in contrast, amplify the movement of the basilar membrane by injecting energy into the motion of the membrane and reducing frictional losses but do not contribute in transmitting sound information. The motion of the basilar membrane deflects the stereocilias (hairs on the hair cells) and causes the intracellular potentials of the hair cells to decrease (depolarization) or increase (hyperpolarization), depending on the direction of the deflection. When the stereocilias are in a resting position, there is a steady state current flowing through the channels of the cells. The movement of the stereocilias therefore modulates the current flow around that steady state current.

Let's look at the modes of action of the two different hair cell types separately:

  • Inner hair cells:

The deflection of the hair-cell stereocilia opens mechanically gated ion channels that allow small, positively charged potassium ions (K+) to enter the cell and causing it to depolarize. Unlike many other electrically active cells, the hair cell itself does not fire an action potential. Instead, the influx of positive ions from the endolymph in scala media depolarizes the cell, resulting in a receptor potential. This receptor potential opens voltage gated calcium channels; calcium ions (Ca2+) then enter the cell and trigger the release of neurotransmitters at the basal end of the cell. The neurotransmitters diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger action potentials in the nerve. In this way, neurotransmitter increases the firing rate in the VIIIth cranial nerve and the mechanical sound signal is converted into an electrical nerve signal.
The repolarization in the hair cell is done in a special manner. The perilymph in Scala tympani has a very low concentration of positive ions. The electrochemical gradient makes the positive ions flow through channels to the perilymph. (see also: Wikipedia Hair cell)

  • Outer hair cells:

In humans' outer hair cells, the receptor potential triggers active vibrations of the cell body. This mechanical response to electrical signals is termed somatic electromotility and drives oscillations in the cell’s length, which occur at the frequency of the incoming sound and provide mechanical feedback amplification. Outer hair cells have evolved only in mammals. Without functioning outer hair cells the sensitivity decreases by approximately 50 dB (due to greater frictional losses in the basilar membrane which would damp the motion of the membrane). They have also improved frequency selectivity (frequency discrimination), which is of particular benefit for humans, because it enables sophisticated speech and music. (see also: Wikipedia Hair cell)

With no external stimulation, auditory nerve fibres discharge action potentials in a random time sequence. This random time firing is called spontaneous activity. The spontaneous discharge rates of the fibers vary from very slow rates to rates of up to 100 per second. Fibers are placed into three groups depending on whether they fire spontaneously at high, medium or low rates. Fibers with high spontaneous rates (> 18 per second) tend to be more sensitive to sound stimulation than other fibers.

Auditory pathway of nerve impulses

Lateral lemniscus in red, as it connects the cochlear nucleus, superior olivary nucleus and the inferior colliculus. Seen from behind.

So in the inner hair cells the mechanical sound signal is finally converted into electrical nerve signals. The inner hair cells are connected to auditory nerve fibres whose nuclei form the spiral ganglion. In the spiral ganglion the electrical signals (electrical spikes, action potentials) are generated and transmitted along the cochlear branch of the auditory nerve (VIIIth cranial nerve) to the cochlear nucleus in the brainstem.

From there, the auditory information is divided into at least two streams:

  • Ventral Cochlear Nucleus:

One stream is the ventral cochlear nucleus which is split further into the posteroventral cochlear nucleus (PVCN) and the anteroventral cochlear nucleus (AVCN). The ventral cochlear nucleus cells project to a collection of nuclei called the superior olivary complex.

Superior olivary complex: Sound localization

The superior olivary complex - a small mass of gray substance - is believed to be involved in the localization of sounds in the azimuthal plane (i.e. their degree to the left or the right). There are two major cues to sound localization: Interaural level differences (ILD) and interaural time differences (ITD). The ILD measures differences in sound intensity between the ears. This works for high frequencies (over 1.6 kHz), where the wavelength is shorter than the distance between the ears, causing a head shadow - which means that high frequency sounds hit the averted ear with lower intensity. Lower frequency sounds don't cast a shadow, since they wrap around the head. However, due to the wavelength being larger than the distance between the ears, there is a phase difference between the sound waves entering the ears - the timing difference measured by the ITD. This works very precisely for frequencies below 800 Hz, where the ear distance is smaller than half of the wavelength. Sound localization in the median plane (front, above, back, below) is helped through the outer ear, which forms direction-selective filters.

There, the differences in time and loudness of the sound information in each ear are compared. Differences in sound intensity are processed in cells of the lateral superior olivary complexm and timing differences (runtime delays) in the medial superior olivary complex. Humans can detect timing differences between the left and right ear down to 10 μs, corresponding to a difference in sound location of about 1 deg. This comparison of sound information from both ears allows the determination of the direction where the sound came from. The superior olive is the first node where signals from both ears come together and can be compared. As a next step, the superior olivary complex sends information up to the inferior colliculus via a tract of axons called lateral lemniscus. The function of the inferior colliculus is to integrate information before sending it to the thalamus and the auditory cortex. It is interesting to know that the superior colliculus close by shows an interaction of auditory and visual stimuli.

  • Dorsal Cochlear Nucleus:

The dorsal cochlear nucleus (DCN) analyzes the quality of sound and projects directly via the lateral lemnisucs to the inferior colliculus.

From the inferior colliculus the auditory information from ventral as well as dorsal cochlear nucleus proceeds to the auditory nucleus of the thalamus which is the medial geniculate nucleus. The medial geniculate nucleus further transfers information to the primary auditory cortex, the region of the human brain that is responsible for processing of auditory information, located on the temporal lobe. The primary auditory cortex is the first relay involved in the conscious perception of sound.

Primary auditory cortex and higher order auditory areas

Sound information that reaches the primary auditory cortex (Brodmann areas 41 and 42). The primary auditory cortex is the first relay involved in the conscious perception of sound. It is known to be tonotopically organized and performs the basics of hearing: pitch and volume. Depending on the nature of the sound (speech, music, noise), is further passed to higher order auditory areas. Sounds that are words are processed by Wernicke’s area (Brodmann area 22). This area is involved in understanding written and spoken language (verbal understanding). The production of sound (verbal expression) is linked to Broca’s area (Brodmann areas 44 and 45). The muscles to produce the required sound when speaking are contracted by the facial area of motor cortex which are regions of the cerebral cortex that are involved in planning, controlling and executing voluntary motor functions.

Lateral surface of the brain with Brodmann's areas numbered.

Pitch Perception

This section reviews a key topic in auditory neuroscience: pitch perception. Some basic understanding of the auditory system is presumed, so readers are encouraged to first read the above sections on the 'Anatomy of the Auditory System' and 'Auditory Signal Processing'.


Pitch is a subjective percept, evoked by sounds that have an approximately periodic nature. For many naturally occurring sounds, periodicity of a sound is the major determinant of pitch. Yet the relationship between an acoustic stimulus and pitch is quite abstract: in particular, pitch is quite robust to changes in other acoustic parameters such as loudness or spectral timbre, both of which may significantly alter the physical properties of an acoustic waveform. This is particularly evident in cases where sounds without any shared spectral components can evoke the same pitch, for example. Consequently, pitch-related information must be extracted from spectral and/or temporal cues represented across multiple frequency channels.

Investigations of pitch encoding in the auditory system have largely focused on identifying neural processes which reflect these extraction processes, or on finding the ‘end point’ of such a process: an explicit, robust representation of pitch as perceived by the listener. Both endeavours have had some success, with evidence accumulating for ‘pitch selective neurons’ in putative ‘pitch areas’. However, it remains debatable whether the activity of these areas is truly related to pitch, or if they simply exhibit selective representation of pitch-related parameters. On the one hand, demonstrating an activation of specific neurons or neural areas in response to numerous pitch-evoking sounds, often with substantial variation in their physical characteristics, provides compelling correlative evidence that these regions are indeed encoding pitch. On the other, demonstrating causal evidence that these neurons represent pitch is difficult, likely requiring a combination of in vivo recording approaches to demonstrate a correspondence of these responses to pitch judgments (i.e., psychophysical responses, rather than just stimulus periodicity), and direct manipulation of the activity in these cells to demonstrate predictable biases or impairments in pitch perception.

Due to the rather abstract nature of pitch, we will not immediately delve into this yet unresolved field of active research. Rather, we begin our discussion with the most direct physical counterparts of pitch perception – i.e., sound frequency (for pure tones) and, more generally, stimulus periodicity. Specifically, we will distinguish between, and more concretely define, the notions of periodicity and pitch. Following this, we will briefly outline the major computational mechanisms that may be implemented by the auditory system to extract such pitch-related information from sound stimuli. Subsequently, we outline representation and processing of pitch parameters in the cochlea, the ascending subcortical auditory pathway, and, finally, more controversial findings in primary auditory cortex and beyond, and evaluate the evidence of ‘pitch neurons’ or ‘pitch areas’ in these cortical regions.

Periodicity and pitch

Pitch is an emergent psychophysical property. The salience and ‘height’ of pitch depends on several factors, but within a specific range of harmonic and fundamental frequencies, called the “existence region”, pitch salience is largely determined by regularity of sound segment repetition; pitch height by the rate of repetition, also called the modulating frequency. The set of sounds capable of evoking pitch perception is diverse and spectrally heterogeneous. Many different stimuli – including pure tones, click trains, iterated ripple noises, amplitude modulated sounds, and so forth – can evoke a pitch percept, while another acoustic signals, even with very similar physical characteristics to such stimuli may not evoke pitch. Most naturally occurring pitch-evoking sounds are harmonic complexes - sounds containing a spectrum of frequencies that are integer multiples of the fundamental frequency, F0.  An important finding in pitch research is the phenomenon of the ‘missing fundamental’ (see below): within a certain frequency range, all the spectral energy at F0 can be removed from a harmonic complex, and still evoke a pitch correlating to F0 in a human listener[53]. This finding appears to generalise to many non-human auditory systems[54][55].

Pitch of the missing fundamental. Audio spectrographs for the melody of 'Mary had a little lamb'. (Left) Melody played with pure tones (fundamental), (middle) melody played with fundamental and first six harmonic overtones, (right) melody played with only harmonic overtones, with the spectral energy at the fundamental frequency removed. As demonstrated in the corresponding audio clips to the left, these three melodies differ in timbre, but pitch is unchanged, despite the missing fundamental and pure tone melodies having no spectral components in common.

The ‘missing fundamental’ phenomenon is important for two reasons. Firstly, it is an important benchmark for assessing whether particular neurons or brain regions are specialised for pitch processing, since such units should be expected to show activity reflective of F0 (and thus pitch), regardless of its presence in the sound and other acoustic parameters. More generally, a ‘pitch neuron’ or ‘pitch centre’ should show consistent activity in response to all stimuli that evoke a particular perception of pitch height. As will be discussed, this has been a source of some disagreement in identifying putative pitch neurons or areas.  Secondly, that we can perceive a pitch corresponding to F0 even in its absence in the auditory stimulus provides strong evidence against the brain implementing a mechanism for ‘selecting’ F0 to directly infer pitch. Rather, pitch must be extracted from temporal or spectral cues (or both)[56].

Mechanisms for pitch extraction: spectral and temporal cues

Resolved and unresolved harmonics. A schematic spectrum, excitation pattern, and simulated basilar membrane (BM) vibration for a complex tone with an F0 of 100 Hz and equal-amplitude harmonics. As can be seen in the excitation pattern and BM vibrations, higher order harmonics are 'unresolved' - that is, there is no effective separation of individual harmonics. (Description adapted from original author. Available at:

These two cues (spectral and temporal) are the bases of two major classes of pitch extraction models[56]. The first of these are the time domain methods, which use temporal cues to assess whether a sound consists of a repetitive segment, and, if so, the rate of repetition. A commonly proposed method of doing so is autocorrelation. An autocorrelation function essentially involves finding the time delays between two sampling points that will give the maximum correlation: for example, a sound wave with a frequency of 100Hz (or period, T=10 milliseconds) would have a maximal correlation if samples are taken 10 milliseconds apart. For a 200Hz wave, the delay yielding maximal correlation would be 5 milliseconds – but also at 10 milliseconds, 15 milliseconds and so forth. Thus if such a function is performed on all component frequencies of a harmonic complex with F0=100Hz (and thus having harmonic overtones at 200Hz, 300Hz, 400Hz, and so forth), and the resulting time intervals giving maximal correlation were summed, they would collectively ‘vote’ for 10 milliseconds – the periodicity of the sound. The second class of pitch extraction strategies are frequency domain methods, where pitch is extracted by analysing the frequency spectra of a sound to calculate F0. For instance, ‘template matching' processes – such as the ‘harmonic sieve’ – propose that the frequency spectrum of a sound is simply matched to harmonic templates – the best match yields the correct F0[57].

There are limitations to both classes of explanations. Frequency domain methods require harmonic frequencies to be resolved – that is, for each harmonic to be represented as a distinct frequency band (see figure, right). Yet higher order harmonics, which are unresolved due to the wider bandwidth in physiological representation for higher frequencies (a consequence of the logarithmic tonotopic organisation of the basilar membrane), can still evoke pitch corresponding to F0. Temporal models do not have this issue, since an autocorrelation function should still yield the same periodicity, regardless of whether the function is performed in one or over several frequency channels. However, it is difficult to attribute the lower limits of pitch-evoking frequencies to autocorrelation: psychophysical studies demonstrate that we can perceive pitch from harmonic complexes with missing fundamentals as low as 30Hz; this corresponds to a sampling delay of over 33 milliseconds – far longer than the ~10 millisecond delay commonly observed in neural signalling[56].    

Sine-phase (left) and alternating phase (right) harmonics. These complexes have the same F0 (125 Hz) and the same harmonic numbers, but the pitch of the complex on the right is an octave higher than the pitch of the complex on the left. Both complexes were filtered between 3900 and 5400 Hz. (Description from original author. Available at:

One strategy to determine which of these two strategies are adopted by the auditory system is the use of alternating-phase harmonics: to present odd harmonics in sine phase, and even harmonics in cosine phase. Since this will not affect the spectral content of the stimulus, no change in pitch perception should occur if the listener is relying primarily on spectral cues. On the other hand, the temporal envelope repetition rate will double. Thus, if temporal envelope cues are adopted, the pitch perceived by listeners for alternating-phase harmonics will be an octave above (i.e., double the frequency of) the pitch perceived for all-cosine harmonic with the same spectral composition. Psychophysical studies have investigated the sensitivity of pitch perception to such phase shifts across different F0 and harmonic ranges, providing evidence that both humans[58] and other primates[59] adopt a dual strategy: spectral cues are used for lower order, resolved harmonics, while temporal envelope cues are used higher order, unresolved harmonics.

Pitch extraction in the ascending auditory pathway

Weber fractions for pitch discrimination in humans has been reported at under 1%[60]. In view of this high sensitivity to pitch changes, and the demonstration that both spectral and temporal cues are used for pitch extraction, we can predict that the auditory system represents both the spectral composition and temporal fine structure of acoustic stimuli in a highly precise manner, until these representations are eventually conveyed explicitly periodicity or pitch-selective neurons.

Electrophysiological experiments have identified neuronal responses in the ascending auditory system that are consistent with this notion. From the level of the cochlea, the tonopically mapped basilar membrane’s (BM) motions in response to auditory stimuli establishes a place code for frequency composition along the BM axis. These representations are further enhanced by a phase-locking of the auditory nerve fibres (ANFs) to the frequency components it responds to. This mechanism for temporal representation of frequency composition is further enhanced in numerous ways, such as lateral inhibition at the hair cell/spiral ganglion cell synapse[61], supporting the notion that this precise representation is critical for pitch encoding.

Thus by this stage, the phase-locked temporal spike patterns of ANFs likely carry an implicit representation of periodicity. This was tested directly by Cariani and Delgutte[62]. By analysing the distribution of all-order inter-spike intervals (ISI) in the ANFs of cats, they showed that the most common ISI was the periodicity of the stimulus, and the peak-to-mean ratio of these distribution increased for complex stimuli evoking more salient pitch perceptions. Based on these findings, these authors proposed the ‘predominant interval hypothesis’, where a pooled code of all-order ISIs ‘vote’ for the periodicity - though of course, this finding is an inevitable consequence of phase-locked responses of ANFs. In addition, there is evidence that the place code for frequency components is also critical. By crossing a low-frequency stimulus with a high-frequency carrier, Oxenham et al transposed the temporal fine-structure of the low frequency sinusoid to higher frequency regions along the BM.[63] This led to significantly impaired pitch discrimination abilities. Thus, both the place and temporal coding represent pitch-related information in the ANFs.

The auditory nerve carries information to the cochlear nucleus (CN). Here, many cell types represent pitch-related information in different ways. For example, many bushy cells appear to have little difference in firing properties of auditory nerve fibres – information may be carried to higher order brain regions without significant modification[56].Of particular interest are the sustained chopper cells in the ventral cochlear nucleus. According Winter and colleagues, the first-order spike intervals in these cells corresponds to periodicity in response to iterated rippled noise stimuli (IRN), as well to cosine-phase and random-phase harmonic complexes, quite invariantly to sound level[64]. While further characterisation of these cells' responses to different pitch-evoking stimuli is required, there is therefore some indication that pitch extraction may begin as early as the level of the CN.

In the inferior colliculus (IC), there is some evidence that the average response rate of neurons is equal to the periodicity of the stimulus[65]. Subsequent studies comparing IC neuron responses to same-phase and alternating-phase harmonic complexes suggest that these cells may be responding to the periodicity of the overall energy level (i.e., the envelope), rather than true modulating frequency, yet it is not clear whether this applies only for unresolved harmonics (as would be predicted by psychophysical experiments) or also for resolved harmonics[56]. There remains much uncertainty regarding the representation of periodicity in the IC.

Pitch coding in the auditory cortex

Thus, there is a tendency to enhance that representations of F0 throughout the ascending auditory system, though the precise nature of this remains unclear. In these subcortical stages of the ascending auditory pathway however, there is no evidence for an explicit representation that consistently encodes information corresponding to perceived pitch. Such representations likely occur in ‘higher’ auditory regions, from primary auditory cortex onward.

Indeed, lesion studies have demonstrated the necessity for auditory cortex in pitch perception. Of course, an impairment in pitch detection following lesions to the auditory cortex may simply be reflect a passive transmission role for the cortex: where subcortical information must ‘pass through’ to affect behaviour. Yet studies such as that by Whitfield have demonstrated that this is likely not the case: while decorticate cats could be re-trained (following an ablation of their auditory cortex) to recognise complex tones comprised of three frequency components, the animals selectively lost the ability generalise these tones to other complexes with the same pitch[66]. In other words, while the harmonic composition could influence behaviour, harmonic relations (i.e. a pitch cue) could not. For example, the lesioned animal could correctly respond to a pure tone at 100Hz, but would not respond to a harmonic complex consisting of its harmonic overtones (at 200Hz, 300Hz, and so forth). This suggests strongly a role for the auditory cortex in further extraction of pitch-related information.

Early MEG studies of the primary auditory cortex had suggested that A1 contained a map of pitch. This was based on the findings that a pure tone and its missing fundamental harmonic complex (MF) evoked stimulus-evoked excitation (called the N100m) in the same location, whereas components frequencies of the MF presented in isolation evoked excitations in different locations[67]. Yet such notions were overcast by the results of experiments using higher spatial-resolution techniques: local field potential (LFP) and multi-unit recording (MUA) demonstrated that the mapping A1 was tonotopic – that is, based on neurons’ best frequency (BF), rather than best ‘pitch’[68]. These techniques do however demonstrate an emergence of distinct coding mechanisms reflective of extracting temporal and spectral cues: phase-locked representation of temporal envelope repetition rate was recorded in the higher BF regions of the tonotopic map, while the harmonic structure of the click train was represented in lower BF regions[69].Thus, the cues for pitch extraction may be further enhanced by this stage.

Schematic illustration of multi-peaked neurons. Blue dotted line shows a classical tuning curve for a 'single-peaked' frequency selective neuron with a best frequency (BF) at around 500Hz, as illustrated by the maximal response of this neuron to frequencies around this BF. The red solid line shows a schematic response of a multi-peaked neuron identified by Kadia and Wang (2003). In addition to a BF at 300Hz, this neuron is also excited by tones at 600Hz and 900Hz - i.e., frequencies in harmonic relation to the principal BF. Although not illustrated here, responses of such neurons to harmonic complexes (in this case, consisting of 300, 600, and 900 Hz for example) often had an additive effect, eliciting responses greater than that of a pure tone at the BF (i.e., 300Hz) alone. See reference [18]

An example of a neuronal substrate that may facilitate such an enhancement was described by Kadia and Wang in primary auditory cortex of marmosets[70]. Around 20% of the neurons here could be classified as ‘multi-peaked’ units: neurons that have multiple frequency response areas, often in harmonic relation (see figure, right). Further, excitation of two of these spectral peaks what shown to have a synergistic effect on the neurons’ responses. This would therefore facilitate the extraction of harmonically related tones in the acoustic stimulus, allowing these neurons to act as a ‘harmonic template’ for extracting spectral cues. Additionally, these authors observed that in the majority of ‘single peaked’ neurons (i.e. neurons with a single spectral tuning peak at its BF), a secondary tone could have a modulatory (facilitating or inhibiting) effect on the response of the neuron to its BF. Again, these modulating frequencies were often harmonically related to the BF. These facilitating mechanisms may therefore accommodate the extraction of certain harmonic components, while rejecting other spectral combinations through inhibitory modulation may facilitate the disambiguation with other harmonic complexes or non-harmonic complexes such as broadband noise.

Putative 'pitch regions' in human supratemporal plane. (A) Lateral view of the left hemisphere, with STG indicated in red. (B–D) Top view of left supratemporal plane, after removal of a large part of the parietal cortex. PP, HG, and PT are indicated in blue, yellow, and green, respectively. Major sulci are outlined in black (FTS, first transverse sulcus; SI, sulcus intermediate; HS, Heschl's sulcus; HS1, first Heschl's sulcus; HS2, second Heschl's sulcus). Panels include hemispheres with one HG, an incomplete separation of HG, and two HG in (B–D), respectively.

However, given that the tendency to enhance F0 has been demonstrated throughout the subcortical auditory system, we might expect have to come closer to a more explicit representation of pitch in the cortex. Neuroimaging experiments have explored this idea, capitalising on the emergent quality of pitch: a subtractive method can identify areas in the brain which show BOLD responses in response to a pitch-evoking stimulus, but not to another sound with very similar spectral properties, but does not evoke pitch perception. Such strategies were used by Patterson, Griffiths and colleagues: by subtracting the BOLD signal acquired during presentation of broad-band noise from the signal acquired during presentation of IRN, they identified a selective activation of the lateral (and to some extent, medial) Heschl’s gyrus (HG) in response to the latter class of pitch-evoking sounds[71]. Further, varying the repetition rate of IRN over time to create a melody led to additional activation in the superior temporal gyrus (STG) and planum polare (PP), suggesting a hierarchical processing of pitch through the auditory cortex. In line with this, MEG recordings by Krumbholz et al showed that, as the repetition rate of IRN stimuli is increased, a novel N100m is detected around the HG as the repetition rate crosses the lower threshold for pitch perception, and the magnitude of this “pitch-onset response” increased with pitch salience[72].

There is however some debate about the precise location of the pitch selective area. As Hall and Plack point out, the use of IRN stimuli alone to identify pitch-sensitive cortical areas is insufficient to capture the broad range of stimuli that can induce pitch perception: the activation of HG may be specific to repetitive broadband stimuli[73]. Indeed, based on BOLD signals observed in response to multiple pitch-evoking stimuli, Hall and Plack suggest that the planum temporale (PT) is more relevant for pitch processing.

Despite ongoing disagreement about the precise neural area specialised for pitch coding, such evidence suggests that regions lying anterolateral to A1 may be specialised for pitch perception. Further support for this notion is provided by the identification of ‘pitch selective’ neurons at the anterolateral border of A1 in the marmoset auditory cortex. These neurons were selectively responsive to both pure tones and missing F0 harmonics with the similar periodicities[74]. Many of these neurons were also sensitive to the periodicity of other pitch-evoking stimuli, such as click trains or IRN noise. This provides strong evidence that these neurons are not merely responding any particular component of the acoustic signal, but specifically represent pitch-related information.

Periodicity coding or pitch coding?

Accumulating evidence thus suggests that there are neurons and neural areas specialised in extracting F0, likely in regions just anterolateral to the low BF regions of A1. However, there are still difficulties in calling these neurons or areas “pitch selective”. While stimulus F0 is certainly a key determinant of pitch, it is not necessarily equivalent to the pitch perceived by the listener.

There are however several lines of evidence suggesting that these regions are indeed coding pitch, rather than just F0. For instance, further investigation of the marmoset pitch-selective units by Bendor and colleagues has demonstrated that the activity in these neurons corresponds well to the animals' psychophysical responses[59]. These authors tested the animals’ abilities to detect an alternating-phase harmonic complex amidst an ongoing presentation of same-phase harmonics at the same F0, in order to distinguish between when animals rely more on temporal envelope cues for pitch perception, rather than spectral cues. Consistent with psychophysical experiments in humans, the marmosets used primarily temporal envelope cues for higher order, unresolved harmonics of low F0, while spectral cues were used to extract pitch from lower-order harmonics of high F0 complexes. Recording from these pitch selective neurons showed that the F0 tuning shifted down an octave for alternating-phase harmonics, compared to same-phase harmonics for neurons tuned to low F0s. These patterns of neuronal responses are thus consistent with the psychophysical results, and suggest that both temporal and spectral cues are integrated in these neurons to influence pitch perception.

Yet, again, this study cannot definitively distinguish whether these pitch-selective neurons explicitly represent pitch, or simply an integration of F0 information that will then be subsequently decoded to perceive pitch. A more direct approach to addressing this issue was taken by Bizley et al, who analysed how auditory cortex LFP and MUA measurements in ferrets could independently be used to estimate stimulus F0 and pitch perception[75]. While ferrets were engaged in a pitch discrimination task (to indicate whether a target artificial vowel sound was higher or lower in pitch than a reference in a 2-alternative forced choice paradigm), receiver operating characteristic (ROC) analysis was used to estimate the discriminability of neural activity in predicting the change in F0 or the actual behavioural choice (i.e. a surrogate for perceived pitch). They found that neural responses across the auditory cortex were informative regarding both. Initially, the activity better discriminated F0 than the animal’s choice, but information regarding the animals’ choice grew steadily higher throughout the post-stimulus interval, eventually becoming more discriminable than the direction of F0 change[75].

Comparing the differences in ROC between the cortical areas studied showed that posterior fields activity better discriminated the ferrets’ choice. This may be interpreted in two ways. Since choice-related activity was higher in the posterior fields (which lie by the low BF border of A1), compared to the primary fields, this may be seen as further evidence for pitch-selectivity near low BF border of A1. On the other hand, the fact the pitch-related information was also observed in the primary auditory fields may suggest that sufficient pitch-related information may already be established by this stage, or that a distributed code across multiple auditory areas code pitch. Indeed, while single neurons distributed across the auditory cortex are in general sensitive to multiple acoustic parameters (and therefore not ‘pitch-selective’), information theoretic or neurometric analyses (using neural data to infer stimulus-related information) indicate that pitch information can nevertheless be robustly represented via population coding, or even by single neurons through temporal multiplexing (i.e., representing multiple sound features in distinct time windows)[76][77]. Thus, in the absence of stimulation or deactivation of these putative pitch-selective neurons or areas to demonstrate that such interventions induce predictable biases or impairments in pitch, it may be that pitch is represented in spatially and temporally distributed codes across the auditory cortex, rather than relying on specialised local representations.

Thus, both electrophysiological recording and neuroimaging studies suggest that there may be an explicit neural code for pitch lies near the low BF border of A1. Certainly, the consistent and selective responses to a wide range of pitch-evoking stimuli suggest that these putative pitch-selective neurons and areas are not simply reflecting any immediately available physical characteristic of the acoustic signal. Moreover, there is evidence that these putative pitch-selective neurons extract information from spectral and temporal cues in much the same way as the animal. However, by virtue of the abstract relationship between pitch and an acoustic signal, such correlative evidence between a stimulus and neural response can only be interpreted as evidence that the auditory system has the capacity to form enhanced representations of pitch-related parameters. Without more direct causal evidence for these putative pitch-selective neurons and neural areas determining pitch perception, we cannot conclude whether animals do indeed rely on such localised explicit codes for pitch, or if the robust distributed representations of pitch across the auditory cortex mark the final coding of pitch in the auditory system.    


  1. a b c d Conway, Bevil R (2009). "Color vision, cones, and color-coding in the cortex". The neuroscientist. 15: 274–290.
  2. Russell, Richard and Sinha, Pawan} (2007). "Real-world face recognition: The importance of surface reflectance properties". Perception. 36 (9).{{cite journal}}: CS1 maint: multiple names: authors list (link)
  3. Gegenfurtner, Karl R and Rieger, Jochem (2000). "Sensory and cognitive contributions of color to the recognition of natural scenes". Current Biology. 10 (13): 805–808.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. Changizi, Mark A and Zhang, Qiong and Shimojo, Shinsuke (2006). "Bare skin, blood and the evolution of primate colour vision". Biology letters. 2 (2): 217–221.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  5. a b Beretta, Giordano (2000). Understanding Color. Hewlett-Packard.
  6. a b Boynton, Robert M (1988). "Color vision". Annual review of psychology. 39 (1): 69–100.
  7. Grassmann, Hermann (1853). "Zur theorie der farbenmischung". Annalen der Physik. 165 (5): 69–84.
  8. Konig, Arthur and Dieterici, Conrad (1886). "Die Grundempfindungen und ihre intensitats-Vertheilung im Spectrum". Koniglich Preussischen Akademie der Wissenschaften.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  9. Smith, Vivianne C and Pokorny, Joel (1975). "Spectral sensitivity of the foveal cone photopigments between 400 and 500 nm". Vision research. 15 (2): 161–171.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  10. Vos, JJ and Walraven, PL (1971). "On the derivation of the foveal receptor primaries". Vision Research. 11 (8): 799–818.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  11. a b c Gegenfurtner, Karl R and Kiper, Daniel C (2003). "Color vision". Neuroscience. 26 (1): 181.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  12. Kaiser, Peter K and Boynton, Robert M (1985). "Role of the blue mechanism in wavelength discrimination". Vision research. 125 (4): 523–529.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  13. Paulus, Walter and Kroger-Paulus, Angelika (1983). "A new concept of retinal colour coding". Vision research. 23 (5): 529–540.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  14. Nerger, Janice L and Cicerone, Carol M (1992). "The ratio of L cones to M cones in the human parafoveal retina". Vision research. 32 (5): 879–888.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  15. Neitz, Jay and Carroll, Joseph and Yamauchi, Yasuki and Neitz, Maureen and Williams, David R (2002). "Color perception is mediated by a plastic neural mechanism that is adjustable in adults". Neuron. 35 (4): 783–792.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  16. Jacobs, Gerald H and Williams, Gary A and Cahill, Hugh and Nathans, Jeremy (2007). "Emergence of novel color vision in mice engineered to express a human cone photopigment". Science. 315 (5819): 1723–1725.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  17. Osorio, D and Ruderman, DL and Cronin, TW (1998). "Estimation of errors in luminance signals encoded by primate retina resulting from sampling of natural images with red and green cones". JOSA A. 15 (1): 16–22.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  18. Kersten, Daniel (1987). "Predictability and redundancy of natural images". JOSA A. 4 (112): 2395–2400.
  19. Jolliffe, I. T. (2002). Principal Component Analysis. Springer.
  20. Buchsbaum, Gershon and Gottschalk, A (1983). "Trichromacy, opponent colours coding and optimum colour information transmission in the retina". Proceedings of the Royal society of London. Series B. Biological sciences. 220 (1218): 89–113.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  21. Zaidi, Qasim (1997). "Decorrelation of L-and M-cone signals". JOSA A. 14 (12): 3430–3431.
  22. Ruderman, Daniel L and Cronin, Thomas W and Chiao, Chuan-Chin (1998). "Statistics of cone responses to natural images: Implications for visual coding". JOSA A. 15 (8): 2036–2045.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  23. Lee, BB and Martin, PR and Valberg, A (1998). "The physiological basis of heterochromatic flicker photometry demonstrated in the ganglion cells of the macaque retina". The Journal of Physiology. 404 (1): 323–347.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  24. a b Derrington, Andrew M and Krauskopf, John and Lennie, Peter (1984). "Chromatic mechanisms in lateral geniculate nucleus of macaque". The Journal of Physiology. 357 (1): 241–265.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  25. Shapley, Robert (1990). "Visual sensitivity and parallel retinocortical channels". Annual review of psychology. 41 (1): 635--658.
  26. Dobkins, Karen R and Thiele, Alex and Albright, Thomas D (2000). "Comparison of red--green equiluminance points in humans and macaques: evidence for different L: M cone ratios between species". JOSA A. 17 (3): 545–556.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  27. Martin, Paul R and Lee, Barry B and White, Andrew JR and Solomon, Samuel G and Ruttiger, Lukas (2001). "Chromatic sensitivity of ganglion cells in the peripheral primate retina". Nature. 410 (6831): 933–936.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  28. Perry, VH and Oehler, R and Cowey, A (1984). "Retinal ganglion cells that project to the dorsal lateral geniculate nucleus in the macaque monkey". Neuroscience. 12 (4): 1101--1123. {{cite journal}}: Cite has empty unknown parameter: |month= (help)CS1 maint: multiple names: authors list (link)
  29. Casagrande, VA (1994). "A third parallel visual pathway to primate area V1". Trends in neurosciences. 17 (7): 305–310.
  30. Hendry, Stewart HC and Reid, R Clay (2000). "The koniocellular pathway in primate vision". Annual review of neuroscience. 23 (1): 127–153.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  31. Callaway, Edward M (1998). "Local circuits in primary visual cortex of the macaque monkey". Annual review of neuroscience. 21 (1): 47–74.
  32. Conway, Bevil R (2001). "Spatial structure of cone inputs to color cells in alert macaque primary visual cortex (V-1)". The Journal of Neuroscience. 21 (8): 2768–2783. {{cite journal}}: Cite has empty unknown parameter: |month= (help)
  33. Horwitz, Gregory D and Albright, Thomas D (2005). "Paucity of chromatic linear motion detectors in macaque V1". Journal of Vision. 5 (6).{{cite journal}}: CS1 maint: multiple names: authors list (link)
  34. Danilova, Marina V and Mollon, JD (2006). "The comparison of spatially separated colours". Vision research. 46 (6): 823–836.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  35. Wachtler, Thomas and Sejnowski, Terrence J and Albright, Thomas D (2003). "Representation of color stimuli in awake macaque primary visual cortex". Neuron. 37 (4): 681–691.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  36. Solomon, Samuel G and Lennie, Peter (2005). "Chromatic gain controls in visual cortical neurons". The Journal of neuroscience. 25 (19): 4779–4792.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  37. Hubel, David H (1995). Eye, brain, and vision. Scientific American Library/Scientific American Books.
  38. Livingstone, Margaret S and Hubel, David H (1987). "Psychophysical evidence for separate channels for the perception of form, color, movement, and depth". The Journal of Neuroscience. 7 (11): 3416–3468.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  39. Zeki, Semir M (1973). "Colour coding in rhesus monkey prestriate cortex". Brain research. 53 (2): 422–427.
  40. Conway, Bevil R and Tsao, Doris Y (2006). "Color architecture in alert macaque cortex revealed by fMRI". Cerebral Cortex. 16 (11): 1604–1613.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  41. Tootell, Roger BH and Nelissen, Koen and Vanduffel, Wim and Orban, Guy A (2004). "Search for color 'center(s)'in macaque visual cortex". Cerebral Cortex. 14 (4): 353–363.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  42. Conway, Bevil R and Moeller, Sebastian and Tsao, Doris Y (2007). "Specialized color modules in macaque extrastriate cortex". 560--573. 56 (3): 560–573.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  43. a b c d Fairchild, Mark D (2013). Color appearance models. John Wiley & Sons.
  44. Webster, Michael A (1996). "Human colour perception and its adaptation". Network: Computation in Neural Systems. 7 (4): 587–634.
  45. Shapley, Robert and Enroth-Cugell, Christina (1984). "Visual adaptation and retinal gain controls". Progress in retinal research. 3: 263–346.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  46. Chaparro, A and Stromeyer III, CF and Chen, G and Kronauer, RE (1995). "Human cones appear to adapt at low light levels: Measurements on the red-green detection mechanism". Vision Research. 35 (22): 3103–3118.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  47. Macleod, Donald IA and Williams, David R and Makous, Walter (1992). "A visual nonlinearity fed by single cones". Vision research. 32 (2): 347–363.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  48. Hayhoe, Mary (1991). Adaptation mechanisms in color and brightness. Springer.
  49. MacAdam, DAvid L (1970). Sources of Color Science. MIT Press.
  50. Webster, Michael A and Mollon, JD (1995). "Colour constancy influenced by contrast adaptation". Nature. 373 (6516): 694–698.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  51. Brainard, David H and Wandell, Brian A (1992). "Asymmetric color matching: how color appearance depends on the illuminant". JOSA A. 9 (9): 1443–1448.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  52. NeurOreille and authors (2010). "Journey into the world of hearing".
  53. Schouten, J. F. (1938). The perception of subjective tones. Proceedings of the Koninklijke Nederlandse Akademie van Wetenschappen41, 1086-1093.  
  54. Cynx, J. & Shapiro, M. Perception of missing fundamental by a species of songbird (Sturnus vulgaris). J Comp Psychol 100, 356–360 (1986).
  55. Heffner, H., & Whitfield, I. C. (1976). Perception of the missing fundamental by cats. The Journal of the Acoustical Society of America59(4), 915-919.
  56. a b c d e Schnupp, J., Nelken, I. & King, A. Auditory neuroscience: Making sense of sound. (MIT press, 2011).
  57. Gerlach, S., Bitzer, J., Goetze, S. & Doclo, S. Joint estimation of pitch and direction of arrival: improving robustness and accuracy for multi-speaker scenarios. EURASIP Journal on Audio, Speech, and Music Processing 2014, 1 (2014).
  58. Carlyon RP, Shackleton TM (1994). "Comparing the fundamental frequencies of resolved and unresolved harmonics: Evidence for two pitch mechanisms?" Journal of the Acoustical Society of America 95:3541-3554    
  59. a b Bendor D, Osmanski MS, Wang X (2012). "Dual-pitch processing mechanisms in primate auditory cortex," Journal of Neuroscience 32:16149-61.
  60. Tramo, M. J., Shah, G. D., & Braida, L. D. (2002). Functional role of auditory cortex in frequency processing and pitch perception. Journal of Neurophysiology87(1), 122-139.
  61. Rask-Andersen, H., Tylstedt, S., Kinnefors, A., & Illing, R. B. (2000). Synapses on human spiral ganglion cells: a transmission electron microscopy and immunohistochemical study. Hearing research141(1), 1-11.
  62. Cariani, P. A., & Delgutte, B. (1996). Neural correlates of the pitch of complex tones. I. Pitch and pitch salience. Journal of Neurophysiology76(3), 1698-1716.
  63. Oxenham, A. J., Bernstein, J. G., & Penagos, H. (2004). Correct tonotopic representation is necessary for complex pitch perception. Proceedings of the National Academy of Sciences of the United States of America101(5), 1421-1425.    
  64. Winter, I. M., Wiegrebe, L., & Patterson, R. D. (2001). The temporal representation of the delay of iterated rippled noise in the ventral cochlear nucleus of the guinea-pig. The Journal of physiology, 537(2), 553-566.
  65. Schreiner, C. E. & Langner, G. Periodicity coding in the inferior colliculus of the cat. II. Topographical organization. Journal of neurophysiology 60, 1823–1840 (1988).
  66. Whitfield IC (1980). "Auditory cortex and the pitch of complex tones." J Acoust Soc Am. 67(2):644-7.
  67. Pantev, C., Hoke, M., Lutkenhoner, B., & Lehnertz, K. (1989). Tonotopic organization of the auditory cortex: pitch versus frequency representation.Science246(4929), 486-488.
  68. Fishman YI, Reser DH, Arezzo JC, Steinschneider M (1998). "Pitch vs. spectral encoding of harmonic complex tones in primary auditory cortex of the awake monkey," Brain Res 786:18-30.    
  69. Steinschneider M, Reser DH, Fishman YI, Schroeder CE, Arezzo JC (1998) Click train encoding in primary auditory cortex of the awake monkey: evidence for two mechanisms subserving pitch perception. J Acoust Soc Am 104:2935–2955.    
  70. Kadia, S. C., & Wang, X. (2003). Spectral integration in A1 of awake primates: neurons with single-and multipeaked tuning characteristics. Journal of neurophysiology89(3), 1603-1622.    
  71. Patterson RD, Uppenkamp S, Johnsrude IS, Griffiths TD. (2002) "The processing of temporal pitch and melody information in auditory cortex," Neuron 36:767-776.    
  72. Krumbholz, K., Patterson, R. D., Seither-Preisler, A., Lammertmann, C., & Lütkenhöner, B. (2003). Neuromagnetic evidence for a pitch processing center in Heschl’s gyrus. Cerebral Cortex13(7), 765-772.
  73. Hall DA, Plack CJ (2009). "Pitch processing sites in the human auditory brain," Cereb Cortex 19(3):576-85.    
  74. Bendor D, Wang X (2005). "The neuronal representation of pitch in primate auditory cortex," Nature 436(7054):1161-5.    
  75. a b Bizley JK, Walker KMM, Nodal FR, King AJ, Schnupp JWH (2012). "Auditory Cortex Represents Both Pitch Judgments and the Corresponding Acoustic Cues," Current Biology 23:620-625.
  76. Walker KMM, Bizley JK, King AJ, and Schnupp JWH. (2011). Multiplexed and robust representations of sound features in auditory cortex. Journal of Neurosci 31(41): 14565-76 
  77. Bizley JK, Walker KM, King AJ, and Schnupp JW. (2010). "Neural ensemble codes for stimulus periodicity in auditory cortex." J Neurosci 30(14): 5078-91.    

Vestibular System

Technological Aspects
In Animals


The main function of the balance system, or vestibular system, is to sense head movements, especially involuntary ones, and counter them with reflexive eye movements and postural adjustments that keep the visual world stable and keep us from falling. An excellent, more extensive article on the vestibular system is available on Scholorpedia [1]. An extensive review of our current knowledge about the vestibular system can be found in "The Vestibular System: a Sixth Sense" by J Goldberg et al [2].

Anatomy of the Vestibular System


Together with the cochlea, the vestibular system is carried by a system of tubes called the membranous labyrinth. These tubes are lodged within the cavities of the bony labyrinth located in the inner ear. A fluid called perilymph fills the space between the bone and the membranous labyrinth, while another one called endolymph fills the inside of the tubes spanned by the membranous labyrinth. These fluids have a unique ionic composition suited to their function in regulating the electrochemical potential of hair cells, which are as we will later see the transducers of the vestibular system. The electric potential of endolymph is of about 80 mV more positive than perilymph.

Since our movements consist of a combination of linear translations and rotations, the vestibular system is composed of two main parts: The otolith organs, which sense linear accelerations and thereby also give us information about the head’s position relative to gravity, and the semicircular canals, which sense angular accelerations.

Human bony labyrinth (Computed tomography 3D) Internal structure of the human labyrinth


The otolith organs of both ears are located in two membranous sacs called the utricle and the saccule which primary sense horizontal and vertical accelerations, respectively. Each utricle has about 30'000 hair cells, and each saccule about 16'000. The otoliths are located at the central part of the labyrinth, also called the vestibule of the ear. Both utricle and saccule have a thickened portion of the membrane called the macula. A gelatinous membrane called the otolthic membrane sits atop the macula, and microscopic stones made of calcium carbonate crystal, the otoliths, are embedded on the surface of this membrane. On the opposite side, hair cells embedded in supporting cells project into this membrane.

The otoliths are the human sensory organs for linear acceleration. The utricle (left) is approximately horizontally oriented; the saccule (center) lies approximately vertical. The arrows indicate the local on-directions of the hair cells; and the thick black lines indicate the location of the striola. On the right you see a cross-section through the otolith membrane. The graphs have been generated by Rudi Jaeger, while we cooperated on investigations of the otolith dynamics.

Semicircular Canals

Cross-section through ampulla. Top: The cupula spans the lumen of the ampulla from the crista to the membranous labyrinth. Bottom: Since head acceleration exceeds endolymph acceleration, the relative flow of endolymph in the canal is opposite to the direction of head acceleration. This flow produces a pressure across the elastic cupula, which deflects in response.

Each ear has three semicircular canals. They are half circular, interconnected membranous tubes filled with endolymph and can sense angular accelerations in the three orthogonal planes. The radius of curvature of the human horizontal semicircular canal is 3.2 mm [3].

The canals on each side are approximately orthogonal to each other. The orientation of the on-directions of the canals on the right side are [4]:

Canal X Y Z
Horizontal 0.32269 -0.03837 -0.94573
Anterior 0.58930 0.78839 0.17655
Posterior 0.69432 -0.66693 0.27042

(The axes are oriented such that the positive x-,y-,and z-axis point forward, left, and up, respectively. The horizontal plane is defined by Reid's line, the line connecting the lower rim of the orbita and the center of the external auditory canal. And the directions are such that a rotation about that vector, according to the right-hand-rule, excites the corresponding canal.) The anterior and posterior semicircular canals are approximately vertical, and the horizontal semicircular canals approximately horizontal.

Orientation of the semicircular canals in the vestibular system. "L / R" stand for "Left / Right", respectively, and "H / A / P" for "Horizontal / Anterior / Posterior". The arrows indicate the direction of head movement that stimulates the corresponding canal.

Each canal presents a dilatation at one end, called the ampulla. Each membranous ampulla contains a saddle-shaped ridge of tissue, the crista, which extends across it from side to side. It is covered by neuroepithelium, with hair cells and supporting cells. From this ridge rises a gelatinous structure, the cupula, which extends to the roof of the ampulla immediately above it, dividing the interior of the ampulla into two approximately equal parts.


The sensors within both the otolith organs and the semicircular canals are the hair cells. They are responsible for the transduction of a mechanical force into an electrical signal and thereby build the interface between the world of accelerations and the brain.

Transduction mechanism in auditory or vestibular haircell. Tilting the haircell towards the kinocilium opens the potassium ion channels. This changes the receptor potential in the haircell. The resulting emission of neurotransmittors can elicit an action potential (AP) in the post-synaptic cell.

Hair cells have a tuft of stereocilia that project from their apical surface. The thickest and longest stereocilia is the kinocilium. Stereocilia deflection is the mechanism by which all hair cells transduce mechanical forces. Stereocilia within a bundle are linked to one another by protein strands, called tip links, which span from the side of a taller stereocilium to the tip of its shorter neighbor in the array. Under deflection of the bundle, the tip links act as gating springs to open and close mechanically sensitive ion channels. Afferent nerve excitation works basically the following way: when all cilia are deflected toward the kinocilium, the gates open and cations, including potassium ions from the potassium rich endolymph, flow in and the membrane potential of the hair cell becomes more positive (depolarization). The hair cell itself does not fire action potentials. The depolarization activates voltage-sensitive calcium channels at the basolateral aspect of the cell. Calcium ions then flow in and trigger the release of neurotransmitters, mainly glutamate, which in turn diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger an increase of the action potentials firing rate in the nerve. On the other hand, afferent nerve inhibition is the process induced by the bending of the stereocilia away from the kinocilium (hyperpolarization) and by which the firing rate is decreased. Because the hair cells are chronically leaking calcium, the vestibular afferent nerve fires actively at rest and thereby allows the sensing of both directions (increase and decrease of firing rate). Hair cells are very sensitive and respond extremely quickly to stimuli. The quickness of hair cell response may in part be due to the fact that they must be able to release neurotransmitter reliably in response to a threshold receptor potential of only 100 µV or so.

Auditory haircells are very similar to those of the vestibular system. Here an electron microscopy image of a frog's sacculus haircell.

Regular and Irregular Haircells

While afferent haircells in the auditory system are fairly homogeneous, those in the vestibular system can be broadly separated into two groups: "regular units" and "irregular units". Regular haircells have approximately constant interspike intervals, and fire constantly proportional to their displacement. In contrast, the inter-spike interval of irregular haircells is much more variable, and their discharge rate increases with increasing frequency; they can thus act as event detectors at high frequencies. Regular and irregular haircells also differ in their location, morphology and innervation.

Signal Processing

Peripheral Signal Transduction

Transduction of Linear Acceleration

The hair cells of the otolith organs are responsible for the transduction of a mechanical force induced by linear acceleration into an electrical signal. Since this force is the product of gravity plus linear movements of the head,

it is therefore sometimes referred to as gravito-inertial force. The mechanism of transduction works roughly as follows: The otoconia, calcium carbonate crystals in the top layer of the otoconia membrane, have a higher specific density than the surrounding materials. Thus a linear acceleration leads to a displacement of the otoconia layer relative to the connective tissue. The displacement is sensed by the hair cells. The bending of the hairs then polarizes the cell and induces afferent excitation or inhibition.

Excitation (red) and inhibition (blue) on utricle (left) and saccule (right), when the head is in a right-ear-down orientation. The displacement of the otoliths was calculated with the finite element technique, and the orientation of the haircells was taken from the literature.

While each of the three semicircular canals senses only one-dimensional component of rotational acceleration, linear acceleration may produce a complex pattern of inhibition and excitation across the maculae of both the utricle and saccule. The saccule is located on the medial wall of the vestibule of the labyrinth in the spherical recess and has its macula oriented vertically. The utricle is located above the saccule in the elliptical recess of the vestibule, and its macula is oriented roughly horizontally when the head is upright. Within each macula, the kinocilia of the hair cells are oriented in all possible directions.

Therefore, under linear acceleration with the head in the upright position, the saccular macula is sensing acceleration components in the vertical plane, while the utricular macula is encoding acceleration in all directions in the horizontal plane. The otolthic membrane is soft enough that each hair cell is deflected proportional to the local force direction. If denotes the direction of maximum sensitivity or on-direction of the hair cell, and the gravito-inertial force, the stimulation by static accelerations is given by

The direction and magnitude of the total acceleration is then determined from the excitation pattern on the otolith maculae.

Transduction of Angular Acceleration

The three semicircular canals are responsible for the sensing of angular accelerations. When the head accelerates in the plane of a semicircular canal, inertia causes the endolymph in the canal to lag behind the motion of the membranous canal. Relative to the canal walls, the endolymph effectively moves in the opposite direction as the head, pushing and distorting the elastic cupula. Hair cells are arrayed beneath the cupula on the surface of the crista and have their stereocilia projecting into the cupula. They are therefore excited or inhibited depending on the direction of the acceleration.

The stimulation of a human semicircular canal is proportional to the scalar product between a vector n (which is perpendicular to the plane of the canal), and the vector omega indicating the angular velocity.

This facilitates the interpretation of canal signals: if the orientation of a semicircular canal is described by the unit vector , the stimulation of the canal is proportional to the projection of the angular velocity onto this canal

The horizontal semicircular canal is responsible for sensing accelerations around a vertical axis, i.e. the neck. The anterior and posterior semicircular canals detect rotations of the head in the sagittal plane, as when nodding, and in the frontal plane, as when cartwheeling.

In a given cupula, all the hair cells are oriented in the same direction. The semicircular canals of both sides also work as a push-pull system. For example, because the right and the left horizontal canal cristae are “mirror opposites” of each other, they always have opposing (push-pull principle) responses to horizontal rotations of the head. Rapid rotation of the head toward the left causes depolarization of hair cells in the left horizontal canal's ampulla and increased firing of action potentials in the neurons that innervate the left horizontal canal. That same leftward rotation of the head simultaneously causes a hyperpolarization of the hair cells in the right horizontal canal's ampulla and decreases the rate of firing of action potentials in the neurons that innervate the horizontal canal of the right ear. Because of this mirror configuration, not only the right and left horizontal canals form a push-pull pair but also the right anterior canal with the left posterior canal (RALP), and the left anterior with the right posterior (LARP).

Central Vestibular Pathways

The information resulting from the vestibular system is carried to the brain, together with the auditory information from the cochlea, by the vestibulocochlear nerve, which is the eighth of twelve cranial nerves. The cell bodies of the bipolar afferent neurons that innervate the hair cells in the maculae and cristae in the vestibular labyrinth reside near the internal auditory meatus in the vestibular ganglion (also called Scarpa's ganglion, Figure 10.1). The centrally projecting axons from the vestibular ganglion come together with axons projecting from the auditory neurons to form the eighth nerve, which runs through the internal auditory meatus together with the facial nerve. The primary afferent vestibular neurons project to the four vestibular nuclei that constitute the vestibular nuclear complex in the brainstem.

Vestibulo-ocular reflex.
Vestibulo-ocular reflex.

Vestibulo-Ocular Reflex (VOR)

An extensively studied example of function of the vestibular system is the vestibulo-ocular reflex (VOR). The function of the VOR is to stabilize the image during rotation of the head. This requires the maintenance of stable eye position during horizontal, vertical and torsional head rotations. When the head rotates with a certain speed and direction, the eyes rotate with the same speed but in the opposite direction. Since head movements are present all the time, the VOR is very important for stabilizing vision.

How does the VOR work? The vestibular system signals how fast the head is rotating and the oculomotor system uses this information to stabilize the eyes in order to keep the visual image motionless on the retina. The vestibular nerves project from the vestibular ganglion to the vestibular nuclear complex, where the vestibular nuclei integrate signals from the vestibular organs with those from the spinal cord, cerebellum, and the visual system. From these nuclei, fibers cross to the contralateral abducens nucleus. There they synapse with two additional pathways. One pathway projects directly to the lateral rectus muscle of eye via the abducens nerve. Another nerve tract projects from the abducens nucleus by the abducens interneurons to the oculomotor nuclei, which contain motor neurons that drive eye muscle activity, specifically activating the medial rectus muscles of the eye through the oculomotor nerve. This short latency connection is sometimes referred to as three-neuron-arc, and allows an eye movement within less than 10 ms after the onset of the head movement.

For example, when the head rotates rightward, the following occurs. The right horizontal canal hair cells depolarize and the left hyperpolarize. The right vestibular afferent activity therefore increases while the left decreases. The vestibulocochlear nerve then carries this information to the brainstem and the right vestibular nuclei activity increases while the left decreases. This makes in turn neurons of the left abducens nucleus and the right oculomotor nucleus fire at higher rate. Those in the left oculomotor nucleus and the right abducens nucleus fire at a lower rate. This results in the fact than the left lateral rectus extraocular muscle and the right medial rectus contract while the left medial rectus and the right lateral rectus relax. Thus, both eyes rotate leftward.

The gain of the VOR is defined as the change in the eye angle divided by the change in the head angle during the head turn

If the gain of the VOR is wrong, that is, different than one, then head movements result in image motion on the retina, resulting in blurred vision. Under such conditions, motor learning adjusts the gain of the VOR to produce more accurate eye motion. Thereby the cerebellum plays an important role in motor learning.

The Cerebellum and the Vestibular System

It is known that postural control can be adapted to suit specific behavior. Patient experiments suggest that the cerebellum plays a key role in this form of motor learning. In particular, the role of the cerebellum has been extensively studied in the case of adaptation of vestibulo-ocular control. Indeed, it has been shown that the gain of the vestibulo-ocular reflex adapts to reach the value of one even if damage occur in a part of the VOR pathway or if it is voluntary modified through the use of magnifying lenses. Basically, there are two different hypotheses about how the cerebellum plays a necessary role in this adaptation. The first from (Ito 1972;Ito 1982) claims that the cerebellum itself is the site of learning, while the second from Miles and Lisberger (Miles and Lisberger 1981) claims that the vestibular nuclei are the site of adaptive learning while the cerebellum constructs the signal that drives this adaptation. Note that in addition to direct excitatory input to the vestibular nuclei, the sensory neurons of the vestibular labyrinth also provide input to the Purkinje cells in the flocculo-nodular lobes of the cerebellum via a pathway of mossy and parallel fibers. In turn, the Purkinje cells project an inhibitory influence back onto the vestibular nuclei. Ito argued that the gain of the VOR can be adaptively modulated by altering the relative strength of the direct excitatory and indirect inhibitory pathways. Ito also argued that a message of retinal image slip going through the inferior olivary nucleus carried by the climbing fiber plays the role of an error signal and thereby is the modulating influence of the Purkinje cells. On the other hand, Miles and Lisberger argued that the brainstem neurons targeted by the Purkinje cells are the site of adaptive learning and that the cerebellum constructs the error signal that drives this adaptation.

Alcohol and the Vestibular System

As you may or may not know from personal experience, consumption of alcohol can also induce a feeling of rotation. The explanation is quite straightforward, and basically relies on two factors: i) alcohol is lighter than the endolymph; and ii) once it is in the blood, alcohol gets relatively quickly into the cupula, as the cupula has a good blood supply. In contrast, it diffuses only slowly into the endolymph, over a period of a few hours. In combination, this leads to a buoyancy of the cupola soon after you have consumed (too much) alcohol. When you lie on your side, the deflection of the left and right horizontal cupulae add up, and induce a strong feeling of rotation. The proof: just roll on the other side - and the perceived direction of rotation will flip around!

Due to the position of the cupulae, you will experience the strongest effect when you lie on your side. When you lie on your back, the deflection of the left and right cupula compensate each other, and you don't feel any horizontal rotation. This explains why hanging one leg out of the bed slows down the perceived rotation.

The overall effect is minimized in the upright head position - so try to stay up(right) as long as possible during the party!

If you have drunk way too much, the endolymph will contain a significant amount of alcohol the next morning - more so than the cupula. This explains while at that point, a small amount of alcohol (e.g. a small beer) balances the difference, and reduces the feeling of spinning.

Somatosensory System

Technological Aspects
In Animals


Anatomy of the Somatosensory System

Our somatosensory system consists of sensors in the skin and sensors in our muscles, tendons, and joints. The receptors in the skin, the so called cutaneous receptors, tell us about temperature (thermoreceptors), pressure and surface texture (mechano receptors), and pain (nociceptors). The receptors in muscles and joints provide information about muscle length, muscle tension, and joint angles. (The following description is based on lecture notes from Laszlo Zaborszky, from Rutgers University.)

Cutaneous receptors


Receptors in the human skin: Mechanoreceptors can be free receptors or encapsulated. Examples for free receptors are the hair receptors at the roots of hairs. Encapsulated receptors are the Pacinian corpuscles and the receptors in the glabrous (hairless) skin: Meissner corpuscles, Ruffini corpuscles and Merkel’s disks.

Sensory information from Meissner corpuscles and rapidly adapting afferents leads to adjustment of grip force when objects are lifted. These afferents respond with a brief burst of action potentials when objects move a small distance during the early stages of lifting. In response to rapidly adapting afferent activity, muscle force increases reflexively until the gripped object no longer moves. Such a rapid response to a tactile stimulus is a clear indication of the role played by somatosensory neurons in motor activity.

The slowly adapting Merkel's receptors are responsible for form and texture perception. As would be expected for receptors mediating form perception, Merkel‘s receptors are present at high density in the digits and around the mouth (50/mm² of skin surface), at lower density in other glabrous surfaces, and at very low density in hairy skin. This innervations density shrinks progressively with the passage of time so that by the age of 50, the density in human digits is reduced to 10/mm². Unlike rapidly adapting axons, slowly adapting fibers respond not only to the initial indentation of skin, but also to sustained indentation up to several seconds in duration.

Activation of the rapidly adapting Pacinian corpuscles gives a feeling of vibration, while the slowly adapting Ruffini corpuscles respond to the lateral movement or stretching of skin.

Rapidly adapting Slowly adapting
Surface receptor / small receptive field Hair receptor, Meissner's corpuscle: Detect an insect or a very fine vibration. Used for recognizing texture. Merkel's receptor: Used for spatial details, e.g. a round surface edge or "an X" in brail.
Deep receptor / large receptive field Pacinian corpuscle: "A diffuse vibration" e.g. tapping with a pencil. Ruffini's corpuscle: "A skin stretch". Used for joint position in fingers.


Nociceptors have free nerve endings. Functionally, skin nociceptors are either high-threshold mechanoreceptors or polymodal receptors. Polymodal receptors respond not only to intense mechanical stimuli, but also to heat and to noxious chemicals. These receptors respond to minute punctures of the epithelium, with a response magnitude that depends on the degree of tissue deformation. They also respond to temperatures in the range of 40–60°C, and change their response rates as a linear function of warming (in contrast with the saturating responses displayed by non-noxious thermoreceptors at high temperatures).

Pain signals can be separated into individual components, corresponding to different types of nerve fibers used for transmitting these signals. The rapidly transmitted signal, which often has high spatial resolution, is called first pain or cutaneous pricking pain. It is well localized and easily tolerated. The much slower, highly affective component is called second pain or burning pain; it is poorly localized and poorly tolerated. The third or deep pain, arising from viscera, musculature and joints, is also poorly localized, can be chronic and is often associated with referred pain.


The thermoreceptors have free nerve endings. Interestingly, we have only two types of thermoreceptors that signal innocuous warmth and cooling respectively in our skin (however, some nociceptors are also sensitive to temperature, but capable of unambiguously signaling only noxious temperatures). The warm receptors show a maximum sensitivity at ~ 45°C, signal temperatures between 30 and 45°C, and cannot unambiguously signal temperatures higher than 45°C , and are unmyelinated. The cold receptors have their maximum sensitivity at ~ 27°C, signal temperatures above 17°C, and some consist of lightly myelinated fibers, while others are unmyelinated. Our sense of temperature comes from the comparison of the signals from the warm and cold receptors. Thermoreceptors are poor indicators of absolute temperature but are very sensitive to changes in skin temperature.


The term proprioceptive or kinesthetic sense is used to refer to the perception of joint position, joint movements, and the direction and velocity of joint movement. There are numerous mechanoreceptors in the muscles, the muscle fascia, and in the dense connective tissue of joint capsules and ligaments. There are two specialized encapsulated, low-threshold mechanoreceptors: the muscle spindle and the Golgi tendon organ. Their adequate stimulus is stretching of the tissue in which they lie. Muscle spindles, joint and skin receptors all contribute to kinesthesia. Muscle spindles appear to provide their most important contribution to kinesthesia with regard to large joints, such as the hip and knee joints, whereas joint receptors and skin receptors may provide more significant contributions with regard to finger and toe joints.

Muscle Spindles

Mammalian muscle spindle showing typical position in a muscle (left), neuronal connections in spinal cord (middle) and expanded schematic (right). The spindle is a stretch receptor with its own motor supply consisting of several intrafusal muscle fibres. The sensory endings of a primary (group Ia) afferent and a secondary (group II) afferent coil around the non-contractile central portions of the intrafusal fibres. Gamma motoneurons activate the intrafusal muscle fibres, changing the resting firing rate and stretch-sensitivity of the afferents.

Scattered throughout virtually every striated muscle in the body are long, thin, stretch receptors called muscle spindles. They are quite simple in principle, consisting of a few small muscle fibers with a capsule surrounding the middle third of the fibers. These fibers are called intrafusal fibers, in contrast to the ordinary extrafusal fibers. The ends of the intrafusal fibers are attached to extrafusal fibers, so whenever the muscle is stretched, the intrafusal fibers are also stretched. The central region of each intrafusal fiber has few myofilaments and is non-contractile, but it does have one or more sensory endings applied to it. When the muscle is stretched, the central part of the intrafusal fiber is stretched and each sensory ending fires impulses.

Numerous specializations occur in this simple basic organization, so that in fact the muscle spindle is one of the most complex receptor organs in the body. Only three of these specializations are described here; their overall effect is to make the muscle spindle adjustable and give it a dual function, part of it being particularly sensitive to the length of the muscle in a static sense and part of it being particularly sensitive to the rate at which this length changes.

  1. Intrafusal muscle fibers are of two types. All are multinucleated, and the central, non-contractile region contains the nuclei. In one type of intrafusal fiber, the nuclei are lined up single file; these are called nuclear chain fiber. In the other type, the nuclear region is broader, and the nuclei are arranged several abreast; these are called nuclear bag fibers. There are typically two or three nuclear bag fibers per spindle and about twice that many chain fibers.
  2. There are also two types of sensory endings in the muscle spindle. The first type, called the primary ending, is formed by a single Ia (A-alpha) fiber, supplying every intrafusal fiber in a given spindle. Each branch wraps around the central region of the intrafusal fiber, frequently in a spiral fashion, so these are sometimes called annulospiral endings. The second type of ending is formed by a few smaller nerve fibers (II or A-Beta) on both sides of the primary endings. These are the secondary endings, which are sometimes referred to as flower-spray endings because of their appearance. Primary endings are selectively sensitive to the onset of muscle stretch but discharge at a slower rate while the stretch is maintained. Secondary endings are less sensitive to the onset of stretch, but their discharge rate does not decline very much while the stretch is maintained. In other words, both primary and secondary endings signal the static length of the muscle (static sensitivity) whereas only the primary ending signals the length changes (movement) and their velocity (dynamic sensitivity). The change of firing frequency of group Ia and group II fibers can then be related to static muscle length (static phase) and to stretch and shortening of the muscle (dynamic phases).
  3. Muscle spindles also receive a motor innervation. The large motor neurons that supply extrafusal muscle fibers are called alpha motor neurons, while the smaller ones supplying the contractile portions of intrafusal fibers are called gamma neurons. Gamma motor neurons can regulate the sensitivity of the muscle spindle so that this sensitivity can be maintained at any given muscle length.

Golgi tendon organ

Mammalian tendon organ showing typical position in a muscle (left), neuronal connections in spinal cord (middle) and expanded schematic (right). The tendon organ is a stretch receptor that signals the force developed by the muscle. The sensory endings of the Ib afferent are entwined amongst the musculotendinous strands of 10 to 20 motor units.

The Golgi tendon organ is located at the musculotendinous junction. There is no efferent innervation of the tendon organ, therefore its sensitivity cannot be controlled from the CNS. The tendon organ, in contrast to the muscle spindle, is coupled in series with the extrafusal muscle fibers. Both passive stretch and active contraction of the muscle increase the tension of the tendon and thus activate the tendon organ receptor, but active contraction produces the greatest increase. The tendon organ, consequently, can inform the CNS about the “muscle tension”. In contrast, the activity of the muscle spindle depends on the “muscle length” and not on the tension. The muscle fibers attached to one tendon organ appear to belong to several motor units. Thus the CNS is informed not only of the overall tension produced by the muscle but also of how the workload is distributed among the different motor units.

Joint receptors

The joint receptors are low-threshold mechanoreceptors and have been divided into four groups. They signal different characteristics of joint function (position, movements, direction and speed of movements). The free receptors or type 4 joint receptors are nociceptors.

Proprioceptive Signal Processing

Feedback loops for proprioceptive signals for the perception and control of limb movements. Arrows indicate excitatory connections; filled circles inhibitory connections.

Gustatory System

Technological Aspects
In Animals


The Gustatory System or sense of taste allows us to perceive different flavors from substances like food, drinks, medicine etc. Molecules that we taste or tastants are sensed by cells in our mouth, which send information to the brain. These specialized cells are called taste cells and can sense 5 main tastes: bitter, salty, sweet, sour and umami (savory). All the variety of flavors that we know are combinations of molecules which fall into these categories.

Measuring the degree by which a substance presents one of the basic tastes is done subjectively by comparing its taste to a taste of a reference substance according to relative indexes of different substances. For the bitter taste quinine (found in tonic water) is used to rate how bitter a substance is. Saltiness can be rated by comparing to a dilute salt solution. The sourness is compared to diluted hydrochloric acid (H+Cl-). Sweetness is measured relative to sucrose. The values of these reference substances are defined as 1.


(Coffee, mate, beer, tonic water etc.)

It is considered by many as unpleasant. In general bitterness is very interesting because a large number of bitter compounds are known to be toxic so the bitter taste is considered to provide an important protective function. Plant leafs often contain toxic compounds. Herbivores have a tendency to prefer immature leaves, which have higher protein content and lower poison levels than mature leaves. It seems that even if the bitter taste is not very pleasant at first, there is a tendency to overcome this aversion because coffee and drinks containing rich amount of caffeine and are widely consumed. Sometimes bitter agents are added to substances to prevent accidental ingestion.


(Table salt)

The salty taste is primarily produced by the presence of cations such as Li+ (lithium ions), K+ (potassium ions) and more commonly Na+ (sodium). The saltiness of substances is compared to sodium chloride, which is typically used as table salt (Na+Cl-). Potassium chloride K+Cl- is the principal ingredient used in salt substitutes and has an index of 0.6 (see bellow part 5) compared to 1 of Na+Cl-.


(Lemon, orange, wine, spoiled milk and candies containing citric acid)

Sour taste can be mildly pleasant and it is linked to salty flavor but more exacerbated. Typically sour are fruits, which are over-riped, spoiled milk, rotten meat, and other spoiled foods, which can be dangerous. It also tastes acids (H+ ions) which taken in large quantities can cause irreversible tissue damage. Sourness is rated compared to hydrochloric acid (H+Cl-), which has a sourness index of 1.


(Sucrose (table sugar), cake, ice cream etc.)

Sweetness is regarded as a pleasant sensation and is produced by the presence of mostly sugars. Sweet substances are rated relative to sucrose, which has an index of 1. Nowadays there are many artificial sweeteners in the market, these include saccharin, aspartame and sucralose but it is still not clear how these substitutes activate the receptors.

Umami (savory or tasty)

(Cheese, soy sauce etc.)

Recently, umami has been added as the fifth taste. This taste signals the presence of L-glutamate and it is a very important for the Eastern cuisines. Monosodium glutamate is commonly used to bring umami to food, but various plants and meats are also sources of glutamates. Umami is further enhanced when glutamate is present with the nucleotides inosinate and guanylate.

Sensory Organs

Tongue and Taste Buds

Human tongue

Taste cells are epithelial and are clustered in taste buds located in the tongue, soft palate, epiglottis, pharynx and the esophagus the tongue being the primary organ of the Gustatory System.

Schematic drawing of a taste bud

Taste buds are located in papillae along the surface of the tongue. There are three types of papillae in human: fungiform located in the anterior part containing approximately five taste buds, circumvallate papillae which are bigger and more posterior than the previous ones and the foliate papillae that are in the posterior edge of the tongue. Circumvallate and foliate papillae contain hundreds of taste buds. In each taste bud there are different types of cells: basal, dark, intermediate and light cells. Basal cells are believed to be the stem cells that give rise to the other types. It is thought that the rest of the cells correspond to different stages of differentiation where the light cells are the most mature type of cells. An alternative idea is that dark, intermediate and light cells correspond to different cellular lineages. Taste cells are short lived and are continuously regenerated. They contain a taste pore at the surface of the epithelium where they extend microvilli, the site where sensory transduction takes place. Taste cells are innervated by fibers of primary gustatory neurons. They contact sensory fibers and these connections resemble chemical synapses, they are excitable with voltage-gated channels: K+, Na+ and Ca+ channels capable of generating action potentials. Although the reaction from different tastants varies, in general tastants interact with receptors or ion channels in the membrane of a taste cells. These interactions depolarize the cell directly or via second messengers and in this way the receptor potential generates action potentials within the taste cells, which lead to Ca2+ influx through Ca2+ voltage-gated channels followed by the release of neurotransmitters at the synapses with the sensory fibers.

Tongue map

The idea that the tongue is most sensitive to certain tastes in different regions was a long time misconception, which has now been proved to be wrong. All sensations come from all regions of the tongue.


An average person has about 5'000 taste buds. A "supertaster" is a person whose sense of taste is significantly more sensitive than average. The increase in the response is thought to be because they have more than 20’000 taste buds, or due to an increased number of fungiform papillae.

Transduction of Taste

As mentioned before we distinguish between 5 types of basic tastes: bitter, salty, sour, sweet and umami. There is one type of taste receptor for each flavor known and each type of taste stimulus is transduced by a different mechanisms. In general bitter, sweet and umami are detected by G protein-coupled receptors and salty and sour are detected via ion channels.

Bitter receptors.
Salty receptors.
Sour receptors.
Sweet receptors.


Bitter compounds act through G protein coupled receptors (GPCR’s) also known as a seven-transmembrane domains, which are located in the walls of the taste cells. Taste receptors of type 2 (T2Rs) which is a group of GPCR’s is thought respond to bitter stimuli. When the bitter-tasting ligand binds to the GPCR it releases the G protein gustducin, its 3 subunits break apart and activate phosphodiesterase, which in turn converts a precursor within the cell into a secondary messenger, closing the K+ channels. This secondary messenger stimulates the release of Ca2+, contributing to depolarization followed by neurotransmitter release. It is possible that bitter substances that are permeable to the membrane are sensed by mechanisms not involving G proteins.


The amiloride-sensitive epithelial sodium channel (ENaC), a type of ion channel in the taste cell wall, allows Na+ ions to enter the cell down an electrochemical gradient, altering the membrane potential of the taste cells by depolarizing the cell. This leads to an opening of voltage-gated Ca2+ channels, followed by neurotransmitter release.


The sour taste signals the presence of acidic compounds (H+ ions) and there are three receptors: 1) The ENaC, (the same protein involved in salty taste). 2) There are also H+ gated channels; one is the K+ channel, which allows K+ outflux of the cell. H+ ions block these so the K+ stays inside the cell. 3) A third channel undergoes a configuration change when a H+ attaches to it leading to an opening of the channel and allowing an influx of Na+ down the concentration gradient into the cell, leading to the opening of a voltage gated Ca2+ channels. These three receptors work in parallel and lead to depolarization of the cell followed by neurotransmitter release.


Sweet transduction is mediated by the binding of a sweet tastant to GPCR’s located in the apical membrane of the taste cell. Saccharide activates the GPCR, which releases gustducin and this in turn activates cAMP (cyclic adenylate monophosphate). cAMP will activate the cAMP kinase that will phosphorylate the K+ channels and eventually inactivate them, leading to depolarization of the cell and followed by neurotransmitter release.

Umami (Savory)

Umami receptors involve also GPCR’s, the same way as bitter and sweet receptors. Glutamate binds a type of the metabotropic glutamate receptor mGlurR4 causing a G-protein complex to activate a secondary receptor, which ultimately leads to neurotransmitter release. In particular how the intermediate steps work, is currently unknown.

Taste transduction of the five main tastes. (Created using

Signal Processing

In humans, the sense of taste is transmitted to the brain via three cranial nerves. The VII facial nerve carries information from the anterior 2/3 part of the tongue and soft palate. The IX nerve or glossopharyngeal nerve carries taste sensations from the posterior 1/3 part of the tongue and the X nerve or vagus nerve carries information from the back of the oral cavity and the epiglottis.

The gustatory cortex is the brain structure responsible for the perception of taste. It consists of the anterior insula on the insular lobe and the frontal operculum on the inferior frontal gyrus of the frontal lobe. Neurons in the gustatory cortex respond to the five main tastes.

Taste cells synapse with primary sensory axons of the mentioned cranial nerves. The central axons of these neurons in the respective cranial nerve ganglia project to rostral and lateral regions of the nucleus of the solitary tract in the medulla. Axons from the rostral (gustatory) part of the solitary nucleus project to the ventral posterior complex of the thalamus, where they terminate in the medial half of the ventral posterior medial nucleus. This nucleus projects to several regions of the neocortex, which include the gustatory cortex.

Gustatory cortex neurons exhibit complex responses to changes in concentration of tastant. For one tastant, the same neuron might increase its firing and for an other tastant, it may only respond to an intermediate concentration.

Taste and Other Senses

In general the Gustatory Systems does not work alone. While eating, consistency and texture are sensed by the mechanoreceptors from the somatosensory system. The sense of taste is also correlated with the olfactory system because if we lack the sense of smell it makes it difficult to distinguish the flavor.

Spicy food

Black pepper.

(black peppers, chili peppers, etc.)

It is not a basic taste because this sensation does not arise from taste buds. Capsaicin is the active ingredient in spicy food and causes “hotness” or “spiciness” when eaten. It stimulates temperature fibers and also nociceptors (pain) in the tongue. In the nociceptors it stimulates the release of substance P, which causes vasodilatation and release of histamine causing hiperalgesia (increased sensitivity to pain).

In general basic tastes can be appetitive or aversive depending on the effect that the food has on us but also essential to the taste experience are the presentation of food, color, texture, smell, previous experiences, expectations, temperature and satiety.

Taste disorders

Ageusia (complete loss of taste)

Ageusia is a partial or complete loss in the sense of taste and sometimes it can be accompanied by the loss of smell.

Dysgeusia (abnormal taste)

Is an alteration in the perception associated with the sense of taste. Tastes of food and drinks vary radically and sometimes the taste is perceived as repulsive. The causes of dysgeusia can be associated with neurologic disorders.

Olfactory System

Technological Aspects
In Animals


Probably the oldest sensory system in nature, the olfactory system concerns the sense of smell. The olfactory system is physiologically strongly related to the gustatory system, so that the two are often examined together. Complex flavors require both taste and smell sensation to be recognized. Consequently, food may taste “different” if the sense of smell does not work properly (e.g. head cold).

Generally the two systems are classified as visceral sense because of their close association with gastrointestinal function. They are also of central importance while speaking of emotional and sexual functions.

Both taste and smell receptors are chemoreceptors that are stimulated by molecules soluted respectively in mucus or saliva. However these two senses are anatomically quite different. While smell receptors are distance receptors that do not have any connection to the thalamus, taste receptors pass up the brainstem to the thalamus and project to the postcentral gyrus along with those for touch and pressure sensibility for the mouth.

In this article we will first focus on the organs composing the olfactory system, then we will characterize them in order to understand their functionality and we will end explaining the transduction of the signal and the commercial application such as the eNose.

Sensory Organs

In vertebrates the main olfactory system detects odorants that are inhaled through the nose where they come to contact with the olfactory epithelium, which contains the olfactory receptors.

Olfactory sensitivity is directly proportional to the area in the nasal cavity near the septum reserved to the olfactory mucous membrane, which is the region where the olfactory receptor cells are located. The extent of this area is a specific between animals species. In dogs, for example, the sense of smell is highly developed and the area covered by this membrane is about 75 – 150 cm2; these animals are called macrosmatic animals. Differently in humans the olfactory mucous membrane cover an area about 3 – 5 cm2, thus they are known as microsmatic animals.

In humans there are about 10 million olfactory cells, each of which have 350 different receptor types composing the olfactory mucous membrane. The 350 different receptors are characteristic for only one odorant type. The bond with one odorant molecule starts a molecular chain reaction, which transforms the chemical perception into an electrical signal.

The electrical signal proceeds through the olfactory nerve’s axons to the olfactory bulbs. In this region there are between 1000 and 2000 glomerular cells which combine and interpret the potentials coming from different receptors. This way it is possible to unequivocally characterise e.g. the coffee aroma, which is composed by about 650 different odorants. Humans can distinguish between about 10.000 odors.

The signal then goes forth to the olfactory cortex where it will be recognized and compared with known odorants (i.e. olfactory memory) involving also an emotional response to the olfactory stimuli.

It is also interesting to note that the human genome has about 600 – 700 genes (~2% of the complete genome) specialized in characterizing the olfactory receptors, but only 350 are still used to build the olfactory system. This is a proof of the evolution change in the necessity of humans in using the olfaction.

Sensory Organ Components

1: Olfactory bulb 2: Mitral cells 3: Bone 4: Nasal Epithelium 5: Glomerulus 6: Olfactory receptor cells
Human skull showing the Cribriform Plate in green and Olfactory nerve in yellow.

Similar to other sensory modalities, olfactory information must be transmitted from peripheral olfactory structures, like the olfactory epithelium, to more central structures, meaning the olfactory bulb and cortex. The specific stimuli have to be integrated, detected and transmitted to the brain in order to reach sensory consciousness. However the olfactory system is different from other sensory systems in three fundamental ways [5]:

  1. Olfactory receptor neurons are continuously replaced by mitotic division of the basal cells of the olfactory epithelium. This is necessary due to the high vulnerability of the neurons, which are directly exposed to the environment.
  2. Due to phylogeny, olfactory sensory activity is transferred directly from the olfactory bulb to the olfactory cortex, without a thalamic relay.
  3. Neural integration and analysis of olfactory stimuli may not involve topographic organization beyond the olfactory bulb, meaning that spatial or frequency axis are not needed to project the signal.

Olfactory Mucous Membrane

The olfactory mucous membrane contains the olfactory receptor cells and in humans it covers an area about 3 – 5 cm^2 in the roof of the nasal cavity near the septum. Because the receptors are continuously regenerated it contains both the supporting cells and progenitors cells of the olfactory receptors. Interspersed between these cells are 10 – 20 millions receptor cells.

Olfactory receptors are neurons with short and thick dendrites. Their extended end is called an olfactory rod, from which cilia project to the surface of the mucus. These neurons have a length of 2 micrometers and have between 10 and 20 cilia of diameter about 0.1 micrometers.

The axons of the olfactory receptor neurons go through the cribriform plate of the ethmoid bone and enter the olfactory bulb. This passage is in absolute the most sensitive of the olfactory system; the damage of the cribriform plate (e.g. breaking the nasal septum) can imply the destruction of the axons compromising the sense of smell.

A further particularity of the mucous membrane is that with a period of a few weeks it is completely renewed.

Olfactory Bulbs

In humans, the olfactory bulb is located anteriorly with respect to the cerebral hemisphere and remain connected to it only by a long olfactory stalk. Furthermore in mammals it is separated into layers and consists of a concentric lamina structure with well-defined neuronal somata and synaptic neuropil.

After passing the cribriform plate the olfactory nerve fibers ramify in the most superficial layer (olfactory nerve layer). When these axons reach the olfactory bulb the layer gets thicker and they terminate in the primary dendrites of the mitral cells and tufted cells. Both these cells send other axons to the olfactory cortex and appear to have the same functionality but in fact tufted cells are smaller and consequently have also smaller axons.

The axons from several thousand receptor neurons converge on one or two glomeruli in a corresponding zone of the olfactory bulb; this suggests that the glomeruli are the unit structures for the olfactory discrimination.

In order to avoid threshold problems in addition to mitral and tufted cells, the olfactory bulb contains also two types of cells with inhibitory properties: periglomerular cells and granule cells. The first will connect two different glomeruli, the second, without using any axons, build a reciprocal synapse with the lateral dendrites of the mitral and tufted cells. By releasing GABA the granule cell on the one side of these synapse are able to inhibits the mitral (or tufted) cells, while on the other side of the synapses the mitral (or tufted) cells are able to excite the granule cells by releasing glutamate. Nowadays about 8.000 glomeruli and 40.000 mitral cells have been counted in young adults. Unfortunately this huge number of cells decrease progressively with the age compromising the structural integrity of the different layers.

Olfactory Cortex

The axons of the mitral and tufted cells pass through the granule layer, the intermediate olfactory stria and the lateral olfactory stria to the olfactory cortex. This tract forms in humans the bulk of the olfactory peduncle. The primary olfactory cortical areas can be easily described by a simple structure composed of three layers: a broad plexiform layer (first layer); a compact pyramidal cell somata layer (second layer) and a deeper layer composed by both pyramidal and nonpyramidal cells (third layer)[5]. Furthermore, in contrast to the olfactory bulb, only a little spatial encoding can be observed; “that is, small areas of the olfactory bulb virtually project the entire olfactory cortex, and small areas of the cortex receive fibers from virtually the entire olfactory bulb” [5].

In general the olfactory tract can be divided in five major regions of the cerebrum: The anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, the anterior cortical nucleus of the amygdala and the entorhinal cortex. Olfactory information is transmitted from primary olfactory cortex to several other parts of the forebrain, including orbital cortex, amygdala, hippocampus, central striatum, hypothalamus and mediodorsal thalamus.

Interesting is also to note that in humans, the piriform cortex can be activated by sniffing, whereas to activate the lateral and the anterior orbitofrontal gyri of the frontal lobe only the smell is required. This is possible because in general the orbitofrontal activation is greater on the right side than on the left side, this directly implies an asymmetry in the cortical representation of olfaction.

Signal Processing

Examples of olfactory thresholds[6].
Substance mg/L of Ari
Ethyl ether 5.83
Chloroform 3.30
Pyridine 0.03
Oil of peppermint 0.02
lodoform 0.02
Butyric acid 0.009
Propyl mercaptan 0.006
Artificial musk 0.00004
Methyl mercaptan 0.0000004

Only substances which come in contact with the olfactory epithelium can excite the olfactory receptors. The right table shows thresholds for some representative substances. These values give an impression of the huge sensitivity of the olfactory receptors.

It is remarkable that humans can recognize more than 10,000 different odors. Many odorant molecules differ only slightly in their chemical structure (e.g. stereoisomers) but can nevertheless be distinguished.

Signal Transduction

An interesting feature of the olfactory system is that a simple sense organ which apparently lacks a high degree of complexity can mediate discrimination of more than 10'000 different odors. On the one hand this is made possible by the huge number of different odorant receptor. The gene family of the olfactory receptor is in fact the largest family studied so far in mammals. On the other hand, the neural net of the olfactory system provides with its 1800 glomeruli a large two dimensional map in the olfactory bulb that is unique to each odorant. In addition, the extracellular field potential in each glomerulus oscillates, and the granule cells appear to regulate the frequency of the oscillation. The exact function of the oscillation is unknown, but it probably also helps to focus the olfactory signal reaching the cortex [5]

Smell measurement

Olfaction consists of a set of transformations from physical space of odorant molecules (olfactory physicochemical space), through a neural space of information processing (olfactory neural space), into a perceptual space of smell (olfactory perceptual space).[7] The rules of these transforms depend on obtaining valid metrics for each of those spaces.

Olfactory perceptual space

As the perceptual space represents the “input” of the smell measurement, its aim is to describe the odors in the most simple possible way. Odors are ordered so that their reciprocal distance in space confers them similarity. This means that the closer two odors are to each other in this space the more are they expected to be similar. This space is thus defined by so called perceptual axes characterized by some arbitrarily chosen “unit” odors.

Olfactory neural space

As suggested by its name the neural space is generated from neural responses. This gives rise to an extensive database of odorant-induced activity, which can be used to formulate an olfactory space where the concept of similarity serves as a guiding principle. Using this procedure different odorants are expected to be similar if they generate a similar neuronal response. This database can be navigated at the Glomerular Activity Response Archive [8].

Olfactory sensory neurons (OSNs) express odorant receptors. The axons of OSNs expressing the same odorant receptors converge onto the same glomerulus at the olfactory bulb, allowing for the organization of olfactory information.

Olfactory physicochemical space

The need to identify the molecular encryption of the biological interaction, makes the physicochemical space the most complex one of the olfactory space described so far. R. Haddad suggest that one possibility is to span this space would to represent each odorant by a very large number of molecular descriptors by use either a variance metric or a distance metric.[7] In his first description single odorants may have many physicochemical features and one expects these features to present themselves at various probabilities within the world of molecules that have a smell. In such metric the orthogonal basis generated from the description of the odorant leads to represent each odorant by a single value. While in the second, the metric represents each odorant with a vector of 1664 values, on the basis of Euclidean distances between odorants in the 1664 physicochemical space. Whereas the first metric enabled the prediction of perceptual attributes, the second enabled the prediction of odorant-induced neuronal response patterns.

Pheromones and Vomeronasal System


Pheromones are a distinct class of species and gender specific chemical cues that provide information about sexual and social status. These airborne chemical signals are released by individuals into the environment. Pheromones can influence the physiology and behavior of other members of the same species, and play a crucial role in various biological processes, including communication, reproduction, territorial marking, and social organization. Indeed, there are alarm pheromones, food trail pheromones and sex pheromones. [9] [10]

For example, in adult male silkmoth (Bombyx mori), the antennae act as a sensing organ both for odors and sex pheromones. When the latter bind, it evokes courtship behaviour. [11].

It is important to note that while pheromones can have significant effects on individuals within a species, they generally do not cross species boundaries. Each species has its own unique set of pheromones that are specific to their reproductive and behavioral needs. [10]

Vomeronasal System

The system allowing the perception of pheromones in mammals is called the vomeronasal system or Jacobson-organ. The sensory neurons found in the vomeronasal organ (VNO) contain cell bodies that house receptors capable of detecting pheromones from the surrounding environment. Although very close and similar, this system is independent from the olfactory system. In fact, it projects to a separate bulb, called the accessory olfactory bulb, and from there to the hypothalamus via the vomeronasal amygdala. It also has a distinct class of genes, the vomeronasal receptors (V1R and V2R), along with Trp receptors. Those genes sustain the production and maintenance of the vomeronasal system receptors and proteins, allowing its functioning as a whole system. [12]

Schematic representation of a sagittal cut through the head of a mouse. Mice olfactory system (left). MOB: main olfactory bulb. MOE: main olfactory epithelium. AON: anterior olfactory nucleus. PC: pyriform cortex. OT: olfactory tract. LA: Lateral part of amygdala. EC: entorhinal cortex. Mice vomeronasal system (right). VNO: vomeronasal organ. AOB: accessory olfactory bulb. VA: vomeronasal amygdala. H: hypothalamus. Adapted from Dulac et al. 2003.

For example, it has been shown that Trp2 knockout mice males do not present the territorial behaviour (urine marking), a normal behavior provoked by pheromones contained in the urine of other males: by inactivating the vomeronasal organ, the mice changed behavior. In addition, this shows that Trp2 channel is required in the vomeronasal organ to detect male specific pheromones and elicit an aggressive, territorial behavior in mice. [13]

In humans it is yet still unclear if the behaviour is affected by pheromones. There is an extensive ongoing debate between experts. There are VR genes in human genome, but they seem to be non-functional. We even have an embryonic structure that resembles the vomeronasal organ and a foetal accessory olfactory bulb (AOB) but it regresses with growth. In addition, primates have responses that can be attributed to pheromone or pheromone-like-hormones. As of today, these issues are still unresolved. [10]


  1. Kathleen Cullen and Soroush Sadeghi (2008). "Vestibular System". Scholarpedia 3(1):3013. {{cite web}}: Text "doi:10.4249/scholarpedia.3013" ignored (help)
  2. JM Goldberg, VJ Wilson, KE Cullen and DE Angelaki (2012). "The Vestibular System: a Sixth Sense"". Oxford University Press, USA.{{cite web}}: CS1 maint: multiple names: authors list (link)
  3. Curthoys IS and Oman CM (1987). "Dimensions of the horizontal semicircular duct, ampulla and utricle in the human". Acta Otolaryngol. 103: 254–261.
  4. Della Santina CC, Potyagaylo V, Migliaccio A, Minor LB, Carey JB (2005). "Orientation of Human Semicircular Canals Measured by Three-Dimensional Multi-planar CT Reconstruction". J Assoc Res Otolaryngol. 6(3): 191–206.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  5. a b c d Paxinos, G., & Mai, J. K. (2004). The human nervous system. Academic Press.
  6. Ganong, W. F., & Barrett, K. E. (2005). Review of medical physiology (Vol. 22). New York: McGraw-Hill Medical.
  7. a b Haddad, R.; Lapid, H.; Harel, D.; Sobel, N. (2008). "Measuring smells". Current Opinion in Neurobiology. 18 (4): 438–444. doi:10.1016/j.conb.2008.09.007.
  8. Glomerular Activity Response Archive
  9. KARLSON, P.; LÜSCHER, M. (1959). "'Pheromones': A New Term for a Class of Biologically Active Substances". Nature. 183 (4653): 55–56. doi:10.1038/183055a0.
  10. a b c Savic, Ivanka (2014). "Pheromone Processing in Relation to Sex and Sexual Orientation.". In CRC Press/Taylor & Francis (ed.). Neurobiology of Chemical Communication. Boca Raton (FL).
  11. Vogt, R. G.; Riddiford, L. M. (1981). "Pheromone binding and inactivation by moth antennae". Nature. 293: 161–163. doi:10.1038/293161a0.
  12. Dulac, C.; Torello, A. T. (2003). "Molecular detection of pheromone signals in mammals: from genes to behaviour". Nature Reviews Neuroscience. 4 (7): 561–562. doi:10.1038/nrn1140.
  13. Leypold, B. G.; Yu, C. R.; Leinders-Zufall, T.; Kim, M. M.; Zufall, F.; Axel, R. (2002). "Altered sexual and social behaviors in trp2 mutant mice". Proceedings of the National Academy of Sciences of the United States of America. 99 (9): 6376–6381. doi:10.1073/pnas.082127599.


This list contains the names of all the authors that have contributed to this text. If you have added, modified or contributed in any way, please add your name to this list.

Name Institution
Thomas Haslwanter Upper Austria University of Applied Sciences / ETH Zurich
Aleksander George Slater Imperial College London / ETH Zurich
Piotr Jozef Sliwa Imperial College London / ETH Zurich
Qian Cheng ETH Zurich
Salomon Wettstein ETH Zurich
Philipp Simmler ETH Zurich
Renate Gander ETH Zurich
Gerick Lee University of Zurich & ETH Zurich
Gabriela Michel ETH Zurich
Peter O'Connor ETH Zurich
Nikhil Biyani ETH Zurich
Mathias Buerki ETH Zurich
Jianwen Sun ETH Zurich
Maurice Göldi University of Zurich
Sofia Jativa ETH Zurich
Salomon Diether ETH Zurich
Arturo Moncada-Torres ETH Zurich
Datta Singh Goolaub ETH Zurich
Stephanie Marquardt University of Zurich & ETH Zurich
Alpha Renner University of Zurich & ETH Zurich
Karlis Kanders University of Zurich & ETH Zurich
Bettina Guebeli ETH Zurich
Yuhuang Hu University of Zurich & ETH Zurich
Sonali Andani ETH Zurich
Isabelle Tan ETH Zurich
Edouard Gence ETH Zurich
Katla Thorvaldsdottir ETH Zurich
Gema Vera Gonzalez ETH Zurich
Monika Evelyn Girr ETH Zurich
Angelina Gurkina ETH Zurich
Laia Serratosa University of Zurich & ETH Zurich
Birte Toussaint University of Zurich & ETH Zurich
Elle Fleur Macartney University of Zurich & ETH Zurich
Cedar Urwyler ETH Zürich
Morio Hamada University of Zurich & ETH Zurich
Jihyun Lee University of Zurich & ETH Zurich
Aeneas Bernardi University of Zurich & ETH Zurich
Sarah Meier ETH Zurich
Maurizio Scandella ETH Zurich
Tamara Gini ETH Zurich
Thomas Denoréaz ETH Zurich
Tatiana Gerth ETH Zurich
Xinyue Yao University of Zurich & ETH Zurich
Viktoria Obermann University of Zurich & ETH Zurich
Elena Bernasconi ETH Zurich
Vanessa Moody University of Pennsylvania
Inês Pereira University of Zurich & ETH Zurich
Pascal Suter ETH Zurich
Lukas Bösiger ETH Zurich
Cyril Schroeder ETH Zurich
Francisco Correia Marques ETH Zurich
Maximillian Fries ETH Zurich
Hrishikesh Ghodki ETH Zurich
Joel Neuner-Jehle ETH Zurich
Marie-Louise Achart ETH Zurich
Roman Krummenacher ETH Zurich
Michelle Mattille ETH Zurich
Paula Wulkop ETH Zurich
En-Yu Jenp ETH Zurich
Philippe Blatter ETH Zurich
Sijamini Baskaralingam ETH Zurich
Luna Bloin-Wibe ETH Zurich
Samuel Ruipérez-Campillo ETH Zurich, U.C. Berkeley & Stanford.
Carla Hetreau ETH Zurich
Dominic Dall'Osto University of Zurich & ETH Zurich
Irene Ruipérez-Campillo UCV Medical School
Igor Martinelli ETH Zurich
Loredana Piazza ETH Zürich
Alain Hügli ETH Zürich
Tobias von Arx ETH Zürich
Anna Schaub ETH Zürich
Shuo Li ETH Zürich
Javier Miragall ETH Zürich
Quillan Favey University of Zurich


Visual System

Auditory System

  • Intraoperative Neurophysiological Monitoring, 2nd Edition, Aage R. Møller, Humana Press 2006, Totowa, New Jersey, pages 55-70
  • The Science and Applications of Acoustics, 2nd Edition, Daniel R. Raichel, Springer Science&Business Media 2006, New York, pages 213-220
  • Physiology of the Auditory System, P. J. Abbas, 1993, in: Cummings Otolaryngology: Head and Neck Surgery, 2nd edition, Mosby Year Book, St. Louis
  • Computer Simulations of Sensory Systems, Lecture Script Ver 1.3 March 2010, T. Haslwanter, Upper Austria University of Applied Sciences, Linz, Austria,

Gustatory System

  • Carleton, Alan; Accolla, Riccardo; Simon, Sidney A. (July 2010). "Coding in the mammalian gustatory system". Trends in Neurosciences. 33 (7): 326–334. doi:10.1016/j.tins.2010.04.002.
  • Dalton, P.; Doolittle, N.; Nagata, H.; Breslin, P.A.S. (1 May 2000). Nature Neuroscience. 3 (5): 431–432. doi:10.1038/74797. {{cite journal}}: Missing or empty |title= (help)
  • Gottfried, J (July 2003). "The Nose Smells What the Eye SeesCrossmodal Visual Facilitation of Human Olfactory Perception". Neuron. 39 (2): 375–386. doi:10.1016/S0896-6273(03)00392-1.
  • Mueller, Ken L.; Hoon, Mark A.; Erlenbach, Isolde; Chandrashekar, Jayaram; Zuker, Charles S.; Ryba, Nicholas J. P. (10 March 2005). "The receptors and coding logic for bitter taste". Nature. 434 (7030): 225–229. doi:10.1038/nature03352.
  • Nitschke, Jack B; Dixon, Gregory E; Sarinopoulos, Issidoros; Short, Sarah J; Cohen, Jonathan D; Smith, Edward E; Kosslyn, Stephen M; Rose, Robert M; Davidson, Richard J (5 February 2006). "Altering expectancy dampens neural response to aversive taste in primary taste cortex". Nature Neuroscience. 9 (3): 435–442. doi:10.1038/nn1645.
  • Okubo, Tadashi; Clark, Cheryl; Hogan, Brigid L.M. (February 2009). "Cell Lineage Mapping of Taste Bud Cells and Keratinocytes in the Mouse Tongue and Soft Palate". Stem Cells. 27 (2): 442–450. doi:10.1634/stemcells.2008-0611.
  • Smith, David V; St John, Steven J (August 1999). "Neural coding of gustatory information". Current Opinion in Neurobiology. 9 (4): 427–435. doi:10.1016/S0959-4388(99)80064-6.
  • Yarmolinsky, David A.; Zuker, Charles S.; Ryba, Nicholas J.P. (October 2009). "Common Sense about Taste: From Mammals to Insects". Cell. 139 (2): 234–244. doi:10.1016/j.cell.2009.10.001.
  • Zhao, Grace Q.; Zhang, Yifeng; Hoon, Mark A.; Chandrashekar, Jayaram; Erlenbach, Isolde; Ryba, Nicholas J.P.; Zuker, Charles S. (October 2003). "The Receptors for Mammalian Sweet and Umami Taste". Cell. 115 (3): 255–266. doi:10.1016/S0092-8674(03)00844-4.
  • Kandel, E., Schwartz, J., and Jessell, T. (2000) Principles of Neural Science. 4th edition. McGraw Hill, New York.



If light passes through a prism, a colour spectrum will be formed at the other end of the prism ranging from red to violet. The wavelength of the red light is from 650nm to 700nm, and the violet light is at around 400nm to 420nm. This is the EM range detectable for the human eye.

Colour spectrum produced by a prism

Colour Models

The colour triangle is often used to illustrate the colour-mixing effect. The triangle entangles the visible spectrum, and a white dot is located in the middle of the triangle. Because of additive colour mixing property of red (700nm), green(546nm) and blue(435nm), every colour can be produced by mixing those three colours.

The RGB color-triangle

History of Sensory Systems

This Wikibook was started by engineers studying at ETH Zurich as part of the course Computational Simulations of Sensory Systems. The course combines physiology with an emphasis on the sensory systems, programming and signal processing. There is a plethora of information regarding these topics on the internet and in the literature, but there's a distinct lack of concise texts and books on the fusion of these 3 topics. The world needs a structured and thorough overview of biology and biological systems from an engineering point of view, which is what this book is trying to correct. We will start off with the Visual System, focusing on the biological and physiological aspects, mainly because this will be used in part to grade our performance in the course. The other part being the programming aspects have already been evaluated and graded. It is the authors' wishes that eventually information on physiology/biology, signal processing AND programming shall be added to each of the sensory systems. Also we hope that more sections will be added to extend the book in ways previously not thought of.

The original title of the Wikibook, Biological Machines, stressed the technical aspects of sensory system. However, as the wikibook evolved it became a comprehensive overview of human sensory systems, with additional emphasis on technical aspects of these systems. This focus is better represented with Sensory Systems, the new wikibook title since December 2011.

In 2015, the content became too big for the original structure. "Neurosensory Implants" and "Computer Models" became separate chapters, and the "Non-Primates" section was split, into "Arthropods" and "Other Animals".