Sensory Systems/Print version

From Wikibooks, open books for an open world
< Sensory Systems
Jump to: navigation, search

Table of contents


Simulation of Neural Systems

Sensory Systems in Humans

Visual System
Auditory System
Vestibular System
Somatosensory System
Olfactory System
Gustatory System

Sensory Systems in Non-Primates

Sensory Systems in Octopus, Fish, and Flies



The Wikibook of

Sensory Systems

Biological Organisms, an Engineer's Point of View.

Biological machines cover.jpg

From Wikibooks: The Free Library


In order to survive - at least on the species level - we continually need to make decisions:

  • "Should I cross the road?"
  • "Should I run away from the creature in front of me?"
  • "Should I eat the thing in front of me?"
  • "Or should I try to mate it?"

To help us to make the right decision, and make that decision quickly, we have developed an elaborate system: a sensory system to notice what's going on around us; and a nervous system to handle all that information. And this system is big. VERY big! Our nervous system contains about 10^{11} nerve cells (or neurons), and about 10-50 times as many supporting cells. These supporting cells, called gliacells, include oligodendrocytes, Schwann cells, and astrocytes. But do we really need all these cells?

Keep it simple: Unicellular Creatures

The answer is: "No!", we do not REALLY need that many cells in order to survive. Creatures existing of a single cell can be large, can respond to multiple stimuli, and can also be remarkably smart!

Xenophyophores are the largest known unicellular organisms, and can get up to 20 cm in diameter!
Paramecium, or "slipper animalcules", respond to light and touch.

We often think of cells as really small things. But Xenophyophores (see image) are unicellular organisms that are found throughout the world's oceans and can get as large as 20 centimetres in diameter.

And even with this single cell, those organisms can respond to a number of stimuli. For example look at a creature from the group Paramecium: the paramecium is a group of unicellular ciliate protozoa formerly known as slipper animalcules, from their slipper shape. (The corresponding word in German is Pantoffeltierchen.) Despite the fact that these creatures consist of only one cell, they are able to respond to different environmental stimuli, e.g. to light or to touch.

Physarum Polycephalum

And such unicellular organisms can be amazingly smart: the plasmodium of the slime mould Physarum polycephalum is a large amoebalike cell consisting of a dendritic network of tube-like structures. This single cell creature manages to connect sources finding the shortest connections (Nakagaki et al. 2000), and can even build efficient, robust and optimized network structures that resemble the Tokyo underground system (Tero et al. 2010). In addition, it has somehow developed the ability to read its tracks and tell if its been in a place before or not: this way it can save energy and not forage through locations where effort has already been put (Reid et al. 2012).

On the one hand, the approach used by the paramecium cannot be too bad, as they have been around for a long time. On the other hand, a single cell mechanism cannot be as flexible and as accurate in its responses as a more refined version of creatures, which use a dedicated, specialized system just for the registration of the environment: a Sensory System.

Not so simple: Three-hundred-and-two Neurons

While humans have hundreds of millions of sensory nerve cells, and about 10^{11} nerve cells, other creatures get away with significantly less. A famous one is Caenorhabditis elegans, a nematode with a total of 302 neurons.

Crawling C. elegans, a hermaphrodite worm with exactly 302 neurons.

C. elegans is one of the simplest organisms with a nervous system, and it was the first multicellular organism to have its genome completely sequenced. (The sequence was published in 1998.) And not only do we know its complete genome, we also know the connectivity between all 302 of its neurons. In fact, the developmental fate of every single somatic cell (959 in the adult hermaphrodite; 1031 in the adult male) has been mapped out. We know, for example, that only 2 of the 302 neurons are responsible for chemotaxis (“movement guided by chemical cues”, i.e. essentially smelling). Nevertheless, there is still a lot of research conducted – also on its smelling - in order to understand how its nervous system works!

General principles of Sensory Systems

Based on the example of the visual system, the general principle underlying our neuro-sensory system can be described as below:


All sensory systems are based on

  • a Signal, i.e. a physical stimulus, provides information about our surrounding.
  • the Collection of this signal, e.g. by using an ear or the lens of an eye.
  • the Transduction of this stimulus into a nerve signal.
  • the Processing of this information by our nervous system.
  • And the generation of a resulting Action.

While the underlying physiology restricts the maximum frequency of our nerve-cells to about 1 kHz, more than one-million times slower than modern computers, our nervous system still manages to perform stunningly difficult tasks with apparent ease. The trick: there are lots of nerve cells (about 10^{11}), and they are massively connected (one nerve cell can have up to 150'000 connections with other nerve cells).


The role of our "senses" is to transduce relevant information from the world surrounding us into a type of signal that is understood by the next cells receiving that signal: the "Nervous System". (The sensory system is often regarded as part of the nervous system. Here I will try to keep these two apart, with the expression Sensory System referring to the stimulus transduction, and the Nervous System referring to the subsequent signal processing.) Note here that only relevant information is to be transduced by the sensory system! The task of our senses is NOT to show us everything that is happening around us. Instead, their task is to filter out the important bits of the signals around us: electromagnetic signals, chemical signals, and mechanical ones. Our Sensory Systems transduce those environmental variables that are (probably) important to us. And the Nervous System propagates them in such a way that the responses that we take help us to survive, and to pass on our genes.

Types of sensory transducers

  1. Mechanical receptors
    • Balance system (vestibular system)
    • Hearing (auditory system)
    • Pressure:
      • Fast adaptation (Meissner’s corpuscle, Pacinian corpuscle) ? movement
      • Slow adaptation (Merkel disks, Ruffini endings) ? shape Comment: these signals are transferred fast
    • Muscle spindles
    • Golgi organs: in the tendons
    • Joint-receptors
  2. Chemical receptors
    • Smell (olfactory system)
    • Taste
  3. Light-receptors (visual system): here we have light-dark receptors (rods), and three different color receptors (cones)
  4. Thermo-receptors
    • Heat-sensors (maximum sensitivity at ~ 45°C, signal temperatures < 50°C)
    • Cold-sensors (maximum sensitivity at ~ 25°C, signal temperatures > 5°C)
    • Comment: The information processing of these signals is similar to those of visual color signals, and is based on differential activity of the two sensors; these signals are slow
  5. Electro-receptors: for example in the bill of the platypus
  6. Magneto-receptors
  7. Pain receptors (nocioceptors): pain receptors are also responsible for itching; these signals are passed on slowly.


Now what distinguishes neurons from other cells in the human body, like liver cells or fat cells? Neurons are unique, in that they:

  • can switch quickly between two states (which can also be done by muscle cells).
  • That they can propagate this change into a specified direction and over longer distances (which cannot also be done by muscle cells).
  • And that this state-change can be signalled effectively to other connected neurons.

While there are more than 50 distinctly different types of neurons, they all share the same structure:

a) Dendrites, b) Soma, c) Nucleus, d) Axon hillock, e) Sheathed Axon, f) Myelin Cell, g) Node of Ranvier, h) Synapse
  • An input stage, often called dendrites, as the input-area often spreads out like the branches of a tree. Input can come from sensory cells or from other neurons; it can come from a single cell (e.g. a bipolar cell in the retina receives input from a single cone), or from up to 150’000 other neurons (e.g. Purkinje cells in the Cerebellum); and it can be positive (excitatory) or negative (inhibitory).
  • An integrative stage: the cell body does the household chores (generating the energy, cleaning up, generating the required chemical substances, etc), combines the incoming signals, and determines when to pass a signal on down the line.
  • A conductile stage, the axon: once the cell body has decided to send out a signal, an action potential propagates along the axon, away from the cell body. An action potential is a quick change in the state of a neuron, which lasts for about 1 msec. Note that this defines a clear direction in the signal propagation, from the cell body, to the:
  • output Stage: The output is provided by synapses, i.e. the points where a neuron contacts the next neuron down the line, most often by the emission of neurotransmitters (i.e. chemicals that affect other neurons) which then provide an input to the next neuron.

Principles of Information Processing in the Nervous System

Parallel processing

An important principle in the processing of neural signals is parallelism. Signals from different locations have different meaning. This feature, sometimes also referred to as line labeling, is used by the

  • Auditory system - to signal frequency
  • Olfactory system - to signal sweet or sour
  • Visual system - to signal the location of a visual signal
  • Vestibular system - to signal different orientations and movements

Population Coding

Sensory information is rarely based on the signal nerve. It is typically coded by different patterns of activity in a population of neurons. This principle can be found in all our sensory systems.


The structure of the connections between nerve cells is not static. Instead it can be modified, to incorporate experiences that we have made. Thereby nature walks a thin line:

Eskimo Curlew

- If we learn too slowly, we might not make it. One example is the "Eskimo curlew", an American bird which may be extinct by now. In the last century (and the one before), this bird was shot in large numbers. The mistake of the bird was: when some of them were shot, the others turned around, maybe to see what's up. So they were shot in turn - until the birds were essentially gone. The lesson: if you learn too slowly (i.e. to run away when all your mates are killed), your species might not make it.

Female Monarch butterfly

- On the other hand, we must not learn too fast, either. For example, the monarch butterfly migrates. But it takes them so long to get from "start" to "finish", that the migration cannot be done by one butterfly alone. In other words, no single butterfly makes the whole journey. Nevertheless, the genetic disposition still tells the butterflies where to go, and when they are there. If they would learn any faster - they could never store the necessary information in their genes. In contrast to other cells in the human body, nerve cells are not re-generated in the human body.

Simulation of Neural Systems

Simulating Action Potentials

Action Potential

The "action potential" is the stereotypical voltage change that is used to propagate signals in the nervous system.

Action potential – Time dependence

With the mechanisms described below, an incoming stimulus (of any sort) can lead to a change in the voltage potential of a nerve cell. Up to a certain threshold, that's all there is to it ("Failed initiations" in Fig. 4). But when the Threshold of voltage-gated ion channels is reached, it comes to a feed-back reaction that almost immediately completely opens the Na+-ion channels ("Depolarization" below): This reaches a point where the permeability for Na+ (which is in the resting state is about 1% of the permeability of K+) is 20\*larger than that of K+. Together, the voltage rises from about -60mV to about +50mV. At that point internal reactions start to close (and block) the Na+ channels, and open the K+ channels to restore the equilibrium state. During this "Refractory period" of about 1 m, no depolarization can elicit an action potential. Only when the resting state is reached can new action potentials be triggered.

To simulate an action potential, we first have to define the different elements of the cell membrane, and how to describe them analytically.

Cell Membrane

The cell membrane is made up by a water-repelling, almost impermeable double-layer of proteins, the cell membrane. The real power in processing signals does not come from the cell membrane, but from ion channels that are embedded into that membrane. Ion channels are proteins which are embedded into the cell membrane, and which can selectively be opened for certain types of ions. (This selectivity is achieved by the geometrical arrangement of the amino acids which make up the ion channels.) In addition to the Na+ and K+ ions mentioned above, ions that are typically found in the nervous system are the cations Ca2+, Mg2+, and the anions Cl- .

States of ion channels

Ion channels can take on one of three states:

  • Open (For example, an open Na-channel lets Na+ ions pass, but blocks all other types of ions).
  • Closed, with the option to open up.
  • Closed, unconditionally.

Resting state

The typical default situation – when nothing is happening - is characterized by K+ that are open, and the other channels closed. In that case two forces determine the cell voltage:

  • The (chemical) concentration difference between the intra-cellular and extra-cellular concentration of K+, which is created by the continuous activity of the ion pumps described above.
  • The (electrical) voltage difference between the inside and outside of the cell.

The equilibrium is defined by the Nernst-equation:

{{E}_{X}}=\frac{RT}{zF}\ln \frac{{{[X]}_{o}}}{{{[X]}_{i}}}

R ... gas-constant, T ... temperature, z ... ion-valence, F ... Faraday constant, [X]o/i … ion concentration outside/ inside. At 25° C, RT/F is 25 mV, which leads to a resting voltage of

{{E}_{X}}=\frac{58mV}{z}\log \frac{{{[X]}_{o}}}{{{[X]}_{i}}}

With typical K+ concentration inside and outside of neurons, this yields E_{K+} = -75 mV. If the ion channels for K+, Na+ and Cl- are considered simultaneously, the equilibrium situation is characterized by the Goldman-equation

{{V}_{m}}=\frac{RT}{F}\ln \frac{{{P}_{K}}{{[{{K}^{+}}]}_{o}}+{{P}_{Na}}{{[N{{a}^{+}}]}_{o}}+{{P}_{Cl}}{{[Cl-]}_{i}}}{{{P}_{K}}{{[{{K}^{+}}]}_{i}}+{{P}_{Na}}{{[N{{a}^{+}}]}_{i}}+{{P}_{Cl}}{{[Cl-]}_{o}}}

where Pi denotes the permeability of Ion "i", and I the concentration. Using typical ion concentration, the cell has in its resting state a negative polarity of about -60 mV.

Activation of Ion Channels

The nifty feature of the ion channels is the fact that their permeability can be changed by

  • A mechanical stimulus (mechanically activated ion channels)
  • A chemical stimulus (ligand activated ion channels)
  • Or an by an external voltage (voltage gated ion channels)
  • Occasionally ion channels directly connect two cells, in which case they are called gap junction channels.


  • Sensory systems are essentially based ion channels, which are activated by a mechanical stimulus (pressure, sound, movement), a chemical stimulus (taste, smell), or an electromagnetic stimulus (light), and produce a "neural signal", i.e. a voltage change in a nerve cell.
  • Action potentials use voltage gated ion channels, to change the "state" of the neuron quickly and reliably.
  • The communication between nerve cells predominantly uses ion channels that are activated by neurotransmitters, i.e. chemicals emitted at a synapse by the preceding neuron. This provides the maximum flexibility in the processing of neural signals.

Modeling a voltage dependent ion channel

Ohm's law relates the resistance of a resistor, R, to the current it passes, I, and the voltage drop across the resistor, V:




where g=1/R is the conductance of the resistor. If you now suppose that the conductance is directly proportional to the probability that the channel is in the open conformation, then this equation becomes

I={{g}_{\max }}*n*V

where gmax is the maximum conductance of the cannel, and n is the probability that the channel is in the open conformation.

Example: the K-channel

Voltage gated potassium channels (Kv) can be only open or closed. Let α be the rate the channel goes from closed to open, and β the rate the channel goes from open to closed

{{({{K}_{v}})}_{closed}}\underset{\beta }{\overset{\alpha }{\longleftrightarrow}}{{({{K}_{v}})}_{open}}

Since n is the probability that the channel is open, the probability that the channel is closed has to be (1-n), since all channels are either open or closed. Changes in the conformation of the channel can therefore be described by the formula

\frac{dn}{dt}=(1-n)\alpha -n\beta =\alpha -(\alpha +\beta )n

Note that α and β are voltage dependent! With a technique called "voltage-clamping", Hodgkin and Huxley determine these rates in 1952, and they came up with something like

\alpha (V)=\frac{0.01*\left( V+10 \right)}{\begin{align}
  & \exp \left( \frac{V+10}{10} \right)-1 \\ 
 & \beta (V)=0.125*\exp \left( \frac{V}{80} \right) \\ 

If you only want to model a voltage-dependent potassium channel, these would be the equations to start from. (For voltage gated Na channels, the equations are a bit more difficult, since those channels have three possible conformations: open, closed, and inactive.)

Hodgkin Huxley equation

The feedback-loop of voltage-gated ion channels mentioned above made it difficult to determine their exact behaviour. In a first approximation, the shape of the action potential can be explained by analyzing the electrical circuit of a single axonal compartment of a neuron, consisting of the following components: 1) membrane capacitance, 2) Na channel, 3) K channel, 4) leakage current:

Circuit diagram of neuronal membrane based on Hodgkin and Huxley model.

The final equations in the original Hodgkin-Huxley model, where the currents in of chloride ions and other leakage currents were combined, were as follows:


Spiking behavior of a Hodgkin-Huxley model.

where m, h, and n are time- and voltage dependent functions which describe the membrane-permeability. For example, for the K channels n obeys the equations described above, which were determined experimentally with voltage-clamping. These equations describe the shape and propagation of the action potential with high accuracy! The model can be solved easily with open source tools, e.g. the Python Dynamical Systems Toolbox PyDSTools. A simple solution file is available under [1] , and the output is shown below.

Links to full Hodgkin-Huxley model

Modeling the Action Potential Generation: The Fitzhugh-Nagumo model

Phaseplane plot of the Fitzhugh-Nagumo model, with (a=0.7, b=0.8, c=3.0, I=-0.4). Solutions for four different starting conditions are shown. The dashed lines indicate the nullclines, and the "o" the fixed point of the model. I=-0.2 would be a stimulation below threshold, leading to a stationary state. And I=-1.6 would hyperpolarize the neuron, also leading to a - different - stationary state.

The Hodgkin-Huxley model has four dynamical variables: the voltage V, the probability that the K channel is open, n(V), the probability that the Na channel is open given that it was closed previously, m(V), and the probability that the Na channel is open given that it was inactive previously, h(V). A simplified model of action potential generation in neurons is the Fitzhugh-Nagumo (FN) model. Unlike the Hodgkin-Huxley model, the FN model has only two dynamic variables, by combining the variables V and m into a single variable v, and combining the variables n and h into a single variable r

  & \frac{dv}{dt}=c(v-\frac{1}{3}{{v}^{3}}+r+I) \\ 
 & \frac{dr}{dt}=-\frac{1}{c}(v-a+br)  
\end{align} The following two examples are taken from I is an external current injected into the neuron. Since the FN model has only two dynamic variables, its full dynamics can be explored using phase plane methods (Sample solution in Python here [2])

Simulating a Single Neuron with Positive Feedback

The following two examples are taken from [3] . This book provides a fantastic introduction into modeling simple neural systems, and gives a good understanding of the underlying information processing.

Simple neural system with feedback.

Let us first look at the response of a single neuron, with an input x(t), and with feedback onto itself. The weight of the input is v, and the weight of the feedback w. The response y(t) of the neuron is given by


This shows how already very simple simulations can capture signal processing properties of real neurons.

System output for a input pulse: a “leaky integrator”
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pylab as plt
def oneUnitWithPosFB():
    '''Simulates a single model neuron with positive feedback '''
    # set input flag (1 for impulse, 2 for step)
    inFlag = 1
    cut = -np.inf   # set cut-off
    sat = np.inf    # set saturation
    tEnd = 100      # set last time step
    nTs = tEnd+1    # find the number of time steps
    v = 1           # set the input weight
    w = 0.95        # set the feedback weight
    x = np.zeros(nTs)   # open (define) an input hold vector 
    start = 11          # set a start time for the input     
    if inFlag == 1:     # if the input should be a pulse 
        x[start] = 1    # then set the input at only one time point
    elif inFlag == 2:   # if the input instead should be a step, then
        x[start:nTs] = np.ones(nTs-start) #keep it up until the end 
    y = np.zeros(nTs)   # open (define) an output hold vector 
    for t in range(2, nTs): # at every time step (skipping the first) 
        y[t] = w*y[t-1] + v*x[t-1]  # compute the output 
        y[t] = np.max([cut, y[t]])  # impose the cut-off constraint
        y[t] = np.min([sat, y[t]])  # mpose the saturation constraint 
    # plot results (no frills)
    tBase = np.arange(tEnd+1)
    plt.plot(tBase, x)
    plt.axis([0, tEnd, 0, 1.1])
    plt.xlabel('Time Step')
    plt.plot(tBase, y)
    plt.xlabel('Time Step')
if __name__ == '__main__':

Simulating a Simple Neural System

Even very simple neural systems can display a surprisingly versatile set of behaviors. An example is Wilson's model of the locust-flight central pattern generator. Here the system is described by

\vec{y}(t)=\mathbf{W}\cdot \vec{y}(t-1)+\vec{v}\,x(t-1)

W is the connection matrix describing the recurrent connections of the neurons, and describes the input to the system.

Input x connects to units yi (i=1,2,3,4) with weights vi , and units y_l (l = 1,2,3,4) connect to units y_k (k = 1,2,3,4) with weights w_kl . For clarity, the self-connections of y2 and y3 are not shown, and the individual forward and recurrent weights are not labeled. Based on Tom Anastasio's excellent book "Tutorial on Neural Systems Modeling".
The response of units representing motoneurons in the inear version of Wilson’s model of the locust-flight central pattern generator (CPG): A simple input pulse elicits a sustained antagonistic oscillation in neurons 2 and 3.
import numpy as np
import matplotlib.pylab as plt
def printInfo(text, value):
    print(np.round(value, 2))
def WilsonCPG():
    '''implements a linear version of Wilson's 
    locust flight central pattern generator (CPG) '''
    v1 = v3 = v4 = 0.                   # set input weights
    v2 = 1.
    w11=0.9; w12=0.2; w13 = w14 = 0.    # feedback weights to unit one
    w21=-0.95; w22=0.4; w23=-0.5; w24=0 # ... to unit two
    w31=0; w32=-0.5; w33=0.4; w34=-0.95 # ... to unit three
    w41 = w42 = 0.; w43=0.2; w44=0.9    # ... to unit four
    V=np.array([v1, v2, v3, v4])        # compose input weight matrix (vector)
    W=np.array([[w11, w12, w13, w14],
              [w21, w22, w23, w24],
              [w31, w32, w33, w34],
              [w41, w42, w43, w44]])    # compose feedback weight matrix
    tEnd = 100              # set end time
    tVec = np.arange(tEnd)  # set time vector
    nTs = tEnd              # find number of time steps
    x = np.zeros(nTs)       # zero input vector
    fly = 11                # set time to start flying
    x[fly] = 1              # set input to one at fly time
    y = np.zeros((4,nTs))   # zero output vector
    for t in range(1,nTs):  # for each time step
        y[:,t] =[:,t-1]) + V*x[t-1]; # compute output
    # These calculations are interesting, but not absolutely necessary
    (eVal,eVec) = np.linalg.eig(W); # find eigenvalues and eigenvectors    
    magEVal = np.abs(eVal)          # find magnitude of eigenvalues
    angEVal = np.angle(eVal)*(180/np.pi) # find angles of eigenvalues
    printInfo('Eigenvectors: --------------', eVec)
    printInfo('Eigenvalues: ---------------', eVal)
    printInfo('Angle of Eigenvalues: ------', angEVal)    
    # plot results (units y2 and y3 only)
    plt.rcParams['font.size'] = 14      # set the default fontsize
    plt.plot(tVec, x, 'k-.', tVec, y[1,:],'k', tVec,y[2,:],'k--', linewidth=2.5)
    plt.axis([0, tEnd, -0.6, 1.1])
    plt.xlabel('Time Step',fontsize=14)
    plt.ylabel('Input and Unit Responses',fontsize=14)
    plt.legend(('Input','Left Motoneuron','Right Motoneuron'))
if __name__ == '__main__':

The Development and Theory of Neuromorphic Circuits


Neurmomorphic engineering uses very-large-scale-integration (VLSI) systems to build analog and digital circuits, emulating neuro-biological architecture and behavior. Most modern circuitry primarily utilizes digital circuit components because they are fast, precise, and insensitive to noise. Unlike more biologically relevant analog circuits, digital circuits require higher power supplies and are not capable of parallel computing. Biological neuron behaviors, such as membrane leakage and threshold constraints, are functions of material substrate parameters, and require analog systems to model and fine tune beyond digital 0/1. This paper will briefly summarize such neuromorphic circuits, and the theory behind their analog circuit components.

Current Events in Neuromorphic Engineering

Recently, the field of neuromorphic engineering has experienced a period of rapid growth, receiving widespread attention from the press and scientific community. In 2013, after drawing the attention of the EU commission, the Human Brain Project was initiated, funding it 1.2 billion euros over ten years. This project proposes computationally simulating the human brain from the level of molecules and neurons up through neuronal circuits. Shortly after this announcement, the U.S. National Insitiute of Health announced the funding of the US\$100 million BRAIN Project, aimed to reconstruct the activity of large populations of neurons. Corporate labs at Hewlett-Packard and IBM are also investigating in various neuromorphic projects.

Transistor Structure & Physics

Metal-oxide-silicon-field-effect-transistors (MOSFETs) are common components of modern integrated circuits. MOSFETs are classified as unipolar devices because each transistor utilizes only one carrier type; negative-type MOFETs (nFETs) have electrons as carriers and positive-type MOSFETs (pFETs) have holes as carriers.

Cross section of an n-type MOSFET. Transistor showing gate (G), body (B), source (S), and drain (D). Positive current flows from the n+ drain well to the n+ source well. Source: Wikipedia

The general MOSFET has a metal gate (G), and two pn junction diodes known as the source (S) and the drain (D) as shown in Fig \ref{fig: transistor}. There is an insulating oxide layer that separates the gate from the silicon bulk (B). The channel that carries the charge runs directly below this oxide layer. The current is a function of the gate dimensions.

The source and the drain are symmetric and differ only in the biases applied to them. In a nFET device, the wells that form the source and drain are n-type and sit in a p-type substrate. The substrate is biased through the bulk p-type well contact. The positive current flows below the gate in the channel from the drain to the source. The source is called as such because it is the source of the electrons. Conversely, in a pFET device, the p-type source and drain are in a bulk n-well that is in a p-type substrate; current flows from the source to the drain.

When the carriers move due to a concentration gradient, this is called diffusion. If the carriers are swept due to an electric field, this is called drift. By convention, the nFET drain is biased at a higher potential than the source, whereas the source is biased higher in a pFET.

In a nFET, when a positive voltage is applied to the gate, positive charge accumulates on the metal contact. This draws electrons from the bulk to the silicon-oxide interface, creating a negatively charged channel between the source and the drain. The larger the gate voltage, the thicker the channel becomes which reduces the internal resistance, and thus increases the current logarithmically. For small gate voltages, typically below the threshold voltage, V_{th} = 0.7V , the channel is not yet fully conducting and the increase in current from the drain to the source increases linearly on a logarithmic scale. This regime, when V_{gs} < V_{th}, is called the subthreshold region. Beyond this threshold voltage, V_{gs} > V_{th}, the channel is fully conducting between the source and drain, and the current is in the superthreshold regime.

Transistor current as a function of V_{g} for a fixed value value of V_{ds}.

For current to flow from the drain to the source, there must initially be an electric field to sweep the carriers across the channel. The strength of this electric field is a function of the applied potential difference between the source and the drain (V_{ds}), and thus controls the drain-source current. For small values of V_{ds}, the current linearly increases as a function of V_{ds} for constant V_{gs} values. As V_{ds} increases beyond 100mV, the current saturates.

pFETs behave similarly to nFET except that the carriers are holes, and the contact biases are negated.

In digital applications, transistors either operate in their saturation region (on) or are off. This large range in potential differences between the on and off modes is why digital circuits have such a high power demand. Contrarily, analog circuits take advantage of the linear region of transistors to produce a continuous signals with a lower power demand. However, because small changes in gate or source-drain voltages can create a large change in current, analog systems are prone to noise.

The field of neuromorphic engineering takes advantage of the noisy nature of analog circuits to replicate stochastic neuronal behavior [4], [5]

</ref>. Unlike clocked digital circuits, analog circuits are capable of creating action potentials with temporal dynamics similar to biological time scales (approx. 10 \mu sec). The potentials are slowed down and firing rates are controlled by lengthening time constants through leaking biases and variable resistive transistors. Analog circuits have been created that are capable of emulating biological action potentials with varying temporal dynamics, thus allowing silicon circuits to mimic neuronal spike-based learning behavior [6]. Whereas, digital circuits can only contain binary synaptic weights [0,1], analog circuits are capable of maintaining synaptic weights within a continuous range of values, making analog circuits particularly advantageous for neuromorophic circuits.

Basic static circuits

With an understanding of how transistors work and how they are biased, basic static analog circuits can be rationalized through. Afterward, these basic static circuits will be combined to create neuromorphic circuits. In the following circuit examples, the source, drain, and gate voltages are fixed, and the current is the output. In practice, the bias gate voltage is fixed to a subthreshold value (0<V_g<0.7V), the drain is held in saturation (V_d>100mV), and the source and bulk are tied to ground (V_s, V_b = 0V). All non-idealities are ignored.

Basic static circuits. (A) Diode-connected transistor. (B) Current mirror. (C) Source follower. (D) Inverter. (E) Current conveyor. (F) Differential Pair.

Diode-Connected Transistor

A diode-connected nFET has its gate tied to the drain. Since the floating drain controls the gate voltage, the drain-gate voltages will self-regulate so the device will always sink the input current, I_{ds}. Beyond several microvolts, the transistor will run in saturation. Similarly, a diode-connected pFET has its gate tied to the source. Though this simple device seems to merely function as a short circuit, it is commonly used in analog circuits for copying and regulating current. Particularly in neuromorphic circuits, they are used to slow current sinks, to increase circuit time constants to biologically plausible time regimes.

Current Mirror

A current mirror takes advantage of the diode-connected transistor’s ability to sink current. When an input current is forced through the diode connected transistor, M1, the floating drain and gate are regulated to the appropriate voltage that allows the input current to pass. Since the two transistors share a common gate node, M2 will also sink the same current. This forces the output transistor to duplicate the input current. The output will mirror the input current as long as:

  1.  V_{s1} = V_{s2}
  2. \frac{W_{M1}}{L_{M1}}=\frac{W_{M2}}{L_{M2}} .

The current mirror gain can be controlled by adjusting these two parameters. When using transistors with different dimensions, otherwise known as a tilted mirror, the gain is:

  Gain = \frac{(\frac{W}{L})_{M2}}{(\frac{W}{L})_{M1}}.

A pFET current mirror is simply a flipped nFET mirror, where the diode-connected pFET mirrors the input current, and forces the other pFET to source output current.

Current mirrors are commonly used to copy currents without draining the input current. This is especially essential for feedback loops, such as the one use to accelerate action potentials, and summing input currents at a synapse.

Source Follower

A source follower consists of an input transistor, M_1, stacked on top of a bias transistor, M_b. The fixed subthreshold (<0.7V) bias voltage controls the gate M_b, forcing it to sink a constant current, I_b. M_1 is thus also forced to sink the same current (I_1 = I_b) regardless of what the input voltage, V_{in}.

A source follower is called so because the output, V_{out}, will follow V_{in} with a slight offset described by:

  V_{out} = \kappa \cdot (V_{in} -V_b),
where kappa is the subthreshold slope factor, typically less than one.

This simple circuit is often used as a buffer. Since no current can flow through the gate, this circuit will not draw current from the input, an important trait for low-power circuits. Source followers can also isolate circuits, protecting them from power surges or static. A pFET source follower only differs from an nFET source follower in that the bias pFET has its bulk tied to V_{out}.

In neuromorphic circuits, source followers and the like are used as simple current integrators which behave like post-synaptic neurons collecting current from many pre-synaptic neurons.


An inverter consists of a pFET, M_1, stacked on top of a nFET, M_2, with their gates tied to the input, V_{in} and the output is tied to the common source node, V_{out}. When a high signal is input, the pFET is off but the nFET is on, effectively draining the output node, V_{out}, and inverting the signal. Contrarily, when the input signal is low, the nFET is off but the pFET is on, charging up the V_{out} node.

This simple circuit is effective as a quick switch. The inverter is also commonly used as a buffer because an output current can be produced without directly sourcing the input current, as no current is allowed through the gate. When two inverters are used in series, they can be used as a non-inverting amplifier. This was used in the original Integrate-and-Fire silicon neuron by Mead et al., 1989 to create a fast depolarizing spike similar to that of a biological action potential [7]. However, when the input fluctuates between high and low signals both transistors are in superthreshold saturation draining current, making this a very power hungry circuit.

Current Conveyor

The current conveyor is also commonly known as a buffered current mirror. Consisting of two transistors with their gates tied to a node of the other, the Current Conveyor self regulates so that the output current matches the input current, in a manner similar to the Current Mirror.

The current conveyor is often used in place of current mirrors for large serially repetitious arrays. This is because the current mirror current is controlled through the gate, whose oxide capacitance will result in a delayed output. Though this lag is negligible for a single output current mirror, long mirroring arrays will accumulative significant output delays. Such delays would greatly hinder large parallel processes such as those that try to emulate biological neural network computational strategies.

Differential Pair

The differential pair is a comparative circuit composed of two source followers with a common bias that forces the current of the weaker input to be silenced. The bias transistor will force I_b to remain constant, tying the common node, V_s, to a fixed voltage. Both input transistors will want to drain current proportional to their input voltages, I_1 and I_2, respectively. However, since the common node must remain fixed, the drains of the input transistors must raise in proportion to the gate voltages. The transistor with the lower input voltage will act as a choke and allow less current through its drain. The losing transistor will see its source voltage increase and thus fall out of saturation.

The differential pair, in the setting of a neuronal circuit, can function as an activation threshold of an ion channel below which the voltage-gated ion channel will not open, preventing the neuron from spiking [8].

Silicon neurons


The Winner-Take-All (WTA) circuit, originally designed by Lazzaro et al. [9], is a continuous time, analog circuit. It compares the outputs of an array of cells, and only allows the cell with the highest output current to be on, inhibiting all other competing cells.

A two-input CMOS winner-take-all circuit

Each cell comprises a current-controlled conveyor, and receives input currents, and outputs into a common line controlling a bias transistor. The cell with the largest input current, will also output the largest current, increasing the voltage of the common node. This forces the weaker cells to turn off. The WTA circuit can be extended to include a large network of competing cells. A soft WTA also has its output current mirrored back to the input, effectively increasing the cell gain. This is necessary to reduce noise and random switching if the cell array has a small dynamic range.

WTA networks are commonly used as a form of competitive learning in computational neural networks that involve distributed decision making. In particular, WTA networks have been used to perform low level recognition and classification tasks that more closely resemble cortical activity during visual selection tasks [10].

Integrate & Fire Neuron

The most general schematic of an Integrate & Fire Neuron, is also known as an Axon-Hillock Neuron, is the most commonly used spiking neuron model [11]. Common elements between most Axon-Hillock circuits include: a node with a memory of the membrane potential V_c, an amplifier, a positive feedback loop C_f, and a mechanism to reset the membrane potential to its resting state, V_p.

The input current, I_i, charges the V_{c}, which is stored in a capacitor, C. This capacitor is analogous to the lipid cellular membrane which prevents free ionic diffusion, creating the membrane potential from the accumulated charge difference on either side of the lipid membrane. The input is amplified to output a voltage spike. A change in membrane potential is positively fed back through C_f to V_{c}, producing a faster spike. This closely resembles how a biological axon hillock, which is densely packed with voltage-gated sodium channels, amplifies the summed potentials to produce an action potential. When a voltage spike is produced, the reset bias, V_p, begins to drain the V_{c} node. This is similar to sodium-potassium channels which actively pump sodium and potassium ions against the concentration gradient to maintain the resting membrane potential.

Spiking neuron circuit. The amplifier consists of two inverting amplifiers that create the characteristic fast upward swing of an actional potential. The output spike, V_o, is initiated by the input current, I_i and the width is modulated by V_p. Source: adopted from Mead et al., 1989

The DPI neuron circuit. (A) Circuit schematic. The input DPI low-pass filter (yellow, ML1 − ML3) models the neuron's leak conductance. A spike event generation amplifier (red, MA1 − MA6) implements current-based positive feedback (modeling both sodium activation and inactivation conductances) and produces address-events at extremely low-power. The reset block (blue, MR1 − MR6) resets the neuron and keeps it in a reset state for a refractory period, set by the Vref bias voltage. An additional DPI filter integrates the spikes and produces a slow after hyper-polarizing current Ig responsible for spike-frequency adaptation (green, MG1 − MG6). (B) Response of the DPI neuron circuit to a constant input current. The measured data was fitted with a function comprising an exponential ∝e−t/τK at the onset of the stimulation, characteristic of all conductance-based models, and an additional exponential ∝e+t/τNa (characteristic of exponential I&F computational models; Brette and Gerstner, 2005) at the onset of the spike Source: Indiveri et al., 2010.

The original Axon Hillock silicon neuron has been adapted to include an activation threshold with the addition of a Differential Pair comparing the input to a set threshold bias [12]. This conductance-based silicon neuron utilizes differential-pair integrator (DPI) with a leaky transistor to compare the input, I_{in} to the threshold, V_{thr}. The leak bias V_{tau}, refractory period bias V_{rfr}, adaptation bias V_{ahp}, and positive feed back gain, all independently control the spiking frequency. Research has been focused on implementing spike frequency adaptation to set refractory periods and modulating thresholds [13]. Adaptation allows for the neuron to modulate its output firing rate as a function of its input. If there is a constant high frequency input, the neuron will be desensitized to the input and the output will be steadily diminished over time. The adaptive component of the conductance-based neuron circuit is modeled through the calcium flux and stores the memory of past activity through the adaptive capacitor, C_{ahp}. The advent of spike frequency adaptation allowed for changes on the neuron level to control adaptive learning mechanisms on the synapse level. This model of neuronal learning is modeled from biology [14] and will be further discussed in Silicon Synapses.

(A)Current depression mechanism. (B) Adaptive threshold mechanism as a function of V_{mem}(blue). The neuron's spiking threshold (red) increases with every spike, increasing the spiking time constant. Source: Indiveri et al., 2010

Silicon Synapses

The most basic silicon synapse, originally used by Mead et al.,1989 [15], simply consists of a pFET source follower that receives a low signal pulse input and outputs a unidirectional current, I_o [16].

(A) Basic synapse circuit. (B)Synapse circuit with longer time constant. Sources: adopted from Mead et al., 1989, and Lazzaro et al., 1993, respectively.

The amplitude of the spike is controlled by the weight bias, V_w, and the pulse width is directly correlated with the input pulse width which is set by $V_{\tau}$. The capacitor in the Lazzaro et al. (1993) synapse circuit was added to increase the spike time constant to a biologically plausible value. This slowed the rate at which the pulse hyperpolarizes and depolarizes, and is a function of the capacitance.

Basic synapse circuit. Source: adopted from Lazzaro et al., 1992

For multiple inputs depicting competitive excitatory and inhibitive behavior, the log-domain integrator uses I_1 and I_2 to regulate the output current magnitude, I_o, as function of the input current, I_i, according to:

I_o = I_i \cdot \sqrt{\frac{I_1}{I_2}}.

I_1 controls the rate at which I_i is able to charge the output transistor gate. I_2 governs the rate in which the output I_o is sunk. This competitive nature is necessary to mimic biological behavior of neurotransmitters that either promote or depress neuronal firing.

Synaptic models have also been developed with first order linear integrators using log-domain filters capable of modeling the exponential decay of excitatory post-synaptic current (EPSC) [17]. This is necessary to have biologically plausible spike contours and time constants. The gain is also independently controlled from the synapse time constant which is necessary for spike-rate and spike-timing dependent learning mechanisms.

(A) Data fit for a typical EPSC according to the linear integrator model. (B) A basic log-domain integrator. Source: Mitra et al., 2010

The aforementioned synapses simply relay currents from the pre-synaptic sources, varying the shape of the pulse spike along the way. They do not, however, contain any memory of previous spikes, nor are they capable of adapting their behavior according to temporal dynamics. These abilities, however, are necessary if neuromorphic circuits are to learn like biological neural networks.

An artificial neural network. There are p presynaptic neurons (x), and q postsynaptic neurons (b). x_p is a single presynaptic neuron that synapses upon postsynaptic neuron b_q with the synaptic weight w_{pq} resulting in the postsynaptic neuron to output y_q. Source: Wikipedia

According to Hebb's postulate, behaviors like learning and memory are hypothesized to occur on the synaptic level [18]. It accredits the learning process to long-term neuronal adaptation in which pre- and post-synaptic contributions are strengthened or weakened by biochemical modifications. This theory is often summarized in the saying, "Neurons that fire together, wire together." Artificial neural networks model learning through these biochemical "wiring" modifications with a single parameter, the synaptic weight, w_{pq}. A synaptic weight is a parameter state variable that quantifies how a presynaptic neuron spike affects a postsynaptic neuron output. Two models of Hebbian synaptic weight plasticity include spike-rate-dependent plasticity (SRDP), and spike-timing-dependent plasticity (STDP). Since the conception of this theory, biological neuron activity has been shown to exhibit behavior closely modeling Hebbian learning. One such example is of synaptic NMDA and AMPA receptor plastic modifications that lead to calcium flux induced adaptation [19].

Learning and long-term memory of information in biological neurons is accredited to NMDA channel induced adaptation. These NMDA receptors are voltage dependent and control intracellular calcium ion flux. It has been shown in animal studies that neuronal desensitization is diminished when extracellular calcium was reduced [20].

(A)Simple synapse consisting of AMPA and NMDA channels, and calcium. (B) Circuit models of individual elements of the synapse. (C) Circuit outputs in response to a presynaptic action (AP) potential input (AP_{PRE}). Source: Rachmuth et al., 2011

Since calcium concentration exponentially decays, this behavior easily implemented on hardware using subthreshold transistors. A circuit model demonstrating calcium dependent biological behavior is shown by Rachmuth et al. (2011) [21]. The calcium signal, I_{Ca_{2+}}, regulates AMPA and NMDA channel activity through the V_{mem} node according to calcium-dependent STDP and SRDP learning rules. The output of these learning rules is the synaptic weight, w, which is proportional to the number of active AMPA and NMDA channels. The SRDP model describes the weight in terms of two state variables, \Omega, which controls the update rule, and \eta, which controls the learning rate.

dw = \eta([Ca_{2+}]) \cdot (\Omega([Ca_{2+}]) - \lambda w)  ,
where w is the synaptic weight, \Omega([Ca_{2+}]) is the update rule, \eta([Ca_{2+}]) is the learning rate, and \lambda is a constant that allows the weight to drift out of saturation in absence of an input.

The NMDA channel controls the calcium influx, I_{Ca}. The NMDA receptor voltage-dependency is modeled by V_{mem}, and the channel mechanics are controlled with a large capacitor to increase the calcium time constant, \tau_{Ca}. The output I_{Ca} is copied via current mirrors into the \Omega and \eta circuits to perform downstream learning functions.

The \Omega circuit compares I_{Ca} to threshold biases, \theta_{LTP} and \theta_{LTD}), that respectively control long-term potentiation or long-term depression through a series of differential pair circuits. The output of differential pairs determines the update rule. This \Omega circuit has been demonstrated to exhibit various Hebbian learning rules as observed in the hippocampus, and anti-Hebbian learning rules used in the cerebellum.

The \eta circuit controls when synaptic learning can occur by only allowing updates when I_{Ca} is above a differential pair set threshold, \theta_{\eta}. The learning rate (LR) is modeled according to:

\tau_{LR} \sim \frac{\theta_{\eta} \cdot C_{\eta}}{I_{\eta} \cdot [Ca_{2+}]}  ,
where \eta is a function of [Ca_{2+}] and controls the learning rate, C_{\eta} is the capacitance of the \eta circuit, and \theta_{\eta} is the threshold voltage of the comparator. This function demonstrates that \theta_{\eta} must be biased to maintain an elevated [Ca_{2+}] in order to simulate SRDT. A leakage current, I_{LEAK}, was included to drain V_{\eta} to \eta_{REST} during inactivity.


  1. T. Haslwanter (2012). "Hodgkin-Huxley Simulations [Python"]. private communications. 
  2. T. Haslwanter (2012). "Fitzhugh-Nagumo Model [Python"]. private communications. 
  3. T. Anastasio (2010). "Tutorial on Neural systems Modeling". 
  4. name="aydiner2003"> T. Anastasio (2010). "Tutorial on Neural systems Modeling". 
  5. WM Siebert (1965), Some implications of the stochastic behavior of primary auditory neurons 
  6. indiveri2010
  7. mead1989
  8. douglaspatent
  9. lazzaro1989
  10. riesenbuber1999
  11. mead1989
  12. douglaspatent
  13. douglas2003
  14. indiveri
  15. mead1989
  16. lazzaro1993
  17. mitra2010
  18. hebb1949
  19. koplas1997
  20. koplas1997
  21. rachmuth2011

















Visual System



Generally speaking, visual systems rely on electromagnetic (EM) waves to give an organism more information about its surroundings. This information could be regarding potential mates, dangers and sources of sustenance. Different organisms have different constituents that make up what is referred to as a visual system.

The complexity of eyes range from something as simple as an eye spot, which is nothing more than a collection of photosensitive cells, to a fully fledged camera eye. If an organism has different types of photosensitive cells, or cells sensitive to different wavelength ranges, the organism would theoretically be able to perceive colour or at the very least colour differences. Polarisation, another property of EM radiation, can be detected by some organisms, with insects and cephalopods having the highest accuracy.

Please note, in this text, the focus has been on using EM waves to see. Granted, some organisms have evolved alternative ways of obtaining sight or at the very least supplementing what they see with extra-sensory information. For example, whales or bats, which use echo-location. This may be seeing in some sense of the definition of the word, but it is not entirely correct. Additionally, vision and visual are words most often associated with EM waves in the visual wavelength range, which is normally defined as the same wavelength limits of human vision. Since some organisms detect EM waves with frequencies below and above that of humans a better definition must be made. We therefore define the visual wavelength range as wavelengths of EM between 300nm and 800nm. This may seem arbitrary to some, but selecting the wrong limits would render parts of some bird's vision as non-vision. Also, with this range of wavelengths, we have defined for example the thermal-vision of certain organisms, like for example snakes as non-vision. Therefore snakes using their pit organs, which is sensitive to EM between 5000nm and 30,000nm (IR), do not "see", but somehow "feel" from afar. Even if blind specimens have been documented targeting and attacking particular body parts.

Firstly a brief description of different types of visual system sensory organs will be elaborated on, followed by a thorough explanation of the components in human vision, the signal processing of the visual pathway in humans and finished off with an example of the perceptional outcome due to these stages.

Sensory Organs

Vision, or the ability to see depends on visual system sensory organs or eyes. There are many different constructions of eyes, ranging in complexity depending on the requirements of the organism. The different constructions have different capabilities, are sensitive to different wave-lengths and have differing degrees of acuity, also they require different processing to make sense of the input and different numbers to work optimally. The ability to detect and decipher EM has proved to be a valuable asset to most forms of life, leading to an increased chance of survival for organisms that utilise it. In environments without sufficient light, or complete lack of it, lifeforms have no added advantage of vision, which ultimately has resulted in atrophy of visual sensory organs with subsequent increased reliance on other senses (e.g. some cave dwelling animals, bats etc.). Interestingly enough, it appears that visual sensory organs are tuned to the optical window, which is defined as the EM wavelengths (between 300nm and 1100nm) that pass through the atmosphere reaching to the ground. This is shown in the figure below. You may notice that there exists other "windows", an IR window, which explains to some extent the thermal-"vision" of snakes, and a radiofrequency (RF) window, of which no known lifeforms are able to detect.

Atmospheric electromagnetic opacity.svg

Through time evolution has yielded many eye constructions, and some of them have evolved multiple times, yielding similarities for organisms that have similar niches. There is one underlying aspect that is essentially identical, regardless of species, or complexity of sensory organ type, the universal usage of light-sensitive proteins called opsins. Without focusing too much on the molecular basis though, the various constructions can be categorised into distinct groups:

  • Spot Eyes
  • Pit Eyes
  • Pinhole Eyes
  • Lens Eyes
  • Refractive Cornea Eyes
  • Reflector Eyes
  • Compound Eyes

The least complicated configuration of eyes enable organisms to simply sense the ambient light, enabling the organism to know whether there is light or not. It is normally simply a collection of photosensitive cells in a cluster in the same spot, thus sometimes referred to as spot eyes, eye spot or stemma. By either adding more angular structures or recessing the spot eyes, an organisms gains access to directional information as well, which is a vital requirement for image formation. These so called pit eyes are by far the most common types of visual sensory organs, and can be found in over 95% of all known species.

Pinhole eye

Taking this approach to the obvious extreme leads to the pit becoming a cavernous structure, which increases the sharpness of the image, alas at a loss in intensity. In other words, there is a trade-off between intensity or brightness and sharpness. An example of this can be found in the Nautilus, species belonging to the family Nautilidae, organisms considered to be living fossils. They are the only known species that has this type of eye, referred to as the pinhole eye, and it is completely analogous to the pinhole camera or the camera obscura. In addition, like more advanced cameras, Nautili are able to adjust the size of the aperture thereby increasing or decreasing the resolution of the eye at a respective decrease or increase in image brightness. Like the camera, the way to alleviate the intensity/resolution trade-off problem is to include a lens, a structure that focuses the light unto a central area, which most often has a higher density of photo-sensors. By adjusting the shape of the lens and moving it around, and controlling the size of the aperture or pupil, organisms can adapt to different conditions and focus on particular regions of interest in any visual scene. The last upgrade to the various eye constructions already mentioned is the inclusion of a refractive cornea. Eyes with this structure have delegated two thirds of the total optic power of the eye to the high refractive index liquid inside the cornea, enabling very high resolution vision. Most land animals, including humans have eyes of this particular construct. Additionally, many variations of lens structure, lens number, photosensor density, fovea shape, fovea number, pupil shape etc. exists, always, to increase the chances of survival for the organism in question. These variations lead to a varied outward appearance of eyes, even with a single eye construction category. Demonstrating this point, a collection of photographs of animals with the same eye category (refractive cornea eyes) is shown below.

Refractive Cornea Eyes
Hawk Eye
Sheep Eye
Cat Eye
Human Eye

An alternative to the lens approach called reflector eyes can be found in for example mollusks. Instead of the conventional way of focusing light to a single point in the back of the eye using a lens or a system of lenses, these organisms have mirror like structures inside the chamber of the eye that reflects the light into a central portion, much like a parabola dish. Although there are no known examples of organisms with reflector eyes capable of image formation, at least one species of fish, the spookfish (Dolichopteryx longipes) uses them in combination with "normal" lensed eyes.

Compound eye

The last group of eyes, found in insects and crustaceans, is called compound eyes. These eyes consist of a number of functional sub-units called ommatidia, each consisting of a facet, or front surface, a transparent crystalline cone and photo-sensitive cells for detection. In addition each of the ommatidia are separated by pigment cells, ensuring the incoming light is as parallel as possible. The combination of the outputs of each of these ommatidia form a mosaic image, with a resolution proportional to the number of ommatidia units. For example, if humans had compound eyes, the eyes would have covered our entire faces to retain the same resolution. As a note, there are many types of compound eyes, but delving to deep into this topic is beyond the scope of this text.

Not only the type of eyes vary, but also the number of eyes. As you are well aware of, humans usually have two eyes, spiders on the other hand have a varying number of eyes, with most species having 8. Normally the spiders also have varying sizes of the different pairs of eyes and the differing sizes have different functions. For example, in jumping spiders 2 larger front facing eyes, give the spider excellent visual acuity, which is used mainly to target prey. 6 smaller eyes have much poorer resolution, but helps the spider to avoid potential dangers. Two photographs of the eyes of a jumping spider and the eyes of a wolf spider are shown to demonstrate the variability in the eye topologies of arachnids.

Anatomy of the Visual System

We humans are visual creatures, therefore our eyes are complicated with many components. In this chapter, an attempt is made to describe these components, thus giving some insight into the properties and functionality of human vision.

Getting inside of the eyeball - Pupil, iris and the lens

Light rays enter the eye structure through the black aperture or pupil in the front of the eye. The black appearance is due to the light being fully absorbed by the tissue inside the eye. Only through this pupil can light enter into the eye which means the amount of incoming light is effectively determined by the size of the pupil. A pigmented sphincter surrounding the pupil functions as the eye's aperture stop. It is the amount of pigment in this iris, that give rise to the various eye colours found in humans.

In addition to this layer of pigment, the iris has 2 layers of ciliary muscles. A circular muscle called the pupillary sphincter in one layer, that contracts to make the pupil smaller. The other layer has a smooth muscle called the pupillary dilator, which contracts to dilate the pupil. The combination of these muscles can thereby dilate/contract the pupil depending on the requirements or conditions of the person. The ciliary muscles are controlled by ciliary zonules, fibres that also change the shape of the lens and hold it in place.

The lens is situated immediately behind the pupil. Its shape and characteristics reveal a similar purpose to that of camera lenses, but they function in slightly different ways. The shape of the lens is adjusted by the pull of the ciliary zonules, which consequently changes the focal length. Together with the cornea, the lens can change the focus, which makes it a very important structure indeed, however only one third of the total optical power of the eye is due to the lens itself. It is also the eye's main filter. Lens fibres make up most of the material for the lense, which are long and thin cells void of most of the cell machinery to promote transparency. Together with water soluble proteins called crystallins, they increase the refractive index of the lens. The fibres also play part in the structure and shape of the lens itself.

Schematic diagram of the human eye

Beamforming in the eye – Cornea and its protecting agent - Sclera

Structure of the Cornea

The cornea, responsible for the remaining 2/3 of the total optical power of the eye, covers the iris, pupil and lens. It focuses the rays that pass through the iris before they pass through the lens. The cornea is only 0.5mm thick and consists of 5 layers:

  • Epithelium: A layer of epithelial tissue covering the surface of the cornea.
  • Bowman's membrane: A thick protective layer composed of strong collagen fibres, that maintain the overall shape of the cornea.
  • Stroma: A layer composed of parallel collagen fibrils. This layer makes up 90% of the cornea's thickness.
  • Descemet's membrane and Endothelium: Are two layers adjusted to the anterior chamber of the eye filled with aqueous humor fluid produced by the ciliary body. This fluid moisturises the lens, cleans it and maintains the pressure in the eye ball. The chamber, positioned between cornea and iris, contains a trabecular meshwork body through which the fluid is drained out by Schlemm canal, through posterior chamber.

The surface of the cornea lies under two protective membranes, called the sclera and Tenon’s capsule. Both of these protective layers completely envelop the eyeball. The sclera is built from collagen and elastic fibres, which protect the eye from external damages, this layer also gives rise to the white of the eye. It is pierced by nerves and vessels with the largest hole reserved for the optic nerve. Moreover, it is covered by conjunctiva, which is a clear mucous membrane on the surface of the eyeball. This membrane also lines the inside of the eyelid. It works as a lubricant and, together with the lacrimal gland, it produces tears, that lubricate and protect the eye. The remaining protective layer, the eyelid, also functions to spread this lubricant around.

Moving the eyes – extra-ocular muscles

The eyeball is moved by a complicated muscle structure of extra-ocular muscles consisting of four rectus muscles – inferior, medial, lateral and superior and two oblique – inferior and superior. Positioning of these muscles is presented below, along with functions:

Extra-ocular muscles: Green - Lateral Rectus; Red - Medial Rectus; Cyan - Superior Rectus; Pink - Inferior Rectus; Dark Blue - Superior Oblique; Yellow - Inferior Oblique.

As you can see, the extra-ocular muscles (2,3,4,5,6,8) are attached to the sclera of the eyeball and originate in the annulus of Zinn, a fibrous tendon surrounding the optic nerve. A pulley system is created with the trochlea acting as a pulley and the superior oblique muscle as the rope, this is required to redirect the muscle force in the correct way. The remaining extra-ocular muscles have a direct path to the eye and therefore do not form these pulley systems. Using these extra-ocular muscles, the eye can rotate up, down, left, right and alternative movements are possible as a combination of these.

Other movements are also very important for us to be able to see. Vergence movements enable the proper function of binocular vision. Unconscious fast movements called saccades, are essential for people to keep an object in focus. The saccade is a sort of jittery movement performed when the eyes are scanning the visual field, in order to displace the point of fixation slightly. When you follow a moving object with your gaze, your eyes perform what is referred to as smooth pursuit. Additional involuntary movements called nystagmus are caused by signals from the vestibular system, together they make up the vestibulo-ocular reflexes.

The brain stem controls all of the movements of the eyes, with different areas responsible for different movements.

  • Pons: Rapid horizontal movements, such as saccades or nystagmus
  • Mesencephalon: Vertical and torsional movements
  • Cerebellum: Fine tuning
  • Edinger-Westphal nucleus: Vergence movements

Where the vision reception occurs – The retina

Filtering of the light performed by the cornea, lens and pigment epithelium

Before being transduced, incoming EM passes through the cornea, lens and the macula. These structures also act as filters to reduce unwanted EM, thereby protecting the eye from harmful radiation. The filtering response of each of these elements can be seen in the figure "Filtering of the light performed by cornea, lens and pigment epithelium". As one may observe, the cornea attenuates the lower wavelengths, leaving the higher wavelengths nearly untouched. The lens blocks around 25% of the EM below 400nm and more than 50% below 430nm. Finally, the pigment ephithelium, the last stage of filtering before the photo-reception, affects around 30% of the EM between 430nm and 500nm.

A part of the eye, which marks the transition from non-photosensitive region to photosensitive region, is called the ora serrata. The photosensitive region is referred to as the retina, which is the sensory structure in the back of the eye. The retina consists of multiple layers presented below with millions of photoreceptors called rods and cones, which capture the light rays and convert them into electrical impulses. Transmission of these impulses is nervously initiaed by the ganglion cells and conducted through the optic nerve, the single route by which information leaves the eye.

Structure of retina including the main cell components: RPE: retinal pigment epithelium; OS: outer segment of the photoreceptor cells; IS: inner segment of the photoreceptor cells; ONL: outer nuclear layer; OPL: outer plexiform layer; INL: inner nuclear layer IPL: inner plexiform layer; GC: ganglion cell layer; P: pigment epithelium cell; BM: Bruch-Membran; R: rods; C: cones; H: horizontal cell; B: bipolar cell; M: Müller cell; A:amacrine cell; G: ganglion cell; AX: Axon; arrow: Membrane limitans externa.

A conceptual illustration of the structure of the retina is shown on the right. As we can see, there are five main cell types:

  • photoreceptor cells
  • horizontal cells
  • bipolar cells
  • amecrine cells
  • ganglion cells

Photoreceptor cells can be further subdivided into two main types called rods and cones. Cones are much less numerous than rods in most parts of the retina, but there is an enormous aggregation of them in the macula, especially in its central part called the fovea. In this central region, each photo-sensitive cone is connected to one ganglion-cell. In addition, the cones in this region are slightly smaller than the average cone size, meaning you get more cones per area. Because of this ratio, and the high density of cones, this is where we have the highest visual acuity.

Density of rods and cones around the eye

There are 3 types of human cones, each of the cones responding to a specific range of wavelengths, because of three types of a pigment called photopsin. Each pigment is sensitive to red, blue or green wavelength of light, so we have blue, green and red cones, also called S-, M- and L-cones for their sensitivity to short-, medium- and long-wavelength respectively. It consists of protein called opsin and a bound chromphore called the retinal. The main building blocks of the cone cell are the synaptic terminal, the inner and outer segments, the interior nucleus and the mitochondria.

The spectral sensitivities of the 3 types of cones:

  • 1. S-cones absorb short-wave light, i.e. blue-violet light. The maximum absorption wavelength for the S-cones is 420nm
  • 2. M-cones absorb blue-green to yellow light. In this case The maximum absorption wavelength is 535nm
  • 3. L-cones absorb yellow to red light. The maximum absorption wavelength is 565nm
Cone cell structure

The inner segment contains organelles and the cell's nucleus and organelles. The pigment is located in the outer segment, attached to the membrane as trans-membrane proteins within the invaginations of the cell-membrane that form the membranous disks, which are clearly visible in the figure displaying the basic structure of rod and cone cells. The disks maximize the reception area of the cells. The cone photoreceptors of many vertebrates contain spherical organelles called oil droplets, which are thought to constitute intra-ocular filters which may serve to increase contrast, reduce glare and lessen chromatic aberrations caused by the mitochondrial size gradient from the periphery to the centres.

Rods have a structure similar to cones, however they contain the pigment rhodopsin instead, which allows them to detect low-intensity light and makes them 100 times more sensitive than cones. Rhodopsin is the only pigment found in human rods, and it is found on the outer side of the pigment epithelium, which similarly to cones maximizes absorption area by employing a disk structure. Similarly to cones, the synaptic terminal of the cell joins it with a bipolar cell and the inner and outer segments are connected by cilium.

The pigment rhodopsin absorbs the light between 400-600nm, with a maximum absorption at around 500nm. This wavelength corresponds to greenish-blue light which means blue colours appear more intense in relation to red colours at night.

The sensitivity of cones and rods across visible EM

EM waves with wavelengths outside the range of 400 – 700 nm are not detected by either rods nor cones, which ultimately means they are not visible to human beings.

Horizontal cells occupy the inner nuclear layer of the retina. There are two types of horizontal cells and both types hyper-polarise in response to light i.e. they become more negative. Type A consists of a subtype called HII-H2 which interacts with predominantly S-cones. Type B cells have a subtype called HI-H1, which features a dendrite tree and an axon. The former contacts mostly M- and L-cone cells and the latter rod cells. Contacts with cones are made mainly by prohibitory synapses, while the cells themselves are joined into a network with gap junctions.

Cross-section of the human retina, with bipolar cells indicated in red.

Bipolar cells spread single dendrites in the outer plexiform layer and the perikaryon, their cell bodies, are found in the inner nuclear layer. Dendrites interconnect exclusively with cones and rods and we differentiate between one rod bipolar cell and nine or ten cone bipolar cells. These cells branch with amacrine or ganglion cells in the inner plexiform layer using an axon. Rod bipolar cells connect to triad synapses or 18-70 rod cells. Their axons spread around the inner plexiform layer synaptic terminals, which contain ribbon synapses and contact a pair of cell processes in dyad synapses. They are connected to ganglion cells with AII amacrine cell links.

Amecrine cells can be found in the inner nuclear layer and in the ganglion cell layer of the retina. Occasionally they are found in the inner plexiform layer, where they work as signal modulators. They have been classified as narrow-field, small-field, medium-field or wide-field depending on their size. However, many classifications exist leading to over 40 different types of amecrine cells.

Ganglion cells are the final transmitters of visual signal from the retina to the brain. The most common ganglion cells in the retina is the midget ganglion cell and the parasol ganglion cell. The signal after having passed through all the retinal layers is passed on to these cells which are the final stage of the retinal processing chain. All the information is collected here forwarded to the retinal nerve fibres and optic nerves. The spot where the ganglion axons fuse to create an optic nerve is called the optic disc. This nerve is built mainly from the retinal ganglion axons and Portort cells. The majority of the axons transmit data to the lateral geniculate nucleus, which is a termination nexus for most parts of the nerve and which forwards the information to the visual cortex. Some ganglion cells also react to light, but because this response is slower than that of rods and cones, it is believed to be related to sensing ambient light levels and adjusting the biological clock.

Signal Processing

As mentioned before the retina is the main component in the eye, because it contains all the light sensitive cells. Without it, the eye would be comparable to a digital camera without the CCD (Charge Coupled Device) sensor. This part elaborates on how the retina perceives the light, how the optical signal is transmitted to the brain and how the brain processes the signal to form enough information for decision making.

Creation of the initial signals - Photosensor Function

Vision invariably starts with light hitting the photo-sensitive cells found in the retina. Light-absorbing visual pigments, a variety of enzymes and transmitters in retinal rods and cones will initiate the conversion from visible EM stimuli into electrical impulses, in a process known as photoelectric transduction. Using rods as an example, the incoming visible EM hits rhodopsin molecules, transmembrane molecules found in the rods' outer disk structure. Each rhodopsin molecule consists of a cluster of helices called opsin that envelop and surround 11-cis retinal, which is the part of the molecule that will change due to the energy from the incoming photons. In biological molecules, moieties, or parts of molecules that will cause conformational changes due to this energy is sometimes referred to as chromophores. 11-cis retinal straightens in response to the incoming energy, turning into retinal (all-trans retinal), which forces the opsin helices further apart, causing particular reactive sites to be uncovered. This "activated" rhodopsin molecule is sometimes referred to as Metarhodopsin II. From this point on, even if the visible light stimulation stops, the reaction will continue. The Metarhodopsin II can then react with roughly 100 molecules of a Gs protein called transducing, which then results in as and ß? after the GDP is converted into GTP. The activated as-GTP then binds to cGMP-phosphodiesterase(PDE), suppressing normal ion-exchange functions, which results in a low cytosol concentration of cation ions, and therefore a change in the polarisation of the cell.

The natural photoelectric transduction reaction has an amazing power of amplification. One single retinal rhodopsin molecule activated by a single quantum of light causes the hydrolysis of up to 106 cGMP molecules per second.

Photo Transduction
Representation of molecular steps in photoactivation (modified from Leskov et al., 2000). Depicted is an outer membrane disk in a rod. Step 1: Incident photon (hν) is absorbed and activates a rhodopsin by conformational change in the disk membrane to R*. Step 2: Next, R* makes repeated contacts with transducin molecules, catalyzing its activation to G* by the release of bound GDP in exchange for cytoplasmic GTP (Step 3). The α and γ subunit G* binds inhibitory γ subunits of the phosphodiesterase (PDE) activating its α and ß subunits. Step 4: Activated PDE hydrolyzes cGMP. Step 5: Guanylyl cyclase (GC) synthesizes cGMP, the second messenger in the phototransduction cascade. Reduced levels of cytosolic cGMP cause cyclic nucleotide gated channels to close preventing further influx of Na+ and Ca2+.
  1. A light photon interacts with the retinal in a photoreceptor. The retinal undergoes isomerisation, changing from the 11-cis to all-trans configuration.
  2. Retinal no longer fits into the opsin binding site.
  3. Opsin therefore undergoes a conformational change to metarhodopsin II.
  4. Metarhodopsin II is unstable and splits, yielding opsin and all-trans retinal.
  5. The opsin activates the regulatory protein transducin. This causes transducin to dissociate from its bound GDP, and bind GTP, then the alpha subunit of transducin dissociates from the beta and gamma subunits, with the GTP still bound to the alpha subunit.
  6. The alpha subunit-GTP complex activates phosphodiesterase.
  7. Phosphodiesterase breaks down cGMP to 5'-GMP. This lowers the concentration of cGMP and therefore the sodium channels close.
  8. Closure of the sodium channels causes hyperpolarization of the cell due to the ongoing potassium current.
  9. Hyperpolarization of the cell causes voltage-gated calcium channels to close.
  10. As the calcium level in the photoreceptor cell drops, the amount of the neurotransmitter glutamate that is released by the cell also drops. This is because calcium is required for the glutamate-containing vesicles to fuse with cell membrane and release their contents.
  11. A decrease in the amount of glutamate released by the photoreceptors causes depolarization of On center bipolar cells (rod and cone On bipolar cells) and hyperpolarization of cone Off bipolar cells.

Without visible EM stimulation, rod cells containing a cocktail of ions, proteins and other molecules, have membrane potential differences of around -40mV. Compared to other nerve cells, this is quite high (-65mV). In this state, the neurotransmitter glutamate is continuously released from the axon terminals and absorbed by the neighbouring bipolar cells. With incoming visble EM and the previously mentioned cascade reaction, the potential difference drops to -70mV. This hyper-polarisation of the cell causes a reduction in the amount of released glutamate, thereby affecting the activity of the bipolar cells, and subsequently the following steps in the visual pathway.

Similar processes exist in the cone-cells and in photosensitive ganglion cells, but make use of different opsins. Photopsin I through III (yellowish-green, green and blue-violet respectively) are found in the three different cone cells and melanopsin (blue) can be found in the photosensitive ganglion cells.

Processing Signals in the Retina

Receptive field.png

Different bipolar cells react differently to the changes in the released glutamate. The so called ON and OFF bipolar cells are used to form the direct signal flow from cones to bipolar cells. The ON bipolar cells will depolarise by visible EM stimulation and the corresponding ON ganglion cells will be activated. On the other hand the OFF bipolar cells are hyper polarised by the visible EM stimulation, and the OFF ganglion cells are inhibited. This is the basic pathway of the Direct signal flow. The Lateral signal flow will start from the rods, then go to the bipolar cells, the amacrine cells, and the OFF bipolar cells inhibited by the Rod-amacrine cells and the ON bipolar cells will stimulated via an electrical synapse, after all of the previous steps, the signal will arrive at the ON or OFF ganglion cells and the whole pathway of the Lateral signal flow is established.

When the action potential (AP) in ON, ganglion cells will be triggered by the visible EM stimulus. The AP frequency will increase when the sensor potential increases. In other words, AP depends on the amplitude of the sensor's potential. The region of ganglion cells where the stimulatory and inhibitory effects influence the AP frequency is called receptive field (RF). Around the ganglion cells, the RF is usually composed of two regions: the central zone and the ring-like peripheral zone. They are distinguishable during visible EM adaptation. A visible EM stimulation on the centric zone could lead to AP frequency increase and the stimulation on the periphery zone will decrease the AP frequency. When the light source is turned off the excitation occurs. So the name of ON field (central field ON) refers to this kind of region. Of course the RF of the OFF ganglion cells act the opposite way and is therefore called "OFF field" (central field OFF). The RFs are organised by the horizontal cells. The impulse on the periphery region will be impulsed and transmitted to the central region, and there the so-called stimulus contrast is formed. This function will make the dark seem darker and the light brighter. If the whole RF is exposed to light. the impulse of the central region will predominate.

Signal Transmission to the Cortex

As mentioned previously, axons of the ganglion cells converge at the optic disk of the retina, forming the optic nerve. These fibres are positioned inside the bundle in a specific order. Fibres from the macular zone of the retina are in the central portion, and those from the temporal half of the retina take up the periphery part. A partial decussation or crossing occurs when these fibres are outside the eye cavity. The fibres from the nasal halves of each retina cross to the opposite halves and extend to the brain. Those from the temporal halves remain uncrossed. This partial crossover is called the optic chiasma, and the optic nerves past this point are called optic tracts, mainly to distinguish them from single-retinal nerves. The function of the partial crossover is to transmit the right-hand visual field produced by both eyes to the left-hand half of the brain only and vice versa. Therefore the information from the right half of the body, and the right visual field, is all transmitted to the left-hand part of the brain when reaches the posterior part of the fore-brain (diencephalon).

The pathway to the central cortex

The information relay between the fibers of optic tracts and the nerve cells occurs in the lateral geniculate bodies, the central part of the visual signal processing, located in the thalamus of the brain. From here the information is passed to the nerve cells in the occipital cortex of the corresponding side of the brain. Connections from the retina to the brain can be separated into a 'parvocellular pathway' and a "magnocellular pathway". The parvocellular pathways signals color and fine detail, whereas the magnocellular pathways detect fast moving stimuli.

Connections from the retina to the brain can be separated into a "parvocellular pathway" and a "magnocellular pathway". The parvocellular pathway originates in midget cells in the retina, and signals color and fine detail; magnocellular pathway starts with parasol cells, and detects fast moving stimuli.

Signals from standard digital cameras correspond approximately to those of the parvocellular pathway. To simulate the responses of parvocellular pathways, researchers have been developing neuromorphic sensory systems, which try to mimic spike-based computation in neural systems. Thereby they use a scheme called "address-event representation" for the signal transmission in the neuromorphic electronic systems (Liu and Delbruck 2010 [1]).

Anatomically, the retinal Magno and Parvo ganglion cells respectively project to 2 ventral magnocellular layers and 4 dorsal parvocellular layers of the Lateral Geniculate Nucleus (LGN). Each of the six LGN layers receives inputs from either the ipsilateral or contralateral eye, i.e., the ganglion cells of the left eye cross over and project to layer 1, 4 and 6 of the right LGN, and the right eye ganglion cells project (uncrossed) to its layer 2, 3 and 5. From here the information from the right and left eye is separated.

Although human vision is combined by two halves of the retina and the signal is processed by the opposite cerebral hemispheres, the visual field is considered as a smooth and complete unit. Hence the two visual cortical areas are thought of as being intimately connected. This connection, called corpus callosum is made of neurons, axons and dendrites. Because the dendrites make synaptic connections to the related points of the hemispheres, electric simulation of every point on one hemisphere indicates simulation of the interconnected point on the other hemisphere. The only exception to this rule is the primary visual cortex.

The synapses are made by the optic tract in the respective layers of the lateral geniculate body. Then these axons of these third-order nerve cells are passed up to the calcarine fissure in each occipital lobe of the cerebral cortex. Because bands of the white fibres and axons pair from the nerve cells in the retina go through it, it is called the striate cortex, which incidentally is our primary visual cortex, sometimes known as V1. At this point, impulses from the separate eyes converge to common cortical neurons, which then enables complete input from both eyes in one region to be used for perception and comprehension. Pattern recognition is a very important function of this particular part of the brain, with lesions causing problems with visual recognition or blindsight.

Based on the ordered manner in which the optic tract fibres pass information to the lateral geniculate bodies and after that pass in to the striate area, if one single point stimulation on the retina was found, the response which produced electrically in both lateral geniculate body and the striate cortex will be found at a small region on the particular retinal spot. This is an obvious point-to-point way of signal processing. And if the whole retina is stimulated, the responses will occur on both lateral geniculate bodies and the striate cortex gray matter area. It is possible to map this brain region to the retinal fields, or more usually the visual fields.

Any further steps in this pathway is beyond the scope of this book. Rest assured that, many further levels and centres exist, focusing on particular specific tasks, like for example colour, orientations, spatial frequencies, emotions etc.

Information Processing in the Visual System

Equipped with a firmer understanding of some of the more important concepts of the signal processing in the visual system, comprehension or perception of the processed sensory information is the last important piece in the puzzle. Visual perception is the process of translating information received by the eyes into an understanding of the external state of things. It makes us aware of the world around us and allows us to understand it better. Based on visual perception we learn patterns which we then apply later in life and we make decisions based on this and the obtained information. In other words, our survival depends on perception. The field of Visual Perception has been divided into different subfields, due to the fact that processing is too complex and requires of different specialized mechanisms to perceive what is seen. These subfields include: Color Perception, Motion Perception, Depth Perception, and Face Recognition, etc.

Deep Hierarchies in the Primate Visual Cortex

Deep hierarchies in the visual system

Despite the ever-increasing computational power of electronic systems, there are still many tasks where animals and humans are vastly superior to computers – one of them being the perception and contextualization of information. The classical computer, either the one in your phone or a supercomputer taking up the whole room, is in essence a number-cruncher. It can perform an incredible amount of calculations in a miniscule amount of time. What it lacks is creating abstractions of the information it is working with. If you attach a camera to your computer, the picture it “perceives” is just a grid of pixels, a 2-dimensional array of numbers. A human would immediately recognize the geometry of the scene, the objects in the picture, and maybe even the context of what’s going on. This ability of ours is provided by dedicated biological machinery – the visual system of the brain. It processes everything we see in a hierarchical way, starting from simpler features of the image to more complex ones all the way to classification of objects into categories. Hence the visual system is said to have a deep hierarchy. The deep hierarchy of the primate visual system has inspired computer scientists to create models of artificial neural networks that would also feature several layers where each of them creates higher generalizations of the input data.

Approximately half of the human neocortex is dedicated to vision. The processing of visual information happens over at least 10 functional levels. The neurons in the early visual areas extract simple image features over small local regions of visual space. As the information gets transmitted to higher visual areas, neurons respond to increasingly complex features. With higher levels of information processing the representations become more invariant – less sensitive to the exact feature size, rotation or position. In addition, the receptive field size of neurons in higher visual areas increases, indicating that they are tuned to more global image features. This hierarchical structure allows for efficient computing – different higher visual areas can use the same information computed in the lower areas. The generic scene description that is made in the early visual areas is used by other parts of the brain to complete various different tasks, such as object recognition and categorization, grasping, manipulation, movement planning etc.

Sub-cortical vision

The neural processing of visual information starts already before any of the cortical structures. Photoreceptors on the retina detect light and send signals to retinal ganglion cells. The receptive field size of a photoreceptor is one 100th of a degree (a one degree large receptive field is roughly the size of your thumb, when you have your arm stretched in front of you). The number of inputs to a ganglion cell and therefore its receptive field size depends on the location – in the center of the retina it receives signals from as few as five receptors, while in the periphery a single cell can have several thousand inputs. This implies that the highest spatial resolution is in the center of the retina, also called the fovea. Due to this property primates posses a gaze control mechanism that directs the eyesight so that the features of interest project onto the fovea.

Ganglion cells are selectively tuned to detect various features of the image, such as luminance contrast, color contrast, and direction and speed of movement. All of these features are the primary information used further up the processing pipeline. If there are visual stimuli that are not detectable by ganglion cells, then they are also not available for any cortical visual area.

Ganglion cells project to a region in thalamus called lateral geniculate nucleus (LGN), which in turn relays the signals to the cortex. There is no significant computation known to happen in LGN – there is almost a one-to-one correspondence between retinal ganglion and LGN cells. However, only 5% of the inputs to LGN come from the retina – all the other inputs are cortical feedback projections. Although the visual system is often regarded as a feed-forward system, the recurrent feedback connections as well as lateral connections are a common feature seen throughout the visual cortex. The role of the feedback is not yet fully understood but it is proposed to be attributed to processes like attention, expectation, imagination and filling-in the missing information.

Cortical vision

Main areas of the visual system

The visual cortex can be divided into three large parts – the occipital part which receives input from LGN and then sends outputs to dorsal and ventral streams. Occipital part includes the areas V1-V4 and MT, which process different aspects of visual information and gives rise to a generic scene representation. The dorsal pathway is involved in the analysis of space and in action planning. The ventral pathway is involved in object recognition and categorization.

V1 is the first cortical area that processes visual information. It is sensitive to edges, gratings, line-endings, motion, color and disparity (angular difference between the projections of a point onto the left and right retinas). The most straight forward example of the hierarchical bottom-up processing is the linear combination of the inputs from several ganglion cells with center-surround receptive fields to create a representation of a bar. This is done by the simple cells of V1 and was first described by the prominent neuroscientists Hubel and Wiesel. This type of information integration implies that the simple cells are sensitive to the exact location of the bar and have a relatively small receptive field. The complex cells of V1 receive inputs from the simple cells, and while also responding to linear oriented patterns they are not sensitive to the exact position of the bar and have a larger receptive field. The computation present in this step could be a MAX-like operation which produces responses similar in amplitude to the larger of the responses pertaining to the individual stimuli. Some simple and complex cells can also detect the end of a bar, and a fraction of V1 cells are also sensitive to local motion within their respective receptive fields.

Area V2 features more sophisticated contour representation including texture-defined contours, illusory contours and contours with border ownership. V2 also builds upon the absolute disparity detection in V1 and features cells that are sensitive to relative disparity which is the difference between the absolute disparities of two points in space. Area V4 receives inputs from V2 and area V3, but very little is known about the computation taking place in V3. Area V4 features neurons that are sensitive to contours with different curvature and vertices with particular angles. Another important feature is the coding for luminance-invariant hue. This is in contrast to V1 where neurons respond to color opponency along the two principle axis (red-green and yellow-blue) rather than the actual color. V4 further outputs to the ventral stream, to inferior temporal cortex (IT) which has been shown through lesion studies to be essential for object discrimination.

Inferior temporal cortex: object discrimination

Stimulus reduction in area TE

Inferior temporal cortex (IT) is divided into two areas: TEO and TE. Area TEO integrates information about the shapes and relative positions of multiple contour elements and features mostly cells which respond to simple combinations of features. The receptive field size of TEO neurons is about 3-5 degrees. Area TE features cells with significantly larger receptive fields (10-20 degrees) which respond to faces, hands and complex feature configurations. Cells in TE respond to visual features that are a simpler generalization of the object of interest but more complex than simple bars or spots. This was shown using a stimulus-reduction method by Tanaka et al. where first a response to an object is measured and then the object is replaced by simpler representations until the critical feature that the TE neurons are responding to is narrowed down.

It appears that the neurons in IT pull together various features of medium complexity from lower levels in the ventral stream to build models of object parts. The neurons in TE that are selective to specific objects have to fulfil two seemingly contradictory requirements – selectivity and invariance. They have to distinguish between different objects by the means of sensitivity to features in the retinal images. However, the same object can be viewed from different angles and distances at different light conditions yielding highly dissimilar retinal images of the same object. To treat all these images as equivalent, invariant features must be derived that are robust against certain transformations, such as changes in position, illumination, size on the retina etc. Neurons in area TE show invariance to position and size as well as to partial occlusion, position-in-depth and illumination direction. Rotation in depth has been shown to have the weakest invariance, with the exception if the object is a human face.

Object categories are not yet explicitly present in area TE – a neuron might typically respond to several but not all exemplars of the same category (e.g., images of trees) and it might also respond to exemplars of different categories (e.g., trees and non-trees). Object recognition and classification most probably involves sampling from a larger population of TE neurons as well as receiving inputs from additional brain areas, e.g., those that are responsible for understanding the context of the scene. Recent readout experiments have demonstrated that statistical classifiers (e.g. support vector machines) can be trained to classify objects based on the responses of a small number of TE neurons. Therefore, a population of TE neurons in principle can reliably signal object categories by their combined activity. Interestingly, there are also reports on highly selective neurons in medial temporal lobe that respond to very specific cues, e.g., to the tower of Pisa in different images or to a particular person’s face.

Learning in the Visual System

Learning can alter the visual feature selectivity of neurons, with the effect of learning becoming stronger at higher hierarchical levels. There is no known evidence on learning in the retina and also the orientation maps in V1 seem to be genetically largely predetermined. However, practising orientation identification improves orientation coding in V1 neurons, by increasing the slope of the tuning curve. Similar but larger effects have been seen in V4. In area TE relatively little visual training has noticeable physiological effects on visual perception, on a single cell level as well as in fMRI. For example, morphing two objects into each other increases their perceived similarity. Overall it seems that the even the adult visual cortex is considerably plastic, and the level of plasticity can be significantly increased, e.g., by administering specific drugs or by living in an enriched environment.

Deep Neural Networks

Similarly to the deep hierarchy of the primate visual system, deep learning architectures attempt to model high-level abstractions of the input data by using multiple levels of non-linear transformations. The model proposed by Hubel and Wiesel where information is integrated and propagated in a cascade from retina and LGN to simple cells and complex cells in V1 inspired the creation of one of the first deep learning architectures, the neocognitron – a multilayered artificial neural network model. It was used for different pattern recognition tasks, including the recognition of handwritten characters. However, it took a lot of time to train the network (in the order of days) and since its inception in the 1980s deep learning didn’t get much attention until the mid-2000s with the abundance of digital data and the invention of faster training algorithms. Deep neural networks have proved themselves to be very effective in tasks that not so long ago seemed possible only for humans to perform, such as recognizing the faces of particular people in photos, understanding human speech (to some extent) and translating text from foreign languages. Furthermore, they have proven to be of great assistance in industry and science to search for potential drug candidates, map real neural networks in the brain and predict the functions of proteins. It must be noted that deep learning is only very loosely inspired from the brain and is much more of an achievement of the field of computer science / machine learning than of neuroscience. The basic parallels are that the deep neural networks are composed of units that integrate information inputs in a non-linear manner (neurons) and send signals to each other (synapses) and that there are different levels of increasingly abstract representations of the data. The learning algorithms and mathematical descriptions of the “neurons” used in deep learning are very different from the actual processes taking place in the brain. Therefore, the research in deep learning, while giving a huge push to a more sophisticated artificial intelligence, can give only limited insights about the brain.


Papers on the deep hierarchies in the visual system
  • Kruger, N.; Janssen, P.; Kalkan, S.; Lappe, M.; Leonardis, A.; Piater, J.; Rodriguez-Sanchez, A. J.; Wiskott, L. (August 2013). "Deep Hierarchies in the Primate Visual Cortex: What Can We Learn for Computer Vision?". IEEE Transactions on Pattern Analysis and Machine Intelligence 35 (8): 1847–1871. doi:10.1109/TPAMI.2012.272. 
  • Poggio, Tomaso; Riesenhuber, Maximilian (1 November 1999). Nature Neuroscience 2 (11): 1019–1025. doi:doi:10.1038/14819. 
Stimulus reduction experiment
Evidence on learning in the visual system
  • Li, Nuo; DiCarlo, James J. (23 September 2010). "Unsupervised Natural Visual Experience Rapidly Reshapes Size-Invariant Object Representation in Inferior Temporal Cortex". Neuron 67 (6): 1062–1075. doi:10.1016/j.neuron.2010.08.029. 
  • Raiguel, S.; Vogels, R.; Mysore, S. G.; Orban, G. A. (14 June 2006). "Learning to See the Difference Specifically Alters the Most Informative V4 Neurons". Journal of Neuroscience 26 (24): 6589–6602. doi:10.1523/JNEUROSCI.0457-06.2006. 
  • Schoups, A; Vogels, R; Qian, N; Orban, G (2 August 2001). "Practising orientation identification improves orientation coding in V1 neurons.". Nature 412 (6846): 549-53. PMID 11484056. 
A recent and accessible overview of the status quo of the deep learning research
  • Jones, Nicola (8 January 2014). "Computer science: The learning machines". Nature 505 (7482): 146–148. doi:10.1038/505146a. 

Motion Perception

Motion Perception is the process of inferring speed and direction of moving objects. Area V5 in humans and area MT (Middle Temporal) in primates are responsible for cortical perception of Motion. Area V5 is part of the extrastriate cortex, which is the region in the occipital region of the brain next to the primary visual cortex. The function of Area V5 is to detect speed and direction of visual stimuli, and integrate local visual motion signals into global motion. Area V1 or Primary Visual cortex is located in the occipital lobe of the brain in both hemispheres. It processes the first stage of cortical processing of visual information. This area contains a complete map of the visual field covered by the eyes. The difference between area V5 and area V1 (Primary Visual Cortex) is that area V5 can integrate motion of local signals or individual parts of an object into a global motion of an entire object. Area V1, on the other hand, responds to local motion that occurs within the receptive field. The estimates from these many neurons are integrated in Area V5.

Movement is defined as changes in retinal illumination over space and time. Motion signals are classified into First order motions and Second order motions. These motion types are briefly described in the following paragraphs.

Example of a "Beta movement".

First-order motion perception refers to the motion perceived when two or more visual stimuli switch on and off over time and produce different motion perceptions. First order motion is also termed "apparent motion,” and it is used in television and film. An example of this is the "Beta movement", which is an illusion in which fixed images seem to move, even though they do not move in reality. These images give the appearance of motion, because they change and move faster than what the eye can detect. This optical illusion happens because the human optic nerve responds to changes of light at ten cycles per second, so any change faster than this rate will be registered as a continuum motion, and not as separate images.

Second order motion refers to the motion that occurs when a moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in luminance or motion energy of the image. Evidence suggests that early processing of First order motion and Second order motion is carried out by separate pathways. Second order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies to which they respond. Second-order motion produces a weaker motion aftereffect. First and second-order signals are combined in are V5.

In this chapter, we will analyze the concepts of Motion Perception and Motion Analysis, and explain the reason why these terms should not be used interchangeably. We will analyze the mechanisms by which motion is perceived such as Motion Sensors and Feature Tracking. There exist three main theoretical models that attempt to describe the function of neuronal sensors of motion. Experimental tests have been conducted to confirm whether these models are accurate. Unfortunately, the results of these tests are inconclusive, and it can be said that no single one of these models describes the functioning of Motion Sensors entirely. However, each of these models simulates certain features of Motion Sensors. Some properties of these sensors are described. Finally, this chapter shows some motion illusions, which demonstrate that our sense of motion can be mislead by static external factors that stimulate motion sensors in the same way as motion.

Motion Analysis and Motion Perception

The concepts of Motion Analysis and Motion Perception are often confused as interchangeable. Motion Perception and Motion Analysis are important to each other, but they are not the same.

Motion Analysis refers to the mechanisms in which motion signals are processed. In a similar way in which Motion Perception does not necessarily depend on signals generated by motion of images in the retina, Motion Analysis may or may not lead to motion perception. An example of this phenomenon is Vection, which occurs when a person perceives that she is moving when she is stationary, but the object that she observes is moving. Vection shows that motion of an object can be analyzed, even though it is not perceived as motion coming from the object. This definition of Motion analysis suggests that motion is a fundamental image property. In the visual field, it is analyzed at every point. The results from this analysis are used to derive perceptual information.

Motion Perception refers to the process of acquiring perceptual knowledge about motion of objects and surfaces in an image. Motion is perceived either by delicate local sensors in the retina or by feature tracking. Local motion sensors are specialized neurons sensitive to motion, and analogous to specialized sensors for color. Feature tracking is an indirect way to perceive motion, and it consists of inferring motion from changes in retinal position of objects over time. It is also referred to as third order motion analysis. Feature tracking works by focusing attention to a particular object and observing how its position has changed over time.

Motion Sensors

Detection of motion is the first stage of visual processing, and it happens thanks to specialized neural processes, which respond to information regarding local changes of intensity of images over time. Motion is sensed independently of other image properties at all locations in the image. It has been proven that motion sensors exist, and they operate locally at all points in the image. Motion sensors are dedicated neuronal sensors located in the retina that are capable of detecting a motion produced by two brief and small light flashes that are so close together that they could not be detected by feature tracking. There exist three main models that attempt to describe the way that these specialized sensors work. These models are independent of one another, and they try to model specific characteristics of Motion Perception. Although there is not sufficient evidence to support that any of these models represent the way the visual system (motion sensors particularly) perceives motion, they still correctly model certain functions of these sensors.

Two different mechanisms for motion detection. Left) A "Reichardt detector" consists of two mirror-symmetrical subunits. In each subunit, the luminance values as measured in two adjacent points become multiplied (M) with each other after one of them is delayed by a low-pass filter with time-constant τ. The resulting output signals of the multipliers become finally subtracted. Right) In the gradient detector, the temporal luminance gradient as measured after one photoreceptor (δI/δt, Left) is divided by the spatial luminance gradient (δI/δx). Here, the spatial gradient is approximated by the difference between the luminance values in two adjacent points.

The Reichardt Detector

The Reichardt Detector is used to model how motion sensors respond to First order motion signals. When an objects moves from point A in the visual field to point B, two signals are generated: one before the movement began and another one after the movement has completed. This model perceives this motion by detecting changes in luminance at one point on the retina and correlating it with a change in luminance at another point nearby after a short delay. The Reichardt Detector operates based on the principle of correlation (statistical relation that involves dependency). It interprets a motion signal by spatiotemporal correlation of luminance signals at neighboring points. It uses the fact that two receptive fields at different points on the trajectory of a moving object receive a time shifted version of the same signal – a luminance pattern moves along an axis and the signal at one point in the axis is a time shifted version of a previous signal in the axis. The Reichardt Detector model has two spatially separate neighboring detectors. The output signals of the detectors are multiplied (correlated) in the following way: a signal multiplied by a second signal that is the time-shifted version of the original. The same procedure is repeated but in the reverse direction of motion (the signal that was time-shifted becomes the first signal and vice versa). Then, the difference between these two multiplications is taken, and the outcome gives the speed of motion. The response of the detector depends upon the stimulus’ phase, contrast and speed. Many detectors tuned at different speeds are necessary to encode the true speed of the pattern. The most compelling experimental evidence for this kind of detector comes from studies of direction discrimination of barely visible targets.

Motion-Energy Filtering

Motion Energy Filter is a model of Motion Sensors based on the principle of phase invariant filters. This model builds spatio-temporal filters oriented in space-time to match the structure of moving patterns. It consists of separable filters, for which spatial profiles remain the same shape over time but are scaled by the value of the temporal filters. Motion Energy Filters match the structure of moving patterns by adding together separable filters. For each direction of motion, two space-time filters are generated: one, which is symmetric (bar-like), and one which is asymmetric (edge-like). The sum of the squares of these filters is called the motion energy. The difference in the signal for the two directions is called the opponent energy. This result is then divided by the squared output of another filter, which is tuned to static contrast. This division is performed to take into account the effect of contrast in the motion. Motion Energy Filters can model a number of motion phenomenon, but it produces a phase independent measurement, which increases with speed but does not give a reliable value of speed.

Spatiotemporal Gradients

 v = \frac{dx}{dt} =  - \frac{{\frac{\partial I(x,t)}{\partial t}}} {{\frac{{\partial I(x,t)}}{{\partial x}}}} =  - \frac{D_t I}{D_x I}

This model of Motion sensors was originally developed in the field of computer vision, and it is based on the principle that the ratio of the temporal derivative of image brightness to the spatial derivative of image brightness gives the speed of motion. It is important to note that at the peaks and troughs of the image, this model will not compute an adequate answer, because the derivative in the denominator would be zero. In order to solve this problem, the first-order and higher-order spatial derivatives with respect to space and time can also be analyzed. Spatiotemporal Gradients is a good model for determining the speed of motion at all points in the image.

Motion Sensors are Orientation-Selective

One of the properties of Motion Sensors is orientation-selectivity, which constrains motion analysis to a single dimension. Motion sensors can only record motion in one dimension along an axis orthogonal to the sensor’s preferred orientation. A stimulus that contains features of a single orientation can only be seen to move in a direction orthogonal to the stimulus’ orientation. One-dimensional motion signals give ambiguous information about the motion of two-dimensional objects. A second stage of motion analysis is necessary in order to resolve the true direction of motion of a 2-D object or pattern. 1-D motion signals from sensors tuned to different orientations are combined to produce an unambiguous 2-D motion signal. Analysis of 2-D motion depends on signals from local broadly oriented sensors as well as on signals from narrowly oriented sensors.

Feature Tracking

Another way in which we perceive motion is through Feature Tracking. Feature Tracking consists of analyzing whether or not the local features of an object have changed positions, and inferring movement from this change. In this section, some features about Feature trackers are mentioned.

Feature trackers fail when a moving stimulus occurs very rapidly. Feature trackers have the advantage over Motion sensors that they can perceive movement of an object even if the movement is separated by intermittent blank intervals. They can also separate these two stages (movements and blank intervals). Motion sensors, on the other hand, would just integrate the blanks with the moving stimulus and see a continuous movement. Feature trackers operate on the locations of identified features. For that reason, they have a minimum distance threshold that matches the precision with which locations of features can be discriminated. Feature trackers do not show motion aftereffects, which are visual illusions that are caused as a result of visual adaptation. Motion aftereffects occur when, after observing a moving stimulus, a stationary object appears to be moving in the opposite direction of the previously observed moving stimulus. It is impossible for this mechanism to monitor multiple motions in different parts of the visual field and at the same time. On the other hand, multiple motions are not a problem for motion sensors, because they operate in parallel across the entire visual field.

Experiments have been conducted using the information above to reach interesting conclusions about feature trackers. Experiments with brief stimuli have shown that color patterns and contrast patterns at high contrasts are not perceived by feature trackers but by motion sensors. Experiments with blank intervals have confirmed that feature tracking can occur with blank intervals in the display. It is only at high contrast that motion sensors perceive the motion of chromatic stimuli and contrast patterns. At low contrasts feature trackers analyze the motion of both chromatic patterns and contrast envelopes and at high contrasts motion sensors analyze contrast envelopes. Experiments in which subjects make multiple motion judgments suggest that feature tracking is a process that occurs under conscious control and that it is the only way we have to analyze the motion of contrast envelopes in low-contrast displays. These results are consistent with the view that the motion of contrast envelopes and color patterns depends on feature tracking except when colors are well above threshold or mean contrast is high. The main conclusion of these experiments is that it is probably feature tracking that allows perception of contrast envelopes and color patterns.

Motion Illusions

As a consequence of the process in which Motion detection works, some static images might seem to us like they are moving. These images give an insight into the assumptions that the visual system makes, and are called visual illusions.

A famous Motion Illusion related to first order motion signals is the Phi phenomenon, which is an optical illusion that makes us perceive movement instead of a sequence of images. This motion illusion allows us to watch movies as a continuum and not as separate images. The phi phenomenon allows a group of frozen images that are changed at a constant speed to be seen as a constant movement. The Phi phenomenon should not be confused with the Beta Movement, because the former is an apparent movement caused by luminous impulses in a sequence, while the later one is an apparent movement caused by luminous stationary impulses.

Motion Illusions happen when Motion Perception, Motion Analysis and the interpretation of these signals are misleading, and our visual system creates illusions about motion. These illusions can be classified according to which process allows them to happen. Illusions are classified as illusions related to motion sensing, 2D integration, and 3D interpretation

The most popular illusions concerning motion sensing are four-stroke motion, RDKs and second order motion signals illusions. The most popular motion illusions concerning 2D integration are Motion Capture, Plaid Motion and Direct Repulsion. Similarly, the ones concerning 3D interpretation are Transformational Motion, Kinetic Depth, Shadow Motion, Biological Motion, Stereokinetic motion, Implicit Figure Motion and 2 Stroke Motion. There are far more Motion Illusions, and they all show something interesting regarding human Motion Detection, Perception and Analysis mechanisms. For more information, visit the following link:

Open Problems

Although we still do not understand most of the specifics regarding Motion Perception, understanding the mechanisms by which motion is perceived as well as motion illusion can give the reader a good overview of the state of the art in the subject. Some of the open problems regarding Motion Perception are the mechanisms of formation of 3D images in global motion and the Aperture Problem.

Global motion signals from the retina are integrated to arrive at a 2 dimensional global motion signal; however, it is unclear how 3D global motion is formed. The Aperture Problem occurs because each receptive field in the visual system covers only a small piece of the visual world, which leads to ambiguities in perception. The aperture problem refers to the problem of a moving contour that, when observed locally, is consistent with different possibilities of motion. This ambiguity is geometric in origin - motion parallel to the contour cannot be detected, as changes to this component of the motion do not change the images observed through the aperture. The only component that can be measured is the velocity orthogonal to the contour orientation; for that reason, the velocity of the movement could be anything from the family of motions along a line in velocity space. This aperture problem is not only observed in straight contours, but also in smoothly curved ones, since they are approximately straight when observed locally. Although the mechanisms to solve the Aperture Problem are still unknown, there exist some hypothesis on how it could be solved. For example, it could be possible to resolve this problem by combining information across space or from different contours of the same object.


In this chapter, we introduced Motion Perception and the mechanisms by which our visual system detects motion. Motion Illusions showed how Motion signals can be misleading, and consequently lead to incorrect conclusions about motion. It is important to remember that Motion Perception and Motion Analysis are not the same. Motion Sensors and Feature trackers complement each other to make the visual system perceive motion.

Motion Perception is complex, and it is still an open area of research. This chapter describes models about the way that Motion Sensors function, and hypotheses about Feature trackers characteristics; however, more experiments are necessary to learn about the characteristics of these mechanisms and be able to construct models that resemble the actual processes of the visual system more accurately.

The variety of mechanisms of motion analysis and motion perception described in this chapter, as well as the sophistication of the artificial models designed to describe them demonstrate that there is much complexity in the way in which the cortex processes signals from the outside environment. Thousands of specialized neurons integrate and interpret pieces of local signals to form global images of moving objects in our brain. Understanding that so many actors and processes in our bodies must work in concert to perceive motion makes our ability to it all the more remarkable that we as humans are able to do it with such ease.

Color Perception


Humans (together with primates like monkeys and gorillas) have the best color perception among mammals [16] . Hence, it is not a coincidence that color plays an important role in a wide variety of aspects. For example, color is useful for discriminating and differentiating objects, surfaces, natural scenery, and even faces [17],[18]. Color is also an important tool for nonverbal communication, including that of emotion [19].

For many decades, it has been a challenge to find the links between the physical properties of color and its perceptual qualities. Usually, these are studied under two different approaches: the behavioral response caused by color (also called psychophysics) and the actual physiological response caused by it [20].

Here we will only focus on the latter. The study of the physiological basis of color vision, about which practically nothing was known before the second half of the twentieth century, has advanced slowly and steadily since 1950. Important progress has been made in many areas, especially at the receptor level. Thanks to molecular biology methods, it has been possible to reveal previously unknown details concerning the genetic basis for the cone pigments. Furthermore, more and more cortical regions have been shown to be influenced by visual stimuli, although the correlation of color perception with wavelength-dependent physiology activity beyond the receptors is not so easy to discern [21].

In this chapter, we aim to explain the basics of the different processes of color perception along the visual path, from the retina in the eye to the visual cortex in the brain. For anatomical details, please refer to Sec. "Anatomy of the Visual System" of this Wikibook.

Color Perception at the Retina

All colors that can be discriminated by humans can be produced by the mixture of just three primary (basic) colors. Inspired by this idea of color mixing, it has been proposed that color is subserved by three classes of sensors, each having a maximal sensitivity to a different part of the visible spectrum [16]. It was first explicitly proposed in 1853 that there are three degrees of freedom in normal color matching [22]. This was later confirmed in 1886 [23] (with remarkably close results to recent studies [24], [25]).

These proposed color sensors are actually the so called cones (Note: In this chapter, we will only deal with cones. Rods contribute to vision only at low light levels. Although they are known to have an effect on color perception, their influence is very small and can be ignored here.) [26]. Cones are of the two types of photoreceptor cells found in the retina, with a significant concentration of them in the fovea. The Table below lists the three types of cone cells. These are distinguished by different types of rhodopsin pigment. Their corresponding absorption curves are shown in the Figure below.

Table 1: General overview of the cone types found in the retina.
Name Higher sensitivity to color Absorption curve peak [nm]
S, SWS, B Blue 420
M, MWS, G Green 530
L, LWS, R Red 560
Absorption curves for the different cones. Blue, green, and red represent the absorption of the S (420 nm), M (530 nm), and L (560 nm) cones, respectively.
Absorption curves for the different cones. Blue, green, and red represent the absorption of the S (420 nm), M (530 nm), and L (560 nm) cones, respectively.

Although no consensus has been reached for naming the different cone types, the most widely utilized designations refer either to their action spectra peak or to the color to which they are sensitive themselves (red, green, blue)[21]. In this text, we will use the S-M-L designation (for short, medium, and long wavelength), since these names are more appropriately descriptive. The blue-green-red nomenclature is somewhat misleading, since all types of cones are sensitive to a large range of wavelengths.

An important feature about the three cone types is their relative distribution in the retina. It turns out that the S-cones present a relatively low concentration through the retina, being completely absent in the most central area of the fovea. Actually, they are too widely spaced to play an important role in spatial vision, although they are capable of mediating weak border perception [27]. The fovea is dominated by L- and M-cones. The proportion of the two latter is usually measured as a ratio. Different values have been reported for the L/M ratio, ranging from 0.67 [28] up to 2 [29], the latter being the most accepted. Why L-cones almost always outnumber the M-cones remains unclear. Surprisingly, the relative cone ratio has almost no significant impact on color vision. This clearly shows that the brain is plastic, capable of making sense out of whatever cone signals it receives [30], [31].

It is also important to note the overlapping of the L- and M-cone absorption spectra. While the S-cone absorption spectrum is clearly separated, the L- and M-cone peaks are only about 30 nm apart, their spectral curves significantly overlapping as well. This results in a high correlation in the photon catches of these two cone classes. This is explained by the fact that in order to achieve the highest possible acuity at the center of the fovea, the visual system treats L- and M-cones equally, not taking into account their absorption spectra. Therefore, any kind of difference leads to a deterioration of the luminance signal [32]. In other words, the small separation between L- and M-cone spectra might be interpreted as a compromise between the needs for high-contrast color vision and high acuity luminance vision. This is congruent with the lack of S-cones in the central part of the fovea, where visual acuity is highest. Furthermore, the close spacing of L- and M-cone absorption spectra might also be explained by their genetic origin. Both cone types are assumed to have evolved "recently" (about 35 million years ago) from a common ancestor, while the S-cones presumably split off from the ancestral receptor much earlier[26].

The spectral absorption functions of the three different types of cone cells are the hallmark of human color vision. This theory solved a long-known problem: although we can see millions of different colors (humans can distinguish between 7 to 10 million different colors[20], our retinas simply do not have enough space to accommodate an individual detector for every color at every retinal location.

From the Retina to the Brain

The signals that are transmitted from the retina to higher levels are not simple point-wise representations of the receptor signals, but rather consist of sophisticated combinations of the receptor signals. The objective of this section is to provide a brief of the paths that some of this information takes.

Once the optical image on the retina is transduced into chemical and electrical signals in the photoreceptors, the amplitude-modulated signals are converted into frequency-modulated representations at the ganglion-cell and higher levels. In these neural cells, the magnitude of the signal is represented in terms of the number of spikes of voltage per second fired by the cell rather than by the voltage difference across the cell membrane. In order to explain and represent the physiological properties of these cells, we will find the concept of receptive fields very useful.

A receptive field is a graphical representation of the area in the visual field to which a given cell responds. Additionally, the nature of the response is typically indicated for various regions in the receptive field. For example, we can consider the receptive field of a photoreceptor as a small circular area representing the size and location of that particular receptor's sensitivity in the visual field. The Figure below shows exemplary receptive fields for ganglion cells, typically in a center-surround antagonism. The left receptive field in the figure illustrates a positive central response (know as on-center). This kind of response is usually generated by a positive input from a single cone surrounded by a negative response generated from several neighboring cones. Therefore, the response of this ganglion cell would be made up of inputs from various cones with both positive and negative signs. In this way, the cell not only responds to points of light, but serves as an edge (or more correctly, a spot) detector. In analogy to the computer vision terminology, we can think of the ganglion cell responses as the output of a convolution with an edge-detector kernel. The right receptive field of in the figure illustrates a negative central response (know as off-center), which is equally likely. Usually, on-center and off-center cells will occur at the same spatial location, fed by the same photoreceptors, resulting in an enhanced dynamic range.

The lower Figure shows that in addition to spatial antagonism, ganglion cells can also have spectral opponency. For instance, the left part of the lower figure illustrates a red-green opponent response with the center fed by positive input from an L-cone and the surrounding fed by a negative input from M-cones. On the other hand, the right part of the lower figure illustrates the off-center version of this cell. Hence, before the visual information has even left the retina, processing has already occurred, with a profound effect on color appearance. There are other types and varieties of ganglion cell responses, but they all share these basic concepts.

Antagonist receptive fields (on center)
On center
Antagonist receptive fields (off center)
Off center
Antagonist receptive fields
Spectrally and spatially antagonist receptive fields (on center)
On center
Spectrally and spatially antagonist receptive fields (off center)
Off center
Spectrally and spatially antagonist receptive fields.

On their way to the primary visual cortex, ganglion cell axons gather to form the optic nerve, which projects to the lateral geniculate nucleus (LGN) in the thalamus. Coding in the optic nerve is highly efficient, keeping the number of nerve fibers to a minimum (limited by the size of the optic nerve) and thereby also the size of the retinal blind spot as small as possible (approximately 5° wide by 7° high). Furthermore, the presented ganglion cells would have no response to uniform illumination, since the positive and negative areas are balanced. In other words, the transmitted signals are uncorrelated. For example, information from neighboring parts of natural scenes are highly correlated spatially and therefore highly predictable [33]. Lateral inhibition between neighboring retinal ganglion cells minimizes this spatial correlation, therefore improving efficiency. We can see this as a process of image compression carried out in the retina.

Given the overlapping of the L- and M-cone absorption spectra, their signals are also highly correlated. In this case, coding efficiency is improved by combining the cone signals in order to minimize said correlation. We can understand this more easily using Principal Component Analysis (PCA). PCA is a statistical method used to reduce the dimensionality of a given set of variables by transforming the original variables, to a set of new variables, the principal components (PCs). The first PC accounts for a maximal amount of total variance in the original variables, the second PC accounts for a maximal amount of variance that was not accounted for by the first component, and so on. In addition, PCs are linearly-independent and orthogonal to each other in the parameter space. PCA's main advantage is that only a few of the strongest PCs are enough to cover the vast majority of system variability [34]. This scheme has been used with the cone absorption functions [35] and even with the naturally occurring spectra[36],[37]. The PCs that were found in the space of cone excitations produced by natural objects are 1) a luminance axis where the L- and M-cone signals are added (L+M), 2) the difference of the L- and M-cone signals (L-M), and 3) a color axis where the S-cone signal is differenced with the sum of the L- and M-cone signals (S-(L+M)). These channels, derived from a mathematical/computational approach, coincide with the three retino-geniculate channels discovered in electrophysiological experiments [38],[39]. Using these mechanisms, visual redundant information is eliminated in the retina.

There are three channels of information that actually communicate this information from the retina through the ganglion cells to the LGN. They are different not only on their chromatic properties, but also in their anatomical substrate. These channels pose important limitations for basic color tasks, such as detection and discrimination.

In the first channel, the output of L- and M-cones is transmitted synergistically to diffuse bipolar cells and then to cells in the magnocellular layers (M-) of the LGN (not to be confused with the M-cones of the retina)[39]. The receptive fields of the M-cells are composed of a center and a surround, which are spatially antagonist. M-cells have high-contrast sensitivity for luminance stimuli, but they show no response at some combination of L-M opponent inputs[40]. However, because the null points of different M-cells vary slightly, the population response is never really zero. This property is actually passed on to cortical areas with predominant M-cell inputs[41].

The parvocellular pathway (P-) originates with the individual outputs from L- or M-cone to midget bipolar cells. These provide input to retinal P-cells[26]. In the fovea, the receptive field centers of P-cells are formed by single L- or M-cones. The structure of the P-cell receptive field surround is still debated. However, the most accepted theory states that the surround consists of a specific cone type, resulting in a spatially opponent receptive field for luminance stimuli[42]. Parvocellular layers contribute with about 80 % of the total projections from the retina to the LGN[43].

Finally, the recently discovered koniocellular pathway (K-) carries mostly signals from S-cones[44]. Groups of this type of cones project to special bipolar cells, which in turn provide input to specific small ganglion cells. These are usually not spatially opponent. The axons of the small ganglion cells project to thin layers of the LGN (adjacent to parvocellular layers)[45].

While the ganglion cells do terminate at the LGN (making synapses with LGN cells), there appears to be a one-to-one correspondence between ganglion cells and LGN cells. The LGN appears to act as a relay station for the signals. However, it probably serves some visual function, since there are neural projections from the cortex back to the LGN that could serve as some type of switching or adaptation feedback mechanism. The axons of LGN cells project to visual area one (V1) in the visual cortex in the occipital lobe.

Color Perception at the Brain

In the cortex, the projections from the magno-, parvo-, and koniocellular pathways end in different layers of the primary visual cortex. The magnocellular fibers innervate principally layer 4Cα and layer 6. Parvocellular neurons project mostly to 4Cβ, and layers 4A and 6. Koniocellular neurons terminate in the cytochrome oxidase (CO-) rich blobs in layers 1, 2, and 3[46].

Once in the visual cortex, the encoding of visual information becomes significantly more complex. In the same way the outputs of various photoreceptors are combined and compared to produce ganglion cell responses, the outputs of various LGN cells are compared and combined to produce cortical responses. As the signals advance further up in the cortical processing chain, this process repeats itself with a rapidly increasing level of complexity to the point that receptive fields begin to lose meaning. However, some functions and processes have been identified and studied in specific regions of the visual cortex.

In the V1 region (striate cortex), double opponent neurons - neurons that have their receptive fields both chromatically and spatially opposite with respect to the on/off regions of a single receptive field - compare color signals across the visual space [47]. They constitute between 5 to 10% of the cells in V1. Their coarse size and small percentage matches the poor spatial resolution of color vision [16]. Furthermore, they are not sensitive to the direction of moving stimuli (unlike some other V1 neurons) and, hence, unlikely to contribute to motion perception[48]. However, given their specialized receptive field structure, these kind of cells are the neural basis for color contrast effects, as well as an efficient mean to encode color itself[49],[50]. Other V1 cells respond to other types of stimuli, such as oriented edges, various spatial and temporal frequencies, particular spatial locations, and combinations of these features, among others. Additionally, we can find cells that linearly combine inputs from LGN cells as well as cells that perform nonlinear combination. These responses are needed to support advanced visual capabilities, such as color itself.

(Partial) flow diagram illustrating the many streams of visual information processes that take place in the visual cortex. It is important to note that information can flow in both directions.
Fig. 4. (Partial) flow diagram illustrating the many streams of visual information processes that take place in the visual cortex. It is important to note that information can flow in both directions.

There is substantially less information on the chromatic properties of single neurons in V2 as compared to V1. On a first glance, it seems that there are no major differences of color coding in V1 and V2[51]. One exception to this is the emergence of a new class of color-complex cell[52]. Therefore, it has been suggested that V2 region is involved in the elaboration of hue. However, this is still very controversial and has not been confirmed.

Following the modular concept developed after the discovery of functional ocular dominance in V1, and considering the anatomical segregation between the P-, M-, and K-pathways (described in Sec. 3), it was suggested that a specialized system within the visual cortex devoted to the analysis of color information should exist[53]. V4 is the region that has historically attracted the most attention as the possible "color area" of the brain. This is because of an influential study that claimed that V4 contained 100 % of hue-selective cells[54]. However, this claim has been disputed by a number of subsequent studies, some even reporting that only 16 % of V4 neurons show hue tuning[55]. Currently, the most accepted concept is that V4 contributes not only to color, but to shape perception, visual attention, and stereopsis as well. Furthermore, recent studies have focused on other brain regions trying to find the "color area" of the brain, such as TEO[56] and PITd[57]. The relationship of these regions to each other is still debated. To reconcile the discussion, some use the term posterior inferior temporal (PIT) cortex to denote the region that includes V4, TEO, and PITd[16].

If the cortical response in V1, V2, and V4 cells is already a very complicated task, the level of complexity of complex visual responses in a network of approximately 30 visual zones is humongous. Figure 4 shows a small portion of the connectivity of the different cortical areas (not cells) that have been identified[58].

At this stage, it becomes exceedingly difficult to explain the function of singles cortical cells in simple terms. As a matter of fact, the function of a single cell might not have meaning since the representation of various perceptions must be distributed across collections of cells throughout the cortex.

Color Vision Adaptation Mechanisms

Although researchers have been trying to explain the processing of color signals in the human visual system, it is important to understand that color perception is not a fixed process. Actually, there are a variety of dynamic mechanisms that serve to optimize the visual response according to the viewing environment. Of particular relevance to color perception are the mechanisms of dark, light, and chromatic adaptation.

Dark Adaptation

Dark adaptation refers to the change in visual sensitivity that occurs when the level of illumination is decreased. The visual system response to reduced illumination is to become more sensitive, increasing its capacity to produce a meaningful visual response even when the light conditions are suboptimal[59].

Dark adaptation. During the first 10 minutes (i.e. to the left of the dotted line), sensitivity recovery is done by the cones. After the first 10 minutes (i.e. to the right of the dotted line), rods outperform the cones. Full sensitivity is recovered after approximately 30 minutes.
Fig. 5. Dark adaptation. During the first 10 minutes (i.e. to the left of the dotted line), sensitivity recovery is done by the cones. After the first 10 minutes (i.e. to the right of the dotted line), rods outperform the cones. Full sensitivity is recovered after approximately 30 minutes.

Figure 5 shows the recovery of visual sensitivity after transition from an extremely high illumination level to complete darkness[58]. First, the cones become gradually more sensitive, until the curve levels off after a couple of minutes. Then, after approximately 10 minutes have passed, visual sensitivity is roughly constant. At that point, the rod system, with a longer recovery time, has recovered enough sensitivity to outperform the cones and therefore recover control the overall sensitivity. Rod sensitivity gradually improves as well, until it becomes asymptotic after about 30 minutes. In other words, cones are responsible for the sensitivity recovery for the first 10 minutes. Afterwards, rods outperform the cones and gain full sensitivity after approximately 30 minutes.

This is only one of several neural mechanisms produced in order to adapt to the dark lightning conditions as good as possible. Some other neural mechanisms include the well-known pupil reflex, depletion and regeneration of photopigment, gain control in retinal cells and other higher-level mechanisms, and cognitive interpretation, among others.

Light Adaptation

Light adaptation is essentially the inverse process of dark adaptation. As a matter of fact, the underlying physiological mechanisms are the same for both processes. However, it is important to consider it separately since its visual properties differ.

Light adaptation. For a given scene, the solid lines represent families of visual response curves at different (relative) energy levels. The dashed line represents the case where we would adapt in order to cover the entire range of illumination, which would yield limited contrast and reduced sensitivity.
Fig. 6. Light adaptation. For a given scene, the solid lines represent families of visual response curves at different (relative) energy levels. The dashed line represents the case where we would adapt in order to cover the entire range of illumination, which would yield limited contrast and reduced sensitivity.

Light adaptation occurs when the level of illumination is increased. Therefore, the visual system must become less sensitive in order to produce useful perceptions, given the fact that there is significantly more visible light available. The visual system has a limited output dynamic range available for the signals that produce our perceptions. However, the real world has illumination levels covering at least 10 orders of magnitude more. Fortunately, we rarely need to view the entire range of illumination levels at the same time.

At high light levels, adaptation is achieved by photopigment bleaching. This scales photon capture in the receptors and protects the cone response from saturating at bright backgrounds. The mechanisms of light adaptation occur primarily within the retina[60]. As a matter of fact, gain changes are largely cone-specific and adaptation pools signals over areas no larger than the diameter of individual cones[61],[62]. This points to a localization of light adaptation that may be as early as the receptors. However, there appears to be more than one site of sensitivity scaling. Some of the gain changes are extremely rapid, while others take seconds or even minutes to stabilize[63]. Usually, light adaptation takes around 5 minutes (six times faster than dark adaptation). This might point to the influence of post-receptive sites.

Figure 6 shows examples of light adaptation [58]. If we would use a single response function to map the large range of intensities into the visual system's output, then we would only have a very small range at our disposal for a given scene. It is clear that with such a response function, the perceived contrast of any given scene would be limited and visual sensitivity to changes would be severely degraded due to signal-to-noise issues. This case is shown by the dashed line. On the other hand, solid lines represent families of visual responses. These curves map the useful illumination range in any given scene into the full dynamic range of the visual output, thus resulting in the best possible visual perception for each situation. Light adaptation can be thought of as the process of sliding the visual response curve along the illumination level axis until the optimum level for the given viewing conditions is reached.

Chromatic Adaptation

The general concept of chromatic adaptation consists in the variation of the height of the three cone spectral responsivity curves. This adjustment arises because light adaptation occurs independently within each class of cone. A specific formulation of this hypothesis is known as the von Kries adaptation. This hypothesis states that the adaptation response takes place in each of the three cone types separately and is equivalent to multiplying their fixed spectral sensitivities by a scaling constant[64]. If the scaling weights (also known as von Kries coefficients) are inversely proportional to the absorption of light by each cone type (i.e. a lower absorption will require a larger coefficient), then von Kries scaling maintains a constant mean response within each cone class. This provides a simple yet powerful mechanism for maintaining the perceived color of objects despite changes in illumination. Under a number of different conditions, von Kries scaling provides a good account of the effects of light adaptation on color sensitivity and appearance[65],[66].

The easiest way to picture chromatic adaptation is by examining a white object under different types of illumination. For example, let's consider examining a piece of paper under daylight, fluorescent, and incandescent illumination. Daylight contains relatively far more short-wavelength energy than fluorescent light, and incandescent illumination contains relatively far more long-wavelength energy than fluorescent light. However, in spite of the different illumination conditions, the paper approximately retains its white appearance under all three light sources. This is because the S-cone system becomes relatively less sensitive under daylight (in order to compensate for the additional short-wavelength energy) and the L-cone system becomes relatively less sensitive under incandescent illumination (in order to compensate for the additional long-wavelength energy)[58].

Retinal Implants

Since the late 20th century, restoring vision to blind people by means of artificial eye prostheses has been the goal of numerous research groups and some private companies around the world. Similar to cochlear implants, the key concept is to stimulate the visual nervous system with electric pulses, bypassing the damaged or degenerated photoreceptors on the human retina. In this chapter we will describe the basic functionality of a retinal implant, as well as the different approaches that are currently being investigated and developed. The two most common approaches to retinal implants are called “epiretinal” and “subretinal” implants, corresponding to eye prostheses located either on top or behind the retina respectively. We will not cover any non-retina related approaches to restoring vision, such as the BrainPort Vision System that aims at stimulating the tongue from visual input, cuff electrodes around the optic nerve, or stimulation implants in the primary visual cortex.

Retinal Structure and Functionality

Figure 1 depicts the schematic nervous structure of the human retina. We can differentiate between three layers of cells. The first, located furthest away from the eye lens, consists of the photoreceptors (rods and cones) whose purpose is to transduce incoming light into electrical signals that are then further propagated to the intermediate layer, which is mainly composed of bipolar cells. These bipolar cells, which are connected to photoreceptors as well as cell types such as horizontal cells and amacrine cells, passd on the electrical signal to the retinal ganglion cells (RGC). For a detailed description on the functionality of bipolar cells, specifically with respect to their subdivision into ON- and OFF-bipolar cells, refer to chapter on Visual Systems. The uppermost layer, consisting of RGCs, collects the electric pulses from the horizontal cells and passes them on to the thalamus via the optic nerve. From there, signals are propagated to the primary visual cortex. There are some key aspects worth mentioning about the signal processing within the human retina. First, while bipolar cells, as well as horizontal and amacrine, generate graded potentials, the RGCs generate action potentials instead. Further, the density of each cell type is not uniform across the retina. While there is an extremely high density of rods and cones in the area of the fovea, with in addition only very few photoreceptors connected to RGCs via the intermediate layer, a far lower density of photoreceptors is found in the peripheral areas of the retina with many photoreceptors connected to a single RGC. The latter also has direct implications on the receptive field of a RGC, as it tends to increase rapidly towards the outer regions of the retina, simply because of the lower photoreceptor density and the increased number of photoreceptors being connected to the same RGC.

Schematic overview of the human eye and the location of retinal prostheses. Note the vertical layering of the retina tissue and the distances of the cell types to epiretinal and subretinal implants respectively.


Implant Use Case

Damage to the photoreceptor layer in the human can be caused by Retinitis pigmentosa, age-related macular degeneration and other diseases, eventually resulting in the affected person to become blind. However, the rest of the visual nervous system, both inside the retina as well as the visual nervous pathway in the brain, remains intact for several years after onset of blindness [67] [68]. This allows artificial stimulation of the remaining, still properly functioning retina cells, through electrodes, to restore visual information for the human patient. Thereby a retina prosthesis can be implanted either behind the retina, and is then referred to as subretinal implant. This brings the electrodes closest to the damaged photoreceptors and the still properly functioning bipolar cells, which are the real stimulation target here. (If the stimulation electrodes penetrate the choroid, which contains the blood supply of the retina, the implants are sometimes called "suprachoroidal" implants.) Or the implant may be put on top of the retina, closest to the Ganglion cell layer, aiming at stimulation of the RGCs instead. These implants are referred to as epiretinal implants. Both approaches are currently being investigated by several research groups. They both have significant advantages as well as drawbacks. Before we treat them in more detail separately, we describe some key challenges that need consideration in both cases.


A big challenge for retinal implants comes from the extremely high spatial density of nervous cells in the human retina. There are roughly 125 million photoreceptors (rods and cones) and 1.5 million ganglion cells in the human retina, as opposed to approximately only 15000 hair cells in the human cochlea [69] [70]. In the fovea, where the highest visual acuity is achieved, as many as 150000 cones are located within one square millimeter. While there are much fewer RGCs in total compared to photoreceptors, their density in the foveal area is close to the density of cones , imposing a tremendous challenge in addressing the nervous cells in high enough spatial resolution with artificial electrodes. Virtually all current scientific experiments with retinal implants use micro-electrode arrays (MEAs) to stimulate the retina cells. High resolution MEAs achieve an inter-electrode spacing of roughly 50 micrometers, resulting in an electrode density of 400 electrodes per square millimeter. Therefore, a one to one association between electrodes and photoreceptors or RGCs respectively is impossible in the foveal area with conventional electrode technology. However, spatial density of both photoreceptors as well as RGCs decrease s quickly towards the outer regions of the retina, making one-to-one stimulation between electrodes and peripheral nerve cells more feasible [71]. Another challenge is operating the electrodes within safe limits. Imposing charge densities above 0.1 mC/cm2 may damage the nervous tissue [71]. Generally, the further a cell is away from the stimulating electrode, the larger is the current amplitude required for stimulation of the cell. Furthermore, the lower the stimulation threshold, the smaller the electrode may be designed and the compacter the electrodes may be placed on the MEAs, thereby enhancing the spatial stimulation resolution. Stimulation threshold is defined as the minimal stimulation strength necessary to trigger a nervous response in at least 50% of the stimulation pulses. For these reasons, a primary goal in designing retinal implants is to use as low a stimulation current as possible while still guaranteeing a reliable stimulation (i.e. generation of an action potential in the case of RGCs) of the target cell. This can either be achieved by placing the electrode as close as possible to the area of the target cell that reacts most sensitive to an applied electric field pulse or by making the cell projections, i.e. dendrites and/or axons, grow on top the electrode, allowing a stimulation of the cell with very low currents even if the cell body is located far away. Further, an implant fixed to the retina automatically follows the movements of the eyeball. While this entails some significant benefits, it also means that any connection to the implant - for adjusting parameters, reading out data, or providing external power for the stimulation - requires a cable that moves with the implant. As we move our eyes approximately three times a second, this exposes the cable and involved connections to severe mechanical stress. For a device that should remain functioning for an entire life time without external intervention, this imposes a severe challenge on the materials and technologies involved.

Subretinal Implants

As the name already suggest, subretinal implants are visual prosthesis located behind the retina. Therefore, the implant is located closest to the damaged photoreceptors, aiming at bypassing the rods and cones and stimulating the bipolar cells in the next nervous layer in the retina. The main advantage of this approach lies in relatively little visual signal processing that takes place between the photoreceptors and the bipolar cells that need to be imitated by the implant. That is, raw visual information, for example captured by a video camera, may be forwarded directly, or with only relatively rudimentary signal processing respectively, to the MEA stimulating the bipolar cells, rendering the procedure rather simple from a signal processing point of view. However, this approach has some severe disadvantages. The high spatial resolution of photoreceptors in the human retina imposes a big challenge in developing and designing a MEA with sufficiently high stimulation resolution and therefore low inter-electrode spacing. Furthermore, the stacking of the nervous layers in z-direction (with the x-y plane tangential to the retina curvature) adds another difficulty when it comes to placing the electrodes close to the bipolar cells. With the MAE located behind the retina, there is a significant spatial gap between the electrodes and the target cells that needs to be overcome. As mentioned above, an increased electrode to target cell distance forces the MAE to operate with higher currents, enlarging the electrode size, the number of cells within the stimulation range of a single electrode and the spatial separation between adjacent electrodes. All of this results in a decreased stimulation resolution as well as opposing the retina to the risk of tissue damage caused by too high charge densities. As shown below, one way to overcome large distances between electrodes and the target cells is to make the cells grow their projections over longer distances directly on top the electrode.

In late 2010, a German research group in collaboration with the private German company “Retina Implant AG”, published results from studies involving tests with subretinal implants in human subjects [67]. A three by three millimeter microphotodiode array (MPDA) containing 1500 pixels, which each pixel consisting of an individual light-sensing photodiodes and an electrode, was implanted behind the retina of three patients suffering from blindness due to macular degeneration. The pixels were located approximately 70 micrometer apart from each other, yielding a spatial resolution of roughly 160 electrodes per square millimeter – or, as indicated by the authors of the paper, a visual cone angle of 15 arcmin for each electrode. It should be noted, that, in contrast to implants using external video cameras to generate visual input, each pixel of the MPDA itself contains a light-sensitive photodiode, autonomously generating the electric current from the light received through the eyeball for its own associated electrode. So each MPDA pixel corresponds in its full functionality to a photoreceptor cell. This has a major advantage: Since the MPDA is fixed behind the human retina, it automatically drags along when the eyeball is being moved. And since the MPDA itself receives the visual input to generate the electric currents for the stimulation electrodes, movements of the head or the eyeball are handled naturally and need no artificial processing. In one of the patients, the MPDA was placed directly beneath the macula, leading to superior results in experimental tests as opposed to the other two patients, whose MPDA was implanted further away from the center of the retina. The results achieved by the patient with the implant behind the macula were quite extraordinary. He was able to recognize letters (5-8cm large) and read words as well as distinguish black-white patterns with different orientations [67].

The experimental results with the MPDA implants have also drawn attention to another visual phenomenon, revealing an additional advantage of the MPDA approach over implants using external imaging devices: Subsequent stimulation of retinal cells quickly leads to decreased responses, suggesting that retinal neurons become inhibited after being stimulated repeatedly within a short period of time. This entails that a visual input projected onto a MEA fixed on or behind the retina will result in a sensed image that quickly fades away, even though the electric stimulation of the electrodes remains constant. This is due to the fixed electrodes on the retina stimulating the same cells on the retina all the time, rendering the cells less and less sensitive to a constant stimulus over time. However, the process is reversible, and the cells regain their initial sensitivity once the stimulus is absent again. So, how does an intact visionary system handle this effect? Why are healthy humans able to fix an object over time without it fading out? As mentioned in [72], the human eye actually continuously adjusts in small, unnoticeable eye movements, resulting in the same visual stimulus to be projected onto slightly different retinal spots over time, even as we tend to focus and fix the eye on some target object. This successfully circumvents the fading cell response phenomenon. With the implant serving both as photoreceptor and electrode stimulator, as it is the case with the MPDA, the natural small eye adjustments can be readily used to handle this effect in a straight forward way. Other implant approaches using external visual input (i.e. from video cameras) will suffer from their projected images fading away if stimulated continuously. Fast, artificial jittering of the camera images may not solve the problem as this external movement may not be in accordance with the eye movement and therefore, the visual cortex may interpret this simply as a wiggly or blurry scene instead of the desired steady long term projection of the fixed image. A further advantage of subretinal implants is the precise correlation between stimulated areas on the retina and perceived location of the stimulus in the visual field of the human subject. In contrast to RGCs, whose location on the retina may not directly correspond to the location of their individual receptive fields, the stimulation of a bipolar cell is perceived exactly at that point in the visual field that corresponds to the geometric location on the retina where that bipolar cell resides. A clear disadvantage of subretinal implants is the invasive surgical procedure involved.

Epiretinal Implants

Epiretinal implants are located on top of the retina and therefore closest to the retina ganglion cells (RGCs). For that reason, epiretinal implants aim at stimulating the RGCs directly, bypassing not only the damaged photoreceptors, but also any intermediate neural visual processing by the bipolar, horizontal and amacrine cells. This has some advantages: First of all, the surgical procedure for an epiretinal implant is far less critical than for a subretinal implant, since the prosthesis need not be implanted from behind the eye. Also, there are much fewer RGCs than photoreceptors or bipolar cells, allowing a more course grained stimulation with increased inter-electrode distance (at least in the peripheral regions of the retina), or an electrode density even superior to that of the actual RGC density, allowing for more flexibility and accuracy when stimulating the cells. A study on the epiretinal stimulation of peripheral parasol cells conducted on macaque retina provides quantitative details [71]. Parasol cells are one type of RGCs forming the secondmost dense visual pathway in the retina. Their main purpose is to encode the movement of objects in the visual field, thus sensing motion. The experiments were performed in vitro by placing the macaque retina tissue on a 61 electrode MEA (60 micrometer inter-electrode spacing). 25 individual parasol cells were indentified and stimulated electronically while properties such as stimulation threshold and best stimulation location were analyzed. The threshold current was defined as the lowest current that triggered a spike on the target cell in 50% of the stimulus pulses (pulse duration: 50 milliseconds) and was determined by incrementally increasing the stimulation strength until sufficient spiking response was registered. Please note two aspects: First, parasol cells as RGCs exhibit action potential behavior, as opposed to bipolar cells which work with graded potentials. Second, the electrodes on the MAE were both used for the stimulation pulses as well as for recording the spiking response from the target cells. 25 parasol cells were located on the 61 electrode MAE with a electrode density significantly higher than the parasol cell density, effectively yielding multiple electrodes within the receptive fields of a single parasol cell. In addition to measuring the stimulation thresholds necessary to trigger a reliable cell response, also the location of best stimulation was determined. The location of best stimulation refers to the location of the stimulating electrode with respect to the target cell where the lowest stimulation threshold was achieved. Surprisingly, this was found out to not be on the cell soma, as one would expect, but roughly 13 micrometers further down the axon path. From there on, the experiments showed the expected quadratic increase in stimulation threshold currents with respect to increasing electrode to soma distance. The study results also showed that all stimulation thresholds were well below the safety limits (around 0.05mC/cm2, as opposed to 0.1mC/cm2 being a (low) safety limit) and that the cell response to a stimulation pulse was fast (0.2 ms latency on average) and precise (small variance on latency). Further, the superior electrode density over parasol cell density allowed a reliable addressing of individual cells by the stimulation of the appropriate electrode, while preventing neighboring cells from also evoking a spike.

Overview of Alternative Technical Approaches

In this section, we give a short overview over some alternative approaches and technologies currently being under research.

Nanotube Electrode

Classic MAEs contain electrodes made out of titanium nitride or indium tin oxide exposing the implant to severe issues with long-term biocompatibility [68]. A promising alternative to metallic electrodes consists of carbon nanotubes (CNT) which combine a number of very advantageous properties. First, they are fully bio compatible since they are made from pure carbon. Second, their robustness makes them suited for long term implantation, a key property for visual prosthesis. Further, the good electric conductivity allows them to operate as electrodes. And finally, their very porous nature leads to extremely large contact surfaces, encouraging the neurons to grow on top the CNTs, thus improving the neuron to electrode contact and lowering the stimulation currents necessary to elicit a cell response. However, CNT electrodes have only emerged recently and at this point only few scientific results are available.

Wireless Implant Approaches

One of the main technical challenges with retinal implant relates to the cabling that connects the MEA with the external stimuli, the power supply as well as the control signals. The mechanical stress on the cabling affects its long term stability and durability, imposing a big challenge on the materials used. Wireless technologies could be a way to circumvent any cabling between the actual retinal implant and external devices. The energy of the incoming light through the eye is not sufficient to trigger neural responses. Therefore, to make a wireless implant work, extra power must be provided to the implant. An approach presented by the Stanford School of Medecine uses an infrared LCD display to project the scene captured by a video camera onto goggles, reflecting infrared pulses onto the chip located on the retina. The chip also uses a photovoltaic rechargeable battery to provide the power required to transfer the IR light into sufficiently strong stimulation pulses. Similar to the subretinal approach, this also allows the eye to naturally fix and focus onto objects in the scene, as the eye is free to move, allowing different parts of the IR image on the goggles to be projected onto different areas on the chip located on the retina. Instead of using infrared light, inductive coils can also be used to transmit electrical power and data signals from external devices to the implant on the retina. This technology has been successfully implemented and tested in the EPIRET3 retinal implant [73]. However, those tests were more a proof-of-concept, as only the patient’s ability to sense a visual signal upon applying a stimulus on the electrodes was tested.

Directed Neural Growth

One way to allow a very precise neural stimulation with extremely low currents and even over longer distances is to make the neurons grow their projections onto the electrode. By applying the right chemical solution onto the retinal tissue, neural growth can be encouraged. This can be achieved by applying a layer of Laminin onto the MEA’s surface. In order to control the neural paths, the Laminin is not applied uniformly across the MEA surface, but in narrow paths forming a pattern corresponding to the connections, the neurons should form. This process of applying the Laminin in a precise, patterend way, is called “microcontact printing”. A picture of what these Lamini paths look like is shown in Figure 5. The successful directed neural growth achieved with this method allowed applying significantly lower stimulation currents compared to classic electrode stimulation while still able to reliably trigger neural response [74]. Furthermore, the stimulation threshold no longer follows the quadratic increase with respect to electrode-soma distance, but remains constant at the same low level even for longer distances (>200 micrometer).

Other Visual Implants

In addition to the stimulation of the retina, also other elements of the visual system can be stimulated

Stimulation of the Optic Nerve

With cuff-electrodes, typically with only a few segments.


  • Little trauma to the eye.


  • Not very specific.

Cortical Implants

Visual cortical implant designed by Mohamad Sawan
The Visual Cortical Implant

Dr. Mohamad Sawan, Professor and Researcher at Polystim neurotechnologies Laboratory at the Ecole Polytechnique de Montreal, has been working on a visual prosthesis to be implanted into the human cortex. The basic principle of Dr. Sawan’s technology consists in stimulating the visual cortex by implanting a silicium microchip on a network of electrodes made of biocompatible materials and in which each electrode injects a stimulating electrical current in order to provoke a series of luminous points to appear (an array of pixels) in the field of vision of the sightless person. This system is composed of two distinct parts: the implant and an external controller. The implant lodged in the visual cortex wirelessly receives dedicated data and energy from the external controller. This implantable part contains all the circuits necessary to generate the electrical stimuli and to oversee the changing microelectrode/biological tissue interface. On the other hand, the battery-operated outer control comprises a micro-camera which captures the image as well as a processor and a command generator which process the imaging data to select and translate the captured images and to generate and manage the electrical stimulation process and oversee the implant. The external controller and the implant exchange data in both directions by a powerful transcutaneous radio frequency (RF) link. The implant is powered the same way. (Wikipedia [2])


  • Much larger area for stimulation: 2° radius of the central retinal visual field correspond to 1 mm2 on the retina, but to 2100 mm2 in the visual cortex.


  • Implantation is more invasive.
  • Parts of the visual field lie in a sulcus and are very hard to reach.
  • Stimulation can trigger seizures.

Computer Simulation of the Visual System

In this section an overview in the simulation of processing done by the early levels of the visual system will be given. The implementation to reproduce the action of the visual system will thereby be done with MATLAB and its toolboxes. The processing done by the early visual system was discussed in the section before and can be put together with some of the functions they perform in the following schematic overview. A good description of the image processing can be found in (Cormack 2000).

Schematic overview of the processing done by the early visual system
Structure Operations 2D Fourier Plane
World I(x,y,t,\lambda) 2D Fourier Plane 01.jpg
Optics Low-pass spatial filtering 2D Fourier Plane 02.jpg
Photoreceptor Array Sampling, more low-pass filtering, temporal lowhandpass filtering, \lambda filtering, gain control, response compression
LGN Cells Spatiotemporal bandpass filtering, \lambda filtering, multiple parallel representations 2D Fourier Plane 03.jpg
Primary Visual Cortical Neurons: Simple & Complex Simple cells: orientation, phase, motion, binocular disparity, & \lambda filtering 2D Fourier Plane 04.jpg
Complex cells: no phase filtering (contrast energy detection)

On the left, are some of the major structures to be discussed; in the middle, are some of the major operations done at the associated structure; in the right, are the 2-D Fourier representations of the world, retinal image, and sensitivities typical of a ganglion and cortical cell. (From Handbook of Image and Video Processing, A. Bovik)

As we can see in the above overview different stages of the image processing have to be considered to simulate the response of the visual system to a stimulus. The next section will therefore give a brief discussion in Image Processing. But first of all we will be concerned with the Simulation of Sensory Organ Components.

Simulating Sensory Organ Components

Anatomical Parameters of the Eye

The average eye has an anterior corneal radius of curvature of r_C = 7.8 mm , and an aqueous refractive index of 1.336. The length of the eye is L_E = 24.2 mm. The iris is approximately flat, and the edge of the iris (also called limbus) has a radius r_L = 5.86 mm.

Optics of the Eyeball

The optics of the eyeball are characterized by its 2-D spatial impulse response function, the Point Spread Function (PSF)

h(r) = 0.95\cdot \exp\left( -2.6\cdot |r|^{1.36} \right) + 0.05\cdot\exp\left( -2.4\cdot |r|^{1.74} \right) ,

in which r is the radial distance in minutes of arc from the center of the image.

Practical implementation

Obviously, the effect on a given digital image depends on the distance of that image from your eyes. As a simple place-holder, substitute this filter with a Gaussian filter with height 30, and with a standard deviation of 1.5.

In one dimension, a Gaussian is described by

g(x) = a \cdot \exp \left( -\frac{x^2}{2\sigma^2} \right) .

Activity of Ganglion Cells

Mexican Hat function, with sigma1:sigma2 = 1:1.6

Ignoring the

  • temporal response
  • effect of wavelength (especially for the cones)
  • opening of the iris
  • sampling and distribution of photo receptors
  • bleaching of the photo-pigment

we can approximate the response of ganglion cells with a Difference of Gaussians (DOG, Wikipedia [3])

f(x;\sigma) = \frac{1}{\sigma_1\sqrt{2\pi}} \, \exp \left( -\frac{x^2}{2\sigma_1^2} \right)-\frac{1}{\sigma_2\sqrt{2\pi}} \, \exp \left( -\frac{x^2}{2\sigma_2^2} \right).

The source code for a Python implementation is available under [75].

The values of \sigma_1 and \sigma_2 have a ratio of approximately 1:1.6, but vary as a function of eccentricity. For midget cells (or P-cells), the Receptive Field Size (RFS) is approximately

RFS \approx 2 \cdot \text{Eccentricity} ,

where the RFS is given in arcmin, and the Eccentricity in mm distance from the center of the fovea (Cormack 2000).

Activity of simple cells in the primary visual cortex (V1)

Again ignoring temporal properties, the activity of simple cells in the primary visual cortex (V1) can be modeled with the use of Gabor filters (Wikipedia [4]). A Gabor filter is a linear filter whose impulse response is defined by a harmonic function (sinusoid) multiplied by a Gaussian function. The Gaussian function causes the amplitude of the harmonic function to diminish away from the origin, but near the origin, the properties of the harmonic function dominate

g(x,y;\lambda,\theta,\psi,\sigma,\gamma)=\exp\left(-\frac{x'^2+\gamma^2y'^2}{2\sigma^2}\right)\cos\left(2\pi\frac{x'}{\lambda}+\psi\right) ,


x' = x \cos\theta + y \sin\theta\, ,


y' = -x \sin\theta + y \cos\theta\, .

In this equation, \lambda represents the wavelength of the cosine factor, \theta represents the orientation of the normal to the parallel stripes of a Gabor function (Wikipedia [5]), \psi is the phase offset, \sigma is the sigma of the Gaussian envelope and \gamma is the spatial aspect ratio, and specifies the ellipticity of the support of the Gabor function.

The size of simple-cell receptive fields depends on its position relative to the fovea, but less strictly so than for retinal ganglion cells. The smallest fields, in and near the fovea, are about one-quarter degree by one-quarter degree, with the center region as small as a few minutes of arc (the same as the diameter of the smallest receptive-field centers in retinal ganglion cells). In the retinal periphery, simple-cell receptive fields can be about 1 degree by 1 degree. [76].

Gabor-like functions arise naturally, simply from the statistics of everyday scenes [77]. An example how even the statistics of a simple image can lead to the emergence of Gabor-like receptive fields, written in Python, is presented in [78]; and a (Python-)demonstration of the effects of filtering an image with Gabor-functions can be found at [79].

Gabor function, with sigma = 1, theta = 1, lambda = 4, psi = 2, gamma = 1

This is an example implementation in MATLAB:

function gb = gabor_fn(sigma,theta,lambda,psi,gamma)
  sigma_x = sigma;
  sigma_y = sigma/gamma;
  % Bounding box
  nstds = 3;
  xmax = max(abs(nstds*sigma_x*cos(theta)),abs(nstds*sigma_y*sin(theta)));
  xmax = ceil(max(1,xmax));
  ymax = max(abs(nstds*sigma_x*sin(theta)),abs(nstds*sigma_y*cos(theta)));
  ymax = ceil(max(1,ymax));
  xmin = -xmax;
  ymin = -ymax;
  [x,y] = meshgrid(xmin:0.05:xmax,ymin:0.05:ymax);
  % Rotation
  x_theta = x*cos(theta) + y*sin(theta);
  y_theta = -x*sin(theta) + y*cos(theta);
  gb = exp(-.5*(x_theta.^2/sigma_x^2+y_theta.^2/sigma_y^2)).* cos(2*pi/lambda*x_theta+psi);

And an equivalent Pyhon implementation would be:

import numpy as np
import matplotlib.pyplot as mp
def gabor_fn(sigma = 1, theta = 1, g_lambda = 4, psi = 2, gamma = 1):
    # Calculates the Gabor function with the given parameters
    sigma_x = sigma
    sigma_y = sigma/gamma
    # Boundingbox:
    nstds = 3
    xmax = max( abs(nstds*sigma_x * np.cos(theta)), abs(nstds*sigma_y * np.sin(theta)) )
    ymax = max( abs(nstds*sigma_x * np.sin(theta)), abs(nstds*sigma_y * np.cos(theta)) )
    xmax = np.ceil(max(1,xmax))
    ymax = np.ceil(max(1,ymax))
    xmin = -xmax
    ymin = -ymax
    numPts = 201    
    (x,y) = np.meshgrid(np.linspace(xmin, xmax, numPts), np.linspace(ymin, ymax, numPts) ) 
    # Rotation
    x_theta =  x * np.cos(theta) + y * np.sin(theta)
    y_theta = -x * np.sin(theta) + y * np.cos(theta)
    gb = np.exp( -0.5* (x_theta**2/sigma_x**2 + y_theta**2/sigma_y**2) ) * \
         np.cos( 2*np.pi/g_lambda*x_theta + psi )
    return gb
if __name__ == '__main__':
    # Main function: calculate Gabor function for default parameters and show it
    gaborValues = gabor_fn()

Image Processing

One major technical tool to understand is the way a computer handles images. We have to know how we can edit images and what techniques we have to rearrange images.

Image Representation

Representation of graylevel images.

For a computer an image is nothing more than a huge amount of little squares. These squares are called "pixel". In a grayscale image, each of this pixel carries a number n, often it holds 0\leq n \leq 255. This number n, represents the exactly color of this square in the image. This means, in a grayscale image we can use 256 different grayscales, where 255 means a white spot, and 0 means the square is black. To be honest, we could even use more than 256 different levels of gray. In the mentioned way, every pixels uses exactly 1 byte (or 8 bit) of memory to be saved. (Due to the binary system of a computer it holds: 28=256) If you think it is necessary to have more different gray scales in your image, this is not a problem. You just can use more memory to save the picture. But just remember, this could be a hard task for huge images. Further quite often you have the problem that your sensing device (e.g. your monitor) can not show more than this 256 different gray colors.

File:ImageRepresentation Color.png
Image represented with RGB-notation

Representing a colourful image is only slightly more complicated than the grayscale picture. All you have to know is that the computer works with a additive colour mixture of the three main colors Red, Green and Blue. This are the so called RGB colours.

Also these images are saved by pixels. But now every pixel has to know 3 values between 0 and 256, for every Color 1 value. So know we have 2563= 16,777,216 different colours which can be represented. Similar to the grayscale images also here holds, that no color means black, and having all color means white. That means, the colour (0,0,0) is black, whereas (0,0,255) means blue and (255,255,255) is white.



WARNING - There are two common, but different ways to describe the location of a point in 2 dimensions: 1) The x/y notation, with x typically pointing to the left 2) The row/column orientation Carefully watch out which coordinates you are using to describe your data, as the two descriptions are not consistent!

Image Filtering

1D Filter

In many technical applications, we find some primitive basis in which we easily can describe features. In 1 dimensional cases filters are not a big deal, therefore we can use this filters for changing images. The so called "Savitzky- Golay Filter" allows to smooth incoming signals. The filter was described in 1964 by Abraham Savitzky and Marcel J. E. Golay. It is a impulse-respond filter (IR).

For better understanding, lets look at a example. In 1d we usually deal with vectors. One such given vector, we call x and it holds: \mathbf{x} = (x_1,x_2,\dots,x_n) \; with \; n \; \in \mathbb{N}. Our purpose is to smooth that vector x. To do so all we need is another vector \mathbf(w) = (w_1,w_2,\dots,w_m) \; with \; n>m \; \in \mathbb{N}, this vector we call a weight vector.

Filter 1D Principle.png

With y(k)=\displaystyle \sum_{i=1}^m w(i)x(k-m+i) we now have a smoothed vector y. This vector is smoother than the vector before, because we only save the average over a few entries in the vector. These means the newly found vectorentries, depends on some entries right left and right of the entry to smooth. One major drawback of this approach is, the newly found vector y only has n-m entries instead of n as the original vector x.

Drawing this new vector would lead to the same function as before, just with less amplitude. So no data is lost, but we have less fluctuation.

2D Filter

Going from the 1d case to the 2d case is done by simply make out of vectors matrices. As already mentioned, a gray-level image is for a computer or for a softwaretool as MATLAB nothing more, than a huge matrix filled with natural numbers, often between 0 and 255.

Filter 2D Principle.png

The weight vector is now a weight-matrix. But still we use the filter by adding up different matrix-element-multiplications. y(n,m)=\displaystyle \sum_{i=1}^k \sum_{j=1}^l w_{ij}\times x(n-1+i,m-1+j)

Dilation and Erosion

For linear filters as seen before, it holds that they are commutative. Cite from wikipedia: One says that x commutes with y under ∗ if:

 x * y = y * x \,

In other words, it does not matter how many and in which sequence different linear filters you use. E.g. if a Savitzky-Golay filter is applied to some date, and then a second Savitzky-Golay filter for calculationg the first derivative, the result is the same if the sequence of filters is reversed. It even holds, that there would have been one filter, which does the same as the two applied.

In contrast morphological operations on an image are non-linear operations and the final result depends on the sequence. If we think of any image, it is defined by pixels with values xij. Further this image is assumed to be a black-and-white image, so we have

 x_{ij}= 0\;or\;1, \forall i,j

To define a morphological operation we have to set a structural element SE. As example, a 3x3-Matrix as a part of the image.

The definition of erosion E says:

E(M)=\left\{ \begin{align}
  & 0,\ \,if\sum\limits_{i,j=0}^{3}{{{(se)}_{ij}}}<9  \\
  & 1,\ \,else  \\
\end{align} \right.\ \ ,with\ {{(se)}_{ij}},M\in SE

So in words, if any of the pixels in the structural element M has value 0, the erosion sets the value of M, a specific pixel in M, to zero. Otherwise E(M)=1

And for the dilation D it holds, if any value in SE is 1, the dilation of M, D(M), is set to 1.

D(M)=\left\{ \begin{align}
        & 1,\; if \sum_{i,j=0}^3 (se)_{ij} >=1 \\
        & 0,\; else 
       \right. \; \; ,with\; (se)_{ij},M \in SE  

Square Morphological.jpg

Compositions of Dilation and Erosion: Opening and Closing of Images

There are two compositions of dilation and erosion. One called opening the other called closing. It holds:

     opening & = & dilation \circ erosion \\
     closing & = & erosion \circ dilation 


  1. E Aydiner, AM Vural, B Ozcelik, K Kiymac, U Tan (2003), A simple chaotic neuron model: stochastic behavior of neural networks 
  2. E Chicca, G Indiveri, R Douglas (2004), An event-based VLSI network of Integrate-and-Fire Neurons 
  3. E Chicca, G Indiveri, R Douglas (2003), An adaptive silicon synapse 
  4. RJ Douglas, MA Mahowald (2003), Silicon Neuron 
  5. DO Hebb (1949), The organization of behavior 
  6. G Indiveri, E Chicca, R Douglas (2004), A VLSI reconfigurable network of integrate-and-fire neurons with spike-based learning synapses 
  7. G Indiveri, F Stefanini, E Chicca (2010), Spike-based learning with a generalized integrate and fire silicon neuron 
  8. PA Koplas, RL Rosenberg, GS Oxford (1997), The role of calcium in the densensitization of capsaisin responses in rat dorsal root ganglion neurons 
  9. a b J Lazzaro, S Ryckebusch, MA Mahowald, CA Mead (1989), Winner-Take-All: Networks of O(N) Complexity 
  10. CA Mead (1989), Analog VLSI and Neural Systems 
  11. S Mitra, G Indiveri, RE Cummings (2010), Synthesis of log-domain integrators for silicon synapses with global parametric control 
  12. G Rachmuth, HZ Shouval, MF Bear, CS Poon (2011), A biophysically-based neuromorphic model of spike rate-timing-dependent plasticity 
  13. M Riesenhuber, T Poggio (1999), Hierarchical models of object recognition in cortex 
  14. SC Liu, J Kramer, T Delbrück, G Indiveri, R Douglas (2002), Analog VLSI: Circuits and Principles 
  15. WM Siebert (1965), Some implications of the stochastic behavior of primary auditory neurons 
  16. a b c d Conway, Bevil R (2009). "Color vision, cones, and color-coding in the cortex". The neuroscientist 15: 274-290. 
  17. Russell, Richard and Sinha, Pawan} (2007). "Real-world face recognition: The importance of surface reflectance properties". Perception 36 (9). 
  18. Gegenfurtner, Karl R and Rieger, Jochem (2000). "Sensory and cognitive contributions of color to the recognition of natural scenes". Current Biology 10 (13): 805-808. 
  19. Changizi, Mark A and Zhang, Qiong and Shimojo, Shinsuke (2006). "Bare skin, blood and the evolution of primate colour vision". Biology letters 2 (2): 217-221. 
  20. a b Beretta, Giordano (2000). Understanding Color. Hewlett-Packard. 
  21. a b Boynton, Robert M (1988). "Color vision". Annual review of psychology 39 (1): 69-100. 
  22. Grassmann, Hermann (1853). "Zur theorie der farbenmischung". Annalen der Physik 165 (5): 69-84. 
  23. Konig, Arthur and Dieterici, Conrad (1886). "Die Grundempfindungen und ihre intensitats-Vertheilung im Spectrum". Koniglich Preussischen Akademie der Wissenschaften. 
  24. Smith, Vivianne C and Pokorny, Joel (1975). "Spectral sensitivity of the foveal cone photopigments between 400 and 500 nm". Vision research 15 (2): 161-171. 
  25. Vos, JJ and Walraven, PL (1971). "On the derivation of the foveal receptor primaries". Vision Research 11 (8): 799-818. 
  26. a b c Gegenfurtner, Karl R and Kiper, Daniel C (2003). "Color vision". Neuroscience 26 (1): 181. 
  27. Kaiser, Peter K and Boynton, Robert M (1985). "Role of the blue mechanism in wavelength discrimination". Vision research 125 (4): 523-529. 
  28. Paulus, Walter and Kroger-Paulus, Angelika (1983). "A new concept of retinal colour coding". Vision research 23 (5): 529-540. 
  29. Nerger, Janice L and Cicerone, Carol M (1992). "The ratio of L cones to M cones in the human parafoveal retina". Vision research 32 (5): 879-888. 
  30. Neitz, Jay and Carroll, Joseph and Yamauchi, Yasuki and Neitz, Maureen and Williams, David R (2002). "Color perception is mediated by a plastic neural mechanism that is adjustable in adults". Neuron 35 (4): 783-792. 
  31. Jacobs, Gerald H and Williams, Gary A and Cahill, Hugh and Nathans, Jeremy (2007). "Emergence of novel color vision in mice engineered to express a human cone photopigment". Science 315 (5819): 1723-1725. 
  32. Osorio, D and Ruderman, DL and Cronin, TW (1998). "Estimation of errors in luminance signals encoded by primate retina resulting from sampling of natural images with red and green cones". JOSA A 15 (1): 16-22. 
  33. Kersten, Daniel (1987). "Predictability and redundancy of natural images". JOSA A 4 (112): 2395-2400. 
  34. Jolliffe, I. T. (2002). Principal Component Analysis. Springer. 
  35. Buchsbaum, Gershon and Gottschalk, A (1983). "Trichromacy, opponent colours coding and optimum colour information transmission in the retina". Proceedings of the Royal society of London. Series B. Biological sciences 220 (1218): 89-113. 
  36. Zaidi, Qasim (1997). "Decorrelation of L-and M-cone signals". JOSA A 14 (12): 3430-3431. 
  37. Ruderman, Daniel L and Cronin, Thomas W and Chiao, Chuan-Chin (1998). "Statistics of cone responses to natural images: Implications for visual coding". JOSA A 15 (8): 2036-2045. 
  38. Lee, BB and Martin, PR and Valberg, A (1998). "The physiological basis of heterochromatic flicker photometry demonstrated in the ganglion cells of the macaque retina". The Journal of Physiology 404 (1): 323-347. 
  39. a b Derrington, Andrew M and Krauskopf, John and Lennie, Peter (1984). "Chromatic mechanisms in lateral geniculate nucleus of macaque". The Journal of Physiology 357 (1): 241-265. 
  40. Shapley, Robert (1990). "Visual sensitivity and parallel retinocortical channels". Annual review of psychology 41 (1): 635--658. 
  41. Dobkins, Karen R and Thiele, Alex and Albright, Thomas D (2000). "Comparison of red--green equiluminance points in humans and macaques: evidence for different L: M cone ratios between species". JOSA A 17 (3): 545-556. 
  42. Martin, Paul R and Lee, Barry B and White, Andrew JR and Solomon, Samuel G and Ruttiger, Lukas (2001). "Chromatic sensitivity of ganglion cells in the peripheral primate retina". Nature 410 (6831): 933-936. 
  43. Perry, VH and Oehler, R and Cowey, A (1984). "Retinal ganglion cells that project to the dorsal lateral geniculate nucleus in the macaque monkey". Neuroscience 12 (4): 1101--1123. 
  44. Casagrande, VA (1994). "A third parallel visual pathway to primate area V1". Trends in neurosciences 17 (7): 305-310. 
  45. Hendry, Stewart HC and Reid, R Clay (2000). "The koniocellular pathway in primate vision". Annual review of neuroscience 23 (1): 127-153. 
  46. Callaway, Edward M (1998). "Local circuits in primary visual cortex of the macaque monkey". Annual review of neuroscience 21 (1): 47-74. 
  47. Conway, Bevil R (2001). "Spatial structure of cone inputs to color cells in alert macaque primary visual cortex (V-1)". The Journal of Neuroscience 21 (8): 2768-2783. 
  48. Horwitz, Gregory D and Albright, Thomas D (2005). "Paucity of chromatic linear motion detectors in macaque V1". Journal of Vision 5 (6). 
  49. Danilova, Marina V and Mollon, JD (2006). "The comparison of spatially separated colours". Vision research 46 (6): 823-836. 
  50. Wachtler, Thomas and Sejnowski, Terrence J and Albright, Thomas D (2003). "Representation of color stimuli in awake macaque primary visual cortex". Neuron 37 (4): 681-691. 
  51. Solomon, Samuel G and Lennie, Peter (2005). "Chromatic gain controls in visual cortical neurons". The Journal of neuroscience 25 (19): 4779-4792. 
  52. Hubel, David H (1995). Eye, brain, and vision. Scientific American Library/Scientific American Books. 
  53. Livingstone, Margaret S and Hubel, David H (1987). "Psychophysical evidence for separate channels for the perception of form, color, movement, and depth". The Journal of Neuroscience 7 (11): 3416-3468. 
  54. Zeki, Semir M (1973). "Colour coding in rhesus monkey prestriate cortex". Brain research 53 (2): 422-427. 
  55. Conway, Bevil R and Tsao, Doris Y (2006). "Color architecture in alert macaque cortex revealed by fMRI". Cerebral Cortex 16 (11): 1604-1613. 
  56. Tootell, Roger BH and Nelissen, Koen and Vanduffel, Wim and Orban, Guy A (2004). "Search for color 'center(s)'in macaque visual cortex". Cerebral Cortex 14 (4): 353-363. 
  57. Conway, Bevil R and Moeller, Sebastian and Tsao, Doris Y (2007). "Specialized color modules in macaque extrastriate cortex". 560--573 56 (3): 560-573. 
  58. a b c d Fairchild, Mark D (2013). Color appearance models. John Wiley & Sons. 
  59. Webster, Michael A (1996). "Human colour perception and its adaptation". Network: Computation in Neural Systems 7 (4): 587 - 634. 
  60. Shapley, Robert and Enroth-Cugell, Christina (1984). "Visual adaptation and retinal gain controls". Progress in retinal research 3: 263-346. 
  61. Chaparro, A and Stromeyer III, CF and Chen, G and Kronauer, RE (1995). "Human cones appear to adapt at low light levels: Measurements on the red-green detection mechanism". Vision Research 35 (22): 3103-3118. 
  62. Macleod, Donald IA and Williams, David R and Makous, Walter (1992). "A visual nonlinearity fed by single cones". Vision research 32 (2): 347-363. 
  63. Hayhoe, Mary (1991). Adaptation mechanisms in color and brightness. Springer. 
  64. MacAdam, DAvid L (1970). Sources of Color Science. MIT Press. 
  65. Webster, Michael A and Mollon, JD (1995). "Colour constancy influenced by contrast adaptation". Nature 373 (6516): 694-698. 
  66. Brainard, David H and Wandell, Brian A (1992). "Asymmetric color matching: how color appearance depends on the illuminant". JOSA A 9 (9): 1443-1448. 
  67. a b c Eberhart Zrenner, KarlUlrich Bartz-Schmidt, Heval Benav, Dorothea Besch, Anna Bruckmann, Veit-Peter Gabel, Florian Gekeler, Udo Greppmaier, Alex Harscher, Steffen Kibbel, Johannes Koch, Akos Kusnyerik, tobias Peters, Katarina Stingl, Helmut Sachs et al. (2010). Subretinal electronic chips allow blind patients to read letters and combine them to words. 
  68. a b Asaf Shoval, ChrisopherAdams, Moshe David-Pur, Mark Shein, Yael Hanein, Evelyne Sernagor (2009). Carbon nanotube electrodes for effective interfacing with retinal tissue. 
  69. Jost B. Jonas, UlrikeSchneider, Gottfried O.H. Naumann (1992). Count and density of human retinal photoreceptors. Springer. 
  70. Ashmore Jonathan (2008). Cochlear Outer Hair Cell Motility. American Physiological Society. 
  71. a b c Chris Sekirnjak, PawelHottowy, Alexander Sher, Wladyslaw Dabrowski, Alan M. Litke, E.J. Chichilnisky (2008). High-Resolution Electrical Stimulation of Primate Retina for Epiretinal Implant Design. Society of Neuroscience. 
  72. Pritchard Roy. Stabilized Images on the Retina. 
  73. Susanne Klauke, Michael Goertz, Stefan Rein, Dirk Hoehl, Uwe Thomas, Reinhard Eckhorn, Frank Bremmer, Thomas Wachtler (2011). Stimulation with a Wireless Intraocular Epiretinal Implant Elicits Visual Percepts in Blind Humans. The Association for Research in Vision and Ophthalmology. 
  74. Neville Z. Mehenti, GrehS. Tsien, Theodore Leng, Harvey A. Fishman, Stacey F. Bent (2006). A model retinal interface based on directed neuronal growth for single cell stimulation. Springer. 
  75. T. Haslwanter (2012). "Mexican Hat Function [Python"]. private communications. 
  76. David, Hubel (1988). Eye, Brain, and Vision. Henry Holt and Company. Retrieved 2014-08-08. 
  77. Olshausen,B.A. and Field,D.J. (1996). "Emergence of simple-cell receptive field properties by learning a sparse code for natural images". Nature 381 (June 13): 607-609. 
  78. scikits-image development team (2012). "Emergence of Gabor-like functions from a SimpleIimage [Python"]. 
  79. Thomas Haslwanter (2012). "Demo-application of Gabor filters to an image [Python"]. 

Auditory System


The sensory system for the sense of hearing is the auditory system. This wikibook covers the physiology of the auditory system, and its application to the most successful neurosensory prosthesis - cochlear implants. The physics and engineering of acoustics are covered in a separate wikibook, Acoustics. An excellent source of images and animations is "Journey into the world of hearing" [1].

The ability to hear is not found as widely in the animal kingdom as other senses like touch, taste and smell. It is restricted mainly to vertebrates and insects. Within these, mammals and birds have the most highly developed sense of hearing. The table below shows frequency ranges of humans and some selected animals:

Humans 20-20'000 Hz
Whales 20-100'000 Hz
Bats 1'500-100'000 Hz
Fish 20-3'000 Hz

The organ that detects sound is the ear. It acts as receiver in the process of collecting acoustic information and passing it through the nervous system into the brain. The ear includes structures for both the sense of hearing and the sense of balance. It does not only play an important role as part of the auditory system in order to receive sound but also in the sense of balance and body position.

Mother and child
Humpback whales in the singing position
Big eared townsend bat
Hyphessobrycon pulchripinnis fish

Humans have a pair of ears placed symmetrically on both sides of the head which makes it possible to localize sound sources. The brain extracts and processes different forms of data in order to localize sound, such as:

  • the shape of the sound spectrum at the tympanic membrane (eardrum)
  • the difference in sound intensity between the left and the right ear
  • the difference in time-of-arrival between the left and the right ear
  • the difference in time-of-arrival between reflections of the ear itself (this means in other words: the shape of the pinna (pattern of folds and ridges) captures sound-waves in a way that helps localizing the sound source, especially on the vertical axis.

Healthy, young humans are able to hear sounds over a frequency range from 20 Hz to 20 kHz. We are most sensitive to frequencies between 2000 to 4000 Hz which is the frequency range of spoken words. The frequency resolution is 0.2% which means that one can distinguish between a tone of 1000 Hz and 1002 Hz. A sound at 1 kHz can be detected if it deflects the tympanic membrane (eardrum) by less than 1 Angstrom, which is less than the diameter of a hydrogen atom. This extreme sensitivity of the ear may explain why it contains the smallest bone that exists inside a human body: the stapes (stirrup). It is 0.25 to 0.33 cm long and weighs between 1.9 and 4.3 mg.

Anatomy of the Auditory System

Human (external) ear

The aim of this section is to explain the anatomy of the auditory system of humans. The chapter illustrates the composition of auditory organs in the sequence that acoustic information proceeds during sound perception.
Please note that the core information for “Sensory Organ Components” can also be found on the Wikipedia page “Auditory system”, excluding some changes like extensions and specifications made in this article. (see also: Wikipedia Auditory system)

The auditory system senses sound waves, that are changes in air pressure, and converts these changes into electrical signals. These signals can then be processed, analyzed and interpreted by the brain. For the moment, let's focus on the structure and components of the auditory system. The auditory system consists mainly of two parts:

  • the ear and
  • the auditory nervous system (central auditory system)

The ear

The ear is the organ where the first processing of sound occurs and where the sensory receptors are located. It consists of three parts:

  • outer ear
  • middle ear
  • inner ear
Anatomy of the human ear (green: outer ear / red: middle ear / purple: inner ear)

Outer ear

Function: Gathering sound energy and amplification of sound pressure.

The folds of cartilage surrounding the ear canal (external auditory meatus, external acoustic meatus) are called the pinna. It is the visible part of the ear. Sound waves are reflected and attenuated when they hit the pinna, and these changes provide additional information that will help the brain determine the direction from which the sounds came. The sound waves enter the auditory canal, a deceptively simple tube. The ear canal amplifies sounds that are between 3 and 12 kHz. At the far end of the ear canal is the tympanic membrane (eardrum), which marks the beginning of the middle ear.

Middle ear

Micro-CT image of the ossicular chain showing the relative position of each ossicle.

Function: Transmission of acoustic energy from air to the cochlea.
Sound waves traveling through the ear canal will hit the tympanic membrane (tympanum, eardrum). This wave information travels across the air-filled tympanic cavity (middle ear cavity) via a series of bones: the malleus (hammer), incus (anvil) and stapes (stirrup). These ossicles act as a lever and a teletype, converting the lower-pressure eardrum sound vibrations into higher-pressure sound vibrations at another, smaller membrane called the oval (or elliptical) window, which is one of two openings into the cochlea of the inner ear. The second opening is called round window. It allows the fluid in the cochlea to move.

The malleus articulates with the tympanic membrane via the manubrium, whereas the stapes articulates with the oval window via its footplate. Higher pressure is necessary because the inner ear beyond the oval window contains liquid rather than air. The sound is not amplified uniformly across the ossicular chain. The stapedius reflex of the middle ear muscles helps protect the inner ear from damage.

The middle ear still contains the sound information in wave form; it is converted to nerve impulses in the cochlea.

Inner ear

Structural diagram of the cochlea Cross section of the cochlea
Cochlea.svg Cochlea-crosssection.svg

Function: Transformation of mechanical waves (sound) into electric signals (neural signals).

The inner ear consists of the cochlea and several non-auditory structures. The cochlea is a snail-shaped part of the inner ear. It has three fluid-filled sections: scala tympani (lower gallery), scala media (middle gallery, cochlear duct) and scala vestibuli (upper gallery). The cochlea supports a fluid wave driven by pressure across the basilar membrane separating two of the sections (scala tympani and scala media). The basilar membrane is about 3 cm long and between 0.5 to 0.04 mm wide. Reissner’s membrane (vestibular membrane) separates scala media and scala vestibuli.

Strikingly, one section, the scala media, contains an extracellular fluid similar in composition to endolymph, which is usually found inside of cells. The organ of Corti is located in this duct, and transforms mechanical waves to electric signals in neurons. The other two sections, scala tympani and scala vestibuli, are located within the bony labyrinth which is filled with fluid called perilymph. The chemical difference between the two fluids endolymph (in scala media) and perilymph (in scala tympani and scala vestibuli) is important for the function of the inner ear.

Organ of Corti

The organ of Corti forms a ribbon of sensory epithelium which runs lengthwise down the entire cochlea. The hair cells of the organ of Corti transform the fluid waves into nerve signals. The journey of a billion nerves begins with this first step; from here further processing leads to a series of auditory reactions and sensations.

Transition from ear to auditory nervous system

Section through the spiral organ of Corti

Hair cells

Hair cells are columnar cells, each with a bundle of 100-200 specialized cilia at the top, for which they are named. These cilia are the mechanosensors for hearing. The shorter ones are called stereocilia, and the longest one at the end of each haircell bundle kinocilium. The location of the kinocilium determines the on-direction, i.e. the direction of deflection inducing the maximum hair cell excitation. Lightly resting atop the longest cilia is the tectorial membrane, which moves back and forth with each cycle of sound, tilting the cilia and allowing electric current into the hair cell.

The function of hair cells is not fully established up to now. Currently, the knowledge of the function of hair cells allows to replace the cells by cochlear implants in case of hearing lost. However, more research into the function of the hair cells may someday even make it possible for the cells to be repaired. The current model is that cilia are attached to one another by “tip links”, structures which link the tips of one cilium to another. Stretching and compressing, the tip links then open an ion channel and produce the receptor potential in the hair cell. Note that a deflection of 100 nanometers already elicits 90% of the full receptor potential.


The nervous system distinguishes between nerve fibres carrying information towards the central nervous system and nerve fibres carrying the information away from it:

  • Afferent neurons (also sensory or receptor neurons) carry nerve impulses from receptors (sense organs) towards the central nervous system
  • Efferent neurons (also motor or effector neurons) carry nerve impulses away from the central nervous system to effectors such as muscles or glands (and also the ciliated cells of the inner ear)

Afferent neurons innervate cochlear inner hair cells, at synapses where the neurotransmitter glutamate communicates signals from the hair cells to the dendrites of the primary auditory neurons.

There are far fewer inner hair cells in the cochlea than afferent nerve fibers. The neural dendrites belong to neurons of the auditory nerve, which in turn joins the vestibular nerve to form the vestibulocochlear nerve, or cranial nerve number VIII'

Efferent projections from the brain to the cochlea also play a role in the perception of sound. Efferent synapses occur on outer hair cells and on afferent (towards the brain) dendrites under inner hair cells.

Auditory nervous system

The sound information, now re-encoded in form of electric signals, travels down the auditory nerve (acoustic nerve, vestibulocochlear nerve, VIIIth cranial nerve), through intermediate stations such as the cochlear nuclei and superior olivary complex of the brainstem and the inferior colliculus of the midbrain, being further processed at each waypoint. The information eventually reaches the thalamus, and from there it is relayed to the cortex. In the human brain, the primary auditory cortex is located in the temporal lobe.

Primary auditory cortex

The primary auditory cortex is the first region of cerebral cortex to receive auditory input.

Perception of sound is associated with the right posterior superior temporal gyrus (STG). The superior temporal gyrus contains several important structures of the brain, including Brodmann areas 41 and 42, marking the location of the primary auditory cortex, the cortical region responsible for the sensation of basic characteristics of sound such as pitch and rhythm.

The auditory association area is located within the temporal lobe of the brain, in an area called the Wernicke's area, or area 22. This area, near the lateral cerebral sulcus, is an important region for the processing of acoustic signals so that they can be distinguished as speech, music, or noise.

Auditory Signal Processing

Now that the anatomy of the auditory system has been sketched out, this topic goes deeper into the physiological processes which take place while perceiving acoustic information and converting this information into data that can be handled by the brain. Hearing starts with pressure waves hitting the auditory canal and is finally perceived by the brain. This section details the process transforming vibrations into perception.

Effect of the head

Sound waves with a wavelength shorter than the head produce a sound shadow on the ear further away from the sound source. When the wavelength is shorter than the head, diffraction of the sound leads to approximately equal sound intensities on both ears.

Difference in loudness and timing help us to localize the source of a sound signal.

Sound reception at the pinna

The pinna collects sound waves in air affecting sound coming from behind and the front differently with its corrugated shape. The sound waves are reflected and attenuated or amplified. These changes will later help sound localization.

In the external auditory canal, sounds between 3 and 12 kHz - a range crucial for human communication - are amplified. It acts as resonator amplifying the incoming frequencies.

Sound conduction to the cochlea

Sound that entered the pinna in form of waves travels along the auditory canal until it reaches the beginning of the middle ear marked by the tympanic membrane (eardrum). Since the inner ear is filled with fluid, the middle ear is kind of an impedance matching device in order to solve the problem of sound energy reflection on the transition from air to the fluid. As an example, on the transition from air to water 99.9% of the incoming sound energy is reflected. This can be calculated using:

 \frac{I_r}{I_i} = \left ( \frac {Z_2 - Z_1}{Z_2 + Z_1} \right ) ^2

with Ir the intensity of the reflected sound, Ii the intensity of the incoming sound and Zk the wave resistance of the two media ( Zair = 414 kg m-2 s-1 and Zwater = 1.48*106 kg m-2 s-1). Three factors that contribute the impedance matching are:

  • the relative size difference between tympanum and oval window
  • the lever effect of the middle ear ossicles and
  • the shape of the tympanum.
Mechanics of the amplification effect of the middle ear.

The longitudinal changes in air pressure of the sound-wave cause the tympanic membrane to vibrate which, in turn, makes the three chained ossicles malleus, incus and stirrup oscillate synchronously. These bones vibrate as a unit, elevating the energy from the tympanic membrane to the oval window. In addition, the energy of sound is further enhanced by the areal difference between the membrane and the stapes footplate. The middle ear acts as an impedance transformer by changing the sound energy collected by the tympanic membrane into greater force and less excursion. This mechanism facilitates transmission of sound-waves in air into vibrations of the fluid in the cochlea. The transformation results from the pistonlike in- and out-motion by the footplate of the stapes which is located in the oval window. This movement performed by the footplate sets the fluid in the cochlea into motion.

Through the stapedius muscle, the smallest muscle in the human body, the middle ear has a gating function: contracting this muscle changes the impedance of the middle ear, thus protecting the inner ear from damage through loud sounds.

Frequency analysis in the cochlea

The three fluid-filled compartements of the cochlea (scala vestibuli, scala media, scala tympani) are separated by the basilar membrane and the Reissner’s membrane. The function of the cochlea is to separate sounds according to their spectrum and transform it into a neural code. When the footplate of the stapes pushes into the perilymph of the scala vestibuli, as a consequence the membrane of Reissner bends into the scala media. This elongation of Reissner’s membrane causes the endolymph to move within the scala media and induces a displacement of the basilar membrane. The separation of the sound frequencies in the cochlea is due to the special properties of the basilar membrane. The fluid in the cochlea vibrates (due to in- and out-motion of the stapes footplate) setting the membrane in motion like a traveling wave. The wave starts at the base and progresses towards the apex of the cochlea. The transversal waves in the basilar membrane propagate with

 c_{trans} = \sqrt{\frac{\mu}{\rho}}

with μ the shear modulus and ρ the density of the material. Since width and tension of the basilar membrane change, the speed of the waves propagating along the membrane changes from about 100 m/s near the oval window to 10 m/s near the apex.

There is a point along the basilar membrane where the amplitude of the wave decreases abruptly. At this point, the sound wave in the cochlear fluid produces the maximal displacement (peak amplitude) of the basilar membrane. The distance the wave travels before getting to that characteristic point depends on the frequency of the incoming sound. Therefore each point of the basilar membrane corresponds to a specific value of the stimulating frequency. A low-frequency sound travels a longer distance than a high-frequency sound before it reaches its characteristic point. Frequencies are scaled along the basilar membrane with high frequencies at the base and low frequencies at the apex of the cochlea.

The position x of the maximal amplitude of the travelling wave corresponds in a 1-to-1 way to a stimulus frequency.

Sensory transduction in the cochlea

Most everyday sounds are composed of multiple frequencies. The brain processes the distinct frequencies, not the complete sounds. Due to its inhomogeneous properties, the basilar membrane is performing an approximation to a Fourier transform. The sound is thereby split into its different frequencies, and each hair cell on the membrane corresponds to a certain frequency. The loudness of the frequencies is encoded by the firing rate of the corresponding afferent fiber. This is due to the amplitude of the traveling wave on the basilar membrane, which depends on the loudness of the incoming sound.

Transduction mechanism in auditory or vestibular hair cell. Tilting the hair cell towards the kinocilium opens the potassium ion channels. This changes the receptor potential in the hair cell. The resulting emission of neurotransmitters can elicit an action potential (AP) in the post-synaptic cell.
Auditory haircells are very similar to those of the vestibular system. Here an electron microscopy image of a frog's sacculus haircell.

The sensory cells of the auditory system, known as hair cells, are located along the basilar membrane within the organ of Corti. Each organ of Corti contains about 16’000 such cells, innervated by about 30'000 afferent nerve fibers. There are two anatomically and functionally distinct types of hair cells: the inner and the outer hair cells. Along the basilar membrane these two types are arranged in one row of inner cells and three to five rows of outer cells. Most of the afferent innervation comes from the inner hair cells while most of the efferent innervation goes to the outer hair cells. The inner hair cells influence the discharge rate of the individual auditory nerve fibers that connect to these hair cells. Therefore inner hair cells transfer sound information to higher auditory nervous centers. The outer hair cells, in contrast, amplify the movement of the basilar membrane by injecting energy into the motion of the membrane and reducing frictional losses but do not contribute in transmitting sound information. The motion of the basilar membrane deflects the stereocilias (hairs on the hair cells) and causes the intracellular potentials of the hair cells to decrease (depolarization) or increase (hyperpolarization), depending on the direction of the deflection. When the stereocilias are in a resting position, there is a steady state current flowing through the channels of the cells. The movement of the stereocilias therefore modulates the current flow around that steady state current.

Lets look at the modes of action of the two different hair cell types separately:

  • Inner hair cells:

The deflection of the hair-cell stereocilia opens mechanically gated ion channels that allow small, positively charged potassium ions (K+) to enter the cell and causing it to depolarize. Unlike many other electrically active cells, the hair cell itself does not fire an action potential. Instead, the influx of positive ions from the endolymph in scala media depolarizes the cell, resulting in a receptor potential. This receptor potential opens voltage gated calcium channels; calcium ions (Ca2+) then enter the cell and trigger the release of neurotransmitters at the basal end of the cell. The neurotransmitters diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger action potentials in the nerve. In this way, neurotransmitter increases the firing rate in the VIIIth cranial nerve and the mechanical sound signal is converted into an electrical nerve signal.
The repolarization in the hair cell is done in a special manner. The perilymph in Scala tympani has a very low concentration of positive ions. The electrochemical gradient makes the positive ions flow through channels to the perilymph. (see also: Wikipedia Hair cell)

  • Outer hair cells:

In humans outer hair cells, the receptor potential triggers active vibrations of the cell body. This mechanical response to electrical signals is termed somatic electromotility and drives oscillations in the cell’s length, which occur at the frequency of the incoming sound and provide mechanical feedback amplification. Outer hair cells have evolved only in mammals. Without functioning outer hair cells the sensitivity decreases by approximately 50 dB (due to greater frictional losses in the basilar membrane which would damp the motion of the membrane). They have also improved frequency selectivity (frequency discrimination), which is of particular benefit for humans, because it enables sophisticated speech and music. (see also: Wikipedia Hair cell)

With no external stimulation, auditory nerve fibres discharge action potentials in a random time sequence. This random time firing is called spontaneous activity. The spontaneous discharge rates of the fibers vary from very slow rates to rates of up to 100 per second. Fibers are placed into three groups depending on whether they fire spontaneously at high, medium or low rates. Fibers with high spontaneous rates (> 18 per second) tend to be more sensitive to sound stimulation than other fibers.

Auditory pathway of nerve impulses

Lateral lemniscus in red, as it connects the cochlear nucleus, superior olivary nucleus and the inferior colliculus. Seen from behind.

So in the inner hair cells the mechanical sound signal is finally converted into electrical nerve signals. The inner hair cells are connected to auditory nerve fibres whose nuclei form the spiral ganglion. In the spiral ganglion the electrical signals (electrical spikes, action potentials) are generated and transmitted along the cochlear branch of the auditory nerve (VIIIth cranial nerve) to the cochlear nucleus in the brainstem.

From there, the auditory information is divided into at least two streams:

  • Ventral Cochlear Nucleus:

One stream is the ventral cochlear nucleus which is split further into the posteroventral cochlear nucleus (PVCN) and the anteroventral cochlear nucleus (AVCN). The ventral cochlear nucleus cells project to a collection of nuclei called the superior olivary complex.

Superior olivary complex: Sound localization

The superior olivary complex - a small mass of gray substance - is believed to be involved in the localization of sounds in the azimuthal plane (i.e. their degree to the left or the right). There are two major cues to sound localization: Interaural level differences (ILD) and interaural time differences (ITD). The ILD measures differences in sound intensity between the ears. This works for high frequencies (over 1.6 kHz), where the wavelength is shorter than the distance between the ears, causing a head shadow - which means that high frequency sounds hit the averted ear with lower intensity. Lower frequency sounds don't cast a shadow, since they wrap around the head. However, due to the wavelength being larger than the distance between the ears, there is a phase difference between the sound waves entering the ears - the timing difference measured by the ITD. This works very precisely for frequencies below 800 Hz, where the ear distance is smaller than half of the wavelength. Sound localization in the median plane (front, above, back, below) is helped through the outer ear, which forms direction-selective filters.

There, the differences in time and loudness of the sound information in each ear are compared. Differences in sound intensity are processed in cells of the lateral superior olivary complexm and timing differences (runtime delays) in the medial superior olivary complex. Humans can detect timing differences between the left and right ear down to 10 μs, corresponding to a difference in sound location of about 1 deg. This comparison of sound information from both ears allows the determination of the direction where the sound came from. The superior olive is the first node where signals from both ears come together and can be compared. As a next step, the superior olivary complex sends information up to the inferior colliculus via a tract of axons called lateral lemniscus. The function of the inferior colliculus is to integrate information before sending it to the thalamus and the auditory cortex. It is interesting to know that the superior colliculus close by shows an interaction of auditory and visual stimuli.

  • Dorsal Cochlear Nucleus:

The dorsal cochlear nucleus (DCN) analyzes the quality of sound and projects directly via the lateral lemnisucs to the inferior colliculus.

From the inferior colliculus the auditory information from ventral as well as dorsal cochlear nucleus proceeds to the auditory nucleus of the thalamus which is the medial geniculate nucleus. The medial geniculate nucleus further transfers information to the primary auditory cortex, the region of the human brain that is responsible for processing of auditory information, located on the temporal lobe. The primary auditory cortex is the first relay involved in the conscious perception of sound.

Primary auditory cortex and higher order auditory areas

Sound information that reaches the primary auditory cortex (Brodmann areas 41 and 42). The primary auditory cortex is the first relay involved in the conscious perception of sound. It is known to be tonotopically organized and performs the basics of hearing: pitch and volume. Depending on the nature of the sound (speech, music, noise), is further passed to higher order auditory areas. Sounds that are words are processed by Wernicke’s area (Brodmann area 22). This area is involved in understanding written and spoken language (verbal understanding). The production of sound (verbal expression) is linked to Broca’s area (Brodmann areas 44 and 45). The muscles to produce the required sound when speaking are contracted by the facial area of motor cortex which are regions of the cerebral cortex that are involved in planning, controlling and executing voluntary motor functions.

Lateral surface of the brain with Brodmann's areas numbered.

Human Speech



The intensity of sound is typically expressed in deciBel (dB), defined as

 SPL = 20 * log \frac{p}{p_0}

where SPL = “sound pressure level” (in dB), and the reference pressure is p_0 = 2*10^{-5} N/m^2 . Note that this is much smaller than the air pressure (ca. 105 N/m2)! Also watch out, because sound is often expressed relative to "Hearing Level" instead of SPL.

  • 0 - 20 dB SPL ... hearing level (0 dB for sinusoidal tones, from 1 kHz – 4 kHz)
  • 60 dB SPL ... medium loud tone, conversational speech

Fundamental frequency, from the vibrations of the vocal cords in the larynx, is about 120 Hz for adult male, 250 Hz for adult female, and up to 400 Hz for children.

Frequency- and loudness-dependence of human hearing loss.


Formants are the dominant frequencies in human speech, and are caused by resonances of the signals from the vocal cord in our mouth etc. Formants show up as distinct peaks of energy in the sound's frequency spectrum. They are numbered in ascending order starting with the format at the lowest frequency.

Spectrogram of the German vowels "a,e,i,o,u". These correspond approximately to the vowels in the English words "hut, hat, hit, hot, put". Calculated using the MATLAB command "spectrogram(data, 512,256, 512, fs)". The chapter Power Spectrum of Non-stationary Signals below describes the mathematics behind the spectrogram.


Speech is often considered to consist of a sequence of acoustic units called phons, which correspond to linguistic units called phonemes. Phonemes are the smallest units of sound that allows different words to be distinguished. The word "dog", for example, contains three phonemes. Changes to the first, second, and third phoneme respectively produce the words "log", "dig", and "dot". English is said to contain 40 different phonemes, specified as in /d/, /o/, /g/ for the word "dog".

Speech Perception

The ability of humans to decode speech signals still easily exceeds that of any algorithm developed so far. While automatic speech recognition has become fairly successful in recognizing clearly spoken speech in environments with high Signal-to-noise ratio, once the conditions become a bit less than ideal, recognition algorithms tend to perform vary poorly compared to humans. It seems from this that our computer speech recognition algorithms have not yet come close to capturing the underlying algorithm that humans use to recognize speech.

Evidence has shown that the perception of speech takes quite a different route than the perception of other sounds in the brain. While studies on non-speech sound responses have generally found response to be graded with stimulus, speech studies have repeatedly found a discretization of response when a graded stimulus is presented. For instance, Lisker and Abramson,[2] played a pre-voiced 'b/p' sound. Whether the sound is interpreted as a /b/ or a /p/ depends on the voice onset time (VOT). They found that when smoothly varying the VOT, there was a sharp change (at ~20ms after the consonant is played) where subjects switched their identification from /b/ to /p/. Furthermore, subjects had a great deal of difficulty differentiating between two sounds in the same category (e.g. pairs of sounds with a VOTs of -10ms to 10m, which would both be /b/'s, than sounds with a 10ms to 30ms, which would be identified as a b and a p). This shows that some type of categorization scheme is going on. One of the main problems encountered when trying to build a model of speech perception is the so-called 'Lack of Invariance', which could more straightforwardly just be stated as the 'variance'. This term refers to the fact that a single phoneme (e.g. /p/ as in sPeech or Piety), has a great variety of waveforms that map to it, and that the mapping between an acoustic waveform and a phoneme is far from obvious and heavily context-dependent, yet human listeners reliably give the correct result. Even when the context is similar, a waveform will show a great deal of variance due to factors such as the pace of speech, the identity of the speaker and the tone in which he is speaking. So while there is no agreed-upon model of speech perception, the existing models can be split into two classes: Passive Perception and Active perception.

Passive Perception Models

Passive perception theories generally describe the problem of speech perception in the same way that most sensory signal-processing algorithms do: Some raw input signal goes in, and is processed though a hierarchy where each subsequent step extracts some increasingly abstract signal from the input. One of the early examples of a passive model was distinctive feature theory. The idea is to identify the presence of sets of binary values for certain features. For example, 'nasal/oral', 'vocalic/non-vocalic'. The theory is that a phoneme is interpreted as a binary vector of the presence or absence of these features. These features can be extracted from the spectrogram data. Other passive models, such as those described by Selfridge[3] and Uttley,[4] involve a kind of template-matching, where a hierarchy of processing layers extract features that are increasingly abstract and invariant to certain irrelevant features (such as identity of the speaker when classifying phonemes).

Active Perception Models

An entirely different take on speech perception are active-perception theories. These theories make the point that it would be redundant for the brain to have two parallel systems for speech perception and speech production, given that the ability produce a sound is so closely tied with the ability to identify it - proponents of these theories argue that it would be wasteful and complicated to maintain two separate databases-one containing the programs to identify phonemes, and another to produce them. They argue that speech perception is actually done by attempting to replicate the incoming signal, and thus using the same circuits for phoneme production as for identification. The Motor Theory of speech perception (Liberman et al., 1967), states that speech sounds are identified not by any sort of template matching, but by using the speech-generating mechanisms to try and regenerate a copy of the speech signal. It states that phonemes should not be seen as hidden signals within the speech, but as “cues” that the generating mechanism attempts to reproduce in a pre-speech signal. The theory states that speech-generating regions of the brain learn which speech-precursor signals will produce which sounds by the constant feedback loop of always hearing one's own speech. The babbling of babies, it is argued, is a way of learning this how to generate these “cue” sounds from pre-motor signals.[5]

A similar idea is proposed in the analysis-by-synthesis model, by Stevens and Halle.[6] This describes a generative model which attempts to regenerate a similar signal to the incoming sound. It essentially takes advantage of the fact that speech-generating mechanisms are similar between people, and that the characteristic features that one hears in speech can be reproduced by the speaker. As the speaker hears the sound, the speech centers attempt to generate the signal that's coming in. Comparators give constant feedback on the quality of the regeneration. The 'units of perception', are therefore not so much abstractions of the incoming sound, as pre-motor commands for generating the same speech.

Motor theories took a serious hit when a series of studies on what is now known as Broca's Aphasia were published. This condition impairs one's ability to produce speech sounds, without impairing the ability to comprehend them, whereas motor theory, taken in its original form, states that production and comprehension are done by the same circuits, so impaired speech production should imply impaired speech comprehension. The existence of Broca's aphasia appears to contradicts this prediction.[7]

Current Models

The TRACE model of speech perception. All connections beyond the input layer are bidirectional. Each unit represents some unit of speech such as a word of a phoneme.

One of the most influential computational models of speech perception is called TRACE.[8] TRACE is a neural-network-like model, with three layers and a recurrent connection scheme. The first layer extracts features from an input spectrogram in temporal order, basically simulating the cochlea. The second layer extracts phonemes from the feature information, and the third layer extracts words from the phoneme information. The model contains feed-forward (bottom-up) excitatory connections, lateral inhibitory connections, and feedback (top-down) excitatory connections. In this model, each computational unit corresponds to some unit of perception (e.g. the phoneme /p/ or the word "preposterous"). The basic idea is that, based on their input, units within a layer will compete to have the strongest output. The lateral inhibitory connections result in a sort of winner-takes-all circuit, in which the unit with the strongest input will inhibit its neighbors and become the clear winner. The feedback connections allow us to explain the effect of context-dependent comprehension - for example, suppose the phoneme layer, based on its bottom-up inputs, could not decide whether it had heard a /g/ or a /k/, but that the phoneme was preceded by 'an', and followed by 'ry'. Both the /g/ and /k/ units would initially be equally activated, sending inputs up to the word level, which would already contain excited units corresponding to words such as 'anaconda', 'angry', and 'ankle', which had been activated by the preceding 'an'. The excitement of the /g/ or /k/


  1. NeurOreille and authors (2010). "Journey into the world of hearing". 
  2. Lisker, L.; Abramson (1970). "The voicing dimension: Some experiments in comparative phonetics". in B. Hála, M. Romportl and P. Janota. Proceedings of the 6th International Congress of Phonetic Sciences. Prague: Academia. 
  3. Selfridge, O.C (1959) "Pandemonium: a paradigm for learning". in Proceedings of the Symposium on Mechanisation of Thought Process. National Physics Laboratory.
  4. Uttley, A.M. (July 1966). "The transmission of information and the effect of local feedback in theoretical and neural networks". Brain Research 2 (1): 21–50. doi:10.1016/0006-8993(66)90060-6. 
  5. Liberman, A. M.; Mattingly, I. G.; Turvey (1967). "Language codes and memory codes". in Melton, A. W.; Martin, E.. Coding Processes in Human Memory. V. H. Winston & Sons. pp. 307-334. 
  6. Stevens, K. N.; Halle, M. (1967). "Remarks on analysis by synthesis and distinctive features". in Wathen-Dunn, W.. Models for the perception of speech and visual form: proceedings of a symposium. Cambridge, MA: MIT Press. pp. 88–102. 
  7. Hickok, Gregory (January 2010). "The role of mirror neurons in speech and language processing". Brain and Language 112 (1): 1–2. doi:10.1016/j.bandl.2009.10.006. 
  8. McClelland, James L; Elman, Jeffrey L (January 1986). "The TRACE model of speech perception". Cognitive Psychology 18 (1): 1–86. doi:10.1016/0010-0285(86)90015-0. 

Cochlear Implants

Cochlear implant

A cochlear implant (CI) is a surgically implanted electronic device that replaces the mechanical parts of the auditory system by directly stimulating the auditory nerve fibers through electrodes inside the cochlea. Candidates for cochlear implants are people with severe to profound sensorineural hearing loss in both ears and a functioning auditory nervous system. They are used by post-lingually deaf people to regain some comprehension of speech and other sounds as well as by pre-lingually deaf children to enable them to gain spoken language skills. (Diagnosis of hearing loss in newborns and infants is done using otoacoustic emissions, and/or the recording of auditory evoked potentials.) A quite recent evolution is the use of bilateral implants allowing recipients basic sound localization.

Parts of the cochlear implant

The implant is surgically placed under the skin behind the ear. The basic parts of the device include:


  • a microphone which picks up sound from the environment
  • a speech processor which selectively filters sound to prioritize audible speech and sends the electrical sound signals through a thin cable to the transmitter,
  • a transmitter, which is a coil held in position by a magnet placed behind the external ear, and transmits the processed sound signals to the internal device by electromagnetic induction,


The internal part of a cochlear implant (model Cochlear Freedom 24 RE)
  • a receiver and stimulator secured in bone beneath the skin, which converts the signals into electric impulses and sends them through an internal cable to electrodes,
  • an array of up to 24 electrodes wound through the cochlea, which send the impulses to the nerves in the scala tympani and then directly to the brain through the auditory nerve system

Signal processing for cochlear implants

In normal hearing subjects, the primary information carrier for speech signals is the envelope, whereas for music, it is the fine structure. This is also relevant for tonal languages, like Mandarin, where the meaning of words depends on their intonation. It was also found that interaural time delays coded in the fine structure determine where a sound is heard from rather than interaural time delays coded in the envelope, although it is still the speech signal coded in the envelope that is perceived.

The speech processor in a cochlear implant transforms the microphone input signal into a parallel array of electrode signals destined for the cochlea. Algorithms for the optimal transfer function between these signals are still an active area of research. The first cochlear implants were single-channel devices. The raw sound was band-passed filtered to include only the frequency range of speech, then modulated onto a 16 kHz wave to allow the electrical signal to electrically couple to the nerves. This approach was able to provide very basic hearing, but was extremely limited in that it was completely unable to take advantage of the frequency-location map of the cochlea.

The advent of multi-channel implants opened the door to try a number of different speech-processing strategies to facilitate hearing. These can be roughly divided into Waveform and Feature-Extraction strategies.

Waveform Strategies

These generally involve applying a non-linear gain on the sound (as an input audio signal with a ~30dB dynamic range must be compressed into an electrical signal with just a ~5dB dynamic range), and passing it through parallel filter banks. The first waveform strategy to be tried was Compressed Analog approach. In this system, the raw audio is initially filtered with a gain-controlled amplifier (the gain-control reduces the dynamic range of the signal). The signal is then passed through parallel band-pass filters, and the output of these filters goes on to stimulate electrodes at their appropriate locations.

A problem with the Compressed Analog approach was that the there was a strong interaction-effect between adjacent electrodes. If electrodes driven by two filters happened to be stimulating at the same time, the superimposed stimulation could cause unwanted distortion in the signals coming from hair cells that were within range of both of these electrodes. The solution to this was the Continuous Interleaved Sampling Approach - in which the electrodes driven by adjacent filters stimulate at slightly different times. This eliminates the interference effect between nearby electrodes, but introduces the problem that, due to the interleaving, temporal resolution suffers.

Schematic representation of Continuous Interleaved Sampling (CIS). The processing ("Proc") comprises the envelope detection, amplitude compression, digitization, and pulse modulation.

Feature-Extraction Strategies

These strategies focus less on transmitting filtered versions of the audio signal and more on extracting more abstract features of the signal and transmitting them to the electrodes. The first feature-extraction strategies looked for the formants (frequencies with maximum energy) in speech. In order to do this, they would apply wide band filters (e.g. 270 Hz low-pass for F0 - the base formant, 300 Hz-1 kHz for F1, and 1 kHz-4 kHz for F2), then calculate the formant frequency, using the zero-crossings of each of these filter outputs, and formant-amplitude by looking at the envelope of the signals from each filter. Only electrodes corresponding to these formant frequencies would be activated. The main limitation of this approach was that formants primarily identify vowels, and consonant information, which primarily resides in higher frequencies, was poorly transmitted. The MPEAK system later improved on this design my incorporating high-frequency filters which could better simulate unvoiced sounds (consonants) by stimulating high-frequency electrodes, and formant frequency electrodes at random intervals.[1][2][3]

Current Developments

Block diagram of the SPEAK processing scheme

Currently, the leading strategy is the SPEAK system, which combines characteristics of Waveform and Feature-Detection strategies. In this system, the signal passes through a parallel array of 20 band-pass filters. The envelope is extracted from each of these and several of the most powerful frequencies are selected (how many depends on the shape of the spectrum), and the rest are discarded. This is known as a 'n-of-m" strategy. The amplitudes of these are then logarithmically compressed to adapt the mechanical signal range of sound to the much narrower electrical signal range of hair cells.

Multiple microphones

On its newest implants, the company Cochlear uses 3 microphones instead of one. The additional information is used for beam-forming, i.e. extracting more information from sound coming from straight ahead. This can improve the signal-to-noise ratio when talking to other people by up to 15dB, thereby significantly enhancing speech perception in noisy environments.

Integration CI – Hearing Aid

Preservation of low-frequency hearing after cochlear implantation is possible with careful surgical technique and with careful attention to electrode design. For patients with remaining low-frequency hearing, the company MedEl offers a combination of a cochlea implant for the higher frequencies, and classical hearing aid for the lower frequencies. This system, called EAS for electric-acoustic stimulation, uses with a lead of 18mm, compared to 31.5 mm for the full CI. (The length of the cochlea is about 36 mm.) This results in a significant improvement of music perception, and improved speech recognition for tonal languages.

Fine Structure

Graph showing how envelope (in red) and phase (black dots, for zero crossings) of a signal can be simply derived with the Hilbert Transform.

For high frequencies, the human auditory system uses only tonotopic coding for information. For low frequencies, however, also temporal information is used: the auditory nerve fires synchronously with the phase of the signal. In contrast, the original CIs only used the power spectrum of the incoming signal. In its new models, MedEl incorporates the timing information for low frequencies, which it calls fine structure, in determining the timing of the stimulation pulses. This improves music perception, and speech perception for tonal languages like Mandarin.

Mathematically, envelope and fine-structure of a signal can be elegantly obtained with the Hilbert Transform (see Figure). The corresponding Python code is available under.[4]

Virtual Electrodes

The numbers of electrodes available is limited by the size of the electrode (and the resulting charge and current densities), and by the current spread along the endolymph. To increase the frequency specificity, one can stimulate two adjacent electrodes. Subjects report to perceive this as a single tone at a frequency intermediate to the two electrodes.

Simulation of the stimulation strength of a cochlear implant

Simulation of a cochlear implant

Sound processing in cochlear implant is still subject to a lot of research and one of the major product differentiations between the manufacturers. However, the basic sound processing is rather simple and can be implemented to gain an impression of the quality of sound perceived by patients using a cochlear implant. The first step in the process is to sample some sound and analyze its frequency. Then a time-window is selected, during which we want to find the stimulation strengths of the CI electrodes. There are two ways to achieve that: i) through the use of linear filters ( see Gammatone filters); or ii) through the calculation of the powerspectrum (see Spectral Analysis).

Cochlear implants and Magnetic Resonance Imaging

With more than 150 000 implantations worldwide, Cochlear Implants (CIs) have now become a standard method for treating severe to profound hearing loss. Since the benefits of CIs become more evident, payers become more willing to support CIs and due to the screening programs of newborns in most industrialized nations, many patients get CIs in infancy and will likely continue to have them throughout their lives. Some of them may require diagnostic scanning during their lives which may be assisted by imaging studies with Magnetic resonance imaging (MRI). For large segments of the population, including patients suffering from stroke, back pain or headache, MRI has become a standard method for diagnosis. MRI uses pulses of magnetic fields to generate images and current MRI machines are working with 1.5 Tesla magnet fields. 0.2 to 4.0 Tesla devices are common and the radiofrequency power can peak as high as 6 kW in a 1.5 Tesla machine.

Cochlear implants have been historically thought to incompatible with MRI with magnetic fields higher than 0.2 T. The external parts of the device always have to be removed. There are different regulations for the internal parts of the device. Current US Food and Drug Administration (FDA) guidelines allow limited use of MRI after CI implantation. The pulsar and Sonata (MED-EL Corp, Innsbruck, Austria) devices are approved for 0.2 T MRI with the magnet in place. The Hi-res 90K (Advanced Bionics Corp, Sylmar, CA, USA) and the Nucleus Freedom (Cochlear Americas, Englewood, CO, USA) are approved for up to 1.5 T MRI after surgical removal of the internal magnet. Each removal and replacement of the magnet can be done using a small incision under local anesthesia, but the procedure is likely to weaken the pocket of the magnet and to risk infection of the patient.

Cadaver studies have shown that there is a risk that the implant may be displaced from the internal device in a 1.5 T MRI scanner. However, the risk could be eliminated when a compression dressing was applied. Nevertheless, the CI produces an artifact that could potentially reduce the diagnostic value of the scan. The size of the artifact will be larger relative to the size of the patient’s head and this might be particularly challenging for MRI scans with children. A recent study by Crane et al., 2010 found out that the artifact around the area of the CI had a mean anterior-posterior dimension of 6.6 +/- 1.5 cm (mean +/- standard deviation) and a left-right dimension averaging 4.8 +/- 1.0 cm (mean +/- standard deviation) (Crane et al., 2010). ([5])

Computer Simulations of the Auditory System

Working with Sound

Audio signals can be stored in a variety of formats. They can be uncompressed or compressed, and the encoding can be open or proprietary. On Windows systems, the most common format is the WAV-format. It contains a header with information about the number of channels, sample rate, bits per sample etc. This header is followed by the data themselves. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.

Many programing languages provide commands for reading and writing WAV-files. When working with data in other formats, you have two options:

  • You can either you convert them into WAV-format, and go on from there. A very comprehensive free cross-platform solution to record, convert and stream audio and video is ffmpeg (
  • Or you can obtain special programs moduls for reading/writing the desired format.

Reminder of Fourier Transformations

To transform a continuous function, one uses the Fourier Integral:

F(k)=\int_{-\infty}^{\infty} {f(t)} \cdot e^{-2 \pi ikt} dt

where k represents frequency. Note that F(k) is a complex value: its absolute value gives us the amplitude of the function, and its phase defines the phase-shift between cosine and sine components.

The inverse transform is given by

f(t)=\int_{-\infty}^{\infty} F(k) \cdot e^{2 \pi ikt} dk
Fourier Transformation: a sum of sine-waves can make up any repititive waveform.

If the data are sampled with a constant sampling frequency and there are N data points,

f(\tau)= \sum_{n=0}^{N-1} F_n e^{2 \pi in \tau /N}

The coefficients Fn can be obtained by

 F_n = \sum_{\tau = 0}^{N-1} f(\tau) \cdot e^{-2 \pi in \tau/N}

Since there are a discrete, limited number of data points and with a discrete, limited number of waves, this transform is referred to as Discrete Fourier Transform (DFT). The Fast Fourier Transform (FFT) is just a special case of the DFT, where the number of points is a power of 2: N = 2^n .

Note that each F_n is a complex number: its magnitude defines to the amplitude of the corresponding frequency component in the signal; and the phase of F_n defines the corresponding phase (see illustration). If the signal in the time domain "f(t)" is real valued, as is the case with most measured data, this puts a constraint on the corresponding frequency components: in that case we have

 F_n = F_{N-n}^*

A frequent source of confusion is the question: “Which frequency corresponds to F_n?” If there are N data points and the sampling period is ''T_s'', the n^{th} frequency is given by

 f_n = \frac{n}{N \cdot T_s}, 1 \le n \le N (in \; Hz)

In other words, the lowest frequency is \frac{1}{N \cdot T_s} [in Hz], while the highest independent frequency is  \frac{1}{2T_s} due to the Nyquist-Shannon theorem. Note that in MATLAB, the first return value corresponds to the offset of the function, and the second value to n=1!

Spectral Analysis of Biological Signals

Power Spectrum of Stationary Signals

Most FFT functions and algorithms return the complex Fourier coefficients F_n. If we are only interested in the magnitude of the contribution at the corresponding frequency, we can obtain this information by

 P_n = F_n \cdot F_n^* = |F_n|^2

This is the power spectrum of our signal, and tells us how big the contribution of the different frequencies is.

Power Spectrum of Non-stationary Signals

Often one has to deal with signals that are changing their characteristics over time. In that case, one wants to know how the power spectrum changes with time. The simplest way is to take only a short segment of data at a time, and calculate the corresponding power spectrum. This approach is called Short Time Fourier Transform (STFT). However in that case edge effects can significantly distort the signals, since we are assuming that our signal is periodic.

"Hanning window"

To eliminate edge artifacts, the signals can be filtered, or "windowed". An examples of such a window is shown in the figure above. While some windows provide better frequency resolution (e.g. the rectangular window), others exhibit fewer artifacts such as spectral leakage (e.g. Hanning window). For a selected section of the signal, the data resulting from windowing are obtained by multiplying the signal with the window (left Figure):

Effects of windowing a signal. STFT Example.png

An example can show how cutting a signal, and applying a window to it, can affect the spectral power distribution, is shown in the right figure above. (The corrsponding Python code can be found at [6] ) Note that decreasing the width of the sample window increases the width of the corresponding powerspectrum!

Stimulation strength for one time window

To obtain the power spectrum for one selected time window, the first step is to calculate the power spectrum through the Fast Fourier Transform (FFT) of the time signal. The result is the sound intensity in frequency domain, and the corresponding frequencies. The second step is to concentrate those intensities on a few distinct frequencies ("binning"). The result is a sound signal consisting of a few distinct frequencies - the location of the electrodes in the simulated cochlea. Back conversion into the time domain gives the simulated sound signal for that time window.

The following Python function does sound processing on a given signal.

import numpy as np
def pSpect(data, rate):
    '''Calculation of power spectrum and corresponding frequencies, using a Hamming window'''
    nData = len(data)
    window = np.hamming(nData)
    fftData = np.fft.fft(data*window)
    PowerSpect = fftData * fftData.conj() / nData
    freq = np.arange(nData) * float(rate) / nData
    return (np.real(PowerSpect), freq)
def calc_stimstrength(sound, rate=1000, sample_freqs=[100, 200, 400]): 
    '''Calculate the stimulation strength for a given sound'''
    # Calculate the powerspectrum
    Pxx, freq = pSpect(sound, rate)
    # Generate matrix to sum over the requested bins
    num_electrodes = len(sample_freqs)
    sample_freqs = np.hstack((0, sample_freqs))
    average_freqs = np.zeros([len(freq), num_electrodes])
    for jj in range(num_electrodes):
        average_freqs[((freq>sample_freqs[jj]) * (freq<sample_freqs[jj+1])),jj] = 1
    # Calculate the stimulation strength (the square root has to be taken, to get the amplitude)
    StimStrength = np.sqrt(Pxx).dot(average_freqs)
    return StimStrength

Sound Transduction by Pinna and Outer Ear

The outer ear is divided into two parts: the visible part on the side of the head (the pinna), and the external auditory meatus (outer ear canal) leading to the eardrum, as shown in the figure below. With such a structure, the outer ear contributes the ‘spectral cues’ for people’s sound localization abilities, making people not only have the ability to detect and identify a sound, but also have the ability to localize a sound source. [7]

The Atonamy of Human Ear

Pinna Function

The Pinna’s cone shape enables it to gather sound waves and funnel them into the out ear canal. On top of that, its various folds make the pinna a resonant cavity which amplifies certain frequencies. Furthermore, the interference effects resulting from the sound reflection caused by the pinna are directionally dependent and will attenuate other frequencies. Therefore, the pinna could be simulated as a filter function applied to the incoming sound, modulating its amplitude and phase spectra.

Frequency Responses for Sounds from Two Different Directions by the Pinna [8]

The resonance of the pinna cavity can be approximated well by 6 normal modes [9]. Among these normal modes, the first mode, which mainly depends on the concha depth (i.e. the depth of the bowl-shaped part of the pinna nearest the ear canal), is the dominant one.

The cancellation of certain frequencies caused by the pinna reflection is called “pinna notch”. [9] As shown in the right figure [8], sound transmitted by the pinna goes through two paths, a direct path and a longer reflected path. The different paths have different length, and thereby produce phase differences. When the frequency of incoming sound signal reaches certain criterion, which is that the path difference is half of the sound wavelength, the interference of sounds via direct and reflected paths will be destructive. This phenomenon is called “pinna notch”. Normally the notch frequency could happen in the range from 6k Hz to 16k Hz depending on the pinna shape. It is also seen that the frequency response of pinna is directionally dependent. This makes the pinna contribute to the spatial cues for sound localization.

Ear Canal Function

The outer ear canal is approximately 25 mm long and 8 mm in diameter, with a tortuous path from the entrance of the canal to the eardrum. The outer ear canal can be modeled as a cylinder closed at one end which leads to a resonant frequency around 3k Hz. This way the outer ear canal amplifies sounds in a frequency range important for human speech. [10]

Simulation of Outer Ear

Based on the main functions of the outer ear, it is easy to simulate the sound transduction by the pinna and outer ear canal with a filter, or a filter bank, if we know the characteristics of the filter.

Many researchers are working on the simulation of human auditory system, which includes the simulation of the outer ear. In the next chapter, a Pinna-Related Transfer Function model is first introduced, followed by two MATLAB toolboxes developed by Finnish and British research groups, respectively.

Model of Pinna-Related Transfer Function by Spagnol

This part is entirely from the paper published by S.Spagnol, M.Geronazzo, and F.Avanzini. [11] In order to model the functions of the pinna, Spagnol developed a reconstruction model of the Pinna-Related Transfer Function (PRTF), which is a frequency response characterizing how sound is transduced by the pinna. This model is composed by two distinct filter blocks, accounting for resonance function and reflection function of the pinna respectively, as shown in the figure below.

General Model for the Reconstruction of PRTFs[11]

There are two main resonances in the interesting frequency range of the pinna[11], which can be represented by two second-order peak filters with fixed bandwidth f_b = 5 kHz [12]:

H_{res} (z)=  \frac{V_0 (1-h)(1-z^{-2})}{1+2dhz^{-1}+(2h-1)z^{-2}}


h=  \frac{1}{1+\tan(\pi\frac{f_B}{f_s})}
 d= -\cos(2\pi \frac{f_C}{f_s} )

and f_s is the sampling frequency, f_C the central frequency, and G the notch depth.

For the reflection part, three second-order notch filters of the form [13] are designed with the parameters including center frequency f_C, notch depth G, and bandwidth f_B.

H_{refl}(z)=  \frac{1+(1+k)\frac{H_0}{2}+d(1-k)z^{-1}+(-k-(1+k)\frac{H_0}{2})z^{-2}} {1+d(1-k) z^{-1}-kz^{-2}}

where d is the same as previously defined for the resonance function, and

H_0= V_0-1
k= \frac{\tan(\pi\frac{f_B}{f_s})-V_0}{\tan(\pi\frac{f_B}{f_s})+V_0}

each accounting for a different spectral notch.

By cascading the three in-series placed notch filters after the parallel two peak filters, an eighth-order filter is designed to model the PRTF.
By comparing the synthetic PRTF with the original one, as shown in the figures below, Spagnol concluded that the synthesis model for PRTF was overall effective. This model may have missing notches due to the limitation of cutoff frequency. Approximation errors may also be brought in due to the possible presence of non-modeled interfering resonances.

Original vs Synthetic PRTF Plots[11]

HUTear MATLAB Toolbox

Block Diagram of Generic Auditory Model of HUTear

HUTear is a MATLAB Toolbox for auditory modeling developed by Lab of Acoustics and Audio Signal Processing at Helsinki University of Technology [14]. This open source toolbox could be downloaded from here. The structure of the toolbox is shown in the right figure.

In this model, there is a block for “Outer and Middle Ear” (OME) simulation. This OME model is developed on the basis of Glassberg and Moor [15]. The OME filter is usually a linear filter. Auditory filter is generated with taking the "Equal Loudness Curves at 60 dB"(ELC)/"Minimum Audible Field"(MAF)/"Minimum Audible Pressure at ear canal"(MAP) correction into account. This model accounts for the outer ear simulation. By specifying different parameters with the "OEMtool", you may compare the MAP IIR approximation and MAP data, as shown in the figure below.

UI of OEMtool from HUTear Toolbox

MATLAB Model of the Auditory Periphery (MAP)

MAP is developed by researchers in the Hearing Research Lab at University of Essex, England [16]. Being a computer model of physiological basis of human hearing, MAP is an open-source code package for testing, developing the model, which could be downloaded from here. Its model structure is shown in the right figure.

MAP Model Structure

Within the MAP model, there is the “Outer Middle Ear (OME)” sub-model, allowing the user to test and create an OME model. In this OME model, the function of the outer ear is modeled as a resonance function. The resonances are composed by two parallel bandpass filters, respectively, representing concha resonance and outer ear canal resonance. These two filters are specified by the pass frequency range, gain and order. By adding the output of resonance filters to the original sound pressure wave, the output of the outer ear model is obtained.

To test the OME model, run the function named “testOME.m”. A figure plotting the external ear resonances and stapes peak displacement will be displayed. (as shown in the figure below)

External Ear Resonances and Stapes Peak Displacement from OME Model of MAP


The outer ear, including pinna and outer ear canal, can be simulated as a linear filter, or a filter bank. This reflects its resonance and reflection effect to incoming sound. It is worth noting that since the pinna shape varies from person to person, the model parameters, like the resonant frequencies, depend on the subject.

One aspect not included in the models described above is the Head-Related Transfer Function(HRTF). The HRTF describes how an ear receives a sound from a point sound source in space. It is not introduced here because it goes beyond the effect of the outer ear (pinna and outer ear canal) as it is also influenced by the effects of head and torso. There are plenty of literature and publications for HRTF for the interested reader.(wiki, tutorial 1,2, reading list for spatial audio research including HRTF)

Simulation of the Inner Ear

The shape and organisation of the basilar membrane means that different frequencies resonate particularly strongly at different points along the membrance. This leads to a tonotopic organisation of the sensitivity to frequency ranges along the membrane, which can be modeled as being an array of overlapping band-pass filters known as "auditory filters".[17] The auditory filters are associated with points along the basilar membrane and determine the frequency selectivity of the cochlea, and therefore the listener’s discrimination between different sounds.[18] They are non-linear, level-dependent and the bandwidth decreases from the base to apex of the cochlea as the tuning on the basilar membrane changes from high to low frequency.[18][19] The bandwidth of the auditory filter is called the critical bandwidth, as first suggested by Fletcher (1940). If a signal and masker are presented simultaneously then only the masker frequencies falling within the critical bandwidth contribute to masking of the signal. The larger the critical bandwidth the lower the signal-to-noise ratio (SNR) and the more the signal is masked.

ERB related to centre frequency. The diagram shows the ERB versus centre frequency according to the formula of Glasberg and Moore.[18]

Another concept associated with the auditory filter is the "equivalent rectangular bandwidth" (ERB). The ERB shows the relationship between the auditory filter, frequency, and the critical bandwidth. An ERB passes the same amount of energy as the auditory filter it corresponds to and shows how it changes with input frequency.[18] At low sound levels, the ERB is approximated by the following equation according to Glasberg and Moore:[18]

ERB = 24.7*(4.37F + 1) \,

where the ERB is in Hz and F is the centre frequency in kHz.

It is thought that each ERB is the equivalent of around 0.9mm on the basilar membrane.[18][19]

Gammatone Filters

Sample gamma tone impulse response.

One filter type used to model the auditory filters is the "gammatone filter". It provides a simple linear filter for describing the movement of one location of the basilar membrane for a given sound input, which is therefore easy to implement. Linear filters are popular for modeling different aspects of the auditory system. In general, they are IIR-filters (infinite impulse response) incorporating feedforward and feedback, which are defined by

 \sum\limits_{j = 0}^m {{a_{j + 1}}y(k - j)}  = \sum\limits_{i = 0}^n {{b_{i + 1}}x(k - i)}

where a1=1. In other words, the coefficients ai and bj uniquely determine this type of filter. The feedback-character of these filters can be made more obvious by re-shuffling the equation

 y(k) = {b_1}x(k) + {b_2}x(k - 1) + ... + {b_{n + 1}}x(k - n) - \left( {{a_2}y(k - 1) + ... + {a_{m + 1}}y(k - m)} \right)

(In contrast, FIR-filters, or finite impulse response filters, only involve feedforward: for them a_i=0 for i>1.)

General description of an "Infinite Impulse Response" filter.

Linear filters cannot account for nonlinear aspects of the auditory system. They are nevertheless used in a variety of models of the auditory system. The gammatone impulse response is given by

g(t) = at^{n-1} e^{-2\pi bt} \cos(2\pi ft + \phi), \,

where f is the frequency, \phi is the phase of the carrier, a is the amplitude, n is the filter's order, b is the filter's bandwidth, and t is time.

This is a sinusoid with an amplitude envelope which is a scaled gamma distribution function.

Variations and improvements of the gammatone model of auditory filtering include the gammachirp filter, the all-pole and one-zero gammatone filters, the two-sided gammatone filter, and filter cascade models, and various level-dependent and dynamically nonlinear versions of these.[20]

For computer simulations, efficient implementations of gammatone models are availabel for Matlab and for Python[21] .

When working with gammatone filters, we can elegantly exploit Parseval's Theorem to determine the energy in a given frequency band:

 \int_{ - \infty }^\infty  {{{\left| {f(t)} \right|}^2}dt = } \int_{ - \infty }^\infty  {{{\left| {F(\omega )} \right|}^2}d\omega }


  4. T. Haslwanter (2012). "Hilbert Transformation [Python"]. private communications. 
  5. Crane BT, Gottschalk B, Kraut M, Aygun N, Niparko JK (2010) Magnetic resonance imaging at 1.5 T after cochlear implantation. Otol Neurotol 31:1215-1220
  6. T. Haslwanter (2012). "Short Time Fourier Transform [Python"]. private communications. 
  7. Semple, M.N. (1998), "Auditory perception: Sounds in a virtual world", Nature (Nature Publishing Group) 396 (6713): 721-724, doi:10.1038/25447 
  8. a b
  9. a b Shaw, E.A.G. (1997), "Acoustical features of the human ear", Binaural and spatial hearing in real and virtual environments (Mahwah, NJ: Lawrence Erlbaum) 25: 47 
  10. Federico Avanzini (2007-2008), Algorithms for sound and music computing, Course Material of Informatica Musicale (, pp. 432 
  11. a b c d Spagnol, S. and Geronazzo, M. and Avanzini, F. (2010), "Structural modeling of pinna-related transfer functions", In Proc. Int. Conf. on Sound and Music Computing (SMC 2010) (barcelona): 422-428 
  12. S. J. Orfanidis, ed., Introduction To Signal Processing. Prentice Hall, 1996.
  13. U. Zölzer, ed., Digital Audio Effects. New York, NY, USA: J.Wiley & Sons, 2002.
  15. Glasberg, B.R. and Moore, B.C.J. (1990), "Derivation of auditory filter shapes from notched-noise data", Hearing research (Elsevier) 47 (1-2): 103-138 
  17. Munkong, R. (2008), IEEE Signal Processing Magazine 25 (3): 98--117, doi:10.1109/MSP.2008.918418, Bibcode2008ISPM...25...98M 
  18. a b c d e f Moore, B. C. J. (1998). Cochlear hearing loss. London: Whurr Publishers Ltd.. ISBN 0585122563. 
  19. a b Moore, B. C. J. (1986), "Parallels between frequency selectivity measured psychophysically and in cochlear mechanics", Scand. Audio Suppl. (25): 129–52 
  20. R. F. Lyon, A. G. Katsiamis, E. M. Drakakis (2010). "History and Future of Auditory Filter Models". Proc. ISCAS. IEEE. 
  21. T. Haslwanter (2011). "Gammatone Toolbox [Python"]. private communications. 

Vestibular System


The main function of the balance system, or vestibular system, is to sense head movements, especially involuntary ones, and counter them with reflexive eye movements and postural adjustments that keep the visual world stable and keep us from falling. An excellent, more extensive article on the vestibular system is available on Scholorpedia [1]. An extensive review of our current knowledge about the vestibular system can be found in "The Vestibular System: a Sixth Sense" by J Goldberg et al [2].

Anatomy of the Vestibular System


Together with the cochlea, the vestibular system is carried by a system of tubes called the membranous labyrinth. These tubes are lodged within the cavities of the bony labyrinth located in the inner ear. A fluid called perilymph fills the space between the bone and the membranous labyrinth, while another one called endolymph fills the inside of the tubes spanned by the membranous labyrinth. These fluids have a unique ionic composition suited to their function in regulating the electrochemical potential of hair cells, which are as we will later see the transducers of the vestibular system. The electric potential of endolymph is of about 80 mV more positive than perilymph.

Since our movements consist of a combination of linear translations and rotations, the vestibular system is composed of two main parts: The otolith organs, which sense linear accelerations and thereby also give us information about the head’s position relative to gravity, and the semicircular canals, which sense angular accelerations.

Human bony labyrinth (Computed tomography 3D) Internal structure of the human labyrinth
Canaux osseux.png


The otolith organs of both ears are located in two membranous sacs called the utricle and the saccule which primary sense horizontal and vertical accelerations, respectively. Each utricle has about 30'000 hair cells, and each saccule about 16'000. The otoliths are located at the central part of the labyrinth, also called the vestibule of the ear. Both utricle and saccule have a thickened portion of the membrane called the macula. A gelatinous membrane called the otolthic membrane sits atop the macula, and microscopic stones made of calcium carbonate crystal, the otoliths, are embedded on the surface of this membrane. On the opposite side, hair cells embedded in supporting cells project into this membrane.

The otoliths are the human sensory organs for linear acceleration. The utricle (left) is approximately horizontally oriented; the saccule (center) lies approximately vertical. The arrows indicate the local on-directions of the hair cells; and the thick black lines indicate the location of the striola. On the right you see a cross-section through the otolith membrane. The graphs have been generated by Rudi Jaeger, while we cooperated on investigations of the otolith dynamics.

Semicircular Canals

Cross-section through ampulla. Top: The cupula spans the lumen of the ampulla from the crista to the membranous labyrinth. Bottom: Since head acceleration exceeds endolymph acceleration, the relative flow of endolymph in the canal is opposite to the direction of head acceleration. This flow produces a pressure across the elastic cupula, which deflects in response.

Each ear has three semicircular canals. They are half circular, interconnected membranous tubes filled with endolymph and can sense angular accelerations in the three orthogonal planes. The radius of curvature of the human horizontal semicircular canal is 3.2 mm [3].

The canals on each side are approximately orthogonal to each other. The orientation of the on-directions of the canals on the right side are [4]:

Canal X Y Z
Horizontal 0.32269 -0.03837 -0.94573
Anterior 0.58930 0.78839 0.17655
Posterior 0.69432 -0.66693 0.27042

(The axes are oriented such that the positive x-,y-,and z-axis point forward, left, and up, respectively. The horizontal plane is defined by Reid's line, the line connecting the lower rim of the orbita and the center of the external auditory canal. And the directions are such that a rotation about that vector, according to the right-hand-rule, excites the corresponding canal.) The anterior and posterior semicircular canals are approximately vertical, and the horizontal semicircular canals approximately horizontal.

Orientation of the semicircular canals in the vestibular system. "L / R" stand for "Left / Right", respectively, and "H / A / P" for "Horizontal / Anterior / Posterior". The arrows indicate the direction of head movement that stimulates the corresponding canal.

Each canal presents a dilatation at one end, called the ampulla. Each membranous ampulla contains a saddle-shaped ridge of tissue, the crista, which extends across it from side to side. It is covered by neuroepithelium, with hair cells and supporting cells. From this ridge rises a gelatinous structure, the cupula, which extends to the roof of the ampulla immediately above it, dividing the interior of the ampulla into two approximately equal parts.


The sensors within both the otolith organs and the semicircular canals are the hair cells. They are responsible for the transduction of a mechanical force into an electrical signal and thereby build the interface between the world of accelerations and the brain.

Transduction mechanism in auditory or vestibular haircell. Tilting the haircell towards the kinocilium opens the potassium ion channels. This changes the receptor potential in the haircell. The resulting emission of neurotransmittors can elicit an action potential (AP) in the post-synaptic cell.

Hair cells have a tuft of stereocilia that project from their apical surface. The thickest and longest stereocilia is the kinocilium. Stereocilia deflection is the mechanism by which all hair cells transduce mechanical forces. Stereocilia within a bundle are linked to one another by protein strands, called tip links, which span from the side of a taller stereocilium to the tip of its shorter neighbor in the array. Under deflection of the bundle, the tip links act as gating springs to open and close mechanically sensitive ion channels. Afferent nerve excitation works basically the following way: when all cilia are deflected toward the kinocilium, the gates open and cations, including potassium ions from the potassium rich endolymph, flow in and the membrane potential of the hair cell becomes more positive (depolarization). The hair cell itself does not fire action potentials. The depolarization activates voltage-sensitive calcium channels at the basolateral aspect of the cell. Calcium ions then flow in and trigger the release of neurotransmitters, mainly glutamate, which in turn diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger an increase of the action potentials firing rate in the nerve. On the other hand, afferent nerve inhibition is the process induced by the bending of the stereocilia away from the kinocilium (hyperpolarization) and by which the firing rate is decreased. Because the hair cells are chronically leaking calcium, the vestibular afferent nerve fires actively at rest and thereby allows the sensing of both directions (increase and decrease of firing rate). Hair cells are very sensitive and respond extremely quickly to stimuli. The quickness of hair cell response may in part be due to the fact that they must be able to release neurotransmitter reliably in response to a threshold receptor potential of only 100 µV or so.

Auditory haircells are very similar to those of the vestibular system. Here an electron microscopy image of a frog's sacculus haircell.

Regular and Irregular Haircells

While afferent haircells in the auditory system are fairly homogeneous,those in the vestibular system can be broadly separated into two groups: "regular units" and "irregular units". Regular haircells have approximately constant interspike intervals, and fire constantly proportional to their displacement. In contrast, the inter-spike interval of irregular haircells is much more variable, and their discharge rate increases with increasing frequency; they can thus act as event detectors at high frequencies. Regular and irregular haircells also differ in their location, morphology and innervation.

Signal Processing

Peripheral Signal Transduction

Transduction of Linear Acceleration

The hair cells of the otolith organs are responsible for the transduction of a mechanical force induced by linear acceleration into an electrical signal. Since this force is the product of gravity plus linear movements of the head

 \vec F = \vec F_g + \vec F_{inertial} = m(\vec g-\frac{d^2\vec x}{dt^2})

it is therefore sometimes referred to as gravito-inertial force. The mechanism of transduction works roughly as follows: The otoconia, calcium carbonate crystals in the top layer of the otoconia membrane, have a higher specific density than the surrounding materials. Thus a linear acceleration leads to a displacement of the otoconia layer relative to the connective tissue. The displacement is sensed by the hair cells. The bending of the hairs then polarizes the cell and induces afferent excitation or inhibition.

Excitation (red) and inhibition (blue) on utricle (left) and saccule (right), when the head is in a right-ear-down orientation. The displacement of the otoliths was calculated with the finite element technique, and the orientation of the haircells was taken from the literature.

While each of the three semicircular canals senses only one-dimensional component of rotational acceleration, linear acceleration may produce a complex pattern of inhibition and excitation across the maculae of both the utricle and saccule. The saccule is located on the medial wall of the vestibule of the labyrinth in the spherical recess and has its macula oriented vertically. The utricle is located above the saccule in the elliptical recess of the vestibule, and its macula is oriented roughly horizontally when the head is upright. Within each macula, the kinocilia of the hair cells are oriented in all possible directions.

Therefore, under linear acceleration with the head in the upright position, the saccular macula is sensing acceleration components in the vertical plane, while the utricular macula is encoding acceleration in all directions in the horizontal plane. The otolthic membrane is soft enough that each hair cell is deflected proportional to the local force direction. If denotes the direction of maximum sensitivity or on-direction of the hair cell, and the gravito-inertial force, the stimulation by static accelerations is given by

 stim_{otolith}= \vec F \cdot \vec n

The direction and magnitude of the total acceleration is then determined from the excitation pattern on the otolith maculae.

Transduction of Angular Acceleration

The three semicircular canals are responsible for the sensing of angular accelerations. When the head accelerates in the plane of a semicircular canal, inertia causes the endolymph in the canal to lag behind the motion of the membranous canal. Relative to the canal walls, the endolymph effectively moves in the opposite direction as the head, pushing and distorting the elastic cupula. Hair cells are arrayed beneath the cupula on the surface of the crista and have their stereocilia projecting into the cupula. They are therefore excited or inhibited depending on the direction of the acceleration.

The stimulation of a human semicircular canal is proportional to the scalar product between a vector n (which is perpendicular to the plane of the canal), and the vector omega indicating the angular velocity.

This facilitates the interpretation of canal signals: if the orientation of a semicircular canal is described by the unit vector  \vec n , the stimulation of the canal is proportional to the projection of the angular velocity  \vec \omega onto this canal

 stim_{canal}= \vec \omega \cdot \vec n

The horizontal semicircular canal is responsible for sensing accelerations around a vertical axis, i.e. the neck. The anterior and posterior semicircular canals detect rotations of the head in the sagittal plane, as when nodding, and in the frontal plane, as when cartwheeling.

In a given cupula, all the hair cells are oriented in the same direction. The semicircular canals of both sides also work as a push-pull system. For example, because the right and the left horizontal canal cristae are “mirror opposites” of each other, they always have opposing (push-pull principle) responses to horizontal rotations of the head. Rapid rotation of the head toward the left causes depolarization of hair cells in the left horizontal canal's ampulla and increased firing of action potentials in the neurons that innervate the left horizontal canal. That same leftward rotation of the head simultaneously causes a hyperpolarization of the hair cells in the right horizontal canal's ampulla and decreases the rate of firing of action potentials in the neurons that innervate the horizontal canal of the right ear. Because of this mirror configuration, not only the right and left horizontal canals form a push-pull pair but also the right anterior canal with the left posterior canal (RALP), and the left anterior with the right posterior (LARP).

Central Vestibular Pathways

The information resulting from the vestibular system is carried to the brain, together with the auditory information from the cochlea, by the vestibulocochlear nerve, which is the eighth of twelve cranial nerves. The cell bodies of the bipolar afferent neurons that innervate the hair cells in the maculae and cristae in the vestibular labyrinth reside near the internal auditory meatus in the vestibular ganglion (also called Scarpa's ganglion, Figure Figure 10.1). The centrally projecting axons from the vestibular ganglion come together with axons projecting from the auditory neurons to form the eighth nerve, which runs through the internal auditory meatus together with the facial nerve. The primary afferent vestibular neurons project to the four vestibular nuclei that constitute the vestibular nuclear complex in the brainstem.

Vestibulo-ocular reflex.

Vestibulo-Ocular Reflex (VOR)

An extensively studied example of function of the vestibular system is the vestibulo-ocular reflex (VOR). The function of the VOR is to stabilize the image during rotation of the head. This requires the maintenance of stable eye position during horizontal, vertical and torsional head rotations. When the head rotates with a certain speed and direction, the eyes rotate with the same speed but in the opposite direction. Since head movements are present all the time, the VOR is very important for stabilizing vision.

How does the VOR work? The vestibular system signals how fast the head is rotating and the oculomotor system uses this information to stabilize the eyes in order to keep the visual image motionless on the retina. The vestibular nerves project from the vestibular ganglion to the vestibular nuclear complex, where the vestibular nuclei integrate signals from the vestibular organs with those from the spinal cord, cerebellum, and the visual system. From these nuclei, fibers cross to the contralateral abducens nucleus. There they synapse with two additional pathways. One pathway projects directly to the lateral rectus muscle of eye via the abducens nerve. Another nerve tract projects from the abducens nucleus by the abducens interneurons to the oculomotor nuclei, which contain motor neurons that drive eye muscle activity, specifically activating the medial rectus muscles of the eye through the oculomotor nerve. This short latency connection is sometimes referred to as three-neuron-arc, and allows an eye movement within less than 10 ms after the onset of the head movement.

For example, when the head rotates rightward, the following occurs. The right horizontal canal hair cells depolarize and the left hyperpolarize. The right vestibular afferent activity therefore increases while the left decreases. The vestibulocochlear nerve then carries this information to the brainstem and the right vestibular nuclei activity increases while the left decreases. This makes in turn neurons of the left abducens nucleus and the right oculomotor nucleus fire at higher rate. Those in the left oculomotor nucleus and the right abducens nucleus fire at a lower rate. This results in the fact than the left lateral rectus extraocular muscle and the right medial rectus contract while the left medial rectus and the right lateral rectus relax. Thus, both eyes rotate leftward.

The gain of the VOR is defined as the change in the eye angle divided by the change in the head angle during the head turn

 gain = \frac{\Delta_{Eye}}{\Delta_{Head}}

If the gain of the VOR is wrong, that is, different than one, then head movements result in image motion on the retina, resulting in blurred vision. Under such conditions, motor learning adjusts the gain of the VOR to produce more accurate eye motion. Thereby the cerebellum plays an important role in motor learning.

The Cerebellum and the Vestibular System

It is known that postural control can be adapted to suit specific behavior. Patient experiments suggest that the cerebellum plays a key role in this form of motor learning. In particular, the role of the cerebellum has been extensively studied in the case of adaptation of vestibulo-ocular control. Indeed, it has been shown that the gain of the vestibulo-ocular reflex adapts to reach the value of one even if damage occur in a part of the VOR pathway or if it is voluntary modified through the use of magnifying lenses. Basically, there are two different hypotheses about how the cerebellum plays a necessary role in this adaptation. The first from (Ito 1972;Ito 1982) claims that the cerebellum itself is the site of learning, while the second from Miles and Lisberger (Miles and Lisberger 1981) claims that the vestibular nuclei are the site of adaptive learning while the cerebellum constructs the signal that drives this adaptation. Note that in addition to direct excitatory input to the vestibular nuclei, the sensory neurons of the vestibular labyrinth also provide input to the Purkinje cells in the flocculo-nodular lobes of the cerebellum via a pathway of mossy and parallel fibers. In turn, the Purkinje cells project an inhibitory influence back onto the vestibular nuclei. Ito argued that the gain of the VOR can be adaptively modulated by altering the relative strength of the direct excitatory and indirect inhibitory pathways. Ito also argued that a message of retinal image slip going through the inferior olivary nucleus carried by the climbing fiber plays the role of an error signal and thereby is the modulating influence of the Purkinje cells. On the other hand, Miles and Lisberger argued that the brainstem neurons targeted by the Purkinje cells are the site of adaptive learning and that the cerebellum constructs the error signal that drives this adaptation.

Vestibular Implants


People with damaged vestibular systems experience a combination of symptoms that may include hearing and vision disturbances, vertigo, dizziness, and spatial disorientation. Currently, there are no effective treatments for patients with weak or damaged vestibular systems. Over the past decade, scientists have developed an electrical stimulating device, similar to cochlear implants, that would restore semicircular canal function. Vestibular implants are intended to restore balance in patients with a damaged vestibular system. Figure[5] shows a vestibular implant prototype, which is a modified cochlear implant designed by MED-EL (Innsbruck, Austria).

Vestibular implant designed by MED-EL (Innsbruck, Austria).

This vestibular neuroprosthesis prototype contains four major components: an electrical stimulator, three extracochlear electrodes that are placed in the ampullae of each semicircular canal, and an intracochlear array. When the vestibular implant is turned on, trains of electrical stimulation in the form of charge-balance, biphasic pulses are delivered down each extracochlear electrode toward a respective vestibular nerve [5]. Ultimately, the electrical stimulation would restore balance in a patient by stabilizing gaze via the vestibulo-ocular reflex (VOR). Progress toward an implantable prosthesis has shown promising results to effectively restore normal vestibular sensory transduction of head rotations. However, achieving an accurate stimulation paradigm to chronically encode three-dimensional head movements without causing undesired neuronal activity remains one of several key challenges.

Vestibular prosthesis evolution (1963-2014)

In 1963, Cohen and Suzuki [6] introduced the notion of vestibular prosthesis by demonstrating that eye movements can be induced via electrical stimulation of the ampullary branch of a vestibular nerve. Studies that followed were driven to engineer a continuous and accurate stimulation model for rehabilitating patients with different types of vestibular disorders, such as bilateral loss of vestibular function (BVL) and Meniere's disease [5] [7]. Four decades after Cohen and Sukui's pioneering work, Merfeld and colleagues developed the first vestibular device for generating smooth eye movements by electrically stimulating the vestibular nerve [8] [9]. The feasibility of neuro-electronic vestibular devices had further inspired researchers to integrate a motion-detection system to measure head movements. Santina and colleagues [10] [11] [12] [13] used gyroscopic sensors to measure movements in three-dimensional space and encoded this information to generate signals that control muscles of each eye via the vestibular nerve. As of late 2012, only two groups in the world have conducted vestibular implant studies on humans: a team led by Jay Rubinstein at the University of Washington and a joint-effort between a team led by Herman Kingma at the Maastrict University of Medical Center in the Netherlands and second group led by Jean-Phillippe Guyot at Hopitaux Universitaries de Geneve, Switzerland [5]. Jay Rubinstein led the first vestibular clinical study in 2010. Rubinstein and colleagues had successfully installed a vestibular pacemaker to reduce or cease involuntary vertigo attacks in patients diagnosed with Meniere's disease [7]. This device was combined with a handheld controller to start and stop a range of electrical stimuli that can be directed to any or all electrodes, but did not code for motion [7]. Unfortunately, the vestibular pacemaker in implanted patients had resulted in both the auditory and vestibular function deteriorating considerably [14] [7] [5]. A new direction has been taken from this group to explore a different electrical stimulation paradigm by incorporating information about motion [14]. The second attempt for human clinical studies was carried by Kingma, Guyot, and colleagues in 2012. Vestibular implants used in this study were prototyped by MED-EL. Perez-Fornos and colleagues [5] demonstrated that patients achieved a level of satisfactory functional recovery that allows them to exercise everyday activities such as walking.

Current progress is being made through ongoing university-industry partnerships. There are four leading University and/or industry partnerships working toward a vestibular prosthesis for clinical applications. These teams include: Rubinstein at the University of Washington and Cochlear Ltd (Lane Cove, Australia), Della Santina's team at the Vestibular NeuroEngineering Laboratory [Johns Hopkins School of Medicine, Baltimore, MD, USA], Daniel Merfeld's team at the Jenks Vestibular Physiology Laboratory at Harvard [Massachusetts Eye and Ear Infirmary, Boston, MA, USA], and a joint-effort between Herman Kingma, Jean-Philippe Guyot, and MED-EL.

Future directions in research

The state-of-the-art vestibular implant technology is a two-step system that produces electrical stimulations to three ampullary nerves in response to rotations around a respective axis (anterior, posterior, or horizontal canals). However, the biophysics of prosthetic nerve stimulation remains a challenge to mimic normal sensory transduction. Even though much is already known about how vestibular nerve afferents encode head movements, it is not yet understood how to design a noninvasive stimulus encoding strategy for a multichannel prosthesis. Active research has continued to focus on overcoming design and signal transduction limitations.

Current neural prostheses are intended to excite neural tissues in which they are implanted, but the effect of continuous excitatory stimulations can yet cause neurological deficits [7]. Ultimately, a device that can both excite head motion in one direction and inhibit movement in the opposite direction is much desired. The latest prototype system developed by Santina and colleagues, SCSD1, has shown that direct current stimulations can evoke excitatory and inhibitory VOR responses [15]. Their results demonstrate that effects of introducing the vestibular system to an artificial baseline can possibly alter the dynamic ranges of excitatory and inhibitory thresholds in unpredicted ways. On the other hand, clinical studies show that it is possible for humans to adapt within a reasonably short time (a few minutes) to the absence and presence of artificial neural activity [16]. Once adaptation is reached, then one can tune the amplitude and frequency modulations of the stimulation to elicit smooth eye movements of different speeds and directions [16].

Another type of design limitation of electrical prosthesis is current to spread away from the targeted nerve tissue and cause stimulations in the wrong canal [17] [18]. As a consequence, this current spread induces misalignment between the axis of the eye and head rotation [19]. Therefore, the mechanisms underlying directional neural plasticity can provide well-aligned responses for humans. Other studies suggest infrared nerve stimulation is advantageous for targeting specific neurons and less obtrusive to nearby populations of neurons [17] [19]. The use of optics would allow higher spatial selectivity and improved surgical access [17].

In addition, a fundamental challenge underlying the development of vestibular prosthesis is accounting for ways in which information from vestibular end organs can elicit particular movements. It has been shown that reflex and perceptual responses are dependent on which vestibular afferent inputs are stimulated [14]. Surgical practices are examined for accurate placements of the electrode with respect to the afferents, which in the end could greatly influence the ability to stimulate a desired response.

Because the auditory and vestibular areas of the inner ear are connected, the spread of current beyond the target ampullary nerves and/or risks of surgery could interfere with cochlear nerve activity. It is likely that humans with implants will experience a risk of hearing loss, as observed in rhesus monkeys [20]. Santina and colleagues [20] found that implantation of electrodes caused up to 14 dB of hearing loss and delivery of electrical stimulation further reduced hearing by 0.4-7.8 dB. This study suggests that current spread to cochlear hair cells may cause random activity in nearby cochlear regions.

Computer Simulation of the Vestibular System

Semicircular Canals

Model without Cupula

Simplified semicircular canal, without cupula.

Let us consider the mechanical description of the semi-circular canals (SCC). We will make very strong and reductive assumptions in the following description. The goal here is merely to understand the very basic mechanical principles underlying the semicircular canals.

The first strong simplification we make is that a semicircular canal can be modeled as a circular tube of “outer” radius R and “inner” radius r. (For proper hydro mechanical derivations see (Damiano and Rabbitt 1996) and Obrist (2005)). This tube is filled with endolymph.

The orientation of the semicircular canal can be described, in a given coordinate system, by a vector  \vec n that is perpendicular to the plane of the canal. We will also use the following notations:

 \theta Rotation angle of tube [rad]
 \dot{\theta} \equiv \frac{d \theta}{dt} Angular velocity of the tube [rad/s]
 \ddot{\theta} \equiv \frac{d^2 \theta}{dt^2} Angular acceleration of the tube [rad/s^2]
 \phi Rotation angle of the endolymph inside the tube [rad], and similar notation for the time derivatives
 \delta = \theta - \phi movement between the tube and the endolymph [rad].

Note that all these variables are scalar quantities. We use the fact that the angular velocity of the tube can be viewed as the projection of the actual angular velocity vector of the head  \vec \omega onto the plane of the semicircular canal described by  \vec n to go from the 3D environment of the head to our scalar description. That is,

 \dot{\theta} = \vec \omega \cdot \vec n

where the standard scalar product is meant with the dot.

To characterize the endolymph movement, consider a free floating piston, with the same density as the endolymph. Two forces are acting on the system:

  1. The inertial moment  I \ddot{\phi} , where I characterizes the inertia of the endolymph.
  2. The viscous moment  B \dot{\delta} , caused by the friction of the endolymph on the walls of the tube.

This gives the equation of motion

 I \ddot{\phi} = B \dot{\delta}

Substituting  \phi = \theta - \delta and integrating gives

 \dot{\theta} = \dot{\delta} + \frac{B}{I} \delta .

Let us now consider the example of a velocity step  \dot{\theta}(t) of constant amplitude  \omega . In this case, we obtain a displacement

 \delta = \frac{I}{B} \omega \cdot (1-e^{-\frac{B}{I}t})

and for  t \gg \frac{I}{B} , we obtain the constant displacement

 \delta \approx \frac{I}{B} \omega .

Now, let us derive the time constant  T_1 \equiv \frac{I}{B} . Fora thin tube,  r \ll R , the inertia is approximately given by

 I = m l^2 \approx 2 \rho \pi^2 r^2 R^3 .

From the Poiseuille-Hagen Equation, the force F from a laminar flow with velocity v in a thin tube is

 F = \frac{8 \bar{V} \eta l}{r^2}

where  \bar{V} = r^2 \pi v is the volume flow per second,  \eta the viscosity and  l = 2 \pi R the length of the tube.

With the torque  M = F \cdot R and the relative angular velocity  \Omega = \frac{v}{R} , substitution provides

 B = \frac{M}{\Omega} = 16 \eta \pi ^2 R^3

Finally, this gives the time constant  T_1

 T_1 = \frac{I}{B} = \frac{\delta r^2}{8 \eta}

For the human balance system, replacing the variables with experimentally obtained parameters yields a time constant  T_1 of about 0.01 s. This is brief enough that in equation (10.5) the  \approx can be replaced by " = ". This gives a system gain of

 G \equiv \frac{\delta}{\omega} = \frac{I}{B} = T_1

Model with Cupula

Effect of the cupula.

Our discussion until this point has not included the role of the cupula in the SCC: The cupula acts as an elastic membrane that gets displaced by angular accelerations. Through its elasticity the cupula returns the system to its resting position. The elasticity of the cupula adds an additional elastic term to the equation of movement. If it is taken into account, this equation becomes

 \ddot{\theta} = \ddot{\delta} + \frac{B}{I} \dot{\delta} + \frac{K}{I} \delta

An elegant way to solve such differential equations is the Laplace-Transformation. The Laplace transform turns differential equations into algebraic equations: if the Laplace transform of a signal x(t) is denoted by X(s), the Laplace transform of the time derivative is

 \frac{dx(t)}{dt} \xrightarrow{Laplace Transform} s \cdot X(s) - x(0)

The term x(0) details the starting condition, and can often be set to zero by an appropriate choice of the reference position. Thus, the Laplace transform is

 s^2 \tilde{\theta} = s^2 \tilde{\delta} + \frac{B}{I} s \tilde{\delta} + \frac{K}{I} \tilde{\delta}

where "~" indicates the Laplace transformed variable. With  T_1 from above, and  T_2 defined by

 T_2 = \frac{B}{K}

we get the

 \frac{ \tilde{\delta} }{ \tilde{\theta} } = \frac{T_1 s^2}{T_1 s^2 + s + \frac{1}{T_2}}

For humans, typical values for  T_2 = B/K are about 5 sec.

To find the poles of this transfer function, we have to determine for which values of s the denominator equals 0:

 s_{1,2} = \frac{1}{T_1} \Big(-1 \pm \sqrt{1-4\frac{T_1}{T_2}} \Big)

Since  T_2 \gg T_1 , and since

 \sqrt{1-x} \approx 1 - \frac{x}{2} for x \ll 1

we obtain

 s_1 \approx - \frac{1}{T_1}, and s_2 \approx - \frac{1}{T_2}

Typically we are interested in the cupula displacement  \delta as a function of head velocity  \dot{\theta} \equiv s \tilde{\theta} :

 \frac{\tilde{\delta}}{s \tilde{\theta}}(s) = \frac{T_1 T_2 s}{(T_1 s +1)(T_2 s + 1)}

For typical head movements (0.2 Hz < f < 20Hz), the system gain is approximately constant. In other words, for typical head movements the cupula displacement is proportional to the angular head velocity!

Bode plot of the cupula displacement of a function of head velocity, with T1 = 0.01 sec, T2 = 5 sec, and an amplification factor of (T1+ T2)/ (T1* T2) to obtain a gain of approximately 0 for the central frequencies.

Control Systems

For Linear, Time-Invariant systems (LTI systems), the input and output have a simple relationship in the frequency domain :

 Out(s) = G(s)*In(s)

where the transfer function G(s) can be expressed by the algebraic function


In other words, specifying the coefficients of the numerator (n) and denominator (d) uniquely characterizes the transfer function. This notation is used by some computational tools to simulate the response of such a system to a given input.

Different tools can be used to simulate such a system. For example, the response of a low-pass filter with a time-constant of 7 sec to an input step at 1 sec has the following transfer function


and can be simulated as follows:

With Simulink
Step-response simulation of a lowpass filter with Simulink.

If you work on the command line, you can use the Control System Toolbox of MATLAB or the module signal of the Python package SciPy:

MATLAB Control System Toolbox:

% Define the transfer function
num = [1];
tau = 7;
den = [tau, 1];
mySystem = tf(num,den)
% Generate an input step
t = 0:0.1:30;
inSignal = zeros(size(t));
inSignal(t>=1) = 1;
% Simulate and show the output
[outSignal, tSim] = lsim(mySystem, inSignal, t);
plot(t, inSignal, tSim, outSignal);

Python - SciPy:

# Import required packages
import numpy as np
import scipy.signal as ss
import matplotlib.pylab as mp
# Define transfer function
num = [1]
tau = 7
den = [tau, 1]
mySystem = ss.lti(num, den)
# Generate inSignal
t = np.arange(0,30,0.1)
inSignal = np.zeros(t.size)
inSignal[t>=1] = 1
# Simulate and plot outSignal
tout, outSignal, xout = ss.lsim(mySystem, inSignal, t)
mp.plot(t, inSignal, tout, outSignal)


Consider now the mechanics of the otolith organs. Since they are made up by complex, visco-elastic materials with a curved shape, their mechanics cannot be described with analytical tools. However, their movement can be simulated numerically with the finite element technique. Thereby the volume under consideration is divided into many small volume elements, and for each element the physical equations are approximated by analytical functions.

FE-Simulations: Small, finite elements are used to construct a mechanical model; here for example the saccule.

Here we will only show the physical equations for the visco-elastic otolith materials. The movement of each elastic material has to obey Cauchy’s equations of motion:

 \rho \frac{\partial^2 u_i}{\partial t^2} = \rho B_i + \sum_{j} \frac{\partial T_{ij}}{\partial x_j}

where  \rho is the effective density of the material,  u_i the displacements along the i-axis,  B_i the i-component of the volume force, and  T_{ij} the components of the Cauchy’s strain tensor.  x_j are the coordinates.

For linear elastic, isotropic material, Cauchy’s strain tensor is given by

 T_{ij} = \lambda e \delta_{ij} + 2 \mu E_{ij}

where  \lambda and  \mu are the Lamé constants;  \mu is identical with the shear modulus.  e = div(\vec u) , and  E_{ij} is the stress tensor

 E_{ij} = \frac{1}{2} \Big( \frac{\partial u_i}{\partial x_j} + \frac{\partial u_j}{\partial x_i} \Big).

This leads to Navier’s Equations of motion

 \rho \frac{\partial ^2 u_i}{\partial t^2} = \rho B_i + (\lambda + \mu) \frac{\partial e}{\partial x_i} + \mu \sum_{j} \frac{\partial ^2 u_i}{\partial x_j^2}

This equation holds for purely elastic, isotropic materials, and can be solved with the finite element technique. A typical procedure to find the mechanical parameters that appear in this equation is the following: when a cylindrical sample of the material is put under strain, the Young coefficient E characterizes the change in length, and the Poisson’s ratio  \nu the simultaneous decrease in diameter. The Lamé constants  \lambda and  \mu are related to E and  \nu by:

 E = \frac{\mu (3 \lambda + 2 \mu)}{\lambda + \mu}


 \nu = \frac{\lambda}{2(\lambda + \mu)}

Central Vestibular Processing

Central processing of vestibular information significantly affects the perceived orientation and movement in space. The corresponding information processing in the brainstem can often be modeled efficiently with control-system tools. As a specific example, we show how to model the effect of velocity storage.

Velocity Storage

The concept of velocity storage is based on the following experimental finding: when we abruptly stop from a sustained rotation about an earth-vertical axis, the cupula is deflected by the deceleration, but returns to its resting state with a time-constant of about 5 sec. However, the perceived rotation continues much longer, and decreases with a much longer time constant, typically somewhere between 15 and 20 sec.

Vestibular Modeling: The blue curve describes the deflection of the cupula as a response to a velocity step, modeled as a high-pass filter with a time-constant of 5 sec. The green curve represents the internal estimate of the angular velocity, obtained with an internal model of the cupula-response in a negative feedback look, and a feed-forward gain-factor of 2.

In the attached figure, the response of the canals to an angular velocity stimulus ω is modeled by the transfer function C, here a simple high-pass filter with a time constant of 5 sec. (The canal response is determined by the deflection of the cupula, and is approximately proportional to the neural firing rate.) To model the increase in time constant, we assume that the central vestibular system has an internal model of the transfer function of the canals, \hat{C}. Based on this internal model, the expected firing rate of the internal estimate of the angular velocity, \hat{\omega}, is compared to the actual firing rate. With a the gain-factor k set to 2, the output of the model nicely reproduces the increase in the time constant. The corresponding Python code can be found at [21].

It is worth noting that this feedback loop can be justified physiologically: we know that there are strong connections between the left and right vestibular nuclei. If those connections are severed, the time constant of the perceived rotation decreases to the peripheral time-constant of the semicircular canals.

Central Vestibular Processing can often be described with control-system models. Here "omega" is the head velocity, "C" the transfer function of the semicircular canals, and "k" a simple gain factor. The "hat"-ed variables indicate internal estimates.

Mathematically, negative feedback with a high gain has the interesting property that it can practically invert the transfer function in the negative feedback loop: if k>>1, and if the internal model of the canal transfer function is similar to the actual transfer function, the estimated angular velocity corresponds to the actual angular velocity.

  & \hat{\omega }=(\omega C-\hat{\omega }\hat{C})k \\ 
 & \hat{\omega }(1+\hat{C}k)=\omega Ck \\ 
 & \frac{{\hat{\omega }}}{\omega }=\frac{C}{1/k+\hat{C}}\,\,\xrightarrow[if\,C\approx \hat{C}]{k>>1}1  

Alcohol and the Vestibular System

As you may or may not know from personal experience, consumption of alcohol can also induce a feeling of rotation. The explanation is quite straightforward, and basically relies on two factors: i) alcohol is lighter than the endolymph; and ii) once it is in the blood, alcohol gets relatively quickly into the cupula, as the cupula has a good blood supply. In contrast, it diffuses only slowly into the endolymph, over a period of a few hours. In combination, this leads to a buoyancy of the cupola soon after you have consumed (too much) alcohol. When you lie on your side, the deflection of the left and right horizontal cupulae add up, and induce a strong feeling of rotation. The proof: just roll on the other side - and the perceived direction of rotation will flip around!

Due to the position of the cupulae, you will experience the strongest effect when you lie on your side. When you lie on your back, the deflection of the left and right cupula compensate each other, and you don't feel any horizontal rotation. This explains why hanging one leg out of the bed slows down the perceived rotation.

The overall effect is minimized in the upright head position - so try to stay up(right) as long as possible during the party!

If you have drunk way too much, the endolymph will contain a significant amount of alcohol the next morning - more so than the cupula. This explains while at that point, a small amount of alcohol (e.g. a small beer) balances the difference, and reduces the feeling of spinning.


  1. Kathleen Cullen and Soroush Sadeghi (2008). "Vestibular System". Scholarpedia 3(1):3013. 
  2. JM Goldberg, VJ Wilson, KE Cullen and DE Angelaki (2012). "The Vestibular System: a Sixth Sense"". Oxford University Press, USA. 
  3. Curthoys IS and Oman CM (1987). "Dimensions of the horizontal semicircular duct, ampulla and utricle in the human.". Acta Otolaryngol 103: 254–261. 
  4. Della Santina CC, Potyagaylo V, Migliaccio A, Minor LB, Carey JB (2005). "Orientation of Human Semicircular Canals Measured by Three-Dimensional Multi-planar CT Reconstruction.". J Assoc Res Otolaryngol 6(3): 191-206. 
  5. a b c d e f Perez Fornos, A.; Guinand, N.; Van De Berg, R.; Stokroos, R.; Micera, S.; Kingma, H.; Pelizzone, M.; and Guyot, J. (2014). "Artificial balance: restoration of the vestibulo-ocular reflex in humans with a prototype vestibular neuroprosthesis.". Frontiers in Neurology 5. 
  6. Cohen, B. and Suzuki, J. (1963). "Eye movements induced by ampullary nerve stimulation.". The American journal of physiology 204: 347-351. 
  7. a b c d e Golub, J. S.; Ling, L.; Nie, K.; Nowack, A.; Shepherd, S. J.; Bierer, S. M.; Jameyson, E.; Kaneko, C. R.; Phillips, J. O.; and Rubinstein, J. T. (2014). "Prosthetic Implantation of the Human Vestibular System.". Otology & Neurotology 1: 136–147. 
  8. Gong, W. and Merfeld, D. M. (2000). "Prototype neural semicircular canal prosthesis using patterned electrical stimulation.". Annals of Biomedical Engineering 28: 572-581. 
  9. Lewis, R. F.; Haburcakova, C.; Gong, W.; Makary, C.; and Merfeld, D. M. (2010). "Vestibuloocular Reflex Adaptation Investigated With Chronic Motion-Modulated Electrical Stimulation of Semicircular Canal Afferents.". Journal of Neurophysiology 103: 1066-1079. 
  10. Dai, C.; Fridman, G. Y.; Chiang, B.; Davidovics, N.; Melvin, T.; Cullen, K. E. and Della Santina, Charles C. (2011). "Cross-axis adaptation improves 3D vestibulo-ocular reflex alignment during chronic stimulation via a head-mounted multichannel vestibular prosthesis.". Experimental Brain Research 210: 595-606. 
  11. Dai, C.; Fridman, G. Y.; Davidovics, N.; Chiang, B.; Ahn, J. and Della Santina, C. C. (2011). "Restoration of 3D Vestibular Sensation in Rhesus Monkeys Using a Multichannel Vestibular Prosthesis.". Hearing Research 281: 74-83. 
  12. Dai, Chenkai and Fridman, Gene Y. and Chiang, Bryce and Rahman, Mehdi A. and Ahn, Joong Ho and Davidovics, Natan S. and Della Santina, Charles C. (2013). "Directional Plasticity Rapidly Improves 3D Vestibulo-Ocular Reflex Alignment in Monkeys Using a Multichannel Vestibular Prosthesis.". Journal of the Association for Research in Otolaryngology 14: 863-877. 
  13. Davidovics, Natan S. and Rahman, Mehdi A. and Dai, Chenkai and Ahn, JoongHo and Fridman, Gene Y. and Della Santina, Charles C. (2013). "Multichannel Vestibular Prosthesis Employing Modulation of Pulse Rate and Current with Alignment Precompensation Elicits Improved VOR Performance in Monkeys.". Journal of the Association for Research in Otolaryngology 14: 233-248. 
  14. a b c Phillips, Christopher and DeFrancisci, Christina and Ling, Leo and Nie, Kaibao and Nowack, Amy and Phillips, James O. and Rubinstein, Jay T. (2013). "Postural responses to electrical stimulation of the vestibular end organs in human subjects.". Experimental Brain Research 229: 181-195. 
  15. Fridman, Gene Y. and Della Santina, Charles C. (2013). "Safe Direct Current Stimulation to Expand Capabilities of Neural Prostheses.". IEEE Trans Neural Syst Rehabil Eng. 21: 319-328. 
  16. a b Guyot, Jean-Philippe and Sigrist, Alain and Pelizzone, Marco and Kos, Maria I. (2011). "Adaptation to steady-state electrical stimulation of the vestibular system in humans.". Annals of Otology, Rhinology & Laryngology 120: 143-149. 
  17. a b c Harris, David M. and Bierer, Steven M. and Wells, Jonathon D. and Phillips, James O. (2009). "Optical nerve stimulation for a vestibular prosthesis.". Processing of SPIE 5. 
  18. Della Santina, Charles C. and Migliaccio, Americo A. and Patel, Amit H. (2007). "A multichannel semicircular canal neural prosthesis using electrical stimulation to restore 3-D vestibular sensation.". IEEE transactions on bio-medical engineering 54: 1016-1030. 
  19. a b Lumbreras, Vicente and Bas, Esperanza and Gupta, Chhavi and Rajguru, Suhrud M. (2014). "Pulsed Infrared Radiation Excites Cultured Neonatal Spiral and Vestibular Ganglion Neurons by Modulating Mitochondrial Calcium Cycling.". Journal of Neurophysiology. 
  20. a b Dai, Chenkai and Fridman, Gene Y. and Della Santina, Charles C. (2011). "Effects of vestibular prosthesis electrode implantation and stimulation on hearing in rhesus monkeys.". Hearing Research 277: 204-210. 
  21. Thomas Haslwanter (2013). "Vestibular Processing: Simulation of the Velocity Storage [Python"]. 

Somatosensory System


Anatomy of the Somatosensory System

Our somatosensory system consists of sensors in the skin and sensors in our muscles, tendons, and joints. The receptors in the skin, the so called cutaneous receptors, tell us about temperature (thermoreceptors), pressure and surface texture (mechano receptors), and pain (nociceptors). The receptors in muscles and joints provide information about muscle length, muscle tension, and joint angles. (The following description is based on lecture notes from Laszlo Zaborszky, from Rutgers University.)

Cutaneous receptors


Receptors in the human skin: Mechanoreceptors can be free receptors or encapsulated. Examples for free receptors are the hair receptors at the roots of hairs. Encapsulated receptors are the Pacinian corpuscles and the receptors in the glabrous (hairless) skin: Meissner corpuscles, Ruffini corpuscles and Merkel’s disks.

Sensory information from Meissner corpuscles and rapidly adapting afferents leads to adjustment of grip force when objects are lifted. These afferents respond with a brief burst of action potentials when objects move a small distance during the early stages of lifting. In response to rapidly adapting afferent activity, muscle force increases reflexively until the gripped object no longer moves. Such a rapid response to a tactile stimulus is a clear indication of the role played by somatosensory neurons in motor activity.

The slowly adapting Merkel's receptors are responsible for form and texture perception. As would be expected for receptors mediating form perception, Merkel‘s receptors are present at high density in the digits and around the mouth (50/mm2 of skin surface), at lower density in other glabrous surfaces, and at very low density in hairy skin. This innervations density shrinks progressively with the passage of time so that by the age of 50, the density in human digits is reduced to 10/mm2. Unlike rapidly adapting axons, slowly adapting fibers respond not only to the initial indentation of skin, but also to sustained indentation up to several seconds in duration.

Activation of the rapidly adapting Pacinian corpuscles gives a feeling of vibration, while the slowly adapting Ruffini corpuscles respond to the lataral movement or stretching of skin.

Rapidly adapting Slowly adapting
Surface receptor / small receptive field Hair receptor, Meissner's corpuscle: Detect an insect or a very fine vibration. Used for recognizing texture. Merkel's receptor: Used for spatial details, e.g. a round surface edge or "an X" in brail.
Deep receptor / large receptive field Pacinian corpuscle: "A diffuse vibration" e.g. tapping with a pencil. Ruffini's corpuscle: "A skin stretch". Used for joint position in fingers.


Nociceptors have free nerve endings. Functionally, skin nociceptors are either high-threshold mechanoreceptors or polymodal receptors. Polymodal receptors respond not only to intense mechanical stimuli, but also to heat and to noxious chemicals. These receptors respond to minute punctures of the epithelium, with a response magnitude that depends on the degree of tissue deformation. They also respond to temperatures in the range of 40-60oC, and change their response rates as a linear function of warming (in contrast with the saturating responses displayed by non-noxious thermoreceptors at high temperatures).

Pain signals can be separated into individual components, corresponding to different types of nerve fibers used for transmitting these signals. The rapidly transmitted signal, which often has high spatial resolution, is called first pain or cutaneous pricking pain. It is well localized and easily tolerated. The much slower, highly affective component is called second pain or burning pain; it is poorly localized and poorly tolerated. The third or deep pain, arising from viscera, musculature and joints, is also poorly localized, can be chronic and is often associated with referred pain.


The thermoreceptors have free nerve endings. Interestingly, we have only two types of thermoreceptors that signal innocuous warmth and cooling respectively in our skin (however, some nociceptors are also sensitive to temperature, but capable of unamibiously signaling only noxious temperatures). The warm receptors show a maximum sensitivity at ~ 45°C, signal temperatures between 30 and 45°C, and cannot unambiguously signal temperatures higher than 45°C , and are unmyelinated. The cold receptors have their maximum sensitivity at ~ 27°C, signal temperatures above 17°C, and some consist of lightly myelinated fibers, while others are unmyelinated. Our sense of temperature comes from the comparison of the signals from the warm and cold receptors. Thermoreceptors are poor indicators of absolute temperature but are very sensitive to changes in skin temperature.


The term proprioceptive or kinesthetic sense is used to refer to the perception of joint position, joint movements, and the direction and velocity of joint movement. There are numerous mechanoreceptors in the muscles, the muscle fascia, and in the dense connective tissue of joint capsules and ligaments. There are two specialized encapsulated, low-threshold mechanoreceptors: the muscle spindle and the Golgi tendon organ. Their adequate stimulus is stretching of the tissue in which they lie. Muscle spindles, joint and skin receptors all contribute to kinesthesia. Muscle spindles appear to provide their most important contribution to kinesthesia with regard to large joints, such as the hip and knee joints, whereas joint receptors and skin receptors may provide more significant contributions with regard to finger and toe joints.

Muscle Spindles

Mammalian muscle spindle showing typical position in a muscle (left), neuronal connections in spinal cord (middle) and expanded schematic (right). The spindle is a stretch receptor with its own motor supply consisting of several intrafusal muscle fibres. The sensory endings of a primary (group Ia) afferent and a secondary (group II) afferent coil around the non-contractile central portions of the intrafusal fibres. Gamma motoneurons activate the intrafusal muscle fibres, changing the resting firing rate and stretch-sensitivity of the afferents.

Scattered throughout virtually every striated muscle in the body are long, thin, stretch receptors called muscle spindles. They are quite simple in principle, consisting of a few small muscle fibers with a capsule surrounding the middle third of the fibers. These fibers are called intrafusal fibers, in contrast to the ordinary extrafusal fibers. The ends of the intrafusal fibers are attached to extrafusal fibers, so whenever the muscle is stretched, the intrafusal fibers are also stretched. The central region of each intrafusal fiber has few myofilaments and is non-contractile, but it does have one or more sensory endings applied to it. When the muscle is stretched, the central part of the intrafusal fiber is stretched and each sensory ending fires impulses.

Numerous specializations occur in this simple basic organization, so that in fact the muscle spindle is one of the most complex receptor organs in the body. Only three of these specializations are described here; their overall effect is to make the muscle spindle adjustable and give it a dual function, part of it being particularly sensitive to the length of the muscle in a static sense and part of it being particularly sensitive to the rate at which this length changes.

  1. Intrafusal muscle fibers are of two types. All are multinucleated, and the central, non-contractile region contains the nuclei. In one type of intrafusal fiber, the nuclei are lined up single file; these are called nuclear chain fiber. In the other type, the nuclear region is broader, and the nuclei are arranged several abreast; these are called nuclear bag fibers. There are typically two or three nuclear bag fibers per spindle and about twice that many chain fibers.
  2. There are also two types of sensory endings in the muscle spindle. The first type, called the primary ending, is formed by a single Ia (A-alpha) fiber, supplying every intrafusal fiber in a given spindle. Each branch wraps around the central region of the intrafusal fiber, frequently in a spiral fashion, so these are sometimes called annulospiral endings. The second type of ending is formed by a few smaller nerve fibers (II or A-Beta) on both sides of the primary endings. These are the secondary endings, which are sometimes referred to as flower-spray endings because of their appearance. Primary endings are selectively sensitive to the onset of muscle stretch but discharge at a slower rate while the stretch is maintained. Secondary endings are less sensitive to the onset of stretch, but their discharge rate does not decline very much while the stretch is maintained. In other words, both primary and secondary endings signal the static length of the muscle (static sensitivity) whereas only the primary ending signals the length changes (movement) and their velocity (dynamic sensitivity). The change of firing frequency of group Ia and group II fibers can then be related to static muscle length (static phase) and to stretch and shortening of the muscle (dynamic phases).
  3. Muscle spindles also receive a motor innervation. The large motor neurons that supply extrafusal muscle fibers are called alpha motor neurons, while the smaller ones supplying the contractile portions of intrafusal fibers are called gamma neurons. Gamma motor neurons can regulate the sensitivity of the muscle spindle so that this sensitivity can be maintained at any given muscle length.

Golgi tendon organ

Mammalian tendon organ showing typical position in a muscle (left), neuronal connections in spinal cord (middle) and expanded schematic (right). The tendon organ is a stretch receptor that signals the force developed by the muscle. The sensory endings of the Ib afferent are entwined amongst the musculotendinous strands of 10 to 20 motor units.

The Golgi tendon organ is located at the musculotendinous junction. There is no efferent innervation of the tendon organ, therefore its sensitivity cannot be controlled from the CNS. The tendon organ, in contrast to the muscle spindle, is coupled in series with the extrafusal muscle fibers. Both passive stretch and active contraction of the muscle increase the tension of the tendon and thus activate the tendon organ receptor, but active contraction produces the greatest increase. The tendon organ, consequently, can inform the CNS about the “muscle tension”. In contrast, the activity of the muscle spindle depends on the “muscle length” and not on the tension. The muscle fibers attached to one tendon organ appear to belong to several motor units. Thus the CNS is informed not only of the overall tension produced by the muscle but also of how the workload is distributed among the different motor units.

Joint receptors

The joint receptors are low-threshold mechanoreceptors and have been divided into four groups. They signal different characteristics of joint function (position, movements, direction and speed of movements). The free receptors or type 4 joint receptors are nociceptors.

Proprioceptive Signal Processing

Feedback loops for proprioceptive signals for the perception and control of limb movements. Arrows indicate excitatory connections; filled circles inhibitory connections.

Modelling muscle spindles and afferent response

The response of the muscle spindles in mammals to muscle stretch has been thoroughly studied, and various models have been proposed. However, due to the difficulty in obtaining accurate data of the afferent and fusimotor responses during muscular movement, these models have usually been quite limited. For example, several of the earliest models account only for the afferent response, ignoring the fusimotor activity.

Mileusnic et al. (2006) model

One recent model, developed by Mileusnic et al. (2006), portrays the muscle spindle as consisting of several (typically 4 to 11) nuclear chain fibres, and two different nuclear bag fibres, connected in parallel as shown here in the figure below. The muscle fibres respond to three inputs: fascicle length, dynamic fusimotor input and static fusimotor input. The bag_1 fibre is mainly responsible for detecting dynamic fusimotor input, while the bag_2 and chain fibres are mainly responsible for detecting static fusimotor input. All fibres respond to changes in the fascicle length, and are modelled in largely the same way but with different coefficients to account for their different physiological properties. The responses of the three types of fibres are summed to generate the primary and secondary afferent activities. The primary afferent activity is affected by the response of all three types of muscle fibres, while the secondary afferent activity only depends on the bag_2 and chain fibre responses.


Hasan (1983) model

Another comprehensive model of muscle spindles was proposed by Hasan in 1983 [1]. This representation of muscle fibres and spindles is based closely on their physical properties. The muscle spindle is represented as two separate regions connected in series: sensory and non-sensory. The firing rate of the spindle afferent depends on the state of the two regions[1]. The lengths of the two regions can be labelled z(t) for the sensory and y(t) for the non-sensory region. The tension f(t) in the two regions is equal, since they are placed in series. The sensory zone can be assumed to act like a spring (equation (3)), while in the non-sensory region, tension is a non-linear function of y(t) (equation (2) derived by Hasan).

f(t) = k_1(y(t)-c)(1+[\frac{y'(t)}{a}]^{\frac{1}{3}}) \qquad \qquad \text{(2)}

f(t) = k_2z(t) \qquad \qquad \qquad \qquad \qquad \qquad \text{(3)}

The total length of the muscle spindle, x(t) is the sum of the length of the two regions (equation (4)).

x(t) = z(t) + y(t) \qquad \qquad \qquad \qquad \qquad \text{(4)}

Using this substitution and rearranging, we can derive the following expression for the length of the sensory zone (equation (5)):

z'(t) = x'(t) - a(\frac{bz(t) - x(t) + c}{x(t) - z(t) - c})^3 \qquad \text{(5)}

Here, parameter a represents the sensitivity of the tension to to velocity in the non-sensory zone, parameter b = (k_1+k_2)/k_1 and parameter c determines the zero-length tension which influences the background firing rate of the afferent. The length of the sensory zone depends not only on the current length and velocity of the spindle, but on the history of the length changes.

The firing rate, g(t) in Hasan's model depends on a combination of the sensory zone length and its first derivative (equation (6)), with an experimentally derived weighting.

g(t) = z(t) + 0.1z'(t) \qquad \qquad \qquad \qquad \text{(6)}

Model parameters

Approximate values for the model parameters a, b and c were suggested by Hasan (1983), and differ for voluntary and passive movements. A summary of these values is presented in the table below. Type of ending Condition A (mm/s) B C (mm)

Type of ending Condition A (mm/s) B C (mm)
Primary Passive 0.3 250 -15
Primary Gamma - dynamic 0.1 125 -15
Primary Gamma - static 100 100 -25
Secondary Passive 50 50 -20

In the model, these values are assumed to be static for the duration of a movement, however this is not believed to be the case.

Internal models of limb dynamics

In addition to modelling the response of muscle spindle afferents to muscle stretch, several groups have worked on modelling the signals which are sent from the brain to the spindle efferents in order for muscles to complete specific movements. The complexity here lies in the fact that the brain must be able to adapt to unexpected changes in the dynamics of planned movements, using feedback from the spindle afferents.

Studies in this area suggest that humans achieve this using internal models, which are built through an “error-feedback-learning” process, and transform planned muscle states into the motor commands required to achieve them. To generate the motor commands for a particular reaching movement, the brain performs calculations based on the expected dynamics of the planned movement. However, any unexpected changes in these dynamics while the movement is being executed (e.g. external strain placed on the muscle) will lead to errors in expected muscle length (Gottlieb 1994, Shadmehr and Muss-Ivaldi 1994). These errors are communicated to the brain through the muscle spindle afferents, which experience a different sensory state to what is expected. The brain then reacts to these error signals with short and long latency responses, which work to minimise the error, but cannot eliminate it completely due to the delay in the system.

Studies suggest that the error can be eliminated in a subsequent attempt at the movement under the same dynamics, and this is where the “error-feedback-learning” idea comes from (Thoroughman and Shadmehr 1999). The corrections which are generated by the brain form an internal model, which maps a desired action (in kinematic coordinates) to the necessary motor commands (as torques). This internal model can be represented as a weighted combination of basis elements:

torque = \sum w_i  g_i(\theta,\theta',\theta''...)

Here each basis g_i represents some characteristic of the muscle's sensory state, and the motor command is a “population code”. Population coding is a method of representing stimuli as the combined activity of many neurons (in contrast to rate coding). In order to use such a model, we need to know how the bases represent particular limb or muscle positions, and the neuronal firing rates associated with them. The bases can, in principle, represent every aspect of the state: position, velocity, acceleration and even higher derivatives. However, this high dimensionality makes it very difficult to derive relationships experimentally between each dimension of the bases and the firing rates.

Somatosensory Perception of Whiskers


Figure 1A. Overview of the whisker system in rats
Figure 1B. System level description of the ascending pathways from whiskers to barrel cortex.

The barrel Cortex is a specialized region in somatosensory cortex responsible for processing the tactile information from whiskers. As every other cortical region, the barrel cortex also preserves the columnar organization which plays a crucial role in information processing. Information from each whisker is represented in separate, discrete columns analogous to “barrels”, hence the name barrel cortex. Rodents use whiskers constantly to acquire sensory information from the environment. Given their nocturnal nature, tactile information carried by whisker forms the primary sensory signals to build a perceptual map of the environment. The whiskers on the snouts of mice and rats serve as arrays of highly sensitive detectors for acquiring tactile information as shown in Figure 1 A and B. By using their whiskers, rodents can build spatial representations of their environment, locate objects, and perform fine-grain texture discrimination. Somatosensory whisker-related processing is highly organized into stereotypical maps, which occupy a large portion of the rodent brain. During exploration and palpation of objects, the whiskers are under motor control, often executing rapid large-amplitude rhythmic sweeping movements, and this sensory system is therefore an attractive model for investigating active sensory processing and sensory-motor integration. In these animals, a large part of the neocortex is dedicated to the processing of information from the whiskers. Since rodents are nocturnal, visual information is relatively poor and they rely heavily on the tactile information from whiskers. Perhaps the most remarkable specialization of this sensory system is the primary somatosensory ‘‘barrel’’ cortex, where each whisker is represented by a discrete and well-defined structure in layer 4.

These layer 4 barrels are somatotopically arranged in an almost identical fashion to the layout of the whiskers on the snout i.e. bordering whiskers are represented in adjacent cortical areas [1]. Sensorimotor integration of whisker related activity leads to pattern discrimination and enables rodents to have a reliable map of the environment. This is an interesting model to study because rodents use whisker to “see” and this cross modality sensory information processing could help us to improve the life of humans, who are deprived of one sensory modality. Specifically, blind people can be trained to use somatosensory information to build a spatial map of the environment [2].

Pathways carrying whisker information to Barrel Cortex

Pathways carrying whisker information to Barrel Cortex
Figure 2. Schematic demonstrating the ascending pathway of rodent whisker-related sensorimotor system.

The tactile information from the whiskers on the snouts is carried through the trigeminal nerves, which terminate at the trigeminal nucleus as shown in Figure 2. The ascending pathway starts with the primary afferents in the trigeminal ganglion (TG) transducing whisker vibrations into neuronal signals, and projecting to the trigeminal brainstem complex (TN). The TN consists of the principal nucleus (PrV), and the spinal sub-nuclei (interpolarisSpVi; caudalisSpVc; the detailed connectivity of the oralis sub-nucleus is unknown and is omitted in the figure). The SpVi falls into a caudal and rostral part (SpVic and SpVir). The classical mono-whisker lemniscal pathway (lemniscal 1) originates in PrV barrelettes, and projects via VPM barreloid cores to primary somatosensory cortex (S1) barrel columns. A second lemniscal pathway originating from PrV has been recently discovered which carries multi-whisker signals via barreloid heads to septa (and dysgranular zone) of S1. The extra-lemniscal pathway originates in SPVic and carries multi-whisker signals via barreloid tails in VPM to the secondary somatosensory area. Finally, the parelemniscal pathway originates in SpVir and carries multi-whisker signals via POm to S1, S2, and primary motor area (M1). The different colours of connections indicate three principal pathways through which associative coupling between the sensorimotor cortical areas may be realized. Black indicates direct cortico-cotical connections. Blue shows cortico-thalamic cascades. Brown represents cortico-sub-cortical loops. Projections of S1 and S2 may open or close the lemniscal gate (i.e. gate signal flow through PrV) by modulating intrinsic TN circuitry.

Figure 3.Processing of whisker-related sensory information in barrel cortex. System level description of the pathways involved in the propagation of information from whiskers to cortex & columnar organization of the barrel cortex which receives information from single whisker.

The sensory neurons make excitatory glutamatergic synapses in the trigeminal nuclei of the brain stem. Trigemino-thalamic neurons in the principal trigeminal nucleus are organized into somatotopically arranged ‘‘barrelettes,’’ each receiving strong input from a single whisker as shown in (Figure 3). The principal trigeminal neurons project to the ventral posterior medial (VPM) nucleus of the thalamus, which is also somatotopically laid out into anatomical units termed ‘‘barreloids’’ VPM neurons respond rapidly and precisely to whisker deflection, with one ‘‘principal’’ whisker evoking stronger responses than all others. The axons of VPM neurons within individual barreloids project to the primary somatosensory neocortex forming discrete clusters in layer 4, which form the basis of the ‘‘barrel’’ map as shown in Figure 3.

Whisker information processing in Barrel Cortex with specialized local microcircuit

The deflection of a whisker is thought to open mechano-gated ion channels in nerve endings of sensory neurons innervating the hair follicle (although the molecular signalling machinery remains to be identified). The resulting depolarization evokes action potential firing in the sensory neurons of the infraorbital branch of the trigeminal nerve. The transduction through mechanical deformation is similar to the hair cells in the inner ear; in this case the contact of whiskers with the objects causes the mechano-gated ion channels to open. Cation-permeable ion channels let positively charged ions into the cells and causes depolarization, eventually leading to generation of action potentials. A single sensory neuron only fires action potentials to deflection of one specific whisker. The innervation of the hair follicle shows a diversity of nerve endings, which may be specialized for detecting different types of sensory input [3].

The layer 4 barrel map is arranged almost identically to the layout of the whiskers on the snout of the rodent. There are several recurrent connections in layer 4 and it sends axons to layer 2/3 neurons, which integrates information from other cortical regions like primary motor cortex. These intra-cortical and inter-cortical connections enable the rodents to achieve stimulus discrimination capabilities and to extract optimal information from the incoming tactile stimulus. Also, these projections play a crucial role in integrating somatosensory information with motor output. Information from whiskers is processed in the barrel cortex with specialized local microcircuits formed to extract optimal information about the environment. These cortical microcircuits are composed of excitatory and inhibitory neurons as shown in Figure 4.

Figure 4.Local Microcircuit in Barrel cortex. Left: schematic representation of the cortical layers (barrels within L4 in cyan ) with examples of typical dendritic morphologies of excitatory cortical neurons (in red , an L2 neuron; in violet , a spiny stellate L4 cell; in green , an L5B pyramidal neuron). Right: schematic representation of the main excitatory connections between cortical layers within a barrel column (black).

Learning whisker based object discrimination & texture differentiation

Rodents move their sensors to collect information, and these movements are guided by sensory input. When action sequences are required to achieve success in novel tasks, interactions between movement and sensation underlie motor control [4] and complex learned behaviours [5]. The motor cortex has important roles in learning motor skills [6-9], but its function in learning sensorimotor associations is unknown. The neural circuits underlying sensorimotor integration are beginning to be mapped. Different motor cortex layers harbour excitatory neurons with distinct inputs and projections [10-12]. Outputs to motor centres in the brain stem and spinal cord arise from pyramidal tract-type neurons in layer 5B (L5B). Within motor cortex, excitation descends from L2/3 to L5 [13, 14]. Input from somatosensory cortex impinges preferentially onto L2/3 neurons. L2/3 neurons [10] therefore directly link somatosensation and control of movements. In one of the recent studies [15], mice were trained head fixed in a vibrissa-based object-detection task while imaging populations of neurons [16]. Following a sound, a pole was moved to one of several target positions within reach of the whiskers (the ‘go’ stimulus) or to an out-of-reach position (the ‘no-go’ stimulus). Target and out-of-reach locations were arranged along the anterior–posterior axis; the out-of reach position was most anterior. Mice searched for the pole with one whisker row, the C row, and reported the pole as ‘present’ by licking, or ‘not present’ by withholding licking. Licking on go trials (hit) was rewarded with water, whereas licking on no-go trials (false alarm) was punished with a time-out during which the trial was stopped for 2 seconds. Trials without licking (no-go, correct rejection, go, and miss) were not rewarded or punished. All mice showed learning within the first two or three sessions. Performance reached expert levels after three to six training sessions. Learning the behavioural task was directly dependent on the motor related behaviour. Naive mice whisked occasionally in a manner unrelated to trail structure. Thus, object detection relies on a sequence of actions, linked by sensory cues. An auditory cue triggers whisking during the sampling period. Contact between whisker and object causes licking for a water reward during a response period. Silencing vM1 indicates that this task requires the motor cortex; with vM1 silenced, task-dependent whisking persisted, but was reduced in amplitude and repeatability, and task performance dropped.

Neural Correlates of Sensorimotor learning mechanism

Coding of touch in the motor cortex is consistent with direct input from vS1 to the imaged neurons. A model based on population coding of individual behavioural features also predicted motor behaviours. Accurate decoding of whisking amplitude, whisking set-point and lick rate suggests that vM1 controls these slowly varying motor parameters, as expected from previous motor cortex and neurophysiological experiments.


1 Feldmeyer D, Brecht M, Helmchen F, Petersen CCH, Poulet JFA, Staiger JF, Luhmann HJ, Schwarz C."Barrel cortex function" Progress in Neurobiology 2013, 103 : 3-27.

2 Lahav O, Mioduser D. "Multisensory virtual environment for supporting blind persons' acquisition of spatial cognitive mapping, orientation, and mobility skills." 2002.

3 Alloway KD. "Information processing streams in rodent barrel cortex: The differential functions of barrel and septal circuits." Cereb Cortex 2008, 18(5):979-989.

4 Scott SH. "Inconvenient truths about neural processing in primary motor cortex." The Journal of physiology 2008, 586(5):1217-1224.

5 Wolpert DM, Diedrichsen J, Flanagan JR. "Principles of sensorimotor learning." Nature reviews Neuroscience 2011, 12(12):739-751.

6 Wise SP, Moody SL, Blomstrom KJ, Mitz AR. "Changes in motor cortical activity during visuomotor adaptation." Experimental brain research Experimentelle Hirnforschung Experimentation cerebrale 1998, 121(3):285-299.

7 Rokni U, Richardson AG, Bizzi E, Seung HS. "Motor learning with unstable neural representations." Neuron 2007, 54(4):653-666.

8 Komiyama T, Sato TR, O'Connor DH, Zhang YX, Huber D, Hooks BM, Gabitto M, Svoboda K. "Learning-related fine-scale specificity imaged in motor cortex circuits of behaving mice." Nature 2010, 464(7292):1182-1186.

9 Hosp JA, Pekanovic A, Rioult-Pedotti MS, Luft AR. "Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning." The Journal of neuroscience : the official journal of the Society for Neuroscience 2011, 31(7):2481-2487.

10 Keller A. "Intrinsic synaptic organization of the motor cortex." Cereb Cortex 1993, 3(5):430-441.

11 Mao T, Kusefoglu D, Hooks BM, Huber D, Petreanu L, Svoboda K. "Long-range neuronal circuits underlying the interaction between sensory and motor cortex." Neuron 2011, 72(1):111-123.

12 Hooks BM, Hires SA, Zhang YX, Huber D, Petreanu L, Svoboda K, Shepherd GM. "Laminar analysis of excitatory local circuits in vibrissal motor and sensory cortical areas." PLoS biology 2011, 9(1):e1000572.

13 Anderson CT, Sheets PL, Kiritani T, Shepherd GM. "Sublayer-specific microcircuits of corticospinal and corticostriatal neurons in motor cortex." Nature neuroscience 2010, 13(6):739-744.

14 Kaneko T, Cho R, Li Y, Nomura S, Mizuno N. "Predominant information transfer from layer III pyramidal neurons to corticospinal neurons." The Journal of comparative neurology 2000, 423(1):52-65.

15 O'Connor DH, Clack NG, Huber D, Komiyama T, Myers EW, Svoboda K. "Vibrissa-based object localization in head-fixed mice." The Journal of neuroscience : the official journal of the Society for Neuroscience 2010, 30(5):1947-1967.

16 O'Connor DH, Peron SP, Huber D, Svoboda K. "Neural activity in barrel cortex underlying vibrissa-based object localization in mice." Neuron 2010, 67(6):1048-1061.

17 Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY. "Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein." Nature biotechnology 2004, 22(12):1567-1572.

18 Tian L, Hires SA, Mao T, Huber D, Chiappe ME, Chalasani SH, Petreanu L, Akerboom J, McKinney SA, Schreiter ER. "Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators." Nature methods 2009, 6(12):875-881.

Olfactory System


Probably the oldest sensory system in the nature, the olfactory system concerns the sense of smell. The olfactory system is physiologically strongly related to the gustatory system, so that the two are often examined together. Complex flavors require both taste and smell sensation to be recognized. Consequently, food may taste “different” if the sense of smell does not work properly (e.g. head cold).

Generally the two systems are classified as visceral sense because of their close association with gastrointestinal function. They are also of central importance while speaking of emotional and sexual functions.

Both taste and smell receptors are chemoreceptors that are stimulated by molecules soluted respectively in mucus or saliva. However these two senses are anatomically quite different. While smell receptors are distance receptors that do not have any connection to the thalamus, receptors pass up the brainstem to the thalamus and project to the postcentral gyrus along with those for touch and pressure sensibility for the mouth.

In this article we will first focus on the organs composing the olfactory system, then we will characterize them in order to understand their functionality and we will end explaining the transduction of the signal and the commercial application such as the eNose.

Sensory Organs

In vertebrates the main olfactory system detects odorants that are inhaled through the nose where they come to contact with the olfactory epithelium, which contains the olfactory receptors.

Olfactory sensitivity is directly proportional to the area in the nasal cavity near the septum reserved to the olfactory mucous membrane, which is the region where the olfactory receptor cells are located. The extent of this area is a specific between animals species. In dogs, for example, the sense of smell is highly developed and the area covered by this membrane is about 75 – 150 cm2; these animals are called macrosmatic animals. Differently in humans the olfactory mucous membrane cover an area about 3 – 5 cm2, thus they are known as microsmatic animals.

In humans there are about 10 million olfactory cells, each of which have 350 different receptor types composing the olfactory mucous membrane. The 350 different receptors are characteristic for only one odorant type. The bond with one odorant molecule starts a molecular chain reaction, which transforms the chemical perception into an electrical signal.

The electrical signal proceeds through the olfactory nerve’s axons to the olfactory bulbs. In this region there are between 1000 and 2000 glomerular cells which combine and interpret the potentials coming from different receptors. This way it is possible to unequivocally characterise e.g. the coffee aroma, which is composed by about 650 different odorants. Humans can distinguish between about 10.000 odors.

The signal then goes forth to the olfactory cortex where it will be recognized and compared with known odorants (i.e. olfactory memory) involving also an emotional response to the olfactory stimuli.

It is also interesting to note that the human genome has about 600 – 700 genes (~2% of the complete genome) specialized in characterizing the olfactory receptors, but only 350 are still used to build the olfactory system. This is a proof of the evolution change in the necessity of humans in using the olfaction.

Sensory Organ Components

1: Olfactory bulb 2: Mitral cells 3: Bone 4: Nasal Epithelium 5: Glomerulus 6: Olfactory receptor cells

Similar to other sensory modalities, olfactory information must be transmitted from peripheral olfactory structures, like the olfactory epithelium, to more central structures, meaning the olfactory bulb and cortex. The specific stimuli have to be integrated, detected and transmitted to the brain in order to reach sensory consciousness. However the olfactory system is different from other sensory systems in three fundamental ways [2]:

  1. Olfactory receptor neurons are continuously replaced by mitotic division of the basal cells of the olfactory epithelium. This is necessary due to the high vulnerability of the neurons, which are directly exposed to the environment.
  2. Due to phylogeny, olfactory sensory activity is transferred directly from the olfactory bulb to the olfactory cortex, without a thalamic relay.
  3. Neural integration and analysis of olfactory stimuli may not involve topographic organization beyond the olfactory bulb, meaning that spatial or frequency axis are not needed to project the signal.

Olfactory Mucous Membrane

The olfactory mucous membrane contains the olfactory receptor cells and in humans it covers an area about 3 – 5 cm^2 in the roof of the nasal cavity near the septum. Because the receptors are continuously regenerated it contains both the supporting cells and progenitors cells of the olfactory receptors. Interspersed between these cells are 10 – 20 millions receptor cells.

Olfactory receptors are neurons with short and thick dendrites. Their extended end is called an olfactory rod, from which cilia project to the surface of the mucus. These neurons have a length of 2 micrometers and have between 10 and 20 cilia of diameter about 0.1 micrometers.

The axons of the olfactory receptor neurons go through the cribriform plate of the ethmoid bone and enter the olfactory bulb. This passage is in absolute the most sensitive of the olfactory system; the damage of the cribriform plate (e.g. breaking the nasal septum) can imply the destruction of the axons compromising the sense of smell.

A further particularity of the mucous membrane is that with a period of a few weeks it is completely renewed.

Olfactory Bulbs

In humans, the olfactory bulb is located anteriorly with respect to the cerebral hemisphere and remain connected to it only by a long olfactory stalk. Furthermore in mammals it is separated into layers and consists of a concentric lamina structure with well-defined neuronal somata and synaptic neuropil.

After passing the cribriform plate the olfactory nerve fibers ramify in the most superficial layer (olfactory nerve layer). When these axons reach the olfactory bulb the layer gets thicker and they terminate in the primary dendrites of the mitral cells and tufted cells. Both these cells send other axons to the olfactory cortex and appear to have the same functionality but in fact tufted cells are smaller and consequently have also smaller axons.

The axons from several thousand receptor neurons converge on one or two glomeruli in a corresponding zone of the olfactory bulb; this suggests that the glomeruli are the unit structures for the olfactory discrimination.

In order to avoid threshold problems in addition to mitral and tufted cells, the olfactory bulb contains also two types of cells with inhibitory properties: periglomerular cells and granule cells. The first will connect two different glomeruli, the second, without using any axons, build a reciprocal synapse with the lateral dendrites of the mitral and tufted cells. By releasing GABA the granule cell on the one side of these synapse are able to inhibits the mitral (or tufted) cells, while on the other side of the synapses the mitral (or tufted) cells are able to excite the granule cells by releasing glutamate. Nowadays about 8.000 glomeruli and 40.000 mitral cells have been counted in young adults. Unfortunately this huge number of cells decrease progressively with the age compromising the structural integrity of the different layers.

Olfactory Cortex

The axons of the mitral and tufted cells pass through the granule layer, the intermediate olfactory stria and the lateral olfactory stria to the olfactory cortex. This tract forms in humans the bulk of the olfactory peduncle. The primary olfactory cortical areas can be easily described by a simple structure composed of three layers: a broad plexiform layer (first layer); a compact pyramidal cell somata layer (second layer) and a deeper layer composed by both pyramidal and nonpyramidal cells (third layer)[2]. Furthermore, in contrast to the olfactory bulb, only a little spatial encoding can be observed; “that is, small areas of the olfactory bulb virtually project the entire olfactory cortex, and small areas of the cortex receive fibers from virtually the entire olfactory bulb” [2].

In general the olfactory tract can be divided in five major regions of the cerebrum: The anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, the anterior cortical nucleus of the amygdala and the entorhinal cortex. Olfactory information is transmitted from primary olfactory cortex to several other parts of the forebrain, including orbital cortex, amygdala, hippocampus, central striatum, hypothalamus and mediodorsal thalamus.

Interesting is also to note that in humans, the piriform cortex can be activated by sniffing, whereas to activate the lateral and the anterior orbitofrontal gyri of the frontal lobe only the smell is required. This is possible because in general the orbitofrontal activation is greater on the right side than on the left side, this directly implies an asymmetry in the cortical representation of olfaction.

Signal Processing

Examples of olfactory thresholds[3].
Substance mg/L of Ari
Ethyl ether 5.83
Chloroform 3.30
Pyridine 0.03
Oil of peppermint 0.02
lodoform 0.02
Butyric acid 0.009
Propyl mercaptan 0.006
Artificial musk 0.00004
Methyl mercaptan 0.0000004

Only substances which come in contact with the olfactory epithelium can excite the olfactory receptors. The right table shows thresholds for some representative substances. These values give an impression of the huge sensitivity of the olfactory receptors.

It is remarkable that humans can recognize more than 10,000 different odors. Many odorant molecules differ only slightly in their chemical structure (e.g. stereoisomers) but can nevertheless be distinguished.

Signal Transduction

An interesting feature of the olfactory system is that a simple sense organ which apparently lacks a high degree of complexity can mediate discrimination of more than 10'000 different odors. On the one hand this is made possible by the huge number of different odorant receptor. The gene family of the olfactory receptor is in fact the largest family studied so far in mammals. On the other hand, the neural net of the olfactory system provides with its 1800 glomeruli a large two dimensional map in the olfactory bulb that is unique to each odorant. In addition, the extracellular field potential in each glomerulus oscillates, and the granule cells appear to regulate the frequency of the oscillation. The exact function of the oscillation is unknown, but it probably also helps to focus the olfactory signal reaching the cortex [2].

Smell measurement

Olfaction consists of a set of transformations from physical space of odorant molecules (olfactory physicochemical space), through a neural space of information processing (olfactory neural space), into a perceptual space of smell (olfactory perceptual space).[4] The rules of these transforms depend on obtaining valid metrics for each of those spaces.

Olfactory perceptual space

As the perceptual space represent the “input” of the smell measurement, it’s aim is to describe the odors in the most simple possible way. Odor are ordered so that their reciprocal distance in space confers them similarity. This mean that the more two odors are near each other in this space the more are they expected to be similar. This space is thus defined by so called perceptual axes characterized by some arbitrarily chosen “unit” odors.

Olfactory neural space

As suggested by its name the neural space is generated from neural responses. This gives rise to an extensive database of odorant-induced activity, which can be used to formulate an olfactory space where the concept of similarity serves as a guiding principle. Using this procedure different odorants are expected to be similar if they generate a similar neuronal response. This database can be navigated at the Glomerular Activity Response Archive [5].

Olfactory physicochemical space

The need to identify the molecular encryption of the biological interaction, makes the physicochemical space the most complex one of the olfactory space described so far. R. Haddad suggest that one possibility is to span this space would to represent each odorant by a very large number of molecular descriptors by use either a variance metric or a distance metric.[4] In his first description single odorants may have many physicochemical features and one expects these features to present themselves at various probabilities within the world of molecules that have a smell. In such metric the orthogonal basis generated from the description of the odorant leads to represent each odorant by a single value. While in the second, the metric represents each odorant with a vector of 1664 values, on the basis of Euclidean distances between odorants in the 1664 physicochemical space. Whereas the first metric enabled the prediction of perceptual attributes, the second enabled the prediction of odorant-induced neuronal response patterns.

Electronic measurement of odors

Nowadays odors can be measured electronically in a huge amount of different ways, some examples are: mass spectrography, gas chromatography, raman spectra and most recently electronic noses. In general they assume that different olfactory receptors have different affinities to specific molecular physicochemical properties, and that the different activation of these receptors gives rise to a spatio-temporal pattern of activity that reflects odors.

Electronic Nose

E-noses are artificial odor sensing devices based on a chemosensor array and pattern recognition. They are used to identify and quantify substances dissolved in air (or other carrier substances). An e-nose consists of a sampling device (analog to the nose), a sensor array (analog to the olfactory receptor neurons) and a computing unit (analog to the brain).

Sensor arrays

Like in the animal noses, unspecific sensors are used. This is not only due to the fact that it is very hard to find very specific sensors, but one also wants to cover a huge range of possible compounds without a sensor for each of them. Furthermore it is more robust, precise and efficient if the processing is based on information of more than one sensor. Such sensors experience a change in their electrical properties (E.g. higher resistance) when they come in contact with a compound. This alteration leads to a voltage change that is digitized (AD Converter).

The most frequently used sensor types include metal oxide semiconductors (MOS), quartz crystal microbalances (QCM), conducting polymers (CP) and surface acoustic wave (SAW) sensors. Another promising technology is bioelectronic noses that use proteins as sensors. It is also possible to use a combination of different sensors to get a more precise result and to combine the advantages of several sensor types, e.g better temporal responsivity versus better sensitivity.

Example: working principle of a conducting polymer sensor

A conducting polymer sensor consists of an array of about 2-40 different conducting polymers (long chains of organic molecules). Some odor molecules permeate into the polymer film and cause the film to expand thereby increasing its resistance. This increase in resistance of many polymer types can be explained by percolation theory.[6] Due to the chemical properties of the materials, different polymers react differently to the same odor.


The sensor signal has to be matched to an odorant mixture with a pattern recognition algorithm. It is possible to create a database of potential combinations and find the best match with multivariate statistical methods when an odor is presented or a neural network can be trained to recognize the patterns. Often also principal component analysis is used to reduce the dimensionality of the sensor data.


There are many applications for e-noses. They are used in aerospace and other industry to detect and monitor hazardous or harmful substances and for quality control. Possible applications in security are drug or explosive detection. E-noses may someday be able to replace police dogs. A very powerful application could be the diagnosis of diseases that alter the chemical composition of breath or the smell of excretions or blood, thereby potentially substituting invasive diagnostic techniques. It can also be employed to diagnose cancer, as certain cancer cells can be identified by their volatile organic compound profile. Cancer diagnosis by smell has already been found to work with dogs, flies,[7] but practically suitable methods with high sensitivity and specificity are still under development. Another medical application is the treatment of anosmia (inability to perceive odor) by an olfactory implant on basis of an e-nose. This too is still in development. In contrast, e-noses are already in use for environmental monitoring and protection. In robotics, e-noses could be used to follow airborne smells or smells on the ground. Especially for robotics it would be very interesting to have a better understanding of the insect’s olfactory system, since, in order to use the smell to navigate or to locate odor sources the often neglected temporal stimulus information has to be used.

Insects can follow odors as they can react to changes within about 150 milliseconds, and some of their receptors are able to depict fast odor concentration changes that occur in frequencies above at least 10 Hz. In contrast, conducting polymer as well as metal oxide e-noses have response times in the range of seconds to minutes [6] with only few exceptions reported in the range of tens of milliseconds.


  1. a b Hasan 1983, Hasan 1983
  2. a b c d Paxinos, G., & Mai, J. K. (2004). The human nervous system. Academic Press.
  3. Ganong, W. F., & Barrett, K. E. (2005). Review of medical physiology (Vol. 22). New York: McGraw-Hill Medical.
  4. a b Haddad, Rafi; Lapid, Hadas; Harel, David; Sobel, Noam (August 2008). "Measuring smells". Current Opinion in Neurobiology 18 (4): 438–444. doi:10.1016/j.conb.2008.09.007. 
  5. Glomerular Activity Response Archive
  6. a b Arshak, K.; Moore, E.; Lyons, G.M.; Harris, J.; Clifford, S. (June 2004). "A review of gas sensors employed in electronic nose applications". Sensor Review 24 (2): 181–198. doi:10.1108/02602280410525977. 
  7. Strauch, Martin; Lüdke, Alja; Münch, Daniel; Laudes, Thomas; Galizia, C. Giovanni; Martinelli, Eugenio; Lavra, Luca; Paolesse, Roberto et al. (6 January 2014). "More than apples and oranges - Detecting cancer with a fruit fly's antenna". Scientific Reports 4. doi:10.1038/srep03576. 



The Gustatory System or sense of taste allows us to perceive different flavors from substances like food, drinks, medicine etc. Molecules that we taste or tastants are sensed by cells in our mouth, which send information to the brain. These specialized cells are called taste cells and can sense 5 main tastes: bitter, salty, sweet, sour and umami (savory). All the variety of flavors that we know are combinations of molecules which fall into these categories.

Measuring the degree by which a substance presents one of the basic tastes is done subjectively by comparing its taste to a taste of a reference substance according to relative indexes of different substances. For the bitter taste quinine (found in tonic water) is used to rate how bitter a substance is. Saltiness can be rated by comparing to a dilute salt solution. The sourness is compared to diluted hydrochloric acid (H+Cl-). Sweetness is measured relative to sucrose. The values of these reference substances are defined as 1.


(Coffee, mate, beer, tonic water etc.)

It is considered by many as unpleasant. In general bitterness is very interesting because a large number of bitter compounds are known to be toxic so the bitter taste is considered to provide an important protective function. Plant leafs often contain toxic compounds. Herbivores have a tendency to prefer immature leaves, which have higher protein content and lower poison levels than mature leaves. It seems that even if the bitter taste is not very pleasant at first, there is a tendency to overcome this aversion because coffee and drinks containing rich amount of caffeine and are widely consumed. Sometimes bitter agents are added to substances to prevent accidental ingestion.


(Table salt)

The salty taste is primarily produced by the presence of cations such as Li+ (lithium ions), K+ (potassium ions) and more commonly Na+ (sodium). The saltiness of substances is compared to sodium chloride, which is typically used as table salt (Na+Cl-). Potassium chloride K+Cl- is the principal ingredient used in salt substitutes and has an index of 0.6 (see bellow part 5) compared to 1 of Na+Cl-.


(Lemon, orange, wine, spoiled milk and candies containing citric acid)

Sour taste can be mildly pleasant and it is linked to salty flavor but more exacerbated. Typically sour are fruits, which are over-riped, spoiled milk, rotten meat, and other spoiled foods, which can be dangerous. It also tastes acids (H+ ions) which taken in large quantities can cause irreversible tissue damage. Sourness is rated compared to hydrochloric acid (H+Cl-), which has a sourness index of 1.


(Sucrose (table sugar), cake, ice cream etc.)

Sweetness is regarded as a pleasant sensation and is produced by the presence of mostly sugars. Sweet substances are rated relative to sucrose, which has an index of 1. Nowadays there are many artificial sweeteners in the market, these include saccharin, aspartame and sucralose but it is still not clear how these substitutes activate the receptors.

Umami (savory or tasty)

(Cheese, soy sauce etc.)

Recently, monosodium glutamate (umami) has been added as the fifth taste. This taste signals the presence of L-glutamate and it is a very important for the Eastern cuisines.

Sensory Organs

Tongue and Taste Buds

Human tongue

Taste cells are epithelial and are clustered in taste buds located in the tongue, soft palate, epiglottis, pharynx and the esophagus the tongue being the primary organ of the Gustatory System.

Schematic drawing of a taste bud

Taste buds are located in papillae along the surface of the tongue. There are three types of papillae in human: fungiform located in the anterior part containing approximately five taste buds, circumvallate papillae which are bigger and more posterior than the previous ones and the foliate papillae that are in the posterior edge of the tongue. Circumvallate and foliate papillae contain hundreds of taste buds. In each taste bud there are different types of cells: basal, dark, intermediate and light cells. Basal cells are believed to be the stem cells that give rise to the other types. It is thought that the rest of the cells correspond to different stages of differentiation where the light cells are the most mature type of cells. An alternative idea is that dark, intermediate and light cells correspond to different cellular lineages. Taste cells are short lived and are continuously regenerated. They contain a taste pore at the surface of the epithelium where they extend microvilli, the site where sensory transduction takes place. Taste cells are innervated by fibers of primary gustatory neurons. They contact sensory fibers and these connections resemble chemical synapses, they are excitable with voltage-gated channels: K+, Na+ and Ca+ channels capable of generating action potentials. Although the reaction from different tastants varies, in general tastants interact with receptors or ion channels in the membrane of a taste cells. These interactions depolarize the cell directly or via second messengers and in this way the receptor potential generates action potentials within the taste cells, which lead to Ca2+ influx through Ca2+ voltage-gated channels followed by the release of neurotransmitters at the synapses with the sensory fibers.

Tongue map

The idea that the tongue is most sensitive to certain tastes in different regions was a long time misconception, which has now been proved to be wrong. All sensations come from all regions of the tongue.


An average person has about 5'000 taste buds. A "supertaster" is a person whose sense of taste is significantly more sensitive than average. The increase in the response is thought to be because they have more than 20’000 taste buds, or due to an increased number of fungiform papillae.

Transduction of Taste

As mentioned before we distinguish between 5 types of basic tastes: bitter, salty, sour, sweet and umami. There is one type of taste receptor for each flavor known and each type of taste stimulus is transduced by a different mechanisms. In general bitter, sweet and umami are detected by G protein-coupled receptors and salty and sour are detected via ion channels.


Bitter compounds act through G protein coupled receptors (GPCR’s) also known as a seven-transmembrane domains, which are located in the walls of the taste cells. Taste receptors of type 2 (T2Rs) which is a group of GPCR’s is thought respond to bitter stimuli. When the bitter-tasting ligand binds to the GPCR it releases the G protein gustducin, its 3 subunits break apart and activate phosphodiesterase, which in turn converts a precursor within the cell into a secondary messenger, closing the K+ channels. This secondary messenger stimulates the release of Ca2+, contributing to depolarization followed by neurotransmitter release. It is possible that bitter substances that are permeable to the membrane are sensed by mechanisms not involving G proteins.


The amiloride-sensitive epithelial sodium channel (ENaC), a type of ion channel in the taste cell wall, allows Na+ ions to enter the cell down an electrochemical gradient, altering the membrane potential of the taste cells by depolarizing the cell. This leads to an opening of voltage-gated Ca2+ channels, followed by neurotransmitter release.


The sour taste signals the presence of acidic compounds (H+ ions) and there are three receptors: 1) The ENaC, (the same protein involved in salty taste). 2) There are also H+ gated channels; one is the K+ channel, which allows K+ outflux of the cell. H+ ions block these so the K+ stays inside the cell. 3) A third channel undergoes a configuration change when a H+ attaches to it leading to an opening of the channel and allowing an influx of Na+ down the concentration gradient into the cell, leading to the opening of a voltage gated Ca2+ channels. These three receptors work in parallel and lead to depolarization of the cell followed by neurotransmitter release.


Sweet transduction is mediated by the binding of a sweet tastant to GPCR’s located in the apical membrane of the taste cell. Saccharide activates the GPCR, which releases gustducin and this in turn activates cAMP (cyclic adenylate monophosphate). cAMP will activate the cAMP kinase that will phosphorylate the K+ channels and eventually inactivate them, leading to depolarization of the cell and followed by neurotransmitter release.

Umami (Savory)

Umami receptors involve also GPCR’s, the same way as bitter and sweet receptors. Glutamate binds a type of the metabotropic glutamate receptor mGlurR4 causing a G-protein complex to activate a secondary receptor, which ultimately leads to neurotransmitter release. In particular how the intermediate steps work, is currently unknown.

Signal Processing

In humans, the sense of taste is transmitted to the brain via three cranial nerves. The VII facial nerve carries information from the anterior 2/3 part of the tongue and soft palate. The IX nerve or glossopharyngeal nerve carries taste sensations from the posterior 1/3 part of the tongue and the X nerve or vagus nerve carries information from the back of the oral cavity and the epiglottis.

The gustatory cortex is the brain structure responsible for the perception of taste. It consists of the anterior insula on the insular lobe and the frontal operculum on the inferior frontal gyrus of the frontal lobe. Neurons in the gustatory cortex respond to the five main tastes.

Taste cells synapse with primary sensory axons of the mentioned cranial nerves. The central axons of these neurons in the respective cranial nerve ganglia project to rostral and lateral regions of the nucleus of the solitary tract in the medulla. Axons from the rostral (gustatory) part of the solitary nucleus project to the ventral posterior complex of the thalamus, where they terminate in the medial half of the ventral posterior medial nucleus. This nucleus projects to several regions of the neocortex, which include the gustatory cortex.

Gustatory cortex neurons exhibit complex responses to changes in concentration of tastant. For one tastant, the same neuron might increase its firing and for an other tastant, it may only respond to an intermediate concentration.

Taste and Other Senses

In general the Gustatory Systems does not work alone. While eating, consistency and texture are sensed by the mechanoreceptors from the somatosensory system. The sense of taste is also correlated with the olfactory system because if we lack the sense of smell it makes it difficult to distinguish the flavor.

Spicy food

(black peppers, chili peppers, etc.)

It is not a basic taste because this sensation does not arise from taste buds. Capsaicin is the active ingredient in spicy food and causes “hotness” or “spiciness” when eaten. It stimulates temperature fibers and also nociceptors (pain) in the tongue. In the nociceptors it stimulates the release of substance P, which causes vasodilatation and release of histamine causing hiperalgesia (increased sensitivity to pain).

In general basic tastes can be appetitive or aversive depending on the effect that the food has on us but also essential to the taste experience are the presentation of food, color, texture, smell, previous experiences, expectations, temperature and satiety.

Taste disorders

Ageusia (complete loss of taste)

Ageusia is a partial or complete loss in the sense of taste and sometimes it can be accompanied by the loss of smell.

Dysgeusia (abnormal taste)

Is an alteration in the perception associated with the sense of taste. Tastes of food and drinks vary radically and sometimes the taste is perceived as repulsive. The causes of dysgeusia can be associated with neurologic disorders.

Sensory Systems in Non-Primates

Primates are animals belonging to the class of mammals. Primates include humans and the nonhuman primates, the apes, monkeys, lemurs, tree-shrews, lorises, bush babies and tarsiers. They are characterized by a voluminous and complicated forebrain. Most have excellent sight and are highly adapted to an arboreal existence, including in some species the possession of a prehensile tail. Non primates on the other hand often posses smaller brains. But as we learn more about the rest of the animal world, it’s becoming clear that non-primates are pretty intelligent too. Some examples include pigs, octopus, and crows.[1]

In many branches of mythology, the crow plays a shrewd trickster, and in the real world, crows are proving to be quite a clever species. Crows have been found to engage in feats such as tool use, the ability to hide and store food from season to season, episodic-like memory, and the ability to use personal experience to predict future conditions.

As it turns out, being piggy is actually a pretty smart tactic. Pigs are probably the most intelligent domesticated animal on the planet. Although their raw intelligence is most likely commensurate with a dog or cat, their problem-solving abilities top those of felines and canine pals.

If pigs are the most intelligent of the domesticated species, octopuses take the cake for invertebrates. Experiments in maze and problem-solving have shown that they have both short-term and long-term memory. Octopuses can open jars, squeeze through tiny openings, and hop from cage to cage for a snack. They can also be trained to distinguish between different shapes and patterns. In a kind of play-like activity (one of the hallmarks of higher intelligence species) octopuses have been observed repeatedly releasing bottles or toys into a circular current in their aquariums and then catching them.

Sensing of Infrared Radiation in Snakes


Location of pit organs. Above: python, below: crotalus.

When seeing or sometimes even thinking of snakes, many people feel uncomfortable or even scared. There is a reason that they are considered being mythical. Snakes are different when compared to other animals: they do not have legs, they are long and move elegantly and without a noise, some of them are venomous and they steadily use their forked tongue to smell. Some of them are fast and effective killers even by night. Something that definitely makes them special is their „sixth sense“: the ability to detect infrared radiation. Similar to night viewers, snakes are capable of detecting heat changes in their surroundings and thus obtaining a detailed picture of it. There are at least two different groups of snakes which have separately developed this ability: in the first are the pit vipers, and in the second boas and pythons (those two are often classified into one group called “boids”). However, snakes are not the only species which have evolved this sense: vampire bats and some groups of insects have also developed it. Even at night pit vipers, boas and pythons can make out rodents due to the heat they emit. It can be detected by a sensory system that allows them to „see“ electromagnetic radiation with long wavelengths ranging from 750 nm to 1 mm. The organs which make that possible are called “pit organs”, and are located under their eyes, inside two hollows of the maxilla bone. They are immensely sensitive as they can even detect changes in temperature of as little as 0.003K.

Anatomy of Sensing Organ

Structure of the pit organ. The width of the hollow is half the size of the membrane that seperates the two air-filled chambers. The dendrites of the trigenimal lie in the membrane's back. The infrared beams are projected like in a pinhole camera.

The infrared-sensing organs of vipers and boids are similar in their physiological structure but differ in their number, location and morphology. The anatomy is quite simple and will be explained in the example of the crotalus, a venomous pit viper found only in the Americas from southern Canada to northern Argentina. It consists of a hollow space that is seperated into two air-filled chambers by a thin membrane of the thickness of 0.01 mm. It is filled with sensory cells of the trigenimal nerve (TNM). Roughly 7000 in number, they transduce the heat through heat sensitive ion channels, and increase their firing rate when a positive change in temperature occurs and decrease in the opposite case. They are very sensitive due to the spatial proximity of these thermoreceptors to the outside and also because of the air-filled chamber that lies underneath. This air-filled chamber works as an insulator in separating tissues that would otherwise quickly exchange heat energy. Thus, the absorbed thermal energy is used exclusively by the sensory system and is not lost to lower-lying tissues. This simple but sophisticated anatomy is the reason for the unique sensitivity of the pit organs. The pit organs’s physique allows even the detection of the radiation’s direction. The external opening is roughly half as large as the membrane. Thus, the whole organ works according to the optics of a pinhole camera: the position of the irradiated spot provides information about the object’s location. The heat itself is detected by the activation of heat sensitive ion channels called TRPA1. In other animals these channels also exist but have other functions like detecting chemical irritants or cold. Pit vipers and boids seem to have evolved the infrared-sensing independently. Since the heat sensitive ion channels have different thermal thresholds in different snakes, the temperature sensitivity differs among the snakes. Crotalus have the most sensitive channels. Snakes that are not able to detect infrared radiation also possess those channels, but their thermal threshold is too high to detect infrared radiation.

Brain’s Anatomy

Every sensory organ has a dedicated brain region to process the collected information. Snakes evaluate infrared sensory input from the pit organs in the nucleus of the lateral descending trigeminal tract (“LTTD”), a unique region in their metencephalon which has not been found in other animals. The LTTD is linked to the tectum opticum via the reticularis caloris (“RC”). Its function is still unknown. In the tectum opticum visual and infrared stimuli are connected, in order to provide a detailed idea of the animal’s surrounding.


Experiments have shown that the detection of heat targets must be quite accurate as snakes hit thermal sources with a low error even without the help of vision. Measurements have determined that the opening angle of an infrared beam falling onto the pit organ is 45 to 60 degrees. Depending on where the heat source is relatively to the snake, the beam hits the pit’s membrane on a different spot. The receptive field of the infrared sensing system on the tectum opticum is similarly represented as the visual receptive field. The front-end of the tectum opticum receives its input from the back part of the pit membrane and the retina, and thus processes stimuli from the front part of the visual field. Similarly, the back and the sides of the visual field are represented in the back part of the tectum opticum and the front part of the pit membrane and the retina. The receptive fields of the visual and infrared sensory systems overlap almost perfectly within the tectum opticum, such that the neurons there receive and process sensory information from two senses, from more or less the same direction. While crotalus only have two pit organs, the anatomy of the temperature sensors is much more complicated in boas and pythons. They possess 13 pit organs on each side of the head. Every one of those also works like a pinhole camera that reverses the picture. The information of the front part of the visual field is again processed in the front part of the tectum opticum but now, the receptive field of every pit organ is projected onto a different part of it. The front pit organs are represented in the front part of the tectum opticum and the back parts in the back. In addition, the receptive fields of the different pit organs overlap, and thus provide a more or less continuous projective field that matches the visual one. It is curious that the front part of every pit organ is projected to the back part of the receptive field in the tectum opticum, an organization that is quite complicated and unique. The tectum opticum contains six different kinds of neurons which fire for infrared and/or visual stimuli. Some cell types respond only if there is a visual and an infrared stimulus, while others respond for any kind of stimulus. There are cells that respond for one of the sensory input if it comes alone, but increases its firing rate for simultaneous input from both systems. The last group of cells works the other way around. Some of them respond strongly for visual stimuli and stop firing when stimuli from the pit organs also arrive or vice versa. What do snakes with pit organs need these different kinds of neurons for? The processing in their brain has to help the snakes with different tasks: first of all, the snake should be able to detect and locate stimuli. Second, they have to be identified and reacted to appropriately. The cells that respond to both visual and infrared stimuli independently from each other could be responsible for the first task. Cells that only respond if they get both stimuli at the same time could work as detectors for living, moving objects. Moreover, cells that stop firing as soon as the visual stimuli is completed with an infrared signal could be especially important for detecting the cool surrounding like leaves or trees. The interaction between the different types of cells are important for correctly identifying the stimuli. They are not only used for identifying warm-blooded prey, but also for identifying predators and the snake’s thermoregulation.

Common vampire bat. The infrared sensing organs are located in the nose.

Infrared Sensing in Vampire Bats

Vampire bats are the only mammals that are able to detect infrared radiation. To do so they have three hollows in their nose which contain the sensing organs. While they also use ion channels to detect heat, it is a different type of ion channels than in snakes. In other mammals and even everywhere in their own body except for the nose this type of molecule is responsible for sensing pain and burning. However, in the nose the threshold is much lower. The channel already detects changes in temperature from 29°C on. This allows vampire bats to locate heat sources at a distance of 20 cm and helps them to find blood-rich spots on their prey.


Newman, E.A., and Hartline, P.H. (1982) Infrared "vision" in snakes. Scientific American 246(3):116-127 (March).

Gracheva et. al.: Molecular Basis of Infrared Detection by Snakes. Nature. 2010 April 15; 464(7291): 1006-1011

Campbell et. Al.: Biological infrared Imaging and sensing. Micron 33 (2002) 211-225

Gracheva et. al.: Ganglion-specific splicing of TRPV1 underlies infrared sensation in vampire bats. Nature.476, 88-91 (04.08.2011)

Neural Mechanism for Song Learning in Zebra Finches


Over the past four decades songbirds have become a widely used model organism for neuroscientists studying complex sequential behaviours and sensory-guided motor learning. Like human babies, young songbirds learn many of the sounds they use for communication by imitating adults. One songbird in particular, the zebra finch (Taeniopygia guttata), has been the focus of much research because of its proclivity to sing and breed in captivity and its rapid maturation. The song of an adult male zebra finch is a stereotyped series of acoustic signals with structure and modulation over a wide range of time scales, from milliseconds to several seconds. The adult zebra finch song comprises a repeated sequence of sounds, called a motif, which lasts about a second. The motif is composed of shorter bursts of sound called syllables, which often contain sequences of simpler acoustic elements called notes as shown in Fig.1. The songbirds learning system is a very good model to study the sensory-motor integration because the juvenile bird actively listens to the tutor and modulates its own song by correcting for errors in the pitch and offset. The neural mechanism and the architecture of the song bird brain which plays a crucial role in learning is similar to the language processing region in frontal cortex of humans. Detailed study of the hierarchical neural network involved in the learning process could provide significant insights into the neural mechanism of speech learning in humans.

Figure 1: Illustration of the typical song structure & learning phases involved in song bird. Upper panel: Phases involved in the song learning process. Middle panel: Structure of a crystallized song a,b,c,d,e denote the various syllable in the song. Lower panel: Evolution of the song dynamics during learning.

Illustration of the typical song structure & learning phases involved in song bird.

Song-learning proceeds through a series of stages, beginning with sensory phase where the juvenile bird just listens to its tutor (usually its father) vocalizing, often without producing any song-like vocalization itself. The bird uses this phase to memorize a certain structure of the tutor song, forming the neural template of the song. Then it enters the sensorimotor phase, where it starts babbling the song and correcting its errors using auditory feedback. The earliest attempt to recreate the template of the tutor song is highly noisy, unstructured and variable and it is called sub-song. An example is shown in the spectrogram in Fig.1. Through the subsequent days the bird enters a “plastic phase” where there is a significant amount of plasticity in the neural network responsible for generating highly structured syllables and the variability is reduced in the song. By the time they reach sexual maturity, the variability is substantially eliminated—a process called crystallization—and the young bird begins to produce a normal adult song, which can be a striking imitation of the tutor song (Fig.1). Thus, the gradual reduction of song variability from early sub-song to adult song, together with the gradual increase in imitation quality, is an integral aspect of vocal learning in the songbird. In the following sections we will explore several parts of the avian brain and the underlying neural mechanisms that are responsible for this remarkable vocal imitation observed in these birds.

Hierarchical Neural Network involved in the generation of song sequences

It is important to understand the neuroanatomy of the songbird in detail because it provides significant information about the learning mechanisms involved in various motor and sensory integration pathways. This could ultimately shed light on the language processing and vocal learning in humans. The exact neuroanatomical data about human speech processing system is still unknown and songbird anatomy and physiology will enable us to make plausible hypotheses. The comparison of the mammalian brain and a songbird (avian) brain is made in the final section of this chapter in (Fig. 6). The pathway observed in the avian brain can be broadly divided into motor control and anterior forebrain pathway as shown in (Fig.2). The auditory pathway provides the error feedback signals which leads to potentiation or depression of the synaptic connections involved in motor pathways, which plays a significant role in vocal learning. The motor control pathway includes Hyperstriatum Ventrale, pars Caudalis (HVC), Robust Nucleus of Acropallium (RA), Tracheosyringeal subdivision of the hypoglossal nucleus (nXIIts) and Syrinx. This pathway is necessary for generating the required motor control signals which produce highly structured songs and coordinating breathing with singing. The anterior forebrain pathway includes Lateral magnocellular nucleus of anterior nidopallium (LMAN), Area X (X) and the medial nucleus of dorsolateral thalamus (DLM). This pathway plays a crucial role in song learning in juveniles, song variability in adults and song representation. The auditory pathway includes substantia nigra (SNc) and the ventral tegmental area (VTA), which plays a crucial role in auditory inputs processing and analyzing the feedback error. The muscles of the syrinx are innervated by a subset of motor neurons from nXIIts. A primary projection to the nXIIts descends from neurons in the forebrain nucleus RA. Nucleus RA receives motor-related projections from another cortical analogue, nucleus HVC, which in turn receives direct input from several brain areas, including thalamic nucleus uvaeformis (Uva).

Figure 2. Architecture of the song bird brain & various pathways carrying motor and auditory feed- back signals.

Neural Mechanism for the generation of highly structured & temporally precise syllable pattern

Nuclei HVC and RA are involved in the motor control of song in a hierarchical manner (Yu and Margoliash 1996). Recordings in singing zebra finches have shown that HVC neurons that project to RA transmit an extremely sparse pattern of bursts: each RA-projecting HVC neuron generates a single highly stereotyped burst of approximately 6 ms duration at one specific time in the song (Hahnloser, Kozhevnikov et al. 2002). During singing, RA neurons generate a complex sequence of high-frequency bursts of spikes, the pattern of which is precisely reproduced each time the bird sings its song motif (Yu and Margoliash 1996). During a motif, each RA neuron produces a fairly unique pattern of roughly 12 bursts, each lasting ~10 ms (Leonardo and Fee 2005). Based on the observations that RA-projecting HVC neurons generate a single burst of spikes during the song motif and that different neurons appear to burst at many different times in the motif, it has been hypothesized that these neurons generate a continuous sequence of activity over time (Fee, Kozhevnikov et al. 2004, Kozhevnikov and Fee 2007). In other words, at each moment in the song, there is a small ensemble of HVC (RA) neurons active at that time and only at that time (Figure 3), and each ensemble transiently activates (for ~10 ms) a subset of RA neurons determined by the synaptic connections of HVC neurons in RA (Leonardo and Fee 2005). Further, in this model the vector of muscle activities, and thus the configuration of the vocal organ, is determined by the convergent input from RA neurons on a short time scale, of about 10 to 20 ms. The view that RA neurons may simply contribute transiently, with some effective weight, to the activity of vocal muscles is consistent with some models of cortical control of arm movement in primates (Todorov 2000). A number of studies suggest that the timing of the song is controlled on a millisecond-by-millisecond basis by a wave, or chain, of activity that propagates sparsely through HVC neurons. This hypothesis is supported by an analysis of timing variability during natural singing (Glaze and Troyer 2007) as well as experiments in which circuit dynamics in HVC were manipulated to observe the effect on song timing. Thus, in this model, song timing is controlled by propagation of activity through a chain in HVC; the generic sequential activation of this HVC chain is translated, by the HVC connections in RA, into a specific precise sequence of vocal configurations.

Figure 3. Mechanisms of sequence generation in the adult song motor pathway. Illustration of the hypothesis that RA-projecting HVC (HVC(RA)) neurons burst and activate each other sequentially in groups of 100 to 200 coactive neurons. Each group of HVC neurons drives a distinct ensemble of RA neurons to burst. The neurons converge with some effective weight at the level of the motor neurons to activate syringeal muscles.

Synaptic Plasticity in Posterior Forebrain Pathway is a potential substrate for vocal learning

A number of song-related avian brain areas have been discovered (Fig. 4A). Song production areas include HVC (Hyperstriatum Ventrale, pars Caudalis) and RA (robust nucleus of the arcopallium), which generate sequences of neural activity patterns and through motor neurons control the muscles of the vocal apparatus during song (Yu and Margoliash 1996, Hahnloser, Kozhevnikov et al. 2002, Suthers and Margoliash 2002). Lesion of HVC or RA causes immediate loss of song (Vicario and Nottebohm 1988). Other areas in the anterior forebrain pathway (AFP) appear to be important for song learning but not production, at least in adults. The AFP is regarded as an avian homologue of the mammalian basal ganglia thalamocortical loop (Farries 2004). In particular, lesion of area LMAN (lateral magnocellular nucleus of the nidopallium) has little immediate effect on song production in adults, but arrests song learning in juveniles (Doupe 1993, Brainard and Doupe 2000). These facts suggest that LMAN plays a role in driving song learning, but the locus of plasticity is in brain areas related to song production, such as HVC and RA. Doya and Senjowski in 1998 proposed a tripartite schema, in which learning is based on the interactions between actor and a critic (Fig.4B). The critic evaluates the performance of the actor at a desired task. The actor uses this evaluation to change in a way that improves its performance. To learn by trial and error, the actor performs the task differently each time. It generates both good and bad variations, and the critic’s evaluation is used to reinforce the good ones. Ordinarily it is assumed that the actor generates variations by itself. However, the source of variation is external to the actor. We will call this source the experimenter. The actor was identified with HVC, RA, and the motor neurons that control vocalization. The actor learns through plasticity at the synapses from HVC to RA (Fig. 4C). Based on evidence of structural changes like axonal growth and retraction that take place in the HVC to RA projection during song learning, this view is widely regarded as a plausible mechanism. For the experimenter & critic, Doya and Senjowski turned to the anterior forebrain pathway, hypothesizing that the critic is Area X and the experimenter is LMAN.

Figure 4. Plasticity in Specific pathways enabling learning. (A) Avian song pathways and the tripartite hypotheses. A: avian brain areas involved in song production and song learning. Premotor pathway (open) includes areas necessary for song production. Anterior forebrain pathway (filled) is required for song learning but not for song production. (B) Tripartite reinforcement learning schema: the actor produces behaviour; the experimenter sends fluctuating input to the actor, producing variability in behaviour that is used for trial-and-error learning; the critic evaluates the behaviour of the actor and sends a reinforcement signal to it. For birdsong, the actor includes premotor song production areas HVC and RA. (C) Plastic and empiric synapses. RA receives synaptic input from both HVC and LMAN. We will call the HVC synapses “plastic,” in keeping with the hypothesis that these synapses are the locus of plasticity for song learning.

Biophysically realistic synaptic plasticity rules underlying song learning mechanism

Biophysically realistic model

The role of LMAN input to RA is to produce a fluctuation that is static over the duration of a song bout, directly in the synaptic strengths from premotor nucleus HVC to RA. From a functional perspective, the model of Doya and Sejnowski is akin to weight perturbation (Dembo and Kailath 1990, Seung 2003) and relatively easy to implement: a temporary but static HVC->RA weight change that lasts the duration of one song causes some change in song performance. If performance is good, the critic sends a reinforcement signal that makes the temporary static perturbation permanent. From a neurobiological perspective this model requires machinery whereby N-methyl-Daspartate (NMDA)-mediated synaptic transmission from LMAN to RA can drive synaptic weight changes that remain static over the 1 to 2 seconds. In short, LMAN appears to drive fast, transient song fluctuations on a subsyllable level, affected by ordinary excitatory transmission that drives dynamic postsynaptic membrane conductance fluctuations in the postsynaptic RA neurons. The goal of this model is to relate the highlevel concept of reinforcement learning by the tripartite schema to a biologically realistic lower level of description in terms of microscopic events at synapses and neurons in the birdsong system. It should demonstrate song learning in a network of realistic spiking neurons, and examine the plausibility of reinforcement algorithms in explaining biological fine motor skill learning with respect to learning time in the birdsong network. The present model is based on many of the same general assumptions that were made by Doya and Sejnowski. We assume a tripartite actor-critic-experimenter schema. The critic is weak, providing only a scalar evaluation signal. The HVC sequence is fixed, and only the map from HVC to the motor neurons is learned, through plasticity at the HVC->RA synapses. LMAN perturbs song through its inputs to the song premotor pathway. However, the structure and dynamics of LMAN inputs, and their influence on learning, are different, with distinct neurobiological implications. In keeping with our hypothesis that the function of LMAN drive to RA is to perform experiments for trial-and-error learning, the connections from LMAN to RA will be called empiric synapses (Fig. 4C). The conductance of the plastic synapse from neuron j in HVC to neuron i in RA is given by W_{ij}S_{ij} HVC(t) , where the synaptic activation S_{ij} HVC(t) determines the time course of conductance changes, and the plastic parameter W_{ij} determines their amplitude. Changes in W_{ij} are governed by the plasticity rule is given by

\frac{\partial {{W}_{ij}}}{\partial t}=\eta \,R(t)\,{{e}_{ij}}(t)

The positive parameter \eta, called the learning rate, controls the overall amplitude of synaptic changes. The eligibility trace e_{ij}(t) is a hypothetical quantity present at every plastic synapse. It signifies whether the synapse is "eligible" for modification by reinforcement and is based on the recent activation of the plastic synapse and the empiric synapse onto the same RA neuron

{{e}_{ij}}(t)=\int\limits_{0}^{t}{d{t}'}\,G(t-{t}')\left[ S_{i}^{LMAN}(t)-\left\langle S_{i}^{LMAN} \right\rangle  \right]\,\,S_{ij}^{HVC}(t)

Here S_{i}^{LMAN}(t) is the conductance of the empiric (LMAN->RA) synapse onto the RA neuron. The temporal filter G(t) is assumed to be nonnegative, and its shape determines how far back in time the eligibility trace can "remember" the past. The instantaneous activation of the empirical synapses is dependent on the average activity \left\langle \,S_{i}^{LMAN}(t) \right\rangle . The learning principles follows two basic rules shown in (Fig.5). First rule: If coincident activation of a plastic (HVC->RA) synapse and empiric (LMAN->RA) synapse onto the same RA neuron is followed by positive reinforcement, then the plastic synapse is strengthened. Second rule: If activation of a plastic synapse without activation of the empiric synapse onto the same RA neuron is followed by positive reinforcement, then the plastic synapse is weakened. The rules based on dynamic conductance perturbations of the actor neurons perform stochastic gradient ascent on the expected value of the reinforcement signal. This means that song performance as evaluated by the critic is guaranteed to improve on average.

Comparison between Mammalian & Songbird brain architecture

The avian Area X is homologous to the mammalian basal ganglia (BG) and includes striatal and pallidal cell types. The BG forms part of a highly conserved anatomical loop-through several stations, from cortex to the BG (striatum and pallidum), then to thalamus and back to cortex. Similar loops are seen in the songbird: the cortical analogue nucleus LMAN projects to Area X, the striatal components of which project to the thalamic nucleus DLM, which projects back to LMAN. Striatal components accounts for reward basing learning and reinforcement learning. The neuron types and its functionality are exactly comparable in Area X of birds to basal ganglia in humans as shown (in Fig.6). The close anatomical similarity motivates us to learn the song bird brain in more detail because with this we can finally achieve some significant understanding of the speech learning in humans and treat many speech related disorders with higher precision.

Figure 6. Comparison of mammalian and avian basal ganglia–forebrain circuitry.


Brainard, M. S. and A. J. Doupe (2000). "Auditory feedback in learning and maintenance of vocal behaviour." Nat Rev Neurosci 1(1): 31-40.

Dembo, A. and T. Kailath (1990). "Model-free distributed learning." IEEE Trans Neural Netw 1(1): 58-70.

Doupe, A. J. (1993). "A neural circuit specialized for vocal learning." Curr Opin Neurobiol 3(1): 104-111.

Farries, M. A. (2004). "The avian song system in comparative perspective." Ann N Y Acad Sci 1016: 61-76.

Fee, M. S., A. A. Kozhevnikov and R. H. Hahnloser (2004). "Neural mechanisms of vocal sequence generation in the songbird." Ann N Y Acad Sci 1016: 153-170.Glaze, C. M. and T. W. Troyer (2007). "Behavioral measurements of a temporally precise motor code for birdsong." J Neurosci 27(29): 7631-7639.

Hahnloser, R. H., A. A. Kozhevnikov and M. S. Fee (2002). "An ultra-sparse code underlies the generation of neural sequences in a songbird." Nature 419(6902): 65-70.

Kozhevnikov, A. A. and M. S. Fee (2007). "Singing-related activity of identified HVC neurons in the zebra finch." J Neurophysiol 97(6): 4271-4283.

Leonardo, A. and M. S. Fee (2005). "Ensemble coding of vocal control in birdsong." J Neurosci 25(3): 652-661.

Seung, H. S. (2003). "Learning in spiking neural networks by reinforcement of stochastic synaptic transmission." Neuron 40(6): 1063-1073.

Suthers, R. A. and D. Margoliash (2002). "Motor control of birdsong." Curr Opin Neurobiol 12(6): 684-690.

Todorov, E. (2000). "Direct cortical control of muscle activation in voluntary arm movements: a model." Nat Neurosci 3(4): 391-398.

Vicario, D. S. and F. Nottebohm (1988). "Organization of the zebra finch song control system: I. Representation of syringeal muscles in the hypoglossal nucleus." J Comp Neurol 271(3): 346-354.

Yu, A. C. and D. Margoliash (1996). "Temporal hierarchical control of singing in birds." Science 273(5283): 1871-1875.



One of the most interesting non-primate is the octopus. The most interesting feature about this non-primate is its arm movement. In these invertebrates, the control of the arm is especially complex because the arm can be moved in any direction, with a virtually infinite number of degrees of freedom. In the octopus, the brain only has to send a command to the arm to do the action—the entire recipe of how to do it is embedded in the arm itself. Observations indicate that octopuses reduce the complexity of controlling their arms by keeping their arm movements to set, stereotypical patterns. To find out if octopus arms have minds of their own, the researchers cut off the nerves in an octopus arm from the other nerves in its body, including the brain. They then tickled and stimulated the skin on the arm. The arm behaved in an identical fashion to what it would in a healthy octopus. The implication is that the brain only has to send a single move command to the arm, and the arm will do the rest.

In this chapter we discuss in detail the sensory system of an octopus and focus on the sensory motor system in this non-primate.

Octopus - The intelligent non-primate

The Common Octopus, Octopus vulgaris.

Octopuses have two eyes and four pairs of arms, and they are bilaterally symmetric. An octopus has a hard beak, with its mouth at the center point of the arms. Octopuses have no internal or external skeleton (although some species have a vestigial remnant of a shell inside their mantle), allowing them to squeeze through tight places. Octopuses are among the most intelligent and behaviorally flexible of all invertebrates.

The most interesting feature of the octopuses is their arm movements. For goal directed arm movements, the nervous system in octopus generates a sequence of motor commands that brings the arm towards the target. Control of the arm is especially complex because the arm can be moved in any direction, with a virtually infinite number of degrees of freedom. The basic motor program for voluntary movement is embedded within the neural circuitry of the arm itself.[2]

Arm Movements in Octopus

In the hierarchical organization in octopus, the brain only has to send a command to the arm to do the action. The entire recipe of how to do it is embedded in the arm itself. By the use of the arms octopus walks, seizes its pray, or rejects unwanted objects and also obtains a wide range of mechanical and chemical information about its immediate environment.

Octopus arms, unlike human arms, are not limited in their range of motion by elbow, wrist, and shoulder joints. To accomplish goals such as reaching for a meal or swimming, however, an octopus must be able to control its eight appendages. The octopus arm can move in any direction using virtually infinite degrees of freedom. This ability results from the densely packed flexible muscle fibers along the arm of the octopus.

Observations indicate that octopuses reduce the complexity of controlling their arms by keeping their arm movements to set, stereotypical patterns.[3] For example, the reaching movement always consists of a bend that propagates along the arm toward the tip. Since octopuses always use the same kind of movement to extend their arms, the commands that generate the pattern are stored in the arm itself, not in the central brain. Such a mechanism further reduces the complexity of controlling a flexible arm. These flexible arms are controlled by an elaborate peripheral nervous system containing 5 × 107 neurons distributed along each arm. 4 × 105 of these are motor neurons, which innervate the intrinsic muscles of the arm and locally control muscle action.

Whenever it is required, the nervous system in octopus generates a sequence of motor commands which in turn produces forces and corresponding velocities making the limb reach the target. The movements are simplified by the use of optimal trajectories made through vectorial summation and superposition of basic movements. This requires that the muscles are quite flexible.

The Nervous System of the Arms

The eight arms of the octopus are elongated, tapering, muscular organs, projecting from the head and regularly arranged around the mouth. The inner surface of each arm bears a double row of suckers, each sucker alternating with that of the opposite row. There are about 300 suckers on each arm.[4]

The arms perform both motor and sensory functions. The nervous system in the arms of the octopus is represented by the nerve ganglia, subserving motor and inter-connecting functions. The peripheral nerve cells represent the sensory systems. There exists a close functional relationship between the nerve ganglia and the peripheral nerve cells.

General anatomy of the arm

The muscles of the arm can be divided into three separate groups, each having a certain degree of anatomical and functional independence:

  1. Intrinsic muscles of the arm,
  2. Intrinsic muscles of the suckers, and
  3. Acetabulo-brachial muscles (connects the suckers to the arm muscles).

Each of these three groups of muscles comprises three muscle bundles at right angles to one another. Each bundle is innervated separately from the surrounding units and shows a remarkable autonomy.In spite of the absence of a bony or cartilaginous skeleton, octopus can produce arm movements using the contraction and relaxation of different muscles. Behaviorally, the longitudinal muscles shorten the arm and play major role in seizing objects carrying them to mouth, and the oblique and transverse muscles lengthen the arms and are used by octopus for rejecting unwanted objects.

Cross section of an octopus arm: The lateral roots innervate the intrinsic muscles, the ventral roots the suckers.

Six main nerve centers lie in the arm and are responsible for the performance of these sets of muscles. The axial nerve cord is by far the most important motor and integrative center of the arm. The eight cords one in each arm contains altogether 3.5 × 108 neurons. Each axial cord is linked by means of connective nerve bundles with five sets of more peripheral nerve centers, the four intramuscular nerve cords, lying among the intrinsic muscles of the arm, and the ganglia of the suckers, situated in the peduncle just beneath the acetabular cup of each sucker.

All these small peripheral nerves contain motor neurons and receive sensory fibers from deep muscle receptors which play the role of local reflex centers. The motor innervation of the muscles of the arm is thus provided not only by the motor neurons of the axial nerve cord, which receives pre-ganglionic fibers from the brain, but also by these more peripheral motor centers.

Sensory Nervous system

The arms contain a complex and extensive sensory system. Deep receptors in the three main muscle systems of the arms, provide the animal with a widespread sensory apparatus for collecting information from muscles. Many primary receptors lie in the epithelium covering the surface of the arm. The sucker, and particularly its rim, has the greatest number of these sensory cells, while the skin of the arm is rather less sensitive. Several tens of thousands of receptors lie in each sucker.

Three main morphological types of receptors are found in arms of an octopus. These are round cells, irregular multipolar cells, and tapered ciliated cells. All these elements send their processes centripetally towards the ganglia. The functional significance of these three types of receptors is still not very well known and can only be conjectured. It has been suggested that the round and multipolar receptors may record mechanical stimuli, while ciliated receptors are likely to be chemo-receptors.

The ciliated receptors do not send their axons directly to the ganglia but the axons meet encapsulated neurons lying underneath the epithelium and make synaptic contacts with the dendritic processes of these. This linkage helps in reduction of input between primary nerve cells. Round and multipolar receptors on the other hand send their axons directly to the ganglia where the motor neurons lie.

Functioning of peripheral nervous system in arm movements

Behavioral experiments suggest that information regarding the movement of the muscles does not reach the learning centers of the brain, and morphological observations prove that the deep receptors send their axons to peripheral centers such as the ganglion of the sucker or the intramuscular nerve cords.[5] The information regarding the stretch or movement of the muscles is used in local reflexes only.

When the dorsal part of the axial nerve cord that contains the axonal tracts from the brain is stimulated by electrical signals, movements in entire arm are still noticed. The movements are triggered by the stimulation which is provided and is not directly driven by the stimuli coming from the brain. Thus, arm extensions are evoked by stimulation of the dorsal part of the axial nerve cord. In contrast, the stimulation of the muscles within the same area or the ganglionic part of the cord evokes only local muscular contractions. The implication is that the brain only has to send a single move command to the arm, and the arm will do the rest.

A dorsally oriented bend propagates along the arm causing the suckers to point in the direction of the movement. As the bend propagates, the part of the arm proximal to the bend remains extended. For further conformations that an octopus arm has a mind of its own, the nerves in an octopus arm have been cut off from the other nerves in its body, including the brain. Movements resembling normal arm extensions were initiated in amputated arms by electrical stimulation of the nerve cord or by tactile stimulation of the skin or suckers.

It has been noted that the bend propagations are more readily initiated when a bend is created manually before stimulation. If the fully relaxed arm is stimulated, the initial movement is triggered by the stimuli, which follows the same bend propagation. The nervous system of the arm thus, not only drives local reflexes but controls complex movements involving the entire arm.

These evoked movements are almost kinematically identical to the movements of freely behaving octopus. When stimulated, a severed arm shows an active propagation of the muscle activity as in natural arm extensions. Movements evoked from similar initial arm postures result in similar paths, while different starting postures result in different final paths.

As the extensions evoked in denervated octopus arms are qualitatively and kinematically similar to natural arm extensions, an underlying motor program seems to be controlling the movements which are embedded in the neuromuscular system of the arm, which does not require central control.


Fish are aquatic animals with great diversity. There are over 32’000 species of fish, making it the largest group of vertebrates.

The lateral line sensory organ shown on a shark.

Most fish possess highly developed sense organs. The eyes of most daylight dwelling fish are capable of color vision. Some can even see ultra violet light. Fish also have a very good sense of smell. Trout for example have special holes called “nares” in their head that they use to register tiny amounts of chemicals in the water. Migrating salmon coming from the ocean use this sense to find their way back to their home streams, because they remember what they smell like. Especially ground dwelling fish have a very strong tactile sense in their lips and barbels. Their taste buds are also located there. They use these senses to search for food on the ground and in murky waters.

Fish also have a lateral line system, also known as the lateralis system. It is a system of tactile sense organs located in the head and along both sides of the body. It is used to detect movement and vibration in the surrounding water.


Fish use the lateral line sense organ to sense prey and predators, changes in the current and its orientation and they use it to avoid collision in schooling.

Coombs et al. have shown [1] that the lateral line sensory organ is necessary for fish to detect their prey and orient towards it. The fish detect and orient themselves towards movements created by prey or a vibrating metal sphere even when they are blinded. When signal transduction in the lateral lines is inhibited by cobalt chloride application, the ability to target the prey is greatly diminished.

The dependency of fish on the lateral line organ to avoid collisions in schooling fish was demonstrated by Pitcher et al. in 1976, where they show that optically blinded fish can swim in a school of fish, while those with a disabled lateral line organ cannot [2].


The lateral lines are visible as two faint lines that run along either side of the fish body, from its head to its tail. They are made up of a series of mechanoreceptor cells called neuromasts. These are either located on the surface of the skin or are, more frequently, embedded within the lateral line canal. The lateral line canal is a mucus filled structure that lies just beneath the skin and transduces the external water displacement through openings from the outside to the neuromasts on the inside. The neuromasts themselves are made up of sensory cells with fine hair cells that are encapsulated by a cylindrical gelatinous cupula. These reach either directly into the open water (common in deep sea fish) or into the lymph fluid of the lateral line canal. The changing water pressures bend the cupula, and in turn the hair cells inside. Similar to the hair cells in all vertebrate ears, a deflection towards the shorter cilia leads to a hyperpolarization (decrease of firing rate) and a deflection in the opposite direction leads to depolarization (increase of firing rate) of the sensory cells. Therefore the pressure information is transduced to digital information using rate coding that is then passed along the lateral line nerve to the brain. By integrating many neuromasts through their afferent and efferent connections, complex circuits can be formed. This can make them respond to different stimulation frequencies and consequently coding for different parameters, like acceleration or velocity [3].

Some scales of the lateral line (center) of a Rutilus rutilus

Sketch of the anatomy of the lateral line sensory system.

In sharks and rays, some neuromasts have undergone an interesting evolution. They have evolved into electroreceptors called ampullae of Lorenzini. They are mostly concentrated around the head of the fish and can detect a change of electrical stimuli as small as 0.01 microvolt [4]. With this sensitive instrument these fish are able to detect tiny electrical potentials generated by muscle contractions and can thus find their prey over large distances, in murky waters or even hidden under the sand. It has been suggested that sharks also use this sense for migration and orientation, since the ampullae of Lorenzini are sensitive enough to detect the earth’s electromagnetic field.

Convergent Evolution


Cephalopods such as squids, octopuses and cuttlefish have lines of ciliated epidermal cells on head and arms that resemble the lateral lines of fish. Electrophysiological recordings from these lines in the common cuttlefish (Sepia officinalis) and the brief squid (Lolliguncula brevis) have identified them as an invertebrate analogue to the mechanoreceptive lateral lines of fish and aquatic amphibians [5].


Another convergence to the fish lateral line is found in some crustaceans. Contrary to fish, they don’t have the mechanosensory cells on their body, but have them spaced at regular intervals on long trailing antennae. These are held parallel to the body. This forms two ‘lateral lines’ parallel to the body that have similar properties to those of fish lateral lines and are mechanically independent of the body [6].


In aquatic manatees the postcranial body bears tactile hairs. They resemble the mechanosensory hairs of naked mole rats. This arrangement of hair has been compared to the fish lateral line and complement the poor visual capacities of the manatees. Similarly, the whiskers of harbor seals are known to detect minute water movements and serve as a hydrodynamic receptor system. This system is far less sensitive than the fish equivalent. [7]



Halteres of the Crane fly

Halteres are sensory organs present in many flying insects. Widely thought to be an evolutionary modifcation of the rear pair of wings on such insects, halteres provide gyroscopic sensory data, vitally important for flight. Although the fly has other relevant systems to aid in flight, the visual system of the fly is too slow to allow for rapid maneuvers. Additionally, to be able to fly adeptly in low light conditions, a requirement to avoid predation, such a sensory system is necessary. Indeed, without halteres, flies are incapable of sustained, controlled flight. Since the 18th century, scientists have been aware of the role halteres play in flight, but it was only recently that the mechanisms by which they operate have been better explored. [6] [7]


The haltere evolved from the rearmost of two pairs of wings. While the first has maintained its usage for flight, the posterior pair has lost its flight functions and has adopted a slightly different shape. The haltere is visually comprised of three structural components: a knob-shaped end, a thin shaft, and a slightly wider base. The knob contains approximately 13 innervated hairs, while the base contains two chordotonal organs, each innervated by about 20-30 nerves. Chordotonal organs are sense organs thought to be solely responsive to extension, though they remain relatively unknown. The base is also covered by around 340 campaniform sensilla, which are small fibers which respond preferentially to compression in the direction in which they are elongated. Each of these fibers is also innervated. Relative to the stalk of the haltere, both the chordotonal organs and the campaniform sensilla have an orientation of approximately 45 degrees, which is optimal for measuring bending forces on the haltere. The halteres move contrary (anti-phase) to the wings during flight. The sensory components can be categorized into three groups [8]): those sensitive to vertical oscillations of the haltere, including the dorsal and ventral scapal plates, dorsal and ventral Hicks papillae (both the plates and papillae are subcategories of the aforementioned campaniform sensilla), and the small chordotonal organ. The basal plate (another manifestation of the sensilla) and the large chordotonal organ are sensitive to gyroscopic torque acting on the haltere, and there is also a population of undifferentiated papillae which are responsive to all strains acting on the base of the haltere. This provides an additional method for flies to distinguish between the direction of force being applied to the haltere.


As Homeobox genes were being discovered and explored for the first time, it was found that the deletion or inactivation of the Hox gene Ultrabithorax (Ubx) causes the halteres to develop into a normal pair of wings. This was a very compelling early result as to the nature of Hox genes. Manipulations to the Antennapedia gene can similarly cause legs to become severely deformed, or can cause a set of legs to develop instead of antennae on the head.


The halteres function by detecting Coriolis forces, sensing the movement of air across the potentially rotating fly body. Studies have indicated that the angular velocity of the body is encoded by the Coriolis forces measured by the halteres [8]. Active halteres can recruit any neighboring units, influencing nearby muscles and causing dramatic changes in the flight dynamics. Halteres have been shown to have extremely fast response times, allowing these flight changes to be performed much more quickly than if the fly were to rely on its visual system. In order to distinguish between different rotational components, such as pitch and roll, the fly must be able to combine signals from the two halteres, which must not be coincident (coincident signals would diminish the ability of the fly to differentiate the rotational axes). The halteres are capable of contributing to image stabilization, as well as in-flight attitude control, which was established by numerous authors noting a reaction from the head and wings to inputs from the components of the rotation rate vector. contributions from halteres to head and neck movements have been noted, explaining their role in gaze stabilization. The fly therefore uses input from the halteres to establish where to fixate its gaze, an interesting integration of the two senses.


Recordings have indicated that halteres are capable of responding to stimuli at the same (double-wingbeat) frequency as Coriolis forces, the proof of concept that allows further mathematical analysis of how these measurements can occur. The vector cross-product of the halteres' angular velocity and the rotation of the body provide the Coriolis force vector to the fly. This force is at the same frequency as the wingbeat in both the pitch and roll planes, and is doubly fast in the yaw plane. Halteres are capable of providing a rate damping signal to affect rotations. This is because the Coriolis force is proportional to the fly's own rotation rate. By measuring the Coriolis force, the halteres can send an appropriate signal to their affiliated muscles, allowing the fly to properly control its flight. The large amplitude of haltere motion allows for the calculation of the vertical and horizontal rates of rotation. Because of the large disparity in haltere movement between vertical and horizontal movement, Ω1, the vertical component of the rotation rate, generates a force of double the frequency of the horizontal component. It is widely thought that this twofold frequency difference is what allows the fly to distinguish between the vertical and horizontal components. If we assume that the haltere moves sinusoidally, a reasonably accurate approximation of its real-world behavior, the angular position γ can be modeled as: 
 \gamma = \frac{\pi}{ 2}\sin(\omega t)
where ω is the haltere beat frequency, and the amplitude is 180, a close approximation to the real life range of motion. The body rotational velocities can be computed, given the known rates (the roll, pitch, and yaw components are labeled below with 1, 2, and 3, respectively) from the two halteres' (Ωb being the left and Ωc being the right haltere) reference frames, respective to the body of the fly with the following calculations [7]:

W_{1} = - \frac{\Omega_{b3} + \Omega_{c3} }{2\sin(\alpha)}

W_{2} = \frac{\Omega_{b3} - \Omega_{c3} }{2\cos(\alpha)}

W_{3} = - \frac{\Omega_{b1} + \Omega_{c1} }{2}

α represents the haltere angle of rotation from the body plane, and the Ω terms are, as mentioned, the angular velocity of the haltere with respect to the body. Knowing this, one could roughly simulate input to the halteres using the equation for forces on the end knob of a haltere:

F = mg - ma_{i} - ma_{F} - m\dot{\Omega_{i}}\times r_{i} -m\Omega_{i}\times (\Omega_{i}\times r_{i} ) - 2m\Omega_{i} \times v_{i}

m is the mass of the knob of the haltere, g is the acceleration due to gravity, ri, vi,} and ai are the position, velocity, and acceleration of the knob relative to the body of the fly in the i direction, aF is the fly's linear acceleration, and Ωi and Ώi are the angular velocity and acceleration components for the direction i, respectively, of the fly in space. The Coriolis force is simulated by the 2mΩ × vi term. Because the sensory signal generated is proportional to the forces exerted on the halteres, this would allow the haltere signal to be simulated. If attempting to reconcile the force equation with the rotational component equations, it is worthwhile to remember that the force equation must be calculated separately for both halteres.


The sense of balance of butterflies sits at the base of the antennae.

Butterflies and moth keep their balance with Johnston's organ: this is an organ at the base of a butterfly's antennae, and is responsible for maintaining the butterfly's sense of balance and orientation, especially during flight.

Johnston's organ


The perception of sound for some insects is important for mating behavior, e.g. Drosophila [9]. The ability of hearing in Insecta and Crustacea is given by chordotonal organs: mechanoreceptors, which respond to mechanical deformation [10]. These chordotonal organs are widely distributed throughout the insect’s body and differ in their function: proprioceptors are sensitive to forces generated by the insect itself and exteroreceptors to external forces. These receptors allow detection of sound via the vibrations of particles when sound is transmitted though a medium such as air or water. Far-field sounds refer to the phenomenon when air particles transmit the vibration as a pressure change over a long distance from the source. Near-field sounds refer to sound close to the source, where the velocity of the particles can move lightweight structures. Some insects have visible hearing organs such as the ears of noctuoid moths, whereas other insects lack a visible auditory organ, but are still able to register sound. In these insects the "Johnston's Organ" plays an important role for hearing.

Johnston's organ

The Johnston’s Organ (JO) is a chordotonal organ present in most insects. Christopher Johnston was the first who described this organ in mosquitoes, thus the name Johnston’s Organs [11]. Quarterly Journal of Microscopical Science. 1855, Vols. s1-3, 10, pp. 97-102.. This organ is located at the stem of the insect’s antenna. It has developed the highest degree of complexity in the Diptera (two-wings), for which hearing is of particular importance [10]. The JO consists of organized base sensory units called scolopidia (SP). The number of scolopidia varies among the different animals. JO has various mechanosensory functions, such as detection of touch, gravity, wind and sound, for example in honeybees JO (≈ 300 SPs) is responsible to detect sound coming from another “dancing” honeybee [12]. In male mosquitoes (≈ 7000 SPs) JO is used to detect and locate female flight sound for mating behavior [13]. . The antenna of these insects is specialized to capture near-field sound. It acts as a physical mechanotransducer.

Anatomy of the Johnston’s Organ

A typical insect antenna has three basic segments: the scape (base), the pedicel (stem) and the flagellum [14]. Some insects have a bristle at the third segment called an arista. Figure 1 shows the Drosophila antenna. For the Drosophila the antenna segment a3 fits loosely into the sockets on segment a2 and can rotate when sound energy is absorbed [15]. This leads to stretching or compression of JO neurons of the scolopidia. In Diptera the JO scolopidia are located in the second antennal segment a2 the pedicel (Yack, 2004). JO is not only associated with sound perception (exteroreceptor), it can also function as a proprioceptors giving information on the orientation and position of the flagellum relative to the pedicel [16].

Figure 1: Left: Frontal view of the Drosophila antenna. The scolopidia in the second segment (a2, pedicel) with their neurons are illustrated. Sound energy absorption leads to vibration of the arista and rotation of the third segment a3. The rotation leads to deformation of the scolopidia, leading to activation or deactivation. Right: The antenna located on the head of the Drosophila is shown. (adapted from [15]).

Structure of a Scolopedia

A scolopidia is the base sensory unit of the JO. A scolopidia comprises four cell types [10]: (1) one or more bipolar sensory cell neurons, each with a distal dendrite; (2) a scolopale cell enveloping the dendrite; (3) one or more attachment cells associated with the distal region of the scolopale cell; (4) one or more glial cells surrounding the proximal region of the sensory neuron cell body. The scolopale cell surrounds the sensory dendrite (cilium) and forms with this the scolopale lumen / receptor lymph cavity. The scolopale lumen is tightly sealed. The cavity is filled with a lymph, which is thought to have high potassium content and low sodium content, thus closely resembling the endolymph in the cochlea of mammals. Scolopidia are classified according different criteria. The cap cell produces an extracellular cap, which envelopes the cilia tips and connects them to the third antennal segment a3 [17].

Type 1 and Type 2 scolopidia differ by the type of ciliary segment in the sensory cell. In Type 1 the cilium is of uniform diameter, except for a distal dilation at around 2/3 along its length. The cilium inserts into a cap rather than into a tube. In Type 2 the ciliary segment has an increasing diameter into a distal dilation, which can be densely packed with microtubules. The distal part ends in a tube. Mononematic and amphinematic scolopidia differ by the extracellular structure associated with the scolopale cell and the dendritic cilium. Mononematic scolopidia have the dendritic tip inserted into a cap shape which is an electron dense structure. In amphinematic scolopidia the tip is enveloped by an electron-dense tube. Monodynal and Heterodynal scolopidia are distinguished in their number of sensory neurons. Monodynal scolopidia have a single sensory cell and heterodynal ones have more than one.

JO studied in the fruit fly (Drosophila melanogaster)

The JO in Drosophila consists of an array of approximately 277 scolopidia located between the a2/a3 joint and the a2 cuticle (a type of an outer tissue layer) [18]. The scolopidia in Drosophila are mononematic [15]. Most are heterodynal and contain two or three neurons, thus the JO comprises around 480 neurons. It is the largest mechanosensory organ of the fruit fly [9]. Perception by JO of male Drosophila courtship songs (produced by their wings) makes females reduce locomotion and males to chase each other forming courtship chains [19]. JO is not only important to perceive sound, but also to gravity [20] and wind [21] sensing. Using GAL4 enhancer trap lines in the JO showed that JO neurons of flies can be categorized anatomically into five subgroups, A-E [18]. Each has a different target area of the antennal mechanosensory and motor centre (AMMC) in the brain (see Figure 2). Kamikouchi et al. showed that the different subgroups are specialized to distinct types of antennal movement [9]. Different groups are used for sound and gravity response.

Neural activities in the JO

To study JO neurons activities it is possible to observe intracellular calcium signals in the neurons caused by antenna movement [9]. Furthermore flies should be immobilized (e.g. by mounting on a coverslip and immobilizing the second antennal segment to prevent muscle-caused movements). The antenna can be actuated mechanically using an electrostatic force. The antenna receiver vibrates when sound energy is absorbed and deflects backwards and forwards when the Drosophila walks. Deflecting and vibrating the antenna yields different activity patterns in the JO neurons: deflecting the receiver backwards with a constant force gives negative signals in the anterior region and positive ones in the posterior region of the JO. Forward deflection produces the opposite behavior. Courtship songs (pulse song with a dominant frequency of ≈ 200Hz) evoke broadly distributed signals. The opposite patterns for the forward and backward deflection reflect the opposing arrangements of the JO neurons. Their dendrites connect to anatomically distinct sides of the pedicel: the anterior and posterior sides of the receiver. Deflecting the receiver forwards stretches the JO neurons in the anterior region and compresses neurons in the posterior one. From this is can be concluded that JO neurons are activated (i.e. depolarized) by stretch and deactivated (i.e. hyperpolarized) by compression.

Different JO neurons

A JO neuron usually targets only one zone of the AMMC, and neurons targeting the same zone are located in characteristic spatial regions within JO [18]. Similar projecting neurons are organized into concentric rings or paired clusters (see Figure 2A).

Vibration sensitive neurons for sound perception

A and B neurons (AB) were activated maximally by receiver vibration between 19 Hz and 952 Hz. This response was frequency dependent. Subgroup B showed larger response to low-frequency vibrations. Thus subgroup A is responsible for the high-frequency responses.

Deflection sensitive neurons for gravity and wind perception

C and E showed maximal activity for static receiver deflection. Thus these neurons provide information about the direction of a force. They have a larger displacement threshold of the arista than the neurons of AB [21]. Nevertheless CE neurons can respond to small displacement of the arista (e.g. gravitational force): gravity displaces the arista-tip by 1 µm (see S1 of [9]). They also respond to larger displacement caused by air-flow (e.g. wind) [21]. Zone C and E neurons showed distinct sensitivity to air flow direction, which causes deflection of the arista in different directions. Air flow applied to the front of the head resulted in strong activation in zone E and little activation in zone C. Air flow applied from the rear showed the opposite result. Air flow applied to the side of the head yielded in zone C in ipsilaterally activation and in zone E in contralaterally one. The different activation allows the Drosophila to sense from which direction the wind comes. It is not known whether the same subgroups-CE neurons mediate wind and gravity detection or if there are more sensitive CE neurons for gravity detection and less sensitive CE neurons for wind detection [9]. A proof that wild-type Drosophila melanogaster can perceive gravity is that the flies tend to fly upwards against the force vector of gravitation (negative gravitaxis) after getting shaken in a test tube. When the antennal aristae were ablated this negative gravitaxis behavior vanished, but not the phototaxis behavior (flies fly towards light source). Removing also the second segment, i.e. where the JO is located, the negative gravitaxis behavior came present again. This shows that when JO is lost, Drosophila can still perceive gravitational force through other organs, for example mechanoreceptors on neck or legs. These receptors were shown to be responsible for gravity sensing in other insect species [22].

Silencing specific neurons

It is possible to silence selectively subgroups of JO neurons using tetanus toxin combined with subgroup-specific GAL4 drivers and tubulin-GAL80. The latter is a temperature-sensitive GAL4 blocker. With this it could be confirmed that neurons of subgroup CE are responsible for gravitaxis behavior. Elimination of neurons of subgroups CE did not impair the ability of hearing [21]. Silencing subgroup B impaired the male’s response to courtship songs, whereas silencing groups CE or ACE did not [9]. Since subgroup A was found to be involved in hearing (see above) this result was unexpected. From different experiment, in which the sound-evoked compound action potential (sum of action potentials) were investigated the conclusion was drawn that subgroup A is required for nanometer-range receiver vibrations as imposed by faint songs of courting males.

Figure 2: A) left: Neurons of different subgroups A-E are illustrated in the JO. right: The corresponding target zones of the subroups are shown in the AMMC. B) Simplified circuitry of the auditory (zone AB) and deflection sensitive (zone CE) system. These two systems separated similar as in vertebrates. Neurons of zone A have target zones in the AMMC, vlprs and SOG. Vlpr stands for ventrolateral protocerebrum, SOG for suboesophageal ganglion, A for anterior, D for dorsal, M for medial. (adapted from Figure 2.10 of [15]).

Origin of difference of the subgroups

As mentioned above the anatomically different subgroups of JO neurons have different functions [9]. The neurons do attach to the same antennal receiver, but they differ in opposing connection sites on the receiver. Thus for e.g. forward deflection some neurons get stretched whereas others get compressed, which yields different response characteristics (opposing calcium signals). The difference for vibration- and deflection-sensitive neurons may come from distinct molecular machineries for transduction (i.e. adapting or non-adapting channels and NompC-dependent or not). Sound-sensitive neurons express the mechanotransducer channel NompC (no mechanoreceptor potential C, also known as TRPN1) channel whereas subgroups CE are independent of NompC [9]. In addition JO neurons of subgroup AB transduce dynamic receiver vibrations, but adapt fast for static receiver deflection (i.e. they respond phasically) [23]. Neurons of subgroups CE showed a sustained calcium signal response during the static deflection (i.e. they respond tonically). The two distinct behaviors show that there are transduction channels with distinct adaption characteristics, which is also known for the mammalian cochlea or mammalian skin (i.e. tonically activated Merkel calls and rapidly adapting Meissner’s corpuscles) [21].

Differences in gravitation and sound perception in the brain

Neurons of subgroups A and B target on one side zones of the primary auditory centre in the AMMC and on the other side the inferior part of ventrolateral protocerebrum (VLP) (see Figure 2B)). These zones show many commissural connections between themselves and with the VLP. For neurons of subgroups CE almost no commissural connection between the target zones were found, nor connections to the VLP. Neurons associated with the zones of subgroup CE descended or ascended from the thoracic ganglia. This difference in the AB and CE neurons projection reminds strongly on the separate vertebrate projection of the auditory and vestibular pathways in mammals [15].

Johnston’s Organ in honeybees

Solitary bee (Anthidium florentinum): the Johnston's organs on the head are head are clearly visible.

The JO in bees is also located in the pedicel of the antenna and used to detect near field sounds [12]. In a hive some bees perform a waggle dance, which is believed to inform conspecifics about the distance, direction and profitability of a food source. Followers have to decode the message of the dance in the darkness of the hive, i.e. visual perception is not involved in this process. Perception of sound is a possible way to get the information of the dance. The sound of a dancing bee has a carrier frequency of about 260 Hz and is produced by wing vibrations. Bees have various mechanosensors, such as hairs on the cuticle or bristles on the eyes. Dreller et al. found that the mechanosensors in JO are responsible for sound perception in bees [12]. Nevertheless hair sensors could still be involved in detection of further sound-sources, when the amplitude is too low to vibrate the flagellum. Dreller et al. trained bees to associate sound signals with a sucrose reward. After the bees were trained some of the mechanosensors were abolished on different bees. Then the bee’s ability to associate the sound with the reward was tested again. Manipulating the JO yielded loss of the learnt skill. Training could be done with a frequency of 265 Hz, but also of 10 Hz, which shows that JO is also involved in low-frequency hearing. Bees with only one antenna made more mistakes, but were still better than bees that had ablated both antennas. Two JO in each antenna could help followers to calculate the direction of the dancing bee. Hearing could also be used by bees in other contexts, e.g. to keep a swarming colony together. The decoding of the waggle dance is not only done by auditory perception, but also or even more by electric field perception. JO in bees allows detection of electric fields [24]. If body parts are moved together, bees accumulate electric charge in their cuticle. Insects respond to electric fields, e.g. by a modified locomotion (Jackson, 2011). Surface charge is thought to play a role in pollination, because flowers are usually negatively charged and arriving insects have a positive surface charge [24]. This could help bees to take up pollen. By training bees to static and modulated electric fields, Greggers et al. showed that bees can perceive electric fields [24]. Dancing bees produce electric fields, which induce movements of the flagellum 10 times more strongly than the mechanical stimulus of wing vibrations alone. The vibrations of the flagellum in bees are monitored with JO, which responds to displacement amplitudes induced by oscillation of a charged wing. This was proven by recording compound action potential responses from JO axons during electric field stimulation. Electric field reception with JO does not work without antenna. Whether also other non-antennal mechanoreceptors are involved in electric field reception has not been excluded. The results of Greggers et al. suggest that electric fields (and with it JO) are relevant for social communication in bees.

Importance of JO (and chordotonal organs in general) for research

Chordotonal organs, like JO, are only found in Insecta and Crustacea [10]. Chordotonal neurons are ciliated cells [25]. Genes that encode proteins needed for functional cilia are expressed in chordotonal neurons. Mutations in the human homologues result in genetic diseases. Knowledge of the mechanisms of ciliogenesis can help to understand and treat human diseases which are caused by defects in the formation or function of human cilia. This is because the process of controlling neuronal specification in insects and in vertebrates is based on highly conserved transcription factors, which is shown by the following example: Atonal (Ato), a proneural transcription factor, specifies chordotonal organ formation. The mouse orthologue Atoh1 is necessary for hair cell development in the cochlea. Mice which expressed a mutant Atoh1 phenotype, which are deaf, can be cured by the atonal gene of Drosophila. Studying chordotonal organs in insects can lead to more insights of mechanosensation and cilia construction. Drosophila is a versatile model to study the chordotonal organs [26]. The fruit fly is easy and inexpensive to culture, produces large numbers of embryos, can be genetically modified in numerous ways and has a short life cycle, which allows investigating several generations within a relative short time. In addition comes that most of the fundamental biological mechanisms and pathways that control development and survival are conserved across Drosophila and other species, such as humans.

Spider´s Visual System


While the highly developed visual systems of some spider species have been subject to extensive studies since many decades, terms like animal intelligence or cognition were not usually used in the context of spider studies. Instead, spiders were traditionally portrayed as rather simple, instinct driven animals (Bristowe 1958, Savory 1928), processing visual input in pre-programmed patterns rather than actively interpreting the information received from their visual apparatus towards appropriate reactions. While Although this still seems to be the case in a majority of spiders, which primarily interact with the world through tactile sensation rather than by visual cues, some spider species have shown surprisingly intelligent use of their eyes. Considering its limited dimensions within the body, a spider´s optical apparatus and visual processing perform extremely well.[27] Recent research points towards a very sophisticated use of visual cues in a spider´s world when investigating topics such as the complex hunting schemes of the vision-guided jumping spiders (Salticidae) taking huge leaps of up to 30 times their own body length onto prey or a wolf spider´s (Lycosidae) ability to visually recognize asymmetries in potential mates. Even in the case of the night-active Cupiennius salei (Ctenidae), relying primarily on other sensory organs, or the ogre-faced Dinopis hunting at night by spinning small webs and throwing them at approaching prey, the visual system is still highly developed. Findings like these are not only fascinating but are also inspiring other scientific and engineering fields such as robotics and computer-guided image analysis.

General structure of a spider´s visual system

Spider internal anatomy - altered description.jpg

A spider´s anatomy primarily consists of two major body segments, the prosoma and the opisthosoma, which are also known as the cephalothorax and abdomen, respectively. All extremities as well as the sensory organs including the eyes are located in the prosoma. Other than the visual system of arthropods featuring compound eyes, modern arachnid eyes are ocelli (simple eyes consisting of a lens covering a vitreous fluid-filled pit with a retina at the bottom), of which spiders have six or eight, characteristically arranged in three or four rows across the prosoma´s carapace. Overall, 99% of all spiders have eight eyes and of the remaining 1% almost all have six. Spiders with only six eyes lack the “principal eyes”, which are described in detail below.

The pairs of eyes are called anterior median eyes (AME), anterior lateral eyes (ALE), posterior median eyes (PME), and posterior lateral eyes (PLE). The large principal eyes facing forward are the anterior median eyes, which provide the highest spatial resolution to a spider, at the cost of a very narrow field of view. The smaller forward-facing eyes are the anterior lateral eyes with a moderate field of view and medium spatial resolution. The two posterior eye pairs are rather peripheral, secondary eyes with wide field of view. They are extremely sensitive and suitable for low-light conditions. Spiders use their secondary eyes for sensing motion, while their principal eyes allow shape and object recognition. In contrast to insect vision, a visually-based spider´s brain is almost completely devoted to vision, as it receives only the optic nerves and consists of only the optic ganglia and some association centers. The brain is apparently able to recognize object motion, but even more to also classify the counterpart into a potential mate, rival or prey by seeing legs (lines) at a particular angle to the body. Such stimulus will result in a spider displaying either courtship or threatening signs respectively.

A Spider´s eyes

Although spider eyes may be described as “camera eyes”, they are very different in their details from the “camera eyes” of mammals or any other animals. In order to fit a high-resolution eye into such a small body, neither an insect´s compound eyes nor spherical eyes, as we humans have them, would solve the problem. The ocelli found in spiders are the optically better solution, as their resolution is not limited by refractive effects at the lens which would be the case with compound eyes. When replacing the eye of a spider by a compound eye of the same resolving power, it would simply not fit into the spider´s prosoma. By using ocelli, the spatial acuity of some spiders is more similar to that of a mammal than to that of an insect, with a huge size difference and only a few thousand photocells, e.g. in a jumping spider´s eye, as compared to more than 150 million photocells in the human retina.

Principal eyes

Salticid internal eye structure.png

The anterior median eyes (AME), which are present in most spider species, are also called the principal eyes. Details about the principal eye´s structure and its components are illustrated in the figure below and are explained in the following by going through the AME of the jumping spider Portia (family Salticidae), which is famous for its high-spatial-acuity eyes and vision-guided behavior despite its very small body size of 4.5-9.5 mm.

When a light beam enters the principal eye it firstly passes a large corneal lens. This lens features a long focal length enabling it to magnify even distant objects. The combined field of view of the two principal eyes´ corneal lenses would cover about 90° in front of the salticid spider, however a retina with the desired acuity would be too large to fit inside a spider´s eye. The surprising solution is a small, elongated retina, which lies behind a long, narrow tube and a second lens (a concave pit) at its end. Such combination of a corneal lens (with a long focal length) and a long eye tube (magnifying the image from the corneal lens) resembles a telephoto system, making the pair of principal eyes similar to a pair of binoculars.

The salticid spider captures light beams successively on four retina layers of receptors, which lie behind each other (in contrast, the human retina is arranged in only one plane). This structure allows not only a larger number of photoreceptors in a confined area but also enables color vision, as the light is split into different colours (chromatic aberration) by the lens system. Different wavelengths of light thus come into focus at different distances, which correspond to the positions of the retina´s layers. While salticids discern green (layer 1 – ~580 nm, layer 2 – ~520-540 nm), blue (layer 3 – ~480-500 nm) and ultraviolet (layer 4 – ~360 nm) using their principal eyes, it is only the two rearmost layers (layers 1 and 2) which allow shape and form detection due to their close receptor spacing.

As in human eyes, there is a central region in layer 1 called the “fovea”, where the inter-receptor spacing was measured to about 1 μm. This was found to be optimal, as the telephoto optical system provides images precise enough to be sampled in this resolution, but any closer spacing would reduce the retina´s sampling quality due to quantum-level interference between adjacent receptors. Equipped with such eyes, Portia exceeds any insect by far when it comes to visual acuity: While the dragonfly Sympetrum striolatus has the highest acuity known for insects (0.4°), the acuity of Portia is ten times higher (0.04°) with much smaller eyes. The human eye with 0.007° acuity is only five times better than Portia´s. With such visual precision, Portia would be technically able to discriminate two objects which are 0.12 mm apart from a distance of 200 mm. The spatial acuity of other salticid eyes is usually not far behind that of Portia.[28][29][30]

Principal eye retina movements

Such spectacular visual abilities come at a price within small animals as the jumping spiders: The retina in each of Portia´s principal eyes has only 2-5° field of view, while its fovea even captures only 0.6° field of view. This results from the principal retina having elongated boomerang-like shapes which span about 20° vertically and only 1° horizontally, corresponding to about six receptor rows. This severe limitation is compensated by sweeping the eye tube over the whole image of the scene using eye muscles, of which jumping spiders have six. These are attached to the outside of the principal eye tube and allow the same three degrees of freedom – horizontal, vertical, rotation – as in human eyes. Principal retinae can move by as much as 50° horizontally and vertically and rotate about the optical axis (torsion) by a similar amount.

Spiders making sophisticated use of visual cues move their principal eyes´ retinae either spontaneously, in “saccades” fixating the fovea on a moving visual target (“tracking”), or by “scanning”, which serves presumably for pattern recognition. It seems today, that spiders scan a scene sequentially by moving the eye-tube in complex patterns, allowing it to process high amounts of visual information despite their very limited brain capacities.

The spontaneous retinal movements, so-called “microsaccades”, are a mechanism thought to prevent the photoreceptor cells of the anterior-median eyes from adapting to a motionless visual stimulus. Cupiennius spiders, which feature 4 eye muscles - two dorsal and two ventral ones – continuously perform such microsaccades of 2° to 4° in the dorso-median direction, lasting about 80 ms (when fixed to a holder). The 2-4° of microsaccadic movements match closely to Cupiennius´ angle of about 3° between the receptor cells, supporting the idea of its function preventing adaption. In contrast, retinal movements elicited by mechanical stimulation (directing an air puff onto the tarsus of the second walking leg) can be considerably larger than the spontaneous retinal movements, with deflections up to 15°. Such stimulus increases eye muscle activity from being spontaneously active at 12 ± 1 Hz at the resting level to 80 Hz with the air puff stimulation applied. Active retinal movement of the two principal eyes is however never activated simultaneously during such experiments and no correlation exists between the two eyes regarding their direction either. These two mechanisms, spontaneous microsaccades as well as active “peering” by active retinal movement, seemingly allow spiders to follow and analyze stationary visual targets efficiently using only their principal eyes without reinforcing the saccadic movements by body movements.

However, there is another factor influencing visual capacities of a spider´s eye, which is the problem of keeping objects at different distances in focus. In human eyes, this is solved by accommodation, i.e. changing the shape of the lens, but salticids take a different approach: the receptors in layer 1 of their retina are arranged on a “staircase” at different distances from the lens. Thus, the image of any object, whether a few centimeters or some meters in front of the eye, will be in focus on some part of the layer-1 staircase. Additionally, the salticid can swing the eye tubes side to side without moving the corneal lenses and will thus sweep the staircase of each retina across the image of the corneal lense, sequentially obtaining a sharp image of the object.

The resulting visual performance is impressive: Jumping spiders such as Portia focus accurately on an object at distances between 2 centimeters to infinity, being able to see up to about 75 centimeters in practice. The time needed to recognize objects is however relatively long (seemingly in the range of 10-20 s) because of the complex scanning process needed to capture high-quality images from such tiny eyes. Due to this limitation, it is very difficult for spiders such as Portia to identify much larger predators fast enough because of the predator´s size, making the small spider an easy prey for birds, frogs and other predators.[31][32]

Blurry vision for distance estimation

An unexpected finding recently surprised researchers, when it was shown that jumping spiders use a technique called blurry vision to estimate their distance to previously recognized prey before taking a jump. Where humans achieve depth perception using binocular vision and other animals do so by moving their heads around or measuring ultrasound responses, jumping spiders perform this task within their principal eyes. As in other jumping spider species, the principal eyes of Hasarius adansoni feature four retinal layers with the two bottom ones featuring photocells responding to green impulses. However, green light will only ever focus sharply on the bottom one, layer 1, due to its distance from the inner lens. Layer 2 would receive focused blue light, however these photoreceptor cells are not sensitive to blue and receive a fuzzy green image instead. Interestingly, the amount of blur depends on the distance of an object from the spider´s eye – the closer it is, the more out of focus it will appear on the second retina layer. At the same time, the first retina layer 1 always receives a sharp image due to its staircase structure. Jumping spiders are thus able to estimate depth using a single unmoving eye by comparing the images of the two bottom retina layers. This was confirmed by letting spiders jump at prey in an arena flooded with green light versus red light of equal brightness. Without the ability to use the green retina layers, jumping spiders would repeatedly fail to judge distance accurately and miss their jump.

Secondary eyes

Jumping spider vision David Hill.png

In contrast to the principal eyes responsible for object analysis and discrimination, a spider´s secondary eyes act as motion detectors and therefore do not feature eye muscles to analyze a scene more extensively. Depending on their arrangement on the spider´s carapace, secondary eyes enable the animal to have panoramic vision detecting moving objects almost 360° around its body. The anterior and posterior lateral eyes (i.e. secondary eyes) only feature a single type of visual cells with a maximum spectral sensitivity for green colored light of ~535-540 nm wavelength. The number and arrangement of secondary eyes differs significantly between or even within different spider families, as does their structure: Large secondary eyes can contain several thousand rhabdomeres (the light-sensitive parts of the retina) and support hunters or nocturnal spiders with their high sensitivity to light, while small secondary eyes contain at most a few hundred rhabdomeres and only providing basic movement detection. Differently from the principal eyes which are everted (the rhabdomeres point towards the light), the secondary eyes of a spider are inverted, i.e. their rhabdomeres point away from the light, as is the case for vertebrates like the human eye. Spatial resolution of the secondary eyes e.g. in the extensively studied Cupiennius salei is greatest in horizontal direction, enabling the spider to analyse horizontal movements well even with the secondary eyes, while vertical movement may not be especially important when living in a “flat world”.

The reaction time of jumping spiders´ lateral eyes is comparably slow and amounts to 80-120 ms, measured with a 3°-sized (inter-receptor angle) square stimulus travelling past the animal´s eyes. The minimum stimulus travel distances, until the spider reacts, are 0.1° at a stimulus velocity of 1°/s, 1° at 9°/s and 2.5° at 27°/s. This means that a jumping spider´s visual system detects motion even if an object is travelling only a tenth of the secondary eyes´ inter-receptor angle at slow speed. If the stimulus gets even smaller to a size of only 0.5°, responds occur only after long delays, indicating that they lie at the spiders´ limit of perceivable motion.

Secondary eyes of (night-active) spiders usually feature a tapetum behind the rhabdomeres, which is a layer of crystals reflecting light back to the receptors to increase visual sensitivity. This allows night-hunting spiders to have eyes with an aperture as large as f/0.58 enabling them to capture visual information even in ultra-low-light conditions. Secondary eyes containing a tapetum thus easily reveal a spider´s location at night when illuminated e.g. by a flashlight.[33][34]

Central nervous system and visual processing in the brain

As anywhere in neuroscience, we still know very little about a spider´s central nervous system (CNS), especially regarding its functioning in visually controlled behavior. Of all the spiders, the CNS of Cupiennius has been studied most extensively, focusing mainly on the CNS structure. As of today, only little is known about electrophysiological properties of central neurons in Cupiennius, and even less about other spiders in this regard.

The structure of a spider´s nervous system is closely related to its body´s subdivisions, but instead of being spread all over the body, the nervous tissue is enormously concentrated and centralized. The CNS is made up of two paired, rather simple nerve cell clusters (ganglia), which are connected to the spider´s muscles and sensory systems by nerves. The brain is formed by fusion of these ganglia in the head segments ahead of and behind the mouth and fills the prosoma largely with nervous tissue, while no ganglia exist in the abdomen. Looking at the spider´s brain, it receives direct inputs from only one sensory system, the eyes - unlike any insects and crustaceans. The eight optic nerves enter the brain from the front and their signals are processed in two optic lobes in the anterior region of the brain. When a spider´s behavior is especially dependent on vision, as in the case of the jumping spider, the optic ganglia contribute up to 31% of the brain´s volume, indicating the brain to be almost completely devoted to vision. This score still amounts to 20% for Cupiennius, whereas other spiders like Nephila and Ephebopus come in at only 2%.

The distinction between principal and secondary eyes persists in the brain. Both types of eyes have their own visual pathway with two separate neuropil regions fulfilling distinct tasks. Thus spiders evidently process the visual information provided by their two eye types in parallel, with the secondary eyes being specialized for detecting horizontal movement of objects and the principal eyes being used for the detection of shape and texture.

Two visual systems in one brain

While principal and secondary eyesight seems to be distinct in spiders´ brains, surprising inter-relations between both visual systems in the brain are known as well. In visual experiments principal eye muscle activity of Cupiennius was measured while covering either its principal or secondary eyes. When stimulating the animals in a white arena with short sequences of moving black bars, the principal eyes moved involuntarily whenever a secondary eye detected motion within its visual field. This activity increase of the principal eye muscles, compared to no stimulation presented, would not change when covering the principal eyes with black paint, but would stop with the secondary eyes masked. Thus it is now clear, that only the input received from secondary eyes controls principal eye muscle activity. Also, a spider´s principal eyes do not seem to be involved in motion detection, which is only the secondary eyes´ responsibility.

Other experiments using dual-channel telemetric registration of the eye muscle activities of Cupiennius have shown that the spider actively peers into the walking direction: The ipsilateral retina of the principal eyes was measured to shift with respect to the walking direction before, during and after a turn, while the contralateral retina remained in its resting position. This happened independently from the actual light conditions, suggesting a “voluntary” peering initiated by the spider´s brain.

Pattern recognition using principal eyes

5 Salticid eye movement.png

Recognition of shape and form by jumping spiders is believed to be accomplished through a scanning process of the visual field, which consists of a complex set of rotations (torsional movements) and translations of the anterior-median eyes´ retinae. As described in the section “Principal eye retina movements”, a spider´s retinae are narrow and shaped like boomerangs, which can be matched with straight features by sweeping over the visual scene. When investigating a novel target, the eyes scan it in a stereotyped way: By moving slowly from side to side at speeds of 3-10° per second and rotating through ± 25°, horizontal and torsional retina movement allows the detection of differently positioned and rotated lines. This method can be understood as template matching where the template has elongated shape and produces a strong neural response whenever the retina matches a straight feature in the scene. This identifies a straight line with little or no further processing necessary.

A computer vision algorithm for straight line detection as an optimization problem (da Costa, da F. Costa) was inspired by the jumping spider´s visual system and uses the same approach of scanning a scene sequentially using template matching. While the well-known Hough Transform allows robust detection of straight visual features in an image, its efficiency is limited due to the necessity to calculate a good part or even the whole parameter space while searching for lines. In contrast the alternative approach used in salticid visual systems suggests searching the visual space by using a linear window, which allows adaptive searching schemes during the straight line search process without the need to systematically calculate the parameter space. Also, solving the straight line detection in such a way allows to understand it as an optimization problem, which makes efficient processing by computers possible. While it is necessary to find appropriate parameters controlling the annealing-based scanning experimentally, the approach taking a jumping spider´s path of straight line detection was proven to be very effective, especially with properly set parameters.[35]

Visually-guided behavior

Discernment of visual targets

6 Discernment of visual targets by Cupiennius salei.png

The ability of discerning between slightly different visual targets has been shown for Cupiennius salei, although this species relies mainly on its mechanosensory systems during prey catching or mating behavior. When presenting two targets at a distance of 2 m to the spider, its walking path depends on their visual appearance: Having to choose between two identical targets such as vertical bars, Cupiennius shows no preference. However the animal strongly prefers a vertical bar to a sloping bar or a V-shaped target.

The discrimination of different targets has been shown to be only possible with the principal eyes uncovered, while the spider is able to detect the targets using any of the eyes. This suggests that many spiders´ anterior-lateral (secondary) eyes are capable of much more than simply object movement detection. With all eyes covered, the spider exhibits totally undirected walking paths.

Placing Cupiennius in total darkness however results not only in undirected walks but also elicits a change of gait: Instead of using all eight legs the spider will only walk with six and employ the first legs as antennae, comparable to a blind person´s cane. In order to feel the surroundings the extended forelegs are moved up and down as well as sideways. This is specific to the first leg pair only, influenced solely by the visual input when the normal room light is switched to the invisible infrared light.

Vision-based decision making in jumping spiders

The behavior of jumping spiders after having detected movement with the eyes depends on three factors: the target´s size, speed and distance. If it has more than twice the spider´s size, the object is not approached and the spider tries to escape if it comes towards her. If the target has adequate size, its speed is visually analyzed using the secondary eyes. Fast moving targets with a speed of more than 4°/s are chased by jumping spiders, guided by her anterior-lateral eyes. Slower objects are carefully approached and analyzed with the anterior-median (i.e. principal) eyes to determine whether it is prey or another spider of the same species. This is seemingly achieved by applying the above described straight line detection, to find out whether a visual target features legs or not. While jumping spiders have shown to approach potential prey of appropriate characteristics as long as it moves, males are pickier in deciding whether their current counterpart might be a potential mate.

Potential mate detection

Experiments have shown that drawings of a central dot with leg-like appendages on the sides will result in courtship displays, suggesting that visual feature extraction is used by jumping spiders to detect the presence and orientation of linear structures in the target. Additionally, a spider´s behavior towards a considered conspecific spider depends on different factors such as sex and maturity of both involved spiders and whether it is mating time. Female wolf spiders, Schizocosa ocreata, even discern asymmetries in male secondary sexual characters when choosing their mate, possibly to avoid developmental instability in their offspring. Conspicuous tufts of bristles on a male´s forelegs, which are used for visual courtship signaling, appear to influence female mate choice and asymmetry of these body parts in consequence of leg loss and regeneration apparently reduces female receptivity to such male spiders.[36]

Secondary eye-guided hunting

A jumping spider´s stalking behavior when hunting insect prey is comparable to a cat stalking birds. If something moves within the visual field of the secondary eyes, they initiate a turn to bring the larger, forward-facing pair of principal eyes into position for classifying the object´s shape into mate, rival or prey. Even very small, low contrast dot stimuli moving at slow or fast speeds elicit such orientation behavior. Like Cupiennius, jumping spiders are also able to use their secondary eyes for more sophisticated tasks than just motion detection: Presenting visual prey cues to salticids with only visual information from the secondary eyes available and both primary eyes covered, results in the animal exhibiting complete hunting sequences. This suggests that the anterior lateral eyes of jumping spiders may be the most versatile components of their visual system. Besides detecting motion, the secondary eyes obviously also feature a spatial acuity which is good enough to direct complete visually-guided hunting sequences.

Prey “face recognition”

7 Principal eye characteristics influence stalking behavior in Portia fimbriata.jpg

Visual cues also play an important role for jumping spiders (salticids) when discriminating between salticid and non-salticid prey using principal eyesight. To this end a salticid prey´s large principal eyes provide critical cues, to which the jumping spider Portia fimbriata reacts by exhibiting cryptic stalking tactics before attacking (walking very slowly with palps retracted and freezing when faced). This behavior is only used when identifying a prey as salticid. This was exploited in experiments presenting computer-rendered, realistic three-dimensional lures with modified principal eyes to Portia fimbriata. While intact virtual lures resulted in cryptic stalking, lures without or with smaller principal eyes than usual (as sketched in the figure on the right) elicited different behavior. Presenting virtual salticid prey with only one anterior-median eye or a regular lure with two enlarged secondary eyes elicited cryptic stalking behavior suggesting successful recognition of a salticid, while P. fimbriata froze less often when faced by a Cyclops-like lure (a single principal eye centered between the two secondary eyes). Lures with square-edged principal eyes were usually not classified as a salticid, indicating that the shape of the principal eyes´ edges are an important cue to identify fellow salticids.[37]

Jumping decisions from visual features

8 Phidippus clarus female preying on fly.jpg

Spiders in the genus Phidippus have been tested within a study for their willingness to cross inhospitable open space by placing visual targets on the other side of a gap. It was found that whether the spider takes the risk of crossing open ground or not is mainly dependent on factors like distance to target, relative target size compared to distance and the target´s color and shape. In independent test runs, the spider moved to tall, distant targets equally often as to short, close targets, with both objects appearing equally sized on the spider´s retina. When giving the choice of moving to either white or green grass-like targets, the spiders consistently chose the green target irrespective of its contrast with the background, thus proving their ability to use color discernment in hunting situations.[38]

Identifying microhabitat traits by visual cues

Presented with manipulated real plants and photos of plants, Psecas chapoda (a bromeliad-dwelling salticid spider) is able to detect a favorable microhabitat by visually analyzing architectural features of the host plant´s leaves and rosette. By using black-and-white photos, any potential influence of other cues, such as color and smell, on host plant selection by the spider could be excluded during a study, leaving only shape and form as discerning characteristics. Even when having to decide solely from photographs, Psecas chapoda consistently preferred rosette-shaped plants (Agavaceae) with narrow and long leaves over differently looking plants, which proves that some spider species are able to evaluate and distinguish physical structure of microhabitats only on the basis of shape from visual cues of plant traits.[39]


  1. K. Gammon, Life’s Little Mysteries ( smartest-non-primates.html) . TechMediaNetwork.
  2. G. S. et al., Control of Octopus Arm Extension by a Peripheral Motor Program . Science 293, 1845, 2001.
  3. Y. Gutfreund, Organization of octopus arm movements: a model system for study- ing the control of flexible arms. Journal of Neuroscience 16, 7297, 1996.
  4. P. Graziadei, The anatomy of the nervous system of Octopus vulgaris, J. Z. Young. Clarendon, Oxford, 1971.
  5. M. J. Wells, The orientation of octopus. Ergeb. Biol. 26, 40-54, 1963.
  6. J. L. Fox and T. L. Daniel (2008), "A neural basis for gyroscopic force measurement in the halteres of Holorusia.", J Comp Physiol 194: 887-897 
  7. a b Rhoe A. Thompson (2009), "Haltere Mediated Flight Stabilization in Diptera: Rate Decoupling, Sensory Encoding, and Control Realization.", PhD thesis (University of Florida) 
  8. a b J. W. S. Pringle (1948), "The gyroscopic mechanism of the halteres of diptera.", Phil Trans R Soc Lond B 233 (602): 347-384 
  9. a b c d e f g h i Kamikouchi A, Inagaki HK, Effertz T, Hendrich O, Fiala A, Gopfert MC, Ito K (2009). "The neural basis of Drosophila gravity-sensing and hearing.". Nature 458 (7235): 165-171. 
  10. a b c d Yack JE (2004). "The structure and function of auditory chordontonal organs in insects.". Microscopy Research and Technique 63 (6): 315-337. 
  11. Johnston, Christopher. 1855. Original Communications: Auditory Apparatus of the Culex Mosquito
  12. a b c Dreller C and Kirchner WH (1993). "Hearing in honeybees: localization of the auditory sense organ.". Journal of Comparative Physiology A 173: 275-279. 
  13. McIver, S.B. 1989. Mechanoreception, In Comprehensive Insect Physiology, Biochemistry, and Pharmacology. Pergamon Press. 1989, Vol. 6, pp. 71-132.
  14. Keil, Thomas A. 1999. Chapter 1 - Morphology and Development of Peripheral Olfactory Organs. [book auth.] B.S. Hansson. Insect Olfaction. s.l. : Springer, 1999, pp. 5-48
  15. a b c d e Jarman, Andrew P. 2014. Chapter 2 - Development of the Auditory Organ (Johnston's Organ) in Drosophila. Development of Auditory and Vestibular Systems (Fourth Edition). San Diego : Academic Press, 2014, pp. 31-61
  16. Baker, Dean Adam and Beckingham, Kathleen Mary and Armstrong, James Douglas. 2007. Functional dissection of the neural substrates for gravitaxic maze behavior in Drosophila melanogaster. Journal of Comparative Neurology. 2007, Vol. 501, 5, pp. 756-764
  17. Nadrowski, Björn and Albert, Jörg T. and Göpfert, Martin C (2008). "Transducer-Based Force Generation Explains Active Process in Drosophila Hearing.". Current Biology 18 (18): 1365-1372. 
  18. a b c Kamikouchi A, Shimada T and Ito K (2006). "Comprehensive classification of the auditory sensory projections in the brain of the fruit fly Drosophila melanogaster.". J. Comp. Neurol. 499 (3): 317-356. 
  19. Tauber, Eran and Eberl, Daniel F. 2003. Acoustic communication in Drosophila. Behavioural Processes. 2003, Vol. 64, 2, pp. 197-210
  20. Baker, Dean Adam and Beckingham, Kathleen Mary and Armstrong, James Douglas. 2007. Functional dissection of the neural substrates for gravitaxic maze behavior in Drosophila melanogaster. Journal of Comparative Neurology. 2007, Vol. 501, 5, pp. 756-764
  21. a b c d e Yorozu S, Wong A, Fischer BJ, Dankert H, Kernan MJ, Kamikouchi A, Ito K, Anderson DJ (2007). "Distinct sensory representations of wind and near-field sound in the Drosophila brain.". Nature 458 (7235): 201-205. 
  22. Beckingham, Kathleen M. and Texada, Michael J. and Baker, Dean A. and Munjaal, Ravi and Armstrong, J. Douglas. 2005. Genetics of Graviperception in Animals. Academic Press. 2005, Vol. 55, pp.105-145
  23. Nadrowski, Björn and Albert, Jörg T. and Göpfert, Martin C. 2008. Transducer-Based Force Generation Explains Active Process in Drosophila Hearing. Current Biology. 2008, Vol. 18, 18, pp. 1365-1372
  24. a b c Greggers U, Koch G, Schmidt V, Dürr A, Floriou-Servou A, Piepenbrock D, Göpfert MC, Menzel R (2013). "Reception and learning of electric fields in bees.". Proceedings of the Royal Society B: Biological Sciences 280: 1759. 
  25. Kavlie, Ryan G. and Albert, Jörg T. 2013. Chordotonal organs. Current Biology. 2013, Vol. 23, 9, pp. 334-335
  26. Jennings, Barbara H. 2011. Drosophila a versatile model in biology & medicine. Materials Today. 2011, Vol. 14, 5, pp. 190-195
  27. F. G. Barth: A Spider´s World: Senses and Behavior. ISBN 978-3-642-07557-5, Springer-Verlag Berlin, Heidelberg. (2002)
  28. D. P. Harland, R. R. Jackson: 'Eight-legged cats' and how they see - a review of recent research on jumping spiders (Araneae: Salticidae). Department of Zoology, University of Canterbury (2000)
  29. A. Schmid: Different functions of different eye types in the spider Cupiennius salei. The Journal of Experimental Biology 201, 221–225 (1998)
  30. S. Yamashita, H. Tateda: Spectral Sensitivities of Jumping Spider Eyes. J. comp. Physiol. 105, 29-41 (1976)
  31. D. P. Harland, R. R. Jackson: Influence of cues from the anterior medial eyes of virtual prey on Portia fimbriata, an araneophagic jumping spider. The Journal of Experimental Biology 205, 1861–1868 (2002)
  32. A. Schmid, C. Trischler: Active sensing in a freely walking spider: Look where to go. Journal of Insect Physiology 57 p.494–500 (2011)
  33. D. B. Zurek, X. J. Nelson: Hyperacute motion detection by the lateral eyes of jumping spiders. Vision Research 66 p.26–30 (2012)
  34. D. B. Zurek, A. J. Taylor, C. S. Evans, X. J. Nelson: The role of the anterior lateral eyes in the vision-based behaviour of jumping spiders. The Journal of Experimental Biology 213, 2372-2378 (2010)
  35. F. M. G. da Costa, L. da F. Costa: Straight Line Detection as an Optimization Problem: An Approach Motivated by the Jumping Spider Visual System. In: Biologically Motivated Computer Vision, First IEEE International Workshop, BMVC 2000, Seoul, Korea (2000)
  36. G.W. Uetz, E. I. Smith: Asymmetry in a visual signaling character and sexual selection in a wolf spider. Behav Ecol Sociobiol (1999) 45: 87–93
  37. D. P. Harland, R. R. Jackson: Influence of cues from the anterior medial eyes of virtual prey on Portia fimbriata, an araneophagic jumping spider. The Journal of Experimental Biology 205, 1861–1868 (2002)
  38. R. R. Jackson, D. P. Harland: One small leap for the jumping spider but a giant step for vision science. THE JOURNAL OF EXPERIMENTAL BIOLOGY, JEB Classics p.2129-2132
  39. P. M. de Omena, and G. Q. Romero: Using visual cues of microhabitat traits to find home: the case study of a bromeliad-living jumping spider (Salticidae). Behavioral Ecology 21:690–695 (2010)



If light passes through a prism, a colour spectrum will be formed at the other end of the prism ranging from red to violet. The wavelength of the red light is from 650nm to 700nm, and the violet light is at around 400nm to 420nm. This is the EM range detectable for the human eye.

Colour spectrum produced by a prism

Colour Models

The colour triangle is often used to illustrate the colour-mixing effect. The triangle entangles the visible spectrum, and a white dot is located in the middle of the triangle. Because of additive colour mixing property of red (700nm), green(546nm) and blue(435nm), every colour can be produced by mixing those three colours.

The RGB color-triangle

History of Sensory Systems

This Wikibook was started by engineers studying at ETH Zurich as part of the course Computational Simulations of Sensory Systems. The course combines physiology with an emphasis on the sensory systems, programming and signal processing. There is a plethora of information regarding these topics on the internet and in the literature, but there's a distinct lack of concise texts and books on the fusion of these 3 topics. The world needs a structured and thorough overview of biology and biological systems from an engineering point of view, which is what this book is trying to correct. We will start off with the Visual System, focusing on the biological and physiological aspects, mainly because this will be used in part to grade our performance in the course. The other part being the programming aspects have already been evaluated and graded. It is the authors' wishes that eventually information on physiology/biology, signal processing AND programming shall be added to each of the sensory systems. Also we hope that more sections will be added to extend the book in ways previously not thought of.

The original title of the Wikibook, Biological Machines, stressed the technical aspects of sensory system. However, as the wikibook evolved it became a comprehensive overview of human sensory systems, with additional emphasis on technical aspects of these systems. This focus is better represented with Sensory Systems, the new wikibook title since December 2011.


Visual System

Auditory System

  • Intraoperative Neurophysiological Monitoring, 2nd Edition, Aage R. Møller, Humana Press 2006, Totowa, New Jersey, pages 55-70
  • The Science and Applications of Acoustics, 2nd Edition, Daniel R. Raichel, Springer Science&Business Media 2006, New York, pages 213-220
  • Physiology of the Auditory System, P. J. Abbas, 1993, in: Cummings Otolaryngology: Head and Neck Surgery, 2nd edition, Mosby Year Book, St. Louis
  • Computer Simulations of Sensory Systems, Lecture Script Ver 1.3 March 2010, T. Haslwanter, Upper Austria University of Applied Sciences, Linz, Austria,

Gustatory System

  • Carleton, Alan; Accolla, Riccardo; Simon, Sidney A. (July 2010). "Coding in the mammalian gustatory system". Trends in Neurosciences 33 (7): 326–334. doi:10.1016/j.tins.2010.04.002. 
  • Dalton, P.; Doolittle, N.; Nagata, H.; Breslin, P.A.S. (1 May 2000). Nature Neuroscience 3 (5): 431–432. doi:10.1038/74797. 
  • Gottfried, J (July 2003). "The Nose Smells What the Eye SeesCrossmodal Visual Facilitation of Human Olfactory Perception". Neuron 39 (2): 375–386. doi:10.1016/S0896-6273(03)00392-1. 
  • Mueller, Ken L.; Hoon, Mark A.; Erlenbach, Isolde; Chandrashekar, Jayaram; Zuker, Charles S.; Ryba, Nicholas J. P. (10 March 2005). "The receptors and coding logic for bitter taste". Nature 434 (7030): 225–229. doi:10.1038/nature03352. 
  • Nitschke, Jack B; Dixon, Gregory E; Sarinopoulos, Issidoros; Short, Sarah J; Cohen, Jonathan D; Smith, Edward E; Kosslyn, Stephen M; Rose, Robert M et al. (5 February 2006). "Altering expectancy dampens neural response to aversive taste in primary taste cortex". Nature Neuroscience 9 (3): 435–442. doi:10.1038/nn1645. 
  • Okubo, Tadashi; Clark, Cheryl; Hogan, Brigid L.M. (February 2009). "Cell Lineage Mapping of Taste Bud Cells and Keratinocytes in the Mouse Tongue and Soft Palate". Stem Cells 27 (2): 442–450. doi:10.1634/stemcells.2008-0611. 
  • Smith, David V; St John, Steven J (August 1999). "Neural coding of gustatory information". Current Opinion in Neurobiology 9 (4): 427–435. doi:10.1016/S0959-4388(99)80064-6. 
  • Yarmolinsky, David A.; Zuker, Charles S.; Ryba, Nicholas J.P. (October 2009). "Common Sense about Taste: From Mammals to Insects". Cell 139 (2): 234–244. doi:10.1016/j.cell.2009.10.001. 
  • Zhao, Grace Q.; Zhang, Yifeng; Hoon, Mark A.; Chandrashekar, Jayaram; Erlenbach, Isolde; Ryba, Nicholas J.P.; Zuker, Charles S. (October 2003). "The Receptors for Mammalian Sweet and Umami Taste". Cell 115 (3): 255–266. doi:10.1016/S0092-8674(03)00844-4. 
  • Kandel, E., Schwartz, J., and Jessell, T. (2000) Principles of Neural Science. 4th edition. McGraw Hill, New York.


This list contains the names of all the authors that have contributed to this text. If you have added, modified or contributed in any way, please add your name to this list.

Name Institution
Thomas Haslwanter Upper Austria University of Applied Sciences / ETH Zurich
Aleksander George Slater Imperial College London / ETH Zurich
Piotr Jozef Sliwa Imperial College London / ETH Zurich
Qian Cheng ETH Zurich
Salomon Wettstein ETH Zurich
Philipp Simmler ETH Zurich
Renate Gander ETH Zurich
Gerick Lee University of Zurich & ETH Zurich
Gabriela Michel ETH Zurich
Peter O'Connor ETH Zurich
Nikhil Biyani ETH Zurich
Mathias Buerki ETH Zurich
Jianwen Sun ETH Zurich
Maurice Göldi University of Zurich
Sofia Jativa ETH Zurich
Salomon Diether ETH Zurich
Arturo Moncada-Torres ETH Zurich
Datta Singh Goolaub ETH Zurich
Stephanie Marquardt University of Zurich & ETH Zurich
Alpha Renner University of Zurich & ETH Zurich
Karlis Kanders University of Zurich & ETH Zurich