Sensory Systems/Print version
|This is the print version of Sensory Systems
You won't see this message or any elements not part of the book's content when you print or preview this page.
Table of contents
Sensory Systems in Humans
- Visual System
- Auditory System
- Vestibular System
- Somatosensory System
- Olfactory System
- Gustatory System
Sensory Systems in Non-Primates
The Wikibook of
Biological Organisms, an Engineer's Point of View.
From Wikibooks: The Free Library
In order to survive - at least on the species level - we continually need to make decisions:
- "Should I cross the road?"
- "Should I run away from the creature in front of me?"
- "Should I eat the thing in front of me?"
- "Or should I try to mate it?"
To help us to make the right decision, and make that decision quickly, we have developed an elaborate system: a sensory system to notice what's going on around us; and a nervous system to handle all that information. And this system is big. VERY big! Our nervous system contains about nerve cells (or neurons), and about 10-50 times as many supporting cells. These supporting cells, called gliacells, include oligodendrocytes, Schwann cells, and astrocytes. But do we really need all these cells?
Keep it simple: Unicellular Creatures
The answer is: "No!", we do not REALLY need that many cells in order to survive. Creatures existing of a single cell can be large, can respond to multiple stimuli, and can also be remarkably smart!
We often think of cells as really small things. But Xenophyophores (see image) are unicellular organisms that are found throughout the world's oceans and can get as large as 20 centimetres in diameter.
And even with this single cell, those organisms can respond to a number of stimuli. For example look at a creature from the group Paramecium: the paramecium is a group of unicellular ciliate protozoa formerly known as slipper animalcules, from their slipper shape. (The corresponding word in German is Pantoffeltierchen.) Despite the fact that these creatures consist of only one cell, they are able to respond to different environmental stimuli, e.g. to light or to touch.
And such unicellular organisms can be amazingly smart: the plasmodium of the slime mould Physarum polycephalum is a large amoebalike cell consisting of a dendritic network of tube-like structures. This single cell creature manages to connect sources finding the shortest connections (Nakagaki et al. 2000), and can even build efficient, robust and optimized network structures that resemble the Tokyo underground system (Tero et al. 2010). In addition, it has somehow developed the ability to read its tracks and tell if its been in a place before or not: this way it can save energy and not forage through locations where effort has already been put (Reid et al. 2012).
On the one hand, the approach used by the paramecium cannot be too bad, as they have been around for a long time. On the other hand, a single cell mechanism cannot be as flexible and as accurate in its responses as a more refined version of creatures, which use a dedicated, specialized system just for the registration of the environment: a Sensory System.
Not so simple: Three-hundred-and-two Neurons
While humans have hundreds of millions of sensory nerve cells, and about nerve cells, other creatures get away with significantly less. A famous one is Caenorhabditis elegans, a nematode with a total of 302 neurons.
C. elegans is one of the simplest organisms with a nervous system, and it was the first multicellular organism to have its genome completely sequenced. (The sequence was published in 1998.) And not only do we know its complete genome, we also know the connectivity between all 302 of its neurons. In fact, the developmental fate of every single somatic cell (959 in the adult hermaphrodite; 1031 in the adult male) has been mapped out. We know, for example, that only 2 of the 302 neurons are responsible for chemotaxis (“movement guided by chemical cues”, i.e. essentially smelling). Nevertheless, there is still a lot of research conducted – also on its smelling - in order to understand how its nervous system works!
General principles of Sensory Systems
Based on the example of the visual system, the general principle underlying our neuro-sensory system can be described as below:
All sensory systems are based on
- a Signal, i.e. a physical stimulus, provides information about our surrounding.
- the Collection of this signal, e.g. by using an ear or the lens of an eye.
- the Transduction of this stimulus into a nerve signal.
- the Processing of this information by our nervous system.
- And the generation of a resulting Action.
While the underlying physiology restricts the maximum frequency of our nerve-cells to about 1 kHz, more than one-million times slower than modern computers, our nervous system still manages to perform stunningly difficult tasks with apparent ease. The trick: there are lots of nerve cells (about ), and they are massively connected (one nerve cell can have up to 150'000 connections with other nerve cells).
The role of our "senses" is to transduce relevant information from the world surrounding us into a type of signal that is understood by the next cells receiving that signal: the "Nervous System". (The sensory system is often regarded as part of the nervous system. Here I will try to keep these two apart, with the expression Sensory System referring to the stimulus transduction, and the Nervous System referring to the subsequent signal processing.) Note here that only relevant information is to be transduced by the sensory system! The task of our senses is NOT to show us everything that is happening around us. Instead, their task is to filter out the important bits of the signals around us: electromagnetic signals, chemical signals, and mechanical ones. Our Sensory Systems transduce those environmental variables that are (probably) important to us. And the Nervous System propagates them in such a way that the responses that we take help us to survive, and to pass on our genes.
Types of sensory transducers
- Mechanical receptors
- Balance system (vestibular system)
- Hearing (auditory system)
- Fast adaptation (Meissner’s corpuscle, Pacinian corpuscle) ? movement
- Slow adaptation (Merkel disks, Ruffini endings) ? shape Comment: these signals are transferred fast
- Muscle spindles
- Golgi organs: in the tendons
- Chemical receptors
- Smell (olfactory system)
- Light-receptors (visual system): here we have light-dark receptors (rods), and three different color receptors (cones)
- Heat-sensors (maximum sensitivity at ~ 45°C, signal temperatures < 50°C)
- Cold-sensors (maximum sensitivity at ~ 25°C, signal temperatures > 5°C)
- Comment: The information processing of these signals is similar to those of visual color signals, and is based on differential activity of the two sensors; these signals are slow
- Electro-receptors: for example in the bill of the platypus
- Pain receptors (nocioceptors): pain receptors are also responsible for itching; these signals are passed on slowly.
Now what distinguishes neurons from other cells in the human body, like liver cells or fat cells? Neurons are unique, in that they:
- can switch quickly between two states (which can also be done by muscle cells).
- That they can propagate this change into a specified direction and over longer distances (which cannot also be done by muscle cells).
- And that this state-change can be signalled effectively to other connected neurons.
While there are more than 50 distinctly different types of neurons, they all share the same structure:
- An input stage, often called dendrites, as the input-area often spreads out like the branches of a tree. Input can come from sensory cells or from other neurons; it can come from a single cell (e.g. a bipolar cell in the retina receives input from a single cone), or from up to 150’000 other neurons (e.g. Purkinje cells in the Cerebellum); and it can be positive (excitatory) or negative (inhibitory).
- An integrative stage: the cell body does the household chores (generating the energy, cleaning up, generating the required chemical substances, etc), combines the incoming signals, and determines when to pass a signal on down the line.
- A conductile stage, the axon: once the cell body has decided to send out a signal, an action potential propagates along the axon, away from the cell body. An action potential is a quick change in the state of a neuron, which lasts for about 1 msec. Note that this defines a clear direction in the signal propagation, from the cell body, to the:
- output Stage: The output is provided by synapses, i.e. the points where a neuron contacts the next neuron down the line, most often by the emission of neurotransmitters (i.e. chemicals that affect other neurons) which then provide an input to the next neuron.
Principles of Information Processing in the Nervous System
An important principle in the processing of neural signals is parallelism. Signals from different locations have different meaning. This feature, sometimes also referred to as line labeling, is used by the
- Auditory system - to signal frequency
- Olfactory system - to signal sweet or sour
- Visual system - to signal the location of a visual signal
- Vestibular system - to signal different orientations and movements
Sensory information is rarely based on the signal nerve. It is typically coded by different patterns of activity in a population of neurons. This principle can be found in all our sensory systems.
The structure of the connections between nerve cells is not static. Instead it can be modified, to incorporate experiences that we have made. Thereby nature walks a thin line:
- If we learn too slowly, we might not make it. One example is the "Eskimo curlew", an American bird which may be extinct by now. In the last century (and the one before), this bird was shot in large numbers. The mistake of the bird was: when some of them were shot, the others turned around, maybe to see what's up. So they were shot in turn - until the birds were essentially gone. The lesson: if you learn too slowly (i.e. to run away when all your mates are killed), your species might not make it.
- On the other hand, we must not learn too fast, either. For example, the monarch butterfly migrates. But it takes them so long to get from "start" to "finish", that the migration cannot be done by one butterfly alone. In other words, no single butterfly makes the whole journey. Nevertheless, the genetic disposition still tells the butterflies where to go, and when they are there. If they would learn any faster - they could never store the necessary information in their genes. In contrast to other cells in the human body, nerve cells are not re-generated in the human body.
Simulation of Neural Systems
Simulating Action Potentials
The "action potential" is the stereotypical voltage change that is used to propagate signals in the nervous system.
With the mechanisms described below, an incoming stimulus (of any sort) can lead to a change in the voltage potential of a nerve cell. Up to a certain threshold, that's all there is to it ("Failed initiations" in Fig. 4). But when the Threshold of voltage-gated ion channels is reached, it comes to a feed-back reaction that almost immediately completely opens the Na+-ion channels ("Depolarization" below): This reaches a point where the permeability for Na+ (which is in the resting state is about 1% of the permeability of K+) is 20\*larger than that of K+. Together, the voltage rises from about -60mV to about +50mV. At that point internal reactions start to close (and block) the Na+ channels, and open the K+ channels to restore the equilibrium state. During this "Refractory period" of about 1 m, no depolarization can elicit an action potential. Only when the resting state is reached can new action potentials be triggered.
To simulate an action potential, we first have to define the different elements of the cell membrane, and how to describe them analytically.
The cell membrane is made up by a water-repelling, almost impermeable double-layer of proteins, the cell membrane. The real power in processing signals does not come from the cell membrane, but from ion channels that are embedded into that membrane. Ion channels are proteins which are embedded into the cell membrane, and which can selectively be opened for certain types of ions. (This selectivity is achieved by the geometrical arrangement of the amino acids which make up the ion channels.) In addition to the Na+ and K+ ions mentioned above, ions that are typically found in the nervous system are the cations Ca2+, Mg2+, and the anions Cl- .
States of ion channels
Ion channels can take on one of three states:
- Open (For example, an open Na-channel lets Na+ ions pass, but blocks all other types of ions).
- Closed, with the option to open up.
- Closed, unconditionally.
The typical default situation – when nothing is happening - is characterized by K+ that are open, and the other channels closed. In that case two forces determine the cell voltage:
- The (chemical) concentration difference between the intra-cellular and extra-cellular concentration of K+, which is created by the continuous activity of the ion pumps described above.
- The (electrical) voltage difference between the inside and outside of the cell.
The equilibrium is defined by the Nernst-equation:
R ... gas-constant, T ... temperature, z ... ion-valence, F ... Faraday constant, [X]o/i … ion concentration outside/ inside. At 25° C, RT/F is 25 mV, which leads to a resting voltage of
With typical K+ concentration inside and outside of neurons, this yields . If the ion channels for K+, Na+ and Cl- are considered simultaneously, the equilibrium situation is characterized by the Goldman-equation
where Pi denotes the permeability of Ion "i", and I the concentration. Using typical ion concentration, the cell has in its resting state a negative polarity of about -60 mV.
Activation of Ion Channels
The nifty feature of the ion channels is the fact that their permeability can be changed by
- A mechanical stimulus (mechanically activated ion channels)
- A chemical stimulus (ligand activated ion channels)
- Or an by an external voltage (voltage gated ion channels)
- Occasionally ion channels directly connect two cells, in which case they are called gap junction channels.
- Sensory systems are essentially based ion channels, which are activated by a mechanical stimulus (pressure, sound, movement), a chemical stimulus (taste, smell), or an electromagnetic stimulus (light), and produce a "neural signal", i.e. a voltage change in a nerve cell.
- Action potentials use voltage gated ion channels, to change the "state" of the neuron quickly and reliably.
- The communication between nerve cells predominantly uses ion channels that are activated by neurotransmitters, i.e. chemicals emitted at a synapse by the preceding neuron. This provides the maximum flexibility in the processing of neural signals.
Modeling a voltage dependent ion channel
Ohm's law relates the resistance of a resistor, R, to the current it passes, I, and the voltage drop across the resistor, V:
where is the conductance of the resistor. If you now suppose that the conductance is directly proportional to the probability that the channel is in the open conformation, then this equation becomes
where gmax is the maximum conductance of the cannel, and n is the probability that the channel is in the open conformation.
Example: the K-channel
Voltage gated potassium channels (Kv) can be only open or closed. Let α be the rate the channel goes from closed to open, and β the rate the channel goes from open to closed
Since n is the probability that the channel is open, the probability that the channel is closed has to be (1-n), since all channels are either open or closed. Changes in the conformation of the channel can therefore be described by the formula
Note that α and β are voltage dependent! With a technique called "voltage-clamping", Hodgkin and Huxley determine these rates in 1952, and they came up with something like
If you only want to model a voltage-dependent potassium channel, these would be the equations to start from. (For voltage gated Na channels, the equations are a bit more difficult, since those channels have three possible conformations: open, closed, and inactive.)
Hodgkin Huxley equation
The feedback-loop of voltage-gated ion channels mentioned above made it difficult to determine their exact behaviour. In a first approximation, the shape of the action potential can be explained by analyzing the electrical circuit of a single axonal compartment of a neuron, consisting of the following components: 1) membrane capacitance, 2) Na channel, 3) K channel, 4) leakage current:
The final equations in the original Hodgkin-Huxley model, where the currents in of chloride ions and other leakage currents were combined, were as follows:
where m, h, and n are time- and voltage dependent functions which describe the membrane-permeability. For example, for the K channels n obeys the equations described above, which were determined experimentally with voltage-clamping. These equations describe the shape and propagation of the action potential with high accuracy! The model can be solved easily with open source tools, e.g. the Python Dynamical Systems Toolbox PyDSTools. A simple solution file is available under  , and the output is shown below.
Links to full Hodgkin-Huxley model
Modeling the Action Potential Generation: The Fitzhugh-Nagumo model
The Hodgkin-Huxley model has four dynamical variables: the voltage V, the probability that the K channel is open, n(V), the probability that the Na channel is open given that it was closed previously, m(V), and the probability that the Na channel is open given that it was inactive previously, h(V). A simplified model of action potential generation in neurons is the Fitzhugh-Nagumo (FN) model. Unlike the Hodgkin-Huxley model, the FN model has only two dynamic variables, by combining the variables V and m into a single variable v, and combining the variables n and h into a single variable r
I is an external current injected into the neuron. Since the FN model has only two dynamic variables, its full dynamics can be explored using phase plane methods (Sample solution in Python here )
Simulating a Single Neuron with Positive Feedback
The following two examples are taken from  . This book provides a fantastic introduction into modeling simple neural systems, and gives a good understanding of the underlying information processing.
Let us first look at the response of a single neuron, with an input x(t), and with feedback onto itself. The weight of the input is v, and the weight of the feedback w. The response y(t) of the neuron is given by
This shows how already very simple simulations can capture signal processing properties of real neurons.
# -*- coding: utf-8 -*- import numpy as np import matplotlib.pylab as plt def oneUnitWithPosFB(): '''Simulates a single model neuron with positive feedback ''' # set input flag (1 for impulse, 2 for step) inFlag = 1 cut = -np.inf # set cut-off sat = np.inf # set saturation tEnd = 100 # set last time step nTs = tEnd+1 # find the number of time steps v = 1 # set the input weight w = 0.95 # set the feedback weight x = np.zeros(nTs) # open (define) an input hold vector start = 11 # set a start time for the input if inFlag == 1: # if the input should be a pulse x[start] = 1 # then set the input at only one time point elif inFlag == 2: # if the input instead should be a step, then x[start:nTs] = np.ones(nTs-start) #keep it up until the end y = np.zeros(nTs) # open (define) an output hold vector for t in range(2, nTs): # at every time step (skipping the first) y[t] = w*y[t-1] + v*x[t-1] # compute the output y[t] = np.max([cut, y[t]]) # impose the cut-off constraint y[t] = np.min([sat, y[t]]) # mpose the saturation constraint # plot results (no frills) plt.subplot(211) tBase = np.arange(tEnd+1) plt.plot(tBase, x) plt.axis([0, tEnd, 0, 1.1]) plt.xlabel('Time Step') plt.ylabel('Input') plt.subplot(212) plt.plot(tBase, y) plt.xlabel('Time Step') plt.ylabel('Output') plt.show() if __name__ == '__main__': oneUnitWithPosFB()
Simulating a Simple Neural System
Even very simple neural systems can display a surprisingly versatile set of behaviors. An example is Wilson's model of the locust-flight central pattern generator. Here the system is described by
W is the connection matrix describing the recurrent connections of the neurons, and describes the input to the system.
import numpy as np import matplotlib.pylab as plt def printInfo(text, value): print(text) print(np.round(value, 2)) def WilsonCPG(): '''implements a linear version of Wilson's locust flight central pattern generator (CPG) ''' v1 = v3 = v4 = 0. # set input weights v2 = 1. w11=0.9; w12=0.2; w13 = w14 = 0. # feedback weights to unit one w21=-0.95; w22=0.4; w23=-0.5; w24=0 # ... to unit two w31=0; w32=-0.5; w33=0.4; w34=-0.95 # ... to unit three w41 = w42 = 0.; w43=0.2; w44=0.9 # ... to unit four V=np.array([v1, v2, v3, v4]) # compose input weight matrix (vector) W=np.array([[w11, w12, w13, w14], [w21, w22, w23, w24], [w31, w32, w33, w34], [w41, w42, w43, w44]]) # compose feedback weight matrix tEnd = 100 # set end time tVec = np.arange(tEnd) # set time vector nTs = tEnd # find number of time steps x = np.zeros(nTs) # zero input vector fly = 11 # set time to start flying x[fly] = 1 # set input to one at fly time y = np.zeros((4,nTs)) # zero output vector for t in range(1,nTs): # for each time step y[:,t] = W.dot(y[:,t-1]) + V*x[t-1]; # compute output # These calculations are interesting, but not absolutely necessary (eVal,eVec) = np.linalg.eig(W); # find eigenvalues and eigenvectors magEVal = np.abs(eVal) # find magnitude of eigenvalues angEVal = np.angle(eVal)*(180/np.pi) # find angles of eigenvalues printInfo('Eigenvectors: --------------', eVec) printInfo('Eigenvalues: ---------------', eVal) printInfo('Angle of Eigenvalues: ------', angEVal) # plot results (units y2 and y3 only) plt.figure() plt.rcParams['font.size'] = 14 # set the default fontsize plt.rcParams['lines.linewidth']=1 plt.plot(tVec, x, 'k-.', tVec, y[1,:],'k', tVec,y[2,:],'k--', linewidth=2.5) plt.axis([0, tEnd, -0.6, 1.1]) plt.xlabel('Time Step',fontsize=14) plt.ylabel('Input and Unit Responses',fontsize=14) plt.legend(('Input','Left Motoneuron','Right Motoneuron')) plt.show() if __name__ == '__main__': plt.close('all') WilsonCPG()
- T. Haslwanter (2012). "Hodgkin-Huxley Simulations [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/HH_model.py.
- T. Haslwanter (2012). "Fitzhugh-Nagumo Model [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/Fitzhugh_Nagumo.py.
- T. Anastasio (2010). "Tutorial on Neural systems Modeling". http://www.sinauer.com/detail.php?id=3396.
Generally speaking, visual systems rely on electromagnetic (EM) waves to give an organism more information about its surroundings. This information could be regarding potential mates, dangers and sources of sustenance. Different organisms have different constituents that make up what is referred to as a visual system.
The complexity of eyes range from something as simple as an eye spot, which is nothing more than a collection of photosensitive cells, to a fully fledged camera eye. If an organism has different types of photosensitive cells, or cells sensitive to different wavelength ranges, the organism would theoretically be able to perceive colour or at the very least colour differences. Polarisation, another property of EM radiation, can be detected by some organisms, with insects and cephalopods having the highest accuracy.
Please note, in this text, the focus has been on using EM waves to see. Granted, some organisms have evolved alternative ways of obtaining sight or at the very least supplementing what they see with extra-sensory information. For example, whales or bats, which use echo-location. This may be seeing in some sense of the definition of the word, but it is not entirely correct. Additionally, vision and visual are words most often associated with EM waves in the visual wavelength range, which is normally defined as the same wavelength limits of human vision. Since some organisms detect EM waves with frequencies below and above that of humans a better definition must be made. We therefore define the visual wavelength range as wavelengths of EM between 300nm and 800nm. This may seem arbitrary to some, but selecting the wrong limits would render parts of some bird's vision as non-vision. Also, with this range of wavelengths, we have defined for example the thermal-vision of certain organisms, like for example snakes as non-vision. Therefore snakes using their pit organs, which is sensitive to EM between 5000nm and 30,000nm (IR), do not "see", but somehow "feel" from afar. Even if blind specimens have been documented targeting and attacking particular body parts.
Firstly a brief description of different types of visual system sensory organs will be elaborated on, followed by a thorough explanation of the components in human vision, the signal processing of the visual pathway in humans and finished off with an example of the perceptional outcome due to these stages.
Vision, or the ability to see depends on visual system sensory organs or eyes. There are many different constructions of eyes, ranging in complexity depending on the requirements of the organism. The different constructions have different capabilities, are sensitive to different wave-lengths and have differing degrees of acuity, also they require different processing to make sense of the input and different numbers to work optimally. The ability to detect and decipher EM has proved to be a valuable asset to most forms of life, leading to an increased chance of survival for organisms that utilise it. In environments without sufficient light, or complete lack of it, lifeforms have no added advantage of vision, which ultimately has resulted in atrophy of visual sensory organs with subsequent increased reliance on other senses (e.g. some cave dwelling animals, bats etc.). Interestingly enough, it appears that visual sensory organs are tuned to the optical window, which is defined as the EM wavelengths (between 300nm and 1100nm) that pass through the atmosphere reaching to the ground. This is shown in the figure below. You may notice that there exists other "windows", an IR window, which explains to some extent the thermal-"vision" of snakes, and a radiofrequency (RF) window, of which no known lifeforms are able to detect.
Through time evolution has yielded many eye constructions, and some of them have evolved multiple times, yielding similarities for organisms that have similar niches. There is one underlying aspect that is essentially identical, regardless of species, or complexity of sensory organ type, the universal usage of light-sensitive proteins called opsins. Without focusing too much on the molecular basis though, the various constructions can be categorised into distinct groups:
- Spot Eyes
- Pit Eyes
- Pinhole Eyes
- Lens Eyes
- Refractive Cornea Eyes
- Reflector Eyes
- Compound Eyes
The least complicated configuration of eyes enable organisms to simply sense the ambient light, enabling the organism to know whether there is light or not. It is normally simply a collection of photosensitive cells in a cluster in the same spot, thus sometimes referred to as spot eyes, eye spot or stemma. By either adding more angular structures or recessing the spot eyes, an organisms gains access to directional information as well, which is a vital requirement for image formation. These so called pit eyes are by far the most common types of visual sensory organs, and can be found in over 95% of all known species.
Taking this approach to the obvious extreme leads to the pit becoming a cavernous structure, which increases the sharpness of the image, alas at a loss in intensity. In other words, there is a trade-off between intensity or brightness and sharpness. An example of this can be found in the Nautilus, species belonging to the family Nautilidae, organisms considered to be living fossils. They are the only known species that has this type of eye, referred to as the pinhole eye, and it is completely analogous to the pinhole camera or the camera obscura. In addition, like more advanced cameras, Nautili are able to adjust the size of the aperture thereby increasing or decreasing the resolution of the eye at a respective decrease or increase in image brightness. Like the camera, the way to alleviate the intensity/resolution trade-off problem is to include a lens, a structure that focuses the light unto a central area, which most often has a higher density of photo-sensors. By adjusting the shape of the lens and moving it around, and controlling the size of the aperture or pupil, organisms can adapt to different conditions and focus on particular regions of interest in any visual scene. The last upgrade to the various eye constructions already mentioned is the inclusion of a refractive cornea. Eyes with this structure have delegated two thirds of the total optic power of the eye to the high refractive index liquid inside the cornea, enabling very high resolution vision. Most land animals, including humans have eyes of this particular construct. Additionally, many variations of lens structure, lens number, photosensor density, fovea shape, fovea number, pupil shape etc. exists, always, to increase the chances of survival for the organism in question. These variations lead to a varied outward appearance of eyes, even with a single eye construction category. Demonstrating this point, a collection of photographs of animals with the same eye category (refractive cornea eyes) is shown below.
An alternative to the lens approach called reflector eyes can be found in for example mollusks. Instead of the conventional way of focusing light to a single point in the back of the eye using a lens or a system of lenses, these organisms have mirror like structures inside the chamber of the eye that reflects the light into a central portion, much like a parabola dish. Although there are no known examples of organisms with reflector eyes capable of image formation, at least one species of fish, the spookfish (Dolichopteryx longipes) uses them in combination with "normal" lensed eyes.
The last group of eyes, found in insects and crustaceans, is called compound eyes. These eyes consist of a number of functional sub-units called ommatidia, each consisting of a facet, or front surface, a transparent crystalline cone and photo-sensitive cells for detection. In addition each of the ommatidia are separated by pigment cells, ensuring the incoming light is as parallel as possible. The combination of the outputs of each of these ommatidia form a mosaic image, with a resolution proportional to the number of ommatidia units. For example, if humans had compound eyes, the eyes would have covered our entire faces to retain the same resolution. As a note, there are many types of compound eyes, but delving to deep into this topic is beyond the scope of this text.
Not only the type of eyes vary, but also the number of eyes. As you are well aware of, humans usually have two eyes, spiders on the other hand have a varying number of eyes, with most species having 8. Normally the spiders also have varying sizes of the different pairs of eyes and the differing sizes have different functions. For example, in jumping spiders 2 larger front facing eyes, give the spider excellent visual acuity, which is used mainly to target prey. 6 smaller eyes have much poorer resolution, but helps the spider to avoid potential dangers. Two photographs of the eyes of a jumping spider and the eyes of a wolf spider are shown to demonstrate the variability in the eye topologies of arachnids.
Anatomy of the Visual System
We humans are visual creatures, therefore our eyes are complicated with many components. In this chapter, an attempt is made to describe these components, thus giving some insight into the properties and functionality of human vision.
Getting inside of the eyeball - Pupil, iris and the lens
Light rays enter the eye structure through the black aperture or pupil in the front of the eye. The black appearance is due to the light being fully absorbed by the tissue inside the eye. Only through this pupil can light enter into the eye which means the amount of incoming light is effectively determined by the size of the pupil. A pigmented sphincter surrounding the pupil functions as the eye's aperture stop. It is the amount of pigment in this iris, that give rise to the various eye colours found in humans.
In addition to this layer of pigment, the iris has 2 layers of ciliary muscles. A circular muscle called the pupillary sphincter in one layer, that contracts to make the pupil smaller. The other layer has a smooth muscle called the pupillary dilator, which contracts to dilate the pupil. The combination of these muscles can thereby dilate/contract the pupil depending on the requirements or conditions of the person. The ciliary muscles are controlled by ciliary zonules, fibres that also change the shape of the lens and hold it in place.
The lens is situated immediately behind the pupil. Its shape and characteristics reveal a similar purpose to that of camera lenses, but they function in slightly different ways. The shape of the lens is adjusted by the pull of the ciliary zonules, which consequently changes the focal length. Together with the cornea, the lens can change the focus, which makes it a very important structure indeed, however only one third of the total optical power of the eye is due to the lens itself. It is also the eye's main filter. Lens fibres make up most of the material for the lense, which are long and thin cells void of most of the cell machinery to promote transparency. Together with water soluble proteins called crystallins, they increase the refractive index of the lens. The fibres also play part in the structure and shape of the lens itself.
Beamforming in the eye – Cornea and its protecting agent - Sclera
The cornea, responsible for the remaining 2/3 of the total optical power of the eye, covers the iris, pupil and lens. It focuses the rays that pass through the iris before they pass through the lens. The cornea is only 0.5mm thick and consists of 5 layers:
- Epithelium: A layer of epithelial tissue covering the surface of the cornea.
- Bowman's membrane: A thick protective layer composed of strong collagen fibres, that maintain the overall shape of the cornea.
- Stroma: A layer composed of parallel collagen fibrils. This layer makes up 90% of the cornea's thickness.
- Descemet's membrane and Endothelium: Are two layers adjusted to the anterior chamber of the eye filled with aqueous humor fluid produced by the ciliary body. This fluid moisturises the lens, cleans it and maintains the pressure in the eye ball. The chamber, positioned between cornea and iris, contains a trabecular meshwork body through which the fluid is drained out by Schlemm canal, through posterior chamber.
The surface of the cornea lies under two protective membranes, called the sclera and Tenon’s capsule. Both of these protective layers completely envelop the eyeball. The sclera is built from collagen and elastic fibres, which protect the eye from external damages, this layer also gives rise to the white of the eye. It is pierced by nerves and vessels with the largest hole reserved for the optic nerve. Moreover, it is covered by conjunctiva, which is a clear mucous membrane on the surface of the eyeball. This membrane also lines the inside of the eyelid. It works as a lubricant and, together with the lacrimal gland, it produces tears, that lubricate and protect the eye. The remaining protective layer, the eyelid, also functions to spread this lubricant around.
Moving the eyes – extra-ocular muscles
The eyeball is moved by a complicated muscle structure of extra-ocular muscles consisting of four rectus muscles – inferior, medial, lateral and superior and two oblique – inferior and superior. Positioning of these muscles is presented below, along with functions:
As you can see, the extra-ocular muscles (2,3,4,5,6,8) are attached to the sclera of the eyeball and originate in the annulus of Zinn, a fibrous tendon surrounding the optic nerve. A pulley system is created with the trochlea acting as a pulley and the superior oblique muscle as the rope, this is required to redirect the muscle force in the correct way. The remaining extra-ocular muscles have a direct path to the eye and therefore do not form these pulley systems. Using these extra-ocular muscles, the eye can rotate up, down, left, right and alternative movements are possible as a combination of these.
Other movements are also very important for us to be able to see. Vergence movements enable the proper function of binocular vision. Unconscious fast movements called saccades, are essential for people to keep an object in focus. The saccade is a sort of jittery movement performed when the eyes are scanning the visual field, in order to displace the point of fixation slightly. When you follow a moving object with your gaze, your eyes perform what is referred to as smooth pursuit. Additional involuntary movements called nystagmus are caused by signals from the vestibular system, together they make up the vestibulo-ocular reflexes.
The brain stem controls all of the movements of the eyes, with different areas responsible for different movements.
- Pons: Rapid horizontal movements, such as saccades or nystagmus
- Mesencephalon: Vertical and torsional movements
- Cerebellum: Fine tuning
- Edinger-Westphal nucleus: Vergence movements
Where the vision reception occurs – The retina
Before being transduced, incoming EM passes through the cornea, lens and the macula. These structures also act as filters to reduce unwanted EM, thereby protecting the eye from harmful radiation. The filtering response of each of these elements can be seen in the figure "Filtering of the light performed by cornea, lens and pigment epithelium". As one may observe, the cornea attenuates the lower wavelengths, leaving the higher wavelengths nearly untouched. The lens blocks around 25% of the EM below 400nm and more than 50% below 430nm. Finally, the pigment ephithelium, the last stage of filtering before the photo-reception, affects around 30% of the EM between 430nm and 500nm.
A part of the eye, which marks the transition from non-photosensitive region to photosensitive region, is called the ora serrata. The photosensitive region is referred to as the retina, which is the sensory structure in the back of the eye. The retina consists of multiple layers presented below with millions of photoreceptors called rods and cones, which capture the light rays and convert them into electrical impulses. Transmission of these impulses is nervously initiaed by the ganglion cells and conducted through the optic nerve, the single route by which information leaves the eye.
A conceptual illustration of the structure of the retina is shown on the right. As we can see, there are five main cell types:
- photoreceptor cells
- horizontal cells
- bipolar cells
- amecrine cells
- ganglion cells
Photoreceptor cells can be further subdivided into two main types called rods and cones. Cones are much less numerous than rods in most parts of the retina, but there is an enormous aggregation of them in the macula, especially in its central part called the fovea. In this central region, each photo-sensitive cone is connected to one ganglion-cell. In addition, the cones in this region are slightly smaller than the average cone size, meaning you get more cones per area. Because of this ratio, and the high density of cones, this is where we have the highest visual acuity.
There are 3 types of human cones, each of the cones responding to a specific range of wavelengths, because of three types of a pigment called photopsin. Each pigment is sensitive to red, blue or green wavelength of light, so we have blue, green and red cones, also called S-, M- and L-cones for their sensitivity to short-, medium- and long-wavelength respectively. It consists of protein called opsin and a bound chromphore called the retinal. The main building blocks of the cone cell are the synaptic terminal, the inner and outer segments, the interior nucleus and the mitochondria.
The spectral sensitivities of the 3 types of cones:
- 1. S-cones absorb short-wave light, i.e. blue-violet light. The maximum absorption wavelength for the S-cones is 420nm
- 2. M-cones absorb blue-green to yellow light. In this case The maximum absorption wavelength is 535nm
- 3. L-cones absorb yellow to red light. The maximum absorption wavelength is 565nm
The inner segment contains organelles and the cell's nucleus and organelles. The pigment is located in the outer segment, attached to the membrane as trans-membrane proteins within the invaginations of the cell-membrane that form the membranous disks, which are clearly visible in the figure displaying the basic structure of rod and cone cells. The disks maximize the reception area of the cells. The cone photoreceptors of many vertebrates contain spherical organelles called oil droplets, which are thought to constitute intra-ocular filters which may serve to increase contrast, reduce glare and lessen chromatic aberrations caused by the mitochondrial size gradient from the periphery to the centres.
Rods have a structure similar to cones, however they contain the pigment rhodopsin instead, which allows them to detect low-intensity light and makes them 100 times more sensitive than cones. Rhodopsin is the only pigment found in human rods, and it is found on the outer side of the pigment epithelium, which similarly to cones maximizes absorption area by employing a disk structure. Similarly to cones, the synaptic terminal of the cell joins it with a bipolar cell and the inner and outer segments are connected by cilium.
The pigment rhodopsin absorbs the light between 400-600nm, with a maximum absorption at around 500nm. This wavelength corresponds to greenish-blue light which means blue colours appear more intense in relation to red colours at night.
EM waves with wavelengths outside the range of 400 – 700 nm are not detected by either rods nor cones, which ultimately means they are not visible to human beings.
Horizontal cells occupy the inner nuclear layer of the retina. There are two types of horizontal cells and both types hyper-polarise in response to light i.e. they become more negative. Type A consists of a subtype called HII-H2 which interacts with predominantly S-cones. Type B cells have a subtype called HI-H1, which features a dendrite tree and an axon. The former contacts mostly M- and L-cone cells and the latter rod cells. Contacts with cones are made mainly by prohibitory synapses, while the cells themselves are joined into a network with gap junctions.
Bipolar cells spread single dendrites in the outer plexiform layer and the perikaryon, their cell bodies, are found in the inner nuclear layer. Dendrites interconnect exclusively with cones and rods and we differentiate between one rod bipolar cell and nine or ten cone bipolar cells. These cells branch with amacrine or ganglion cells in the inner plexiform layer using an axon. Rod bipolar cells connect to triad synapses or 18-70 rod cells. Their axons spread around the inner plexiform layer synaptic terminals, which contain ribbon synapses and contact a pair of cell processes in dyad synapses. They are connected to ganglion cells with AII amacrine cell links.
Amecrine cells can be found in the inner nuclear layer and in the ganglion cell layer of the retina. Occasionally they are found in the inner plexiform layer, where they work as signal modulators. They have been classified as narrow-field, small-field, medium-field or wide-field depending on their size. However, many classifications exist leading to over 40 different types of amecrine cells.
Ganglion cells are the final transmitters of visual signal from the retina to the brain. The most common ganglion cells in the retina is the midget ganglion cell and the parasol ganglion cell. The signal after having passed through all the retinal layers is passed on to these cells which are the final stage of the retinal processing chain. All the information is collected here forwarded to the retinal nerve fibres and optic nerves. The spot where the ganglion axons fuse to create an optic nerve is called the optic disc. This nerve is built mainly from the retinal ganglion axons and Portort cells. The majority of the axons transmit data to the lateral geniculate nucleus, which is a termination nexus for most parts of the nerve and which forwards the information to the visual cortex. Some ganglion cells also react to light, but because this response is slower than that of rods and cones, it is believed to be related to sensing ambient light levels and adjusting the biological clock.
As mentioned before the retina is the main component in the eye, because it contains all the light sensitive cells. Without it, the eye would be comparable to a digital camera without the CCD (Charge Coupled Device) sensor. This part elaborates on how the retina perceives the light, how the optical signal is transmitted to the brain and how the brain processes the signal to form enough information for decision making.
Creation of the initial signals - Photosensor Function
Vision invariably starts with light hitting the photo-sensitive cells found in the retina. Light-absorbing visual pigments, a variety of enzymes and transmitters in retinal rods and cones will initiate the conversion from visible EM stimuli into electrical impulses, in a process known as photoelectric transduction. Using rods as an example, the incoming visible EM hits rhodopsin molecules, transmembrane molecules found in the rods' outer disk structure. Each rhodopsin molecule consists of a cluster of helices called opsin that envelop and surround 11-cis retinal, which is the part of the molecule that will change due to the energy from the incoming photons. In biological molecules, moieties, or parts of molecules that will cause conformational changes due to this energy is sometimes referred to as chromophores. 11-cis retinal straightens in response to the incoming energy, turning into retinal (all-trans retinal), which forces the opsin helices further apart, causing particular reactive sites to be uncovered. This "activated" rhodopsin molecule is sometimes referred to as Metarhodopsin II. From this point on, even if the visible light stimulation stops, the reaction will continue. The Metarhodopsin II can then react with roughly 100 molecules of a Gs protein called transducing, which then results in as and ß? after the GDP is converted into GTP. The activated as-GTP then binds to cGMP-phosphodiesterase(PDE), suppressing normal ion-exchange functions, which results in a low cytosol concentration of cation ions, and therefore a change in the polarisation of the cell.
The natural photoelectric transduction reaction has an amazing power of amplification. One single retinal rhodopsin molecule activated by a single quantum of light causes the hydrolysis of up to 106 cGMP molecules per second.
- A light photon interacts with the retinal in a photoreceptor. The retinal undergoes isomerisation, changing from the 11-cis to all-trans configuration.
- Retinal no longer fits into the opsin binding site.
- Opsin therefore undergoes a conformational change to metarhodopsin II.
- Metarhodopsin II is unstable and splits, yielding opsin and all-trans retinal.
- The opsin activates the regulatory protein transducin. This causes transducin to dissociate from its bound GDP, and bind GTP, then the alpha subunit of transducin dissociates from the beta and gamma subunits, with the GTP still bound to the alpha subunit.
- The alpha subunit-GTP complex activates phosphodiesterase.
- Phosphodiesterase breaks down cGMP to 5'-GMP. This lowers the concentration of cGMP and therefore the sodium channels close.
- Closure of the sodium channels causes hyperpolarization of the cell due to the ongoing potassium current.
- Hyperpolarization of the cell causes voltage-gated calcium channels to close.
- As the calcium level in the photoreceptor cell drops, the amount of the neurotransmitter glutamate that is released by the cell also drops. This is because calcium is required for the glutamate-containing vesicles to fuse with cell membrane and release their contents.
- A decrease in the amount of glutamate released by the photoreceptors causes depolarization of On center bipolar cells (rod and cone On bipolar cells) and hyperpolarization of cone Off bipolar cells.
Without visible EM stimulation, rod cells containing a cocktail of ions, proteins and other molecules, have membrane potential differences of around -40mV. Compared to other nerve cells, this is quite high (-65mV). In this state, the neurotransmitter glutamate is continuously released from the axon terminals and absorbed by the neighbouring bipolar cells. With incoming visble EM and the previously mentioned cascade reaction, the potential difference drops to -70mV. This hyper-polarisation of the cell causes a reduction in the amount of released glutamate, thereby affecting the activity of the bipolar cells, and subsequently the following steps in the visual pathway.
Similar processes exist in the cone-cells and in photosensitive ganglion cells, but make use of different opsins. Photopsin I through III (yellowish-green, green and blue-violet respectively) are found in the three different cone cells and melanopsin (blue) can be found in the photosensitive ganglion cells.
Processing Signals in the Retina
Different bipolar cells react differently to the changes in the released glutamate. The so called ON and OFF bipolar cells are used to form the direct signal flow from cones to bipolar cells. The ON bipolar cells will depolarise by visible EM stimulation and the corresponding ON ganglion cells will be activated. On the other hand the OFF bipolar cells are hyper polarised by the visible EM stimulation, and the OFF ganglion cells are inhibited. This is the basic pathway of the Direct signal flow. The Lateral signal flow will start from the rods, then go to the bipolar cells, the amacrine cells, and the OFF bipolar cells inhibited by the Rod-amacrine cells and the ON bipolar cells will stimulated via an electrical synapse, after all of the previous steps, the signal will arrive at the ON or OFF ganglion cells and the whole pathway of the Lateral signal flow is established.
When the action potential (AP) in ON, ganglion cells will be triggered by the visible EM stimulus. The AP frequency will increase when the sensor potential increases. In other words, AP depends on the amplitude of the sensor's potential. The region of ganglion cells where the stimulatory and inhibitory effects influence the AP frequency is called receptive field (RF). Around the ganglion cells, the RF is usually composed of two regions: the central zone and the ring-like peripheral zone. They are distinguishable during visible EM adaptation. A visible EM stimulation on the centric zone could lead to AP frequency increase and the stimulation on the periphery zone will decrease the AP frequency. When the light source is turned off the excitation occurs. So the name of ON field (central field ON) refers to this kind of region. Of course the RF of the OFF ganglion cells act the opposite way and is therefore called "OFF field" (central field OFF). The RFs are organised by the horizontal cells. The impulse on the periphery region will be impulsed and transmitted to the central region, and there the so-called stimulus contrast is formed. This function will make the dark seem darker and the light brighter. If the whole RF is exposed to light. the impulse of the central region will predominate.
Signal Transmission to the Cortex
As mentioned previously, axons of the ganglion cells converge at the optic disk of the retina, forming the optic nerve. These fibres are positioned inside the bundle in a specific order. Fibres from the macular zone of the retina are in the central portion, and those from the temporal half of the retina take up the periphery part. A partial decussation or crossing occurs when these fibres are outside the eye cavity. The fibres from the nasal halves of each retina cross to the opposite halves and extend to the brain. Those from the temporal halves remain uncrossed. This partial crossover is called the optic chiasma, and the optic nerves past this point are called optic tracts, mainly to distinguish them from single-retinal nerves. The function of the partial crossover is to transmit the right-hand visual field produced by both eyes to the left-hand half of the brain only and vice versa. Therefore the information from the right half of the body, and the right visual field, is all transmitted to the left-hand part of the brain when reaches the posterior part of the fore-brain (diencephalon).
The information relay between the fibers of optic tracts and the nerve cells occurs in the lateral geniculate bodies, the central part of the visual signal processing, located in the thalamus of the brain. From here the information is passed to the nerve cells in the occipital cortex of the corresponding side of the brain. Connections from the retina to the brain can be separated into a 'parvocellular pathway' and a "magnocellular pathway". The parvocellular pathways signals color and fine detail, whereas the magnocellular pathways detect fast moving stimuli.
Signals from standard digital cameras correspond approximately to those of the parvocellular pathway. To simulate the responses of parvocellular pathways, researchers have been developing neuromorphic sensory systems, which try to mimic spike-based computation in neural systems. Thereby they use a scheme called "address-event representation" for the signal transmission in the neuromorphic electronic systems (Liu and Delbruck 2010 ).
Anatomically, the retinal Magno and Parvo ganglion cells respectively project to 2 ventral magnocellular layers and 4 dorsal parvocellular layers of the Lateral Geniculate Nucleus (LGN). Each of the six LGN layers receives inputs from either the ipsilateral or contralateral eye, i.e., the ganglion cells of the left eye cross over and project to layer 1, 4 and 6 of the right LGN, and the right eye ganglion cells project (uncrossed) to its layer 2, 3 and 5. From here the information from the right and left eye is separated.
Although human vision is combined by two halves of the retina and the signal is processed by the opposite cerebral hemispheres, the visual field is considered as a smooth and complete unit. Hence the two visual cortical areas are thought of as being intimately connected. This connection, called corpus callosum is made of neurons, axons and dendrites. Because the dendrites make synaptic connections to the related points of the hemispheres, electric simulation of every point on one hemisphere indicates simulation of the interconnected point on the other hemisphere. The only exception to this rule is the primary visual cortex.
The synapses are made by the optic tract in the respective layers of the lateral geniculate body. Then these axons of these third-order nerve cells are passed up to the calcarine fissure in each occipital lobe of the cerebral cortex. Because bands of the white fibres and axons pair from the nerve cells in the retina go through it, it is called the striate cortex, which incidentally is our primary visual cortex, sometimes known as V1. At this point, impulses from the separate eyes converge to common cortical neurons, which then enables complete input from both eyes in one region to be used for perception and comprehension. Pattern recognition is a very important function of this particular part of the brain, with lesions causing problems with visual recognition or blindsight.
Based on the ordered manner in which the optic tract fibres pass information to the lateral geniculate bodies and after that pass in to the striate area, if one single point stimulation on the retina was found, the response which produced electrically in both lateral geniculate body and the striate cortex will be found at a small region on the particular retinal spot. This is an obvious point-to-point way of signal processing. And if the whole retina is stimulated, the responses will occur on both lateral geniculate bodies and the striate cortex gray matter area. It is possible to map this brain region to the retinal fields, or more usually the visual fields.
Any further steps in this pathway is beyond the scope of this book. Rest assured that, many further levels and centres exist, focusing on particular specific tasks, like for example colour, orientations, spatial frequencies, emotions etc.
Cortical Processing - Visual Perception
Equipped with a firmer understanding of some of the more important concepts of the signal processing in the visual system, comprehension or perception of the processed sensory information is the last important piece in the puzzle. Visual perception is the process of translating information received by the eyes into an understanding of the external state of things. It makes us aware of the world around us and allows us to understand it better. Based on visual perception we learn patterns which we then apply later in life and we make decisions based on this and the obtained information. In other words, our survival depends on perception.
The field of Visual Perception has been divided into different subfields, due to the fact that processing is too complex and requires of different specialized mechanisms to perceive what is seen. These subfields include: Color Perception, Motion Perception, Depth Perception, and Face Recognition, etc. In the following we will describe important aspects of Motion Perception in primates.
Motion Perception is the process of inferring speed and direction of moving objects. Area V5 in humans and area MT (Middle Temporal) in primates are responsible for cortical perception of Motion. Area V5 is part of the extrastriate cortex, which is the region in the occipital region of the brain next to the primary visual cortex. The function of Area V5 is to detect speed and direction of visual stimuli, and integrate local visual motion signals into global motion. Area V1 or Primary Visual cortex is located in the occipital lobe of the brain in both hemispheres. It processes the first stage of cortical processing of visual information. This area contains a complete map of the visual field covered by the eyes. The difference between area V5 and area V1 (Primary Visual Cortex) is that area V5 can integrate motion of local signals or individual parts of an object into a global motion of an entire object. Area V1, on the other hand, responds to local motion that occurs within the receptive field. The estimates from these many neurons are integrated in Area V5.
Movement is defined as changes in retinal illumination over space and time. Motion signals are classified into First order motions and Second order motions. These motion types are briefly described in the following paragraphs.
First-order motion perception refers to the motion perceived when two or more visual stimuli switch on and off over time and produce different motion perceptions. First order motion is also termed "apparent motion,” and it is used in television and film. An example of this is the "Beta movement", which is an illusion in which fixed images seem to move, even though they do not move in reality. These images give the appearance of motion, because they change and move faster than what the eye can detect. This optical illusion happens because the human optic nerve responds to changes of light at ten cycles per second, so any change faster than this rate will be registered as a continuum motion, and not as separate images.
Second order motion refers to the motion that occurs when a moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in luminance or motion energy of the image. Evidence suggests that early processing of First order motion and Second order motion is carried out by separate pathways. Second order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies to which they respond. Second-order motion produces a weaker motion aftereffect. First and second-order signals are combined in are V5.
In this chapter, we will analyze the concepts of Motion Perception and Motion Analysis, and explain the reason why these terms should not be used interchangeably. We will analyze the mechanisms by which motion is perceived such as Motion Sensors and Feature Tracking. There exist three main theoretical models that attempt to describe the function of neuronal sensors of motion. Experimental tests have been conducted to confirm whether these models are accurate. Unfortunately, the results of these tests are inconclusive, and it can be said that no single one of these models describes the functioning of Motion Sensors entirely. However, each of these models simulates certain features of Motion Sensors. Some properties of these sensors are described. Finally, this chapter shows some motion illusions, which demonstrate that our sense of motion can be mislead by static external factors that stimulate motion sensors in the same way as motion.
Motion Analysis and Motion Perception
The concepts of Motion Analysis and Motion Perception are often confused as interchangeable. Motion Perception and Motion Analysis are important to each other, but they are not the same.
Motion Analysis refers to the mechanisms in which motion signals are processed. In a similar way in which Motion Perception does not necessarily depend on signals generated by motion of images in the retina, Motion Analysis may or may not lead to motion perception. An example of this phenomenon is Vection, which occurs when a person perceives that she is moving when she is stationary, but the object that she observes is moving. Vection shows that motion of an object can be analyzed, even though it is not perceived as motion coming from the object. This definition of Motion analysis suggests that motion is a fundamental image property. In the visual field, it is analyzed at every point. The results from this analysis are used to derive perceptual information.
Motion Perception refers to the process of acquiring perceptual knowledge about motion of objects and surfaces in an image. Motion is perceived either by delicate local sensors in the retina or by feature tracking. Local motion sensors are specialized neurons sensitive to motion, and analogous to specialized sensors for color. Feature tracking is an indirect way to perceive motion, and it consists of inferring motion from changes in retinal position of objects over time. It is also referred to as third order motion analysis. Feature tracking works by focusing attention to a particular object and observing how its position has changed over time.
Detection of motion is the first stage of visual processing, and it happens thanks to specialized neural processes, which respond to information regarding local changes of intensity of images over time. Motion is sensed independently of other image properties at all locations in the image. It has been proven that motion sensors exist, and they operate locally at all points in the image. Motion sensors are dedicated neuronal sensors located in the retina that are capable of detecting a motion produced by two brief and small light flashes that are so close together that they could not be detected by feature tracking. There exist three main models that attempt to describe the way that these specialized sensors work. These models are independent of one another, and they try to model specific characteristics of Motion Perception. Although there is not sufficient evidence to support that any of these models represent the way the visual system (motion sensors particularly) perceives motion, they still correctly model certain functions of these sensors.
The Reichardt Detector
The Reichardt Detector is used to model how motion sensors respond to First order motion signals. When an objects moves from point A in the visual field to point B, two signals are generated: one before the movement began and another one after the movement has completed. This model perceives this motion by detecting changes in luminance at one point on the retina and correlating it with a change in luminance at another point nearby after a short delay. The Reichardt Detector operates based on the principle of correlation (statistical relation that involves dependency). It interprets a motion signal by spatiotemporal correlation of luminance signals at neighboring points. It uses the fact that two receptive fields at different points on the trajectory of a moving object receive a time shifted version of the same signal – a luminance pattern moves along an axis and the signal at one point in the axis is a time shifted version of a previous signal in the axis. The Reichardt Detector model has two spatially separate neighboring detectors. The output signals of the detectors are multiplied (correlated) in the following way: a signal multiplied by a second signal that is the time-shifted version of the original. The same procedure is repeated but in the reverse direction of motion (the signal that was time-shifted becomes the first signal and vice versa). Then, the difference between these two multiplications is taken, and the outcome gives the speed of motion. The response of the detector depends upon the stimulus’ phase, contrast and speed. Many detectors tuned at different speeds are necessary to encode the true speed of the pattern. The most compelling experimental evidence for this kind of detector comes from studies of direction discrimination of barely visible targets.
Motion Energy Filter is a model of Motion Sensors based on the principle of phase invariant filters. This model builds spatio-temporal filters oriented in space-time to match the structure of moving patterns. It consists of separable filters, for which spatial profiles remain the same shape over time but are scaled by the value of the temporal filters. Motion Energy Filters match the structure of moving patterns by adding together separable filters. For each direction of motion, two space-time filters are generated: one, which is symmetric (bar-like), and one which is asymmetric (edge-like). The sum of the squares of these filters is called the motion energy. The difference in the signal for the two directions is called the opponent energy. This result is then divided by the squared output of another filter, which is tuned to static contrast. This division is performed to take into account the effect of contrast in the motion. Motion Energy Filters can model a number of motion phenomenon, but it produces a phase independent measurement, which increases with speed but does not give a reliable value of speed.
This model of Motion sensors was originally developed in the field of computer vision, and it is based on the principle that the ratio of the temporal derivative of image brightness to the spatial derivative of image brightness gives the speed of motion. It is important to note that at the peaks and troughs of the image, this model will not compute an adequate answer, because the derivative in the denominator would be zero. In order to solve this problem, the first-order and higher-order spatial derivatives with respect to space and time can also be analyzed. Spatiotemporal Gradients is a good model for determining the speed of motion at all points in the image.
Motion Sensors are Orientation-Selective
One of the properties of Motion Sensors is orientation-selectivity, which constrains motion analysis to a single dimension. Motion sensors can only record motion in one dimension along an axis orthogonal to the sensor’s preferred orientation. A stimulus that contains features of a single orientation can only be seen to move in a direction orthogonal to the stimulus’ orientation. One-dimensional motion signals give ambiguous information about the motion of two-dimensional objects. A second stage of motion analysis is necessary in order to resolve the true direction of motion of a 2-D object or pattern. 1-D motion signals from sensors tuned to different orientations are combined to produce an unambiguous 2-D motion signal. Analysis of 2-D motion depends on signals from local broadly oriented sensors as well as on signals from narrowly oriented sensors.
Another way in which we perceive motion is through Feature Tracking. Feature Tracking consists of analyzing whether or not the local features of an object have changed positions, and inferring movement from this change. In this section, some features about Feature trackers are mentioned.
Feature trackers fail when a moving stimulus occurs very rapidly. Feature trackers have the advantage over Motion sensors that they can perceive movement of an object even if the movement is separated by intermittent blank intervals. They can also separate these two stages (movements and blank intervals). Motion sensors, on the other hand, would just integrate the blanks with the moving stimulus and see a continuous movement. Feature trackers operate on the locations of identified features. For that reason, they have a minimum distance threshold that matches the precision with which locations of features can be discriminated. Feature trackers do not show motion aftereffects, which are visual illusions that are caused as a result of visual adaptation. Motion aftereffects occur when, after observing a moving stimulus, a stationary object appears to be moving in the opposite direction of the previously observed moving stimulus. It is impossible for this mechanism to monitor multiple motions in different parts of the visual field and at the same time. On the other hand, multiple motions are not a problem for motion sensors, because they operate in parallel across the entire visual field.
Experiments have been conducted using the information above to reach interesting conclusions about feature trackers. Experiments with brief stimuli have shown that color patterns and contrast patterns at high contrasts are not perceived by feature trackers but by motion sensors. Experiments with blank intervals have confirmed that feature tracking can occur with blank intervals in the display. It is only at high contrast that motion sensors perceive the motion of chromatic stimuli and contrast patterns. At low contrasts feature trackers analyze the motion of both chromatic patterns and contrast envelopes and at high contrasts motion sensors analyze contrast envelopes. Experiments in which subjects make multiple motion judgments suggest that feature tracking is a process that occurs under conscious control and that it is the only way we have to analyze the motion of contrast envelopes in low-contrast displays. These results are consistent with the view that the motion of contrast envelopes and color patterns depends on feature tracking except when colors are well above threshold or mean contrast is high. The main conclusion of these experiments is that it is probably feature tracking that allows perception of contrast envelopes and color patterns.
As a consequence of the process in which Motion detection works, some static images might seem to us like they are moving. These images give an insight into the assumptions that the visual system makes, and are called visual illusions.
A famous Motion Illusion related to first order motion signals is the Phi phenomenon, which is an optical illusion that makes us perceive movement instead of a sequence of images. This motion illusion allows us to watch movies as a continuum and not as separate images. The phi phenomenon allows a group of frozen images that are changed at a constant speed to be seen as a constant movement. The Phi phenomenon should not be confused with the Beta Movement, because the former is an apparent movement caused by luminous impulses in a sequence, while the later one is an apparent movement caused by luminous stationary impulses.
Motion Illusions happen when Motion Perception, Motion Analysis and the interpretation of these signals are misleading, and our visual system creates illusions about motion. These illusions can be classified according to which process allows them to happen. Illusions are classified as illusions related to motion sensing, 2D integration, and 3D interpretation
The most popular illusions concerning motion sensing are four-stroke motion, RDKs and second order motion signals illusions. The most popular motion illusions concerning 2D integration are Motion Capture, Plaid Motion and Direct Repulsion. Similarly, the ones concerning 3D interpretation are Transformational Motion, Kinetic Depth, Shadow Motion, Biological Motion, Stereokinetic motion, Implicit Figure Motion and 2 Stroke Motion. There are far more Motion Illusions, and they all show something interesting regarding human Motion Detection, Perception and Analysis mechanisms. For more information, visit the following link: http://www.lifesci.sussex.ac.uk/home/George_Mather/Motion/
Although we still do not understand most of the specifics regarding Motion Perception, understanding the mechanisms by which motion is perceived as well as motion illusion can give the reader a good overview of the state of the art in the subject. Some of the open problems regarding Motion Perception are the mechanisms of formation of 3D images in global motion and the Aperture Problem.
Global motion signals from the retina are integrated to arrive at a 2 dimensional global motion signal; however, it is unclear how 3D global motion is formed. The Aperture Problem occurs because each receptive field in the visual system covers only a small piece of the visual world, which leads to ambiguities in perception. The aperture problem refers to the problem of a moving contour that, when observed locally, is consistent with different possibilities of motion. This ambiguity is geometric in origin - motion parallel to the contour cannot be detected, as changes to this component of the motion do not change the images observed through the aperture. The only component that can be measured is the velocity orthogonal to the contour orientation; for that reason, the velocity of the movement could be anything from the family of motions along a line in velocity space. This aperture problem is not only observed in straight contours, but also in smoothly curved ones, since they are approximately straight when observed locally. Although the mechanisms to solve the Aperture Problem are still unknown, there exist some hypothesis on how it could be solved. For example, it could be possible to resolve this problem by combining information across space or from different contours of the same object.
In this chapter, we introduced Motion Perception and the mechanisms by which our visual system detects motion. Motion Illusions showed how Motion signals can be misleading, and consequently lead to incorrect conclusions about motion. It is important to remember that Motion Perception and Motion Analysis are not the same. Motion Sensors and Feature trackers complement each other to make the visual system perceive motion.
Motion Perception is complex, and it is still an open area of research. This chapter describes models about the way that Motion Sensors function, and hypotheses about Feature trackers characteristics; however, more experiments are necessary to learn about the characteristics of these mechanisms and be able to construct models that resemble the actual processes of the visual system more accurately.
The variety of mechanisms of motion analysis and motion perception described in this chapter, as well as the sophistication of the artificial models designed to describe them demonstrate that there is much complexity in the way in which the cortex processes signals from the outside environment. Thousands of specialized neurons integrate and interpret pieces of local signals to form global images of moving objects in our brain. Understanding that so many actors and processes in our bodies must work in concert to perceive motion makes our ability to it all the more remarkable that we as humans are able to do it with such ease.
Humans (together with primates like monkeys and gorillas) have the best color perception among mammals  . Hence, it is not a coincidence that color plays an important role in a wide variety of aspects. For example, color is useful for discriminating and differentiating objects, surfaces, natural scenery, and even faces ,. Color is also an important tool for nonverbal communication, including that of emotion .
For many decades, it has been a challenge to find the links between the physical properties of color and its perceptual qualities. Usually, these are studied under two different approaches: the behavioral response caused by color (also called psychophysics) and the actual physiological response caused by it .
Here we will only focus on the latter. The study of the physiological basis of color vision, about which practically nothing was known before the second half of the twentieth century, has advanced slowly and steadily since 1950. Important progress has been made in many areas, especially at the receptor level. Thanks to molecular biology methods, it has been possible to reveal previously unknown details concerning the genetic basis for the cone pigments. Furthermore, more and more cortical regions have been shown to be influenced by visual stimuli, although the correlation of color perception with wavelength-dependent physiology activity beyond the receptors is not so easy to discern .
In this chapter, we aim to explain the basics of the different processes of color perception along the visual path, from the retina in the eye to the visual cortex in the brain. For anatomical details, please refer to Sec. "Anatomy of the Visual System" of this Wikibook.
Color Perception at the Retina
All colors that can be discriminated by humans can be produced by the mixture of just three primary (basic) colors. Inspired by this idea of color mixing, it has been proposed that color is subserved by three classes of sensors, each having a maximal sensitivity to a different part of the visible spectrum . It was first explicitly proposed in 1853 that there are three degrees of freedom in normal color matching . This was later confirmed in 1886  (with remarkably close results to recent studies , ).
These proposed color sensors are actually the so called cones (Note: In this chapter, we will only deal with cones. Rods contribute to vision only at low light levels. Although they are known to have an effect on color perception, their influence is very small and can be ignored here.) . Cones are of the two types of photoreceptor cells found in the retina, with a significant concentration of them in the fovea. The Table below lists the three types of cone cells. These are distinguished by different types of rhodopsin pigment. Their corresponding absorption curves are shown in the Figure below.
|Name||Higher sensitivity to color||Absorption curve peak [nm]|
|S, SWS, B||Blue||420|
|M, MWS, G||Green||530|
|L, LWS, R||Red||560|
Although no consensus has been reached for naming the different cone types, the most widely utilized designations refer either to their action spectra peak or to the color to which they are sensitive themselves (red, green, blue). In this text, we will use the S-M-L designation (for short, medium, and long wavelength), since these names are more appropriately descriptive. The blue-green-red nomenclature is somewhat misleading, since all types of cones are sensitive to a large range of wavelengths.
An important feature about the three cone types is their relative distribution in the retina. It turns out that the S-cones present a relatively low concentration through the retina, being completely absent in the most central area of the fovea. Actually, they are too widely spaced to play an important role in spatial vision, although they are capable of mediating weak border perception . The fovea is dominated by L- and M-cones. The proportion of the two latter is usually measured as a ratio. Different values have been reported for the L/M ratio, ranging from 0.67  up to 2 , the latter being the most accepted. Why L-cones almost always outnumber the M-cones remains unclear. Surprisingly, the relative cone ratio has almost no significant impact on color vision. This clearly shows that the brain is plastic, capable of making sense out of whatever cone signals it receives , .
It is also important to note the overlapping of the L- and M-cone absorption spectra. While the S-cone absorption spectrum is clearly separated, the L- and M-cone peaks are only about 30 nm apart, their spectral curves significantly overlapping as well. This results in a high correlation in the photon catches of these two cone classes. This is explained by the fact that in order to achieve the highest possible acuity at the center of the fovea, the visual system treats L- and M-cones equally, not taking into account their absorption spectra. Therefore, any kind of difference leads to a deterioration of the luminance signal . In other words, the small separation between L- and M-cone spectra might be interpreted as a compromise between the needs for high-contrast color vision and high acuity luminance vision. This is congruent with the lack of S-cones in the central part of the fovea, where visual acuity is highest. Furthermore, the close spacing of L- and M-cone absorption spectra might also be explained by their genetic origin. Both cone types are assumed to have evolved "recently" (about 35 million years ago) from a common ancestor, while the S-cones presumably split off from the ancestral receptor much earlier.
The spectral absorption functions of the three different types of cone cells are the hallmark of human color vision. This theory solved a long-known problem: although we can see millions of different colors (humans can distinguish between 7 to 10 million different colors, our retinas simply do not have enough space to accommodate an individual detector for every color at every retinal location.
From the Retina to the Brain
The signals that are transmitted from the retina to higher levels are not simple point-wise representations of the receptor signals, but rather consist of sophisticated combinations of the receptor signals. The objective of this section is to provide a brief of the paths that some of this information takes.
Once the optical image on the retina is transduced into chemical and electrical signals in the photoreceptors, the amplitude-modulated signals are converted into frequency-modulated representations at the ganglion-cell and higher levels. In these neural cells, the magnitude of the signal is represented in terms of the number of spikes of voltage per second fired by the cell rather than by the voltage difference across the cell membrane. In order to explain and represent the physiological properties of these cells, we will find the concept of receptive fields very useful.
A receptive field is a graphical representation of the area in the visual field to which a given cell responds. Additionally, the nature of the response is typically indicated for various regions in the receptive field. For example, we can consider the receptive field of a photoreceptor as a small circular area representing the size and location of that particular receptor's sensitivity in the visual field. The Figure below shows exemplary receptive fields for ganglion cells, typically in a center-surround antagonism. The left receptive field in the figure illustrates a positive central response (know as on-center). This kind of response is usually generated by a positive input from a single cone surrounded by a negative response generated from several neighboring cones. Therefore, the response of this ganglion cell would be made up of inputs from various cones with both positive and negative signs. In this way, the cell not only responds to points of light, but serves as an edge (or more correctly, a spot) detector. In analogy to the computer vision terminology, we can think of the ganglion cell responses as the output of a convolution with an edge-detector kernel. The right receptive field of in the figure illustrates a negative central response (know as off-center), which is equally likely. Usually, on-center and off-center cells will occur at the same spatial location, fed by the same photoreceptors, resulting in an enhanced dynamic range.
The lower Figure shows that in addition to spatial antagonism, ganglion cells can also have spectral opponency. For instance, the left part of the lower figure illustrates a red-green opponent response with the center fed by positive input from an L-cone and the surrounding fed by a negative input from M-cones. On the other hand, the right part of the lower figure illustrates the off-center version of this cell. Hence, before the visual information has even left the retina, processing has already occurred, with a profound effect on color appearance. There are other types and varieties of ganglion cell responses, but they all share these basic concepts.
On their way to the primary visual cortex, ganglion cell axons gather to form the optic nerve, which projects to the lateral geniculate nucleus (LGN) in the thalamus. Coding in the optic nerve is highly efficient, keeping the number of nerve fibers to a minimum (limited by the size of the optic nerve) and thereby also the size of the retinal blind spot as small as possible (approximately 5° wide by 7° high). Furthermore, the presented ganglion cells would have no response to uniform illumination, since the positive and negative areas are balanced. In other words, the transmitted signals are uncorrelated. For example, information from neighboring parts of natural scenes are highly correlated spatially and therefore highly predictable . Lateral inhibition between neighboring retinal ganglion cells minimizes this spatial correlation, therefore improving efficiency. We can see this as a process of image compression carried out in the retina.
Given the overlapping of the L- and M-cone absorption spectra, their signals are also highly correlated. In this case, coding efficiency is improved by combining the cone signals in order to minimize said correlation. We can understand this more easily using Principal Component Analysis (PCA). PCA is a statistical method used to reduce the dimensionality of a given set of variables by transforming the original variables, to a set of new variables, the principal components (PCs). The first PC accounts for a maximal amount of total variance in the original variables, the second PC accounts for a maximal amount of variance that was not accounted for by the first component, and so on. In addition, PCs are linearly-independent and orthogonal to each other in the parameter space. PCA's main advantage is that only a few of the strongest PCs are enough to cover the vast majority of system variability . This scheme has been used with the cone absorption functions  and even with the naturally occurring spectra,. The PCs that were found in the space of cone excitations produced by natural objects are 1) a luminance axis where the L- and M-cone signals are added (L+M), 2) the difference of the L- and M-cone signals (L-M), and 3) a color axis where the S-cone signal is differenced with the sum of the L- and M-cone signals (S-(L+M)). These channels, derived from a mathematical/computational approach, coincide with the three retino-geniculate channels discovered in electrophysiological experiments ,. Using these mechanisms, visual redundant information is eliminated in the retina.
There are three channels of information that actually communicate this information from the retina through the ganglion cells to the LGN. They are different not only on their chromatic properties, but also in their anatomical substrate. These channels pose important limitations for basic color tasks, such as detection and discrimination.
In the first channel, the output of L- and M-cones is transmitted synergistically to diffuse bipolar cells and then to cells in the magnocellular layers (M-) of the LGN (not to be confused with the M-cones of the retina). The receptive fields of the M-cells are composed of a center and a surround, which are spatially antagonist. M-cells have high-contrast sensitivity for luminance stimuli, but they show no response at some combination of L-M opponent inputs. However, because the null points of different M-cells vary slightly, the population response is never really zero. This property is actually passed on to cortical areas with predominant M-cell inputs.
The parvocellular pathway (P-) originates with the individual outputs from L- or M-cone to midget bipolar cells. These provide input to retinal P-cells. In the fovea, the receptive field centers of P-cells are formed by single L- or M-cones. The structure of the P-cell receptive field surround is still debated. However, the most accepted theory states that the surround consists of a specific cone type, resulting in a spatially opponent receptive field for luminance stimuli. Parvocellular layers contribute with about 80 % of the total projections from the retina to the LGN.
Finally, the recently discovered koniocellular pathway (K-) carries mostly signals from S-cones. Groups of this type of cones project to special bipolar cells, which in turn provide input to specific small ganglion cells. These are usually not spatially opponent. The axons of the small ganglion cells project to thin layers of the LGN (adjacent to parvocellular layers).
While the ganglion cells do terminate at the LGN (making synapses with LGN cells), there appears to be a one-to-one correspondence between ganglion cells and LGN cells. The LGN appears to act as a relay station for the signals. However, it probably serves some visual function, since there are neural projections from the cortex back to the LGN that could serve as some type of switching or adaptation feedback mechanism. The axons of LGN cells project to visual area one (V1) in the visual cortex in the occipital lobe.
Color Perception at the Brain
In the cortex, the projections from the magno-, parvo-, and koniocellular pathways end in different layers of the primary visual cortex. The magnocellular fibers innervate principally layer 4Cα and layer 6. Parvocellular neurons project mostly to 4Cβ, and layers 4A and 6. Koniocellular neurons terminate in the cytochrome oxidase (CO-) rich blobs in layers 1, 2, and 3.
Once in the visual cortex, the encoding of visual information becomes significantly more complex. In the same way the outputs of various photoreceptors are combined and compared to produce ganglion cell responses, the outputs of various LGN cells are compared and combined to produce cortical responses. As the signals advance further up in the cortical processing chain, this process repeats itself with a rapidly increasing level of complexity to the point that receptive fields begin to lose meaning. However, some functions and processes have been identified and studied in specific regions of the visual cortex.
In the V1 region (striate cortex), double opponent neurons - neurons that have their receptive fields both chromatically and spatially opposite with respect to the on/off regions of a single receptive field - compare color signals across the visual space . They constitute between 5 to 10% of the cells in V1. Their coarse size and small percentage matches the poor spatial resolution of color vision . Furthermore, they are not sensitive to the direction of moving stimuli (unlike some other V1 neurons) and, hence, unlikely to contribute to motion perception. However, given their specialized receptive field structure, these kind of cells are the neural basis for color contrast effects, as well as an efficient mean to encode color itself,. Other V1 cells respond to other types of stimuli, such as oriented edges, various spatial and temporal frequencies, particular spatial locations, and combinations of these features, among others. Additionally, we can find cells that linearly combine inputs from LGN cells as well as cells that perform nonlinear combination. These responses are needed to support advanced visual capabilities, such as color itself.
There is substantially less information on the chromatic properties of single neurons in V2 as compared to V1. On a first glance, it seems that there are no major differences of color coding in V1 and V2. One exception to this is the emergence of a new class of color-complex cell. Therefore, it has been suggested that V2 region is involved in the elaboration of hue. However, this is still very controversial and has not been confirmed.
Following the modular concept developed after the discovery of functional ocular dominance in V1, and considering the anatomical segregation between the P-, M-, and K-pathways (described in Sec. 3), it was suggested that a specialized system within the visual cortex devoted to the analysis of color information should exist. V4 is the region that has historically attracted the most attention as the possible "color area" of the brain. This is because of an influential study that claimed that V4 contained 100 % of hue-selective cells. However, this claim has been disputed by a number of subsequent studies, some even reporting that only 16 % of V4 neurons show hue tuning. Currently, the most accepted concept is that V4 contributes not only to color, but to shape perception, visual attention, and stereopsis as well. Furthermore, recent studies have focused on other brain regions trying to find the "color area" of the brain, such as TEO and PITd. The relationship of these regions to each other is still debated. To reconcile the discussion, some use the term posterior inferior temporal (PIT) cortex to denote the region that includes V4, TEO, and PITd.
If the cortical response in V1, V2, and V4 cells is already a very complicated task, the level of complexity of complex visual responses in a network of approximately 30 visual zones is humongous. Figure 4 shows a small portion of the connectivity of the different cortical areas (not cells) that have been identified.
At this stage, it becomes exceedingly difficult to explain the function of singles cortical cells in simple terms. As a matter of fact, the function of a single cell might not have meaning since the representation of various perceptions must be distributed across collections of cells throughout the cortex.
Color Vision Adaptation Mechanisms
Although researchers have been trying to explain the processing of color signals in the human visual system, it is important to understand that color perception is not a fixed process. Actually, there are a variety of dynamic mechanisms that serve to optimize the visual response according to the viewing environment. Of particular relevance to color perception are the mechanisms of dark, light, and chromatic adaptation.
Dark adaptation refers to the change in visual sensitivity that occurs when the level of illumination is decreased. The visual system response to reduced illumination is to become more sensitive, increasing its capacity to produce a meaningful visual response even when the light conditions are suboptimal.
Figure 5 shows the recovery of visual sensitivity after transition from an extremely high illumination level to complete darkness. First, the cones become gradually more sensitive, until the curve levels off after a couple of minutes. Then, after approximately 10 minutes have passed, visual sensitivity is roughly constant. At that point, the rod system, with a longer recovery time, has recovered enough sensitivity to outperform the cones and therefore recover control the overall sensitivity. Rod sensitivity gradually improves as well, until it becomes asymptotic after about 30 minutes. In other words, cones are responsible for the sensitivity recovery for the first 10 minutes. Afterwards, rods outperform the cones and gain full sensitivity after approximately 30 minutes.
This is only one of several neural mechanisms produced in order to adapt to the dark lightning conditions as good as possible. Some other neural mechanisms include the well-known pupil reflex, depletion and regeneration of photopigment, gain control in retinal cells and other higher-level mechanisms, and cognitive interpretation, among others.
Light adaptation is essentially the inverse process of dark adaptation. As a matter of fact, the underlying physiological mechanisms are the same for both processes. However, it is important to consider it separately since its visual properties differ.
Light adaptation occurs when the level of illumination is increased. Therefore, the visual system must become less sensitive in order to produce useful perceptions, given the fact that there is significantly more visible light available. The visual system has a limited output dynamic range available for the signals that produce our perceptions. However, the real world has illumination levels covering at least 10 orders of magnitude more. Fortunately, we rarely need to view the entire range of illumination levels at the same time.
At high light levels, adaptation is achieved by photopigment bleaching. This scales photon capture in the receptors and protects the cone response from saturating at bright backgrounds. The mechanisms of light adaptation occur primarily within the retina. As a matter of fact, gain changes are largely cone-specific and adaptation pools signals over areas no larger than the diameter of individual cones,. This points to a localization of light adaptation that may be as early as the receptors. However, there appears to be more than one site of sensitivity scaling. Some of the gain changes are extremely rapid, while others take seconds or even minutes to stabilize. Usually, light adaptation takes around 5 minutes (six times faster than dark adaptation). This might point to the influence of post-receptive sites.
Figure 6 shows examples of light adaptation . If we would use a single response function to map the large range of intensities into the visual system's output, then we would only have a very small range at our disposal for a given scene. It is clear that with such a response function, the perceived contrast of any given scene would be limited and visual sensitivity to changes would be severely degraded due to signal-to-noise issues. This case is shown by the dashed line. On the other hand, solid lines represent families of visual responses. These curves map the useful illumination range in any given scene into the full dynamic range of the visual output, thus resulting in the best possible visual perception for each situation. Light adaptation can be thought of as the process of sliding the visual response curve along the illumination level axis until the optimum level for the given viewing conditions is reached.
The general concept of chromatic adaptation consists in the variation of the height of the three cone spectral responsivity curves. This adjustment arises because light adaptation occurs independently within each class of cone. A specific formulation of this hypothesis is known as the von Kries adaptation. This hypothesis states that the adaptation response takes place in each of the three cone types separately and is equivalent to multiplying their fixed spectral sensitivities by a scaling constant. If the scaling weights (also known as von Kries coefficients) are inversely proportional to the absorption of light by each cone type (i.e. a lower absorption will require a larger coefficient), then von Kries scaling maintains a constant mean response within each cone class. This provides a simple yet powerful mechanism for maintaining the perceived color of objects despite changes in illumination. Under a number of different conditions, von Kries scaling provides a good account of the effects of light adaptation on color sensitivity and appearance,.
The easiest way to picture chromatic adaptation is by examining a white object under different types of illumination. For example, let's consider examining a piece of paper under daylight, fluorescent, and incandescent illumination. Daylight contains relatively far more short-wavelength energy than fluorescent light, and incandescent illumination contains relatively far more long-wavelength energy than fluorescent light. However, in spite of the different illumination conditions, the paper approximately retains its white appearance under all three light sources. This is because the S-cone system becomes relatively less sensitive under daylight (in order to compensate for the additional short-wavelength energy) and the L-cone system becomes relatively less sensitive under incandescent illumination (in order to compensate for the additional long-wavelength energy).
Since the late 20th century, restoring vision to blind people by means of artificial eye prostheses has been the goal of numerous research groups and some private companies around the world. Similar to cochlear implants, the key concept is to stimulate the visual nervous system with electric pulses, bypassing the damaged or degenerated photoreceptors on the human retina. In this chapter we will describe the basic functionality of a retinal implant, as well as the different approaches that are currently being investigated and developed. The two most common approaches to retinal implants are called “epiretinal” and “subretinal” implants, corresponding to eye prostheses located either on top or behind the retina respectively. We will not cover any non-retina related approaches to restoring vision, such as the BrainPort Vision System that aims at stimulating the tongue from visual input, cuff electrodes around the optic nerve, or stimulation implants in the primary visual cortex.
Retinal Structure and Functionality
Figure 1 depicts the schematic nervous structure of the human retina. We can differentiate between three layers of cells. The first, located furthest away from the eye lens, consists of the photoreceptors (rods and cones) whose purpose is to transduce incoming light into electrical signals that are then further propagated to the intermediate layer, which is mainly composed of bipolar cells. These bipolar cells, which are connected to photoreceptors as well as cell types such as horizontal cells and amacrine cells, passd on the electrical signal to the retinal ganglion cells (RGC). For a detailed description on the functionality of bipolar cells, specifically with respect to their subdivision into ON- and OFF-bipolar cells, refer to chapter on Visual Systems. The uppermost layer, consisting of RGCs, collects the electric pulses from the horizontal cells and passes them on to the thalamus via the optic nerve. From there, signals are propagated to the primary visual cortex. There are some key aspects worth mentioning about the signal processing within the human retina. First, while bipolar cells, as well as horizontal and amacrine, generate graded potentials, the RGCs generate action potentials instead. Further, the density of each cell type is not uniform across the retina. While there is an extremely high density of rods and cones in the area of the fovea, with in addition only very few photoreceptors connected to RGCs via the intermediate layer, a far lower density of photoreceptors is found in the peripheral areas of the retina with many photoreceptors connected to a single RGC. The latter also has direct implications on the receptive field of a RGC, as it tends to increase rapidly towards the outer regions of the retina, simply because of the lower photoreceptor density and the increased number of photoreceptors being connected to the same RGC.
Implant Use Case
Damage to the photoreceptor layer in the human can be caused by Retinitis pigmentosa, age-related macular degeneration and other diseases, eventually resulting in the affected person to become blind. However, the rest of the visual nervous system, both inside the retina as well as the visual nervous pathway in the brain, remains intact for several years after onset of blindness  . This allows artificial stimulation of the remaining, still properly functioning retina cells, through electrodes, to restore visual information for the human patient. Thereby a retina prosthesis can be implanted either behind the retina, and is then referred to as subretinal implant. This brings the electrodes closest to the damaged photoreceptors and the still properly functioning bipolar cells, which are the real stimulation target here. (If the stimulation electrodes penetrate the choroid, which contains the blood supply of the retina, the implants are sometimes called "suprachoroidal" implants.) Or the implant may be put on top of the retina, closest to the Ganglion cell layer, aiming at stimulation of the RGCs instead. These implants are referred to as epiretinal implants. Both approaches are currently being investigated by several research groups. They both have significant advantages as well as drawbacks. Before we treat them in more detail separately, we describe some key challenges that need consideration in both cases.
A big challenge for retinal implants comes from the extremely high spatial density of nervous cells in the human retina. There are roughly 125 million photoreceptors (rods and cones) and 1.5 million ganglion cells in the human retina, as opposed to approximately only 15000 hair cells in the human cochlea  . In the fovea, where the highest visual acuity is achieved, as many as 150000 cones are located within one square millimeter. While there are much fewer RGCs in total compared to photoreceptors, their density in the foveal area is close to the density of cones , imposing a tremendous challenge in addressing the nervous cells in high enough spatial resolution with artificial electrodes. Virtually all current scientific experiments with retinal implants use micro-electrode arrays (MEAs) to stimulate the retina cells. High resolution MEAs achieve an inter-electrode spacing of roughly 50 micrometers, resulting in an electrode density of 400 electrodes per square millimeter. Therefore, a one to one association between electrodes and photoreceptors or RGCs respectively is impossible in the foveal area with conventional electrode technology. However, spatial density of both photoreceptors as well as RGCs decrease s quickly towards the outer regions of the retina, making one-to-one stimulation between electrodes and peripheral nerve cells more feasible . Another challenge is operating the electrodes within safe limits. Imposing charge densities above 0.1 mC/cm2 may damage the nervous tissue . Generally, the further a cell is away from the stimulating electrode, the larger is the current amplitude required for stimulation of the cell. Furthermore, the lower the stimulation threshold, the smaller the electrode may be designed and the compacter the electrodes may be placed on the MEAs, thereby enhancing the spatial stimulation resolution. Stimulation threshold is defined as the minimal stimulation strength necessary to trigger a nervous response in at least 50% of the stimulation pulses. For these reasons, a primary goal in designing retinal implants is to use as low a stimulation current as possible while still guaranteeing a reliable stimulation (i.e. generation of an action potential in the case of RGCs) of the target cell. This can either be achieved by placing the electrode as close as possible to the area of the target cell that reacts most sensitive to an applied electric field pulse or by making the cell projections, i.e. dendrites and/or axons, grow on top the electrode, allowing a stimulation of the cell with very low currents even if the cell body is located far away. Further, an implant fixed to the retina automatically follows the movements of the eyeball. While this entails some significant benefits, it also means that any connection to the implant - for adjusting parameters, reading out data, or providing external power for the stimulation - requires a cable that moves with the implant. As we move our eyes approximately three times a second, this exposes the cable and involved connections to severe mechanical stress. For a device that should remain functioning for an entire life time without external intervention, this imposes a severe challenge on the materials and technologies involved.
As the name already suggest, subretinal implants are visual prosthesis located behind the retina. Therefore, the implant is located closest to the damaged photoreceptors, aiming at bypassing the rods and cones and stimulating the bipolar cells in the next nervous layer in the retina. The main advantage of this approach lies in relatively little visual signal processing that takes place between the photoreceptors and the bipolar cells that need to be imitated by the implant. That is, raw visual information, for example captured by a video camera, may be forwarded directly, or with only relatively rudimentary signal processing respectively, to the MEA stimulating the bipolar cells, rendering the procedure rather simple from a signal processing point of view. However, this approach has some severe disadvantages. The high spatial resolution of photoreceptors in the human retina imposes a big challenge in developing and designing a MEA with sufficiently high stimulation resolution and therefore low inter-electrode spacing. Furthermore, the stacking of the nervous layers in z-direction (with the x-y plane tangential to the retina curvature) adds another difficulty when it comes to placing the electrodes close to the bipolar cells. With the MAE located behind the retina, there is a significant spatial gap between the electrodes and the target cells that needs to be overcome. As mentioned above, an increased electrode to target cell distance forces the MAE to operate with higher currents, enlarging the electrode size, the number of cells within the stimulation range of a single electrode and the spatial separation between adjacent electrodes. All of this results in a decreased stimulation resolution as well as opposing the retina to the risk of tissue damage caused by too high charge densities. As shown below, one way to overcome large distances between electrodes and the target cells is to make the cells grow their projections over longer distances directly on top the electrode.
In late 2010, a German research group in collaboration with the private German company “Retina Implant AG”, published results from studies involving tests with subretinal implants in human subjects . A three by three millimeter microphotodiode array (MPDA) containing 1500 pixels, which each pixel consisting of an individual light-sensing photodiodes and an electrode, was implanted behind the retina of three patients suffering from blindness due to macular degeneration. The pixels were located approximately 70 micrometer apart from each other, yielding a spatial resolution of roughly 160 electrodes per square millimeter – or, as indicated by the authors of the paper, a visual cone angle of 15 arcmin for each electrode. It should be noted, that, in contrast to implants using external video cameras to generate visual input, each pixel of the MPDA itself contains a light-sensitive photodiode, autonomously generating the electric current from the light received through the eyeball for its own associated electrode. So each MPDA pixel corresponds in its full functionality to a photoreceptor cell. This has a major advantage: Since the MPDA is fixed behind the human retina, it automatically drags along when the eyeball is being moved. And since the MPDA itself receives the visual input to generate the electric currents for the stimulation electrodes, movements of the head or the eyeball are handled naturally and need no artificial processing. In one of the patients, the MPDA was placed directly beneath the macula, leading to superior results in experimental tests as opposed to the other two patients, whose MPDA was implanted further away from the center of the retina. The results achieved by the patient with the implant behind the macula were quite extraordinary. He was able to recognize letters (5-8cm large) and read words as well as distinguish black-white patterns with different orientations .
The experimental results with the MPDA implants have also drawn attention to another visual phenomenon, revealing an additional advantage of the MPDA approach over implants using external imaging devices: Subsequent stimulation of retinal cells quickly leads to decreased responses, suggesting that retinal neurons become inhibited after being stimulated repeatedly within a short period of time. This entails that a visual input projected onto a MEA fixed on or behind the retina will result in a sensed image that quickly fades away, even though the electric stimulation of the electrodes remains constant. This is due to the fixed electrodes on the retina stimulating the same cells on the retina all the time, rendering the cells less and less sensitive to a constant stimulus over time. However, the process is reversible, and the cells regain their initial sensitivity once the stimulus is absent again. So, how does an intact visionary system handle this effect? Why are healthy humans able to fix an object over time without it fading out? As mentioned in , the human eye actually continuously adjusts in small, unnoticeable eye movements, resulting in the same visual stimulus to be projected onto slightly different retinal spots over time, even as we tend to focus and fix the eye on some target object. This successfully circumvents the fading cell response phenomenon. With the implant serving both as photoreceptor and electrode stimulator, as it is the case with the MPDA, the natural small eye adjustments can be readily used to handle this effect in a straight forward way. Other implant approaches using external visual input (i.e. from video cameras) will suffer from their projected images fading away if stimulated continuously. Fast, artificial jittering of the camera images may not solve the problem as this external movement may not be in accordance with the eye movement and therefore, the visual cortex may interpret this simply as a wiggly or blurry scene instead of the desired steady long term projection of the fixed image. A further advantage of subretinal implants is the precise correlation between stimulated areas on the retina and perceived location of the stimulus in the visual field of the human subject. In contrast to RGCs, whose location on the retina may not directly correspond to the location of their individual receptive fields, the stimulation of a bipolar cell is perceived exactly at that point in the visual field that corresponds to the geometric location on the retina where that bipolar cell resides. A clear disadvantage of subretinal implants is the invasive surgical procedure involved.
Epiretinal implants are located on top of the retina and therefore closest to the retina ganglion cells (RGCs). For that reason, epiretinal implants aim at stimulating the RGCs directly, bypassing not only the damaged photoreceptors, but also any intermediate neural visual processing by the bipolar, horizontal and amacrine cells. This has some advantages: First of all, the surgical procedure for an epiretinal implant is far less critical than for a subretinal implant, since the prosthesis need not be implanted from behind the eye. Also, there are much fewer RGCs than photoreceptors or bipolar cells, allowing a more course grained stimulation with increased inter-electrode distance (at least in the peripheral regions of the retina), or an electrode density even superior to that of the actual RGC density, allowing for more flexibility and accuracy when stimulating the cells. A study on the epiretinal stimulation of peripheral parasol cells conducted on macaque retina provides quantitative details . Parasol cells are one type of RGCs forming the secondmost dense visual pathway in the retina. Their main purpose is to encode the movement of objects in the visual field, thus sensing motion. The experiments were performed in vitro by placing the macaque retina tissue on a 61 electrode MEA (60 micrometer inter-electrode spacing). 25 individual parasol cells were indentified and stimulated electronically while properties such as stimulation threshold and best stimulation location were analyzed. The threshold current was defined as the lowest current that triggered a spike on the target cell in 50% of the stimulus pulses (pulse duration: 50 milliseconds) and was determined by incrementally increasing the stimulation strength until sufficient spiking response was registered. Please note two aspects: First, parasol cells as RGCs exhibit action potential behavior, as opposed to bipolar cells which work with graded potentials. Second, the electrodes on the MAE were both used for the stimulation pulses as well as for recording the spiking response from the target cells. 25 parasol cells were located on the 61 electrode MAE with a electrode density significantly higher than the parasol cell density, effectively yielding multiple electrodes within the receptive fields of a single parasol cell. In addition to measuring the stimulation thresholds necessary to trigger a reliable cell response, also the location of best stimulation was determined. The location of best stimulation refers to the location of the stimulating electrode with respect to the target cell where the lowest stimulation threshold was achieved. Surprisingly, this was found out to not be on the cell soma, as one would expect, but roughly 13 micrometers further down the axon path. From there on, the experiments showed the expected quadratic increase in stimulation threshold currents with respect to increasing electrode to soma distance. The study results also showed that all stimulation thresholds were well below the safety limits (around 0.05mC/cm2, as opposed to 0.1mC/cm2 being a (low) safety limit) and that the cell response to a stimulation pulse was fast (0.2 ms latency on average) and precise (small variance on latency). Further, the superior electrode density over parasol cell density allowed a reliable addressing of individual cells by the stimulation of the appropriate electrode, while preventing neighboring cells from also evoking a spike.
Overview of Alternative Technical Approaches
In this section, we give a short overview over some alternative approaches and technologies currently being under research.
Classic MAEs contain electrodes made out of titanium nitride or indium tin oxide exposing the implant to severe issues with long-term biocompatibility . A promising alternative to metallic electrodes consists of carbon nanotubes (CNT) which combine a number of very advantageous properties. First, they are fully bio compatible since they are made from pure carbon. Second, their robustness makes them suited for long term implantation, a key property for visual prosthesis. Further, the good electric conductivity allows them to operate as electrodes. And finally, their very porous nature leads to extremely large contact surfaces, encouraging the neurons to grow on top the CNTs, thus improving the neuron to electrode contact and lowering the stimulation currents necessary to elicit a cell response. However, CNT electrodes have only emerged recently and at this point only few scientific results are available.
Wireless Implant Approaches
One of the main technical challenges with retinal implant relates to the cabling that connects the MEA with the external stimuli, the power supply as well as the control signals. The mechanical stress on the cabling affects its long term stability and durability, imposing a big challenge on the materials used. Wireless technologies could be a way to circumvent any cabling between the actual retinal implant and external devices. The energy of the incoming light through the eye is not sufficient to trigger neural responses. Therefore, to make a wireless implant work, extra power must be provided to the implant. An approach presented by the Stanford School of Medecine uses an infrared LCD display to project the scene captured by a video camera onto goggles, reflecting infrared pulses onto the chip located on the retina. The chip also uses a photovoltaic rechargeable battery to provide the power required to transfer the IR light into sufficiently strong stimulation pulses. Similar to the subretinal approach, this also allows the eye to naturally fix and focus onto objects in the scene, as the eye is free to move, allowing different parts of the IR image on the goggles to be projected onto different areas on the chip located on the retina. Instead of using infrared light, inductive coils can also be used to transmit electrical power and data signals from external devices to the implant on the retina. This technology has been successfully implemented and tested in the EPIRET3 retinal implant . However, those tests were more a proof-of-concept, as only the patient’s ability to sense a visual signal upon applying a stimulus on the electrodes was tested.
Directed Neural Growth
One way to allow a very precise neural stimulation with extremely low currents and even over longer distances is to make the neurons grow their projections onto the electrode. By applying the right chemical solution onto the retinal tissue, neural growth can be encouraged. This can be achieved by applying a layer of Laminin onto the MEA’s surface. In order to control the neural paths, the Laminin is not applied uniformly across the MEA surface, but in narrow paths forming a pattern corresponding to the connections, the neurons should form. This process of applying the Laminin in a precise, patterend way, is called “microcontact printing”. A picture of what these Lamini paths look like is shown in Figure 5. The successful directed neural growth achieved with this method allowed applying significantly lower stimulation currents compared to classic electrode stimulation while still able to reliably trigger neural response . Furthermore, the stimulation threshold no longer follows the quadratic increase with respect to electrode-soma distance, but remains constant at the same low level even for longer distances (>200 micrometer).
Other Visual Implants
In addition to the stimulation of the retina, also other elements of the visual system can be stimulated
Stimulation of the Optic Nerve
With cuff-electrodes, typically with only a few segments.
- Little trauma to the eye.
- Not very specific.
Dr. Mohamad Sawan, Professor and Researcher at Polystim neurotechnologies Laboratory at the Ecole Polytechnique de Montreal, has been working on a visual prosthesis to be implanted into the human cortex. The basic principle of Dr. Sawan’s technology consists in stimulating the visual cortex by implanting a silicium microchip on a network of electrodes made of biocompatible materials and in which each electrode injects a stimulating electrical current in order to provoke a series of luminous points to appear (an array of pixels) in the field of vision of the sightless person. This system is composed of two distinct parts: the implant and an external controller. The implant lodged in the visual cortex wirelessly receives dedicated data and energy from the external controller. This implantable part contains all the circuits necessary to generate the electrical stimuli and to oversee the changing microelectrode/biological tissue interface. On the other hand, the battery-operated outer control comprises a micro-camera which captures the image as well as a processor and a command generator which process the imaging data to select and translate the captured images and to generate and manage the electrical stimulation process and oversee the implant. The external controller and the implant exchange data in both directions by a powerful transcutaneous radio frequency (RF) link. The implant is powered the same way. (Wikipedia )
- Much larger area for stimulation: 2° radius of the central retinal visual field correspond to 1 mm2 on the retina, but to 2100 mm2 in the visual cortex.
- Implantation is more invasive.
- Parts of the visual field lie in a sulcus and are very hard to reach.
- Stimulation can trigger seizures.
Computer Simulation of the Visual System
In this section an overview in the simulation of processing done by the early levels of the visual system will be given. The implementation to reproduce the action of the visual system will thereby be done with MATLAB and its toolboxes. The processing done by the early visual system was discussed in the section before and can be put together with some of the functions they perform in the following schematic overview. A good description of the image processing can be found in (Cormack 2000).
As we can see in the above overview different stages of the image processing have to be considered to simulate the response of the visual system to a stimulus. The next section will therefore give a brief discussion in Image Processing. But first of all we will be concerned with the Simulation of Sensory Organ Components.
Simulating Sensory Organ Components
Anatomical Parameters of the Eye
The average eye has an anterior corneal radius of curvature of = 7.8 mm , and an aqueous refractive index of 1.336. The length of the eye is = 24.2 mm. The iris is approximately flat, and the edge of the iris (also called limbus) has a radius = 5.86 mm.
Optics of the Eyeball
The optics of the eyeball are characterized by its 2-D spatial impulse response function, the Point Spread Function (PSF)
in which is the radial distance in minutes of arc from the center of the image.
Obviously, the effect on a given digital image depends on the distance of that image from your eyes. As a simple place-holder, substitute this filter with a Gaussian filter with height 30, and with a standard deviation of 1.5.
In one dimension, a Gaussian is described by
Activity of Ganglion Cells
- temporal response
- effect of wavelength (especially for the cones)
- opening of the iris
- sampling and distribution of photo receptors
- bleaching of the photo-pigment
we can approximate the response of ganglion cells with a Difference of Gaussians (DOG, Wikipedia )
The source code for a Python implementation is available under .
The values of and have a ratio of approximately 1:1.6, but vary as a function of eccentricity. For midget cells (or P-cells), the Receptive Field Size (RFS) is approximately
where the RFS is given in arcmin, and the Eccentricity in mm distance from the center of the fovea (Cormack 2000).
Activity of simple cells in the primary visual cortex (V1)
Again ignoring temporal properties, the activity of simple cells in the primary visual cortex (V1) can be modeled with the use of Gabor filters (Wikipedia ). A Gabor filter is a linear filter whose impulse response is defined by a harmonic function (sinusoid) multiplied by a Gaussian function. The Gaussian function causes the amplitude of the harmonic function to diminish away from the origin, but near the origin, the properties of the harmonic function dominate
In this equation, represents the wavelength of the cosine factor, represents the orientation of the normal to the parallel stripes of a Gabor function (Wikipedia ), is the phase offset, is the sigma of the Gaussian envelope and is the spatial aspect ratio, and specifies the ellipticity of the support of the Gabor function.
Gabor-like functions arise naturally, simply from the statistics of everyday scenes . An example how even the statistics of a simple image can lead to the emergence of Gabor-like receptive fields, written in Python, is presented in ; and a (Python-)demonstration of the effects of filtering an image with Gabor-functions can be found at .
This is an example implementation in MATLAB:
function gb = gabor_fn(sigma,theta,lambda,psi,gamma) sigma_x = sigma; sigma_y = sigma/gamma; % Bounding box nstds = 3; xmax = max(abs(nstds*sigma_x*cos(theta)),abs(nstds*sigma_y*sin(theta))); xmax = ceil(max(1,xmax)); ymax = max(abs(nstds*sigma_x*sin(theta)),abs(nstds*sigma_y*cos(theta))); ymax = ceil(max(1,ymax)); xmin = -xmax; ymin = -ymax; [x,y] = meshgrid(xmin:0.05:xmax,ymin:0.05:ymax); % Rotation x_theta = x*cos(theta) + y*sin(theta); y_theta = -x*sin(theta) + y*cos(theta); gb = exp(-.5*(x_theta.^2/sigma_x^2+y_theta.^2/sigma_y^2)).* cos(2*pi/lambda*x_theta+psi); end
And an equivalent Pyhon implementation would be:
import numpy as np import matplotlib.pyplot as mp def gabor_fn(sigma = 1, theta = 1, g_lambda = 4, psi = 2, gamma = 1): # Calculates the Gabor function with the given parameters sigma_x = sigma sigma_y = sigma/gamma # Boundingbox: nstds = 3 xmax = max( abs(nstds*sigma_x * np.cos(theta)), abs(nstds*sigma_y * np.sin(theta)) ) ymax = max( abs(nstds*sigma_x * np.sin(theta)), abs(nstds*sigma_y * np.cos(theta)) ) xmax = np.ceil(max(1,xmax)) ymax = np.ceil(max(1,ymax)) xmin = -xmax ymin = -ymax numPts = 201 (x,y) = np.meshgrid(np.linspace(xmin, xmax, numPts), np.linspace(ymin, ymax, numPts) ) # Rotation x_theta = x * np.cos(theta) + y * np.sin(theta) y_theta = -x * np.sin(theta) + y * np.cos(theta) gb = np.exp( -0.5* (x_theta**2/sigma_x**2 + y_theta**2/sigma_y**2) ) * \ np.cos( 2*np.pi/g_lambda*x_theta + psi ) return gb if __name__ == '__main__': # Main function: calculate Gabor function for default parameters and show it gaborValues = gabor_fn() mp.imshow(gaborValues) mp.colorbar() mp.show()
One major technical tool to understand is the way a computer handles images. We have to know how we can edit images and what techniques we have to rearrange images.
For a computer an image is nothing more than a huge amount of little squares. These squares are called "pixel". In a grayscale image, each of this pixel carries a number n, often it holds . This number n, represents the exactly color of this square in the image. This means, in a grayscale image we can use 256 different grayscales, where 255 means a white spot, and 0 means the square is black. To be honest, we could even use more than 256 different levels of gray. In the mentioned way, every pixels uses exactly 1 byte (or 8 bit) of memory to be saved. (Due to the binary system of a computer it holds: 28=256) If you think it is necessary to have more different gray scales in your image, this is not a problem. You just can use more memory to save the picture. But just remember, this could be a hard task for huge images. Further quite often you have the problem that your sensing device (e.g. your monitor) can not show more than this 256 different gray colors.
Representing a colourful image is only slightly more complicated than the grayscale picture. All you have to know is that the computer works with a additive colour mixture of the three main colors Red, Green and Blue. This are the so called RGB colours.
Also these images are saved by pixels. But now every pixel has to know 3 values between 0 and 256, for every Color 1 value. So know we have 2563= 16,777,216 different colours which can be represented. Similar to the grayscale images also here holds, that no color means black, and having all color means white. That means, the colour (0,0,0) is black, whereas (0,0,255) means blue and (255,255,255) is white.
WARNING - There are two common, but different ways to describe the location of a point in 2 dimensions: 1) The x/y notation, with x typically pointing to the left 2) The row/column orientation Carefully watch out which coordinates you are using to describe your data, as the two descriptions are not consistent!
In many technical applications, we find some primitive basis in which we easily can describe features. In 1 dimensional cases filters are not a big deal, therefore we can use this filters for changing images. The so called "Savitzky- Golay Filter" allows to smooth incoming signals. The filter was described in 1964 by Abraham Savitzky and Marcel J. E. Golay. It is a impulse-respond filter (IR).
For better understanding, lets look at a example. In 1d we usually deal with vectors. One such given vector, we call x and it holds: . Our purpose is to smooth that vector x. To do so all we need is another vector , this vector we call a weight vector.
With we now have a smoothed vector y. This vector is smoother than the vector before, because we only save the average over a few entries in the vector. These means the newly found vectorentries, depends on some entries right left and right of the entry to smooth. One major drawback of this approach is, the newly found vector y only has n-m entries instead of n as the original vector x.
Drawing this new vector would lead to the same function as before, just with less amplitude. So no data is lost, but we have less fluctuation.
Going from the 1d case to the 2d case is done by simply make out of vectors matrices. As already mentioned, a gray-level image is for a computer or for a softwaretool as MATLAB nothing more, than a huge matrix filled with natural numbers, often between 0 and 255.
The weight vector is now a weight-matrix. But still we use the filter by adding up different matrix-element-multiplications.
Dilation and Erosion
For linear filters as seen before, it holds that they are commutativ. Cite from wikipedia: "One says that x commutes with y under ∗ if:
In other words, it does not matter how many and in which sequence different linear filters you use. E.g. if a Savitzky-Golay filter is applied to some date, and then a second Savitzky-Golay filter for calculationg the first derivative, the result is the same if the sequence of filters is reversed. It even holds, that there would have been one filter, which does the same as the two applied.
In contrast morphological operations on an image are non-linear operations and the final result depends on the sequence. If we think of any image, it is defined by pixels with values xij. Further this image is assumed to be a black-and-white image, so we have
To define a morphological operation we have to set a structural element SE. As example, a 3x3-Matrix as a part of the image.
The definition of erosion E says:
So in words, if any of the pixels in the structural element M has value 0, the erosion sets the value of M, a specific pixel in M, to zero. Otherwise E(M)=1
And for the dilation D it holds, if any value in SE is 1, the dilation of M, D(M), is set to 1.
Compositions of Dilation and Erosion: Opening and Closing of Images
There are two compositions of dilation and erosion. One called opening the other called closing. It holds:
- T. Haslwanter (2012). "Hodgkin-Huxley Simulations [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/HH_model.py.
- T. Haslwanter (2012). "Fitzhugh-Nagumo Model [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/Fitzhugh_Nagumo.py.
- T. Anastasio (2010). "Tutorial on Neural systems Modeling". http://www.sinauer.com/detail.php?id=3396.
The sensory system for the sense of hearing is the auditory system. This wikibook covers the physiology of the auditory system, and its application to the most successful neurosensory prosthesis - cochlear implants. The physics and engineering of acoustics are covered in a separate wikibook, Acoustics. An excellent source of images and animations is "Journey into the world of hearing" .
The ability to hear is not found as widely in the animal kingdom as other senses like touch, taste and smell. It is restricted mainly to vertebrates and insects. Within these, mammals and birds have the most highly developed sense of hearing. The table below shows frequency ranges of humans and some selected animals:
The organ that detects sound is the ear. It acts as receiver in the process of collecting acoustic information and passing it through the nervous system into the brain. The ear includes structures for both the sense of hearing and the sense of balance. It does not only play an important role as part of the auditory system in order to receive sound but also in the sense of balance and body position.
Humans have a pair of ears placed symmetrically on both sides of the head which makes it possible to localize sound sources. The brain extracts and processes different forms of data in order to localize sound, such as:
- the shape of the sound spectrum at the tympanic membrane (eardrum)
- the difference in sound intensity between the left and the right ear
- the difference in time-of-arrival between the left and the right ear
- the difference in time-of-arrival between reflections of the ear itself (this means in other words: the shape of the pinna (pattern of folds and ridges) captures sound-waves in a way that helps localizing the sound source, especially on the vertical axis.
Healthy, young humans are able to hear sounds over a frequency range from 20 Hz to 20 kHz. We are most sensitive to frequencies between 2000 to 4000 Hz which is the frequency range of spoken words. The frequency resolution is 0.2% which means that one can distinguish between a tone of 1000 Hz and 1002 Hz. A sound at 1 kHz can be detected if it deflects the tympanic membrane (eardrum) by less than 1 Angstrom, which is less than the diameter of a hydrogen atom. This extreme sensitivity of the ear may explain why it contains the smallest bone that exists inside a human body: the stapes (stirrup). It is 0.25 to 0.33 cm long and weighs between 1.9 and 4.3 mg.
Anatomy of the Auditory System
The aim of this section is to explain the anatomy of the auditory system of humans. The chapter illustrates the composition of auditory organs in the sequence that acoustic information proceeds during sound perception.
Please note that the core information for “Sensory Organ Components” can also be found on the Wikipedia page “Auditory system”, excluding some changes like extensions and specifications made in this article. (see also: Wikipedia Auditory system)
The auditory system senses sound waves, that are changes in air pressure, and converts these changes into electrical signals. These signals can then be processed, analyzed and interpreted by the brain. For the moment, let's focus on the structure and components of the auditory system. The auditory system consists mainly of two parts:
- the ear and
- the auditory nervous system (central auditory system)
The ear is the organ where the first processing of sound occurs and where the sensory receptors are located. It consists of three parts:
- outer ear
- middle ear
- inner ear
Function: Gathering sound energy and amplification of sound pressure.
The folds of cartilage surrounding the ear canal (external auditory meatus, external acoustic meatus) are called the pinna. It is the visible part of the ear. Sound waves are reflected and attenuated when they hit the pinna, and these changes provide additional information that will help the brain determine the direction from which the sounds came. The sound waves enter the auditory canal, a deceptively simple tube. The ear canal amplifies sounds that are between 3 and 12 kHz. At the far end of the ear canal is the tympanic membrane (eardrum), which marks the beginning of the middle ear.
Function: Transmission of acoustic energy from air to the cochlea.
Sound waves traveling through the ear canal will hit the tympanic membrane (tympanum, eardrum). This wave information travels across the air-filled tympanic cavity (middle ear cavity) via a series of bones: the malleus (hammer), incus (anvil) and stapes (stirrup). These ossicles act as a lever and a teletype, converting the lower-pressure eardrum sound vibrations into higher-pressure sound vibrations at another, smaller membrane called the oval (or elliptical) window, which is one of two openings into the cochlea of the inner ear. The second opening is called round window. It allows the fluid in the cochlea to move. The malleus articulates with the tympanic membrane via the manubrium, whereas the stapes articulates with the oval window via its footplate. Higher pressure is necessary because the inner ear beyond the oval window contains liquid rather than air. The sound is not amplified uniformly across the ossicular chain. The stapedius reflex of the middle ear muscles helps protect the inner ear from damage. The middle ear still contains the sound information in wave form; it is converted to nerve impulses in the cochlea.
|Structural diagram of the cochlea||Cross section of the cochlea|
Function: Transformation of mechanical waves (sound) into electric signals (neural signals).
The inner ear consists of the cochlea and several non-auditory structures. The cochlea is a snail-shaped part of the inner ear. It has three fluid-filled sections: scala tympani (lower gallery), scala media (middle gallery, cochlear duct) and scala vestibuli (upper gallery). The cochlea supports a fluid wave driven by pressure across the basilar membrane separating two of the sections (scala tympani and scala media). The basilar membrane is about 3 cm long and between 0.5 to 0.04 mm wide. Reissner’s membrane (vestibular membrane) separates scala media and scala vestibuli. Strikingly, one section, the scala media, contains an extracellular fluid similar in composition to endolymph, which is usually found inside of cells. The organ of Corti is located in this duct, and transforms mechanical waves to electric signals in neurons. The other two sections, scala tympani and scala vestibuli, are located within the bony labyrinth which is filled with fluid called perilymph. The chemical difference between the two fluids endolymph (in scala media) and perilymph (in scala tympani and scala vestibuli) is important for the function of the inner ear.
Organ of Corti
The organ of Corti forms a ribbon of sensory epithelium which runs lengthwise down the entire cochlea. The hair cells of the organ of Corti transform the fluid waves into nerve signals. The journey of a billion nerves begins with this first step; from here further processing leads to a series of auditory reactions and sensations.
Transition from ear to auditory nervous system
Hair cells are columnar cells, each with a bundle of 100-200 specialized cilia at the top, for which they are named. These cilia are the mechanosensors for hearing. The shorter ones are called stereocilia, and the longest one at the end of each haircell bundlekinocilium. The location of the kinocilium determines the on-direction, i.e. the direction of deflection inducing the maximum hair cell excitation. Lightly resting atop the longest cilia is the tectorial membrane, which moves back and forth with each cycle of sound, tilting the cilia and allowing electric current into the hair cell.
The function of hair cells is not fully established up to now. Currently, the knowledge of the function of hair cells allows to replace the cells by cochlear implants in case of hearing lost. However, more research into the function of the hair cells may someday even make it possible for the cells to be repaired. The current model is that cilia are attached to one another by “tip links”, structures which link the tips of one cilium to another. Stretching and compressing, the tip links then open an ion channel and produce the receptor potential in the hair cell. Note that a deflection of 100 nanometers already elicits 90% of the full receptor potential.
The nervous system distinguishes between nerve fibres carrying information towards the central nervous system and nerve fibres carrying the information away from it:
- Afferent neurons (also sensory or receptor neurons) carry nerve impulses from receptors (sense organs) towards the central nervous system
- Efferent neurons (also motor or effector neurons) carry nerve impulses away from the central nervous system to effectors such as muscles or glands (and also the ciliated cells of the inner ear)
Afferent neurons innervate cochlear inner hair cells, at synapses where the neurotransmitter glutamate communicates signals from the hair cells to the dendrites of the primary auditory neurons. There are far fewer inner hair cells in the cochlea than afferent nerve fibers. The neural dendrites belong to neurons of the auditory nerve, which in turn joins the vestibular nerve to form the vestibulocochlear nerve, or cranial nerve number VIII.
Efferent projections from the brain to the cochlea also play a role in the perception of sound. Efferent synapses occur on outer hair cells and on afferent (towards the brain) dendrites under inner hair cells.
Auditory nervous system
The sound information, now re-encoded in form of electric signals, travels down the auditory nerve (acoustic nerve, vestibulocochlear nerve, VIIIth cranial nerve), through intermediate stations such as the cochlear nuclei and superior olivary complex of the brainstem and the inferior colliculus of the midbrain, being further processed at each waypoint. The information eventually reaches the thalamus, and from there it is relayed to the cortex. In the human brain, the primary auditory cortex is located in the temporal lobe.
Primary auditory cortex
The primary auditory cortex is the first region of cerebral cortex to receive auditory input. Perception of sound is associated with the right posterior superior temporal gyrus (STG). The superior temporal gyrus contains several important structures of the brain, including Brodmann areas 41 and 42, marking the location of the primary auditory cortex, the cortical region responsible for the sensation of basic characteristics of sound such as pitch and rhythm. The auditory association area is located within the temporal lobe of the brain, in an area called the Wernicke's area, or area 22. This area, near the lateral cerebral sulcus, is an important region for the processing of acoustic signals so that they can be distinguished as speech, music, or noise.
Auditory Signal Processing
Now that the anatomy of the auditory system has been sketched out, this topic goes deeper into the physiological processes which take place while perceiving acoustic information and converting this information into data that can be handled by the brain. Hearing starts with pressure waves hitting the auditory canal and is finally perceived by the brain. This section details the process transforming vibrations into perception.
Effect of the head
Sound waves with a wavelength shorter than the head produce a sound shadow on the ear further away from the sound source. When the wavelength is shorter than the head, diffraction of the sound leads to approximately equal sound intensities on both ears.
Sound reception at the pinna
The pinna collects sound waves in air affecting sound coming from behind and the front differently with its corrugated shape. The sound waves are reflected and attenuated or amplified. These changes will later help sound localization.
In the external auditory canal, sounds between 3 and 12 kHz - a range crucial for human communication - are amplified. It acts as resonator amplifying the incoming frequencies.
Sound conduction to the cochlea
Sound that entered the pinna in form of waves travels along the auditory canal until it reaches the beginning of the middle ear marked by the tympanic membrane (eardrum). Since the inner ear is filled with fluid, the middle ear is kind of an impedance matching device in order to solve the problem of sound energy reflection on the transition from air to the fluid. As an example, on the transition from air to water 99.9% of the incoming sound energy is reflected. This can be calculated using:
with Ir the intensity of the reflected sound, Ii the intensity of the incoming sound and Zk the wave resistance of the two media ( Zair = 414 kg m-2 s-1 and Zwater = 1.48*106 kg m-2 s-1). Three factors that contribute the impedance matching are:
- the relative size difference between tympanum and oval window
- the lever effect of the middle ear ossicles and
- the shape of the tympanum.
The longitudinal changes in air pressure of the sound-wave cause the tympanic membrane to vibrate which, in turn, makes the three chained ossicles malleus, incus and stirrup oscillate synchronously. These bones vibrate as a unit, elevating the energy from the tympanic membrane to the oval window. In addition, the energy of sound is further enhanced by the areal difference between the membrane and the stapes footplate. The middle ear acts as an impedance transformer by changing the sound energy collected by the tympanic membrane into greater force and less excursion. This mechanism facilitates transmission of sound-waves in air into vibrations of the fluid in the cochlea. The transformation results from the pistonlike in- and out-motion by the footplate of the stapes which is located in the oval window. This movement performed by the footplate sets the fluid in the cochlea into motion.
Through the stapedius muscle, the smallest muscle in the human body, the middle ear has a gating function: contracting this muscle changes the impedance of the middle ear, thus protecting the inner ear from damage through loud sounds.
Frequency analysis in the cochlea
The three fluid-filled compartements of the cochlea (scala vestibuli, scala media, scala tympani) are separated by the basilar membrane and the Reissner’s membrane. The function of the cochlea is to separate sounds according to their spectrum and transform it into a neural code. When the footplate of the stapes pushes into the perilymph of the scala vestibuli, as a consequence the membrane of Reissner bends into the scala media. This elongation of Reissner’s membrane causes the endolymph to move within the scala media and induces a displacement of the basilar membrane. The separation of the sound frequencies in the cochlea is due to the special properties of the basilar membrane. The fluid in the cochlea vibrates (due to in- and out-motion of the stapes footplate) setting the membrane in motion like a traveling wave. The wave starts at the base and progresses towards the apex of the cochlea. The transversal waves in the basilar membrane propagate with
with μ the shear modulus and ρ the density of the material. Since width and tension of the basilar membrane change, the speed of the waves propagating along the membrane changes from about 100 m/s near the oval window to 10 m/s near the apex.
There is a point along the basilar membrane where the amplitude of the wave decreases abruptly. At this point, the sound wave in the cochlear fluid produces the maximal displacement (peak amplitude) of the basilar membrane. The distance the wave travels before getting to that characteristic point depends on the frequency of the incoming sound. Therefore each point of the basilar membrane corresponds to a specific value of the stimulating frequency. A low-frequency sound travels a longer distance than a high-frequency sound before it reaches its characteristic point. Frequencies are scaled along the basilar membrane with high frequencies at the base and low frequencies at the apex of the cochlea.
Sensory transduction in the cochlea
Most everyday sounds are composed of multiple frequencies. The brain processes the distinct frequencies, not the complete sounds. Due to its inhomogeneous properties, the basilar membrane is performing an approximation to a Fourier transform. The sound is thereby split into its different frequencies, and each hair cell on the membrane corresponds to a certain frequency. The loudness of the frequencies is encoded by the firing rate of the corresponding afferent fiber. This is due to the amplitude of the traveling wave on the basilar membrane, which depends on the loudness of the incoming sound.
The sensory cells of the auditory system, known as hair cells, are located along the basilar membrane within the organ of Corti. Each organ of Corti contains about 16’000 such cells, innervated by about 30'000 afferent nerve fibers. There are two anatomically and functionally distinct types of hair cells: the inner and the outer hair cells. Along the basilar membrane these two types are arranged in one row of inner cells and three to five rows of outer cells. Most of the afferent innervation comes from the inner hair cells while most of the efferent innervation goes to the outer hair cells. The inner hair cells influence the discharge rate of the individual auditory nerve fibers that connect to these hair cells. Therefore inner hair cells transfer sound information to higher auditory nervous centers. The outer hair cells, in contrast, amplify the movement of the basilar membrane by injecting energy into the motion of the membrane and reducing frictional losses but do not contribute in transmitting sound information. The motion of the basilar membrane deflects the stereocilias (hairs on the hair cells) and causes the intracellular potentials of the hair cells to decrease (depolarization) or increase (hyperpolarization), depending on the direction of the deflection. When the stereocilias are in a resting position, there is a steady state current flowing through the channels of the cells. The movement of the stereocilias therefore modulates the current flow around that steady state current.
Lets look at the modes of action of the two different hair cell types separately:
- Inner hair cells:
The deflection of the hair-cell stereocilia opens mechanically gated ion channels that allow small, positively charged potassium ions (K+) to enter the cell and causing it to depolarize. Unlike many other electrically active cells, the hair cell itself does not fire an action potential. Instead, the influx of positive ions from the endolymph in scala media depolarizes the cell, resulting in a receptor potential. This receptor potential opens voltage gated calcium channels; calcium ions (Ca2+) then enter the cell and trigger the release of neurotransmitters at the basal end of the cell. The neurotransmitters diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger action potentials in the nerve. In this way, neurotransmitter increases the firing rate in the VIIIth cranial nerve and the mechanical sound signal is converted into an electrical nerve signal.
The repolarization in the hair cell is done in a special manner. The perilymph in Scala tympani has a very low concentration of positive ions. The electrochemical gradient makes the positive ions flow through channels to the perilymph. (see also: Wikipedia Hair cell)
- Outer hair cells:
In humans outer hair cells, the receptor potential triggers active vibrations of the cell body. This mechanical response to electrical signals is termed somatic electromotility and drives oscillations in the cell’s length, which occur at the frequency of the incoming sound and provide mechanical feedback amplification. Outer hair cells have evolved only in mammals. Without functioning outer hair cells the sensitivity decreases by approximately 50 dB (due to greater frictional losses in the basilar membrane which would damp the motion of the membrane). They have also improved frequency selectivity (frequency discrimination), which is of particular benefit for humans, because it enables sophisticated speech and music. (see also: Wikipedia Hair cell)
With no external stimulation, auditory nerve fibres discharge action potentials in a random time sequence. This random time firing is called spontaneous activity. The spontaneous discharge rates of the fibers vary from very slow rates to rates of up to 100 per second. Fibers are placed into three groups depending on whether they fire spontaneously at high, medium or low rates. Fibers with high spontaneous rates (> 18 per second) tend to be more sensitive to sound stimulation than other fibers.
Auditory pathway of nerve impulses
So in the inner hair cells the mechanical sound signal is finally converted into electrical nerve signals. The inner hair cells are connected to auditory nerve fibres whose nuclei form the spiral ganglion. In the spiral ganglion the electrical signals (electrical spikes, action potentials) are generated and transmitted along the cochlear branch of the auditory nerve (VIIIth cranial nerve) to the cochlear nucleus in the brainstem.
From there, the auditory information is divided into at least two streams:
- Ventral Cochlear Nucleus:
One stream is the ventral cochlear nucleus which is split further into the posteroventral cochlear nucleus (PVCN) and the anteroventral cochlear nucleus (AVCN). The ventral cochlear nucleus cells project to a collection of nuclei called the superior olivary complex.
Superior olivary complex: Sound localization
The superior olivary complex - a small mass of gray substance - is believed to be involved in the localization of sounds in the azimuthal plane (i.e. their degree to the left or the right). There are two major cues to sound localization: Interaural level differences (ILD) and interaural time differences (ITD). The ILD measures differences in sound intensity between the ears. This works for high frequencies (over 1.6 kHz), where the wavelength is shorter than the distance between the ears, causing a head shadow - which means that high frequency sounds hit the averted ear with lower intensity. Lower frequency sounds don't cast a shadow, since they wrap around the head. However, due to the wavelength being larger than the distance between the ears, there is a phase difference between the sound waves entering the ears - the timing difference measured by the ITD. This works very precisely for frequencies below 800 Hz, where the ear distance is smaller than half of the wavelength. Sound localization in the median plane (front, above, back, below) is helped through the outer ear, which forms direction-selective filters.
There, the differences in time and loudness of the sound information in each ear are compared. Differences in sound intensity are processed in cells of the lateral superior olivary complexm and timing differences (runtime delays) in the medial superior olivary complex. Humans can detect timing differences between the left and right ear down to 10 μs, corresponding to a difference in sound location of about 1 deg. This comparison of sound information from both ears allows the determination of the direction where the sound came from. The superior olive is the first node where signals from both ears come together and can be compared. As a next step, the superior olivary complex sends information up to the inferior colliculus via a tract of axons called lateral lemniscus. The function of the inferior colliculus is to integrate information before sending it to the thalamus and the auditory cortex. It is interesting to know that the superior colliculus close by shows an interaction of auditory and visual stimuli.
- Dorsal Cochlear Nucleus:
The dorsal cochlear nucleus (DCN) analyzes the quality of sound and projects directly via the lateral lemnisucs to the inferior colliculus.
From the inferior colliculus the auditory information from ventral as well as dorsal cochlear nucleus proceeds to the auditory nucleus of the thalamus which is the medial geniculate nucleus. The medial geniculate nucleus further transfers information to the primary auditory cortex, the region of the human brain that is responsible for processing of auditory information, located on the temporal lobe. The primary auditory cortex is the first relay involved in the conscious perception of sound.
Primary auditory cortex and higher order auditory areas
Sound information that reaches the primary auditory cortex (Brodmann areas 41 and 42). The primary auditory cortex is the first relay involved in the conscious perception of sound. It is known to be tonotopically organized and performs the basics of hearing: pitch and volume. Depending on the nature of the sound (speech, music, noise), is further passed to higher order auditory areas. Sounds that are words are processed by Wernicke’s area (Brodmann area 22). This area is involved in understanding written and spoken language (verbal understanding). The production of sound (verbal expression) is linked to Broca’s area (Brodmann areas 44 and 45). The muscles to produce the required sound when speaking are contracted by the facial area of motor cortex which are regions of the cerebral cortex that are involved in planning, controlling and executing voluntary motor functions.
The intensity of sound is typically expressed in deciBel (dB), defined as
where SPL = “sound pressure level” (in dB), and the reference pressure is . Note that this is much smaller than the air pressure (ca. 105 N/m2)! Also watch out, because sound is often expressed relative to "Hearing Level" instead of SPL.
- 0 - 20 dB SPL ... hearing level (0 dB for sinusoidal tones, from 1 kHz – 4 kHz)
- 60 dB SPL ... medium loud tone, conversational speech
Fundamental frequency, from the vibrations of the vocal cords in the larynx, is about 120 Hz for adult male, 250 Hz for adult female, and up to 400 Hz for children.
Formants are the dominant frequencies in human speech, and are caused by resonances of the signals from the vocal cord in our mouth etc. Formants show up as distinct peaks of energy in the sound's frequency spectrum. They are numbered in ascending order starting with the format at the lowest frequency.
Speech is often considered to consist of a sequence of acoustic units called phons, which correspond to linguistic units called phonemes. Phonemes are the smallest units of sound that allows different words to be distinguished. The word "dog", for example, contains three phonemes. Changes to the first, second, and third phoneme respectively produce the words "log", "dig", and "dot". English is said to contain 40 different phonemes, specified as in /d/, /o/, /g/ for the word "dog".
The ability of humans to decode speech signals still easily exceeds that of any algorithm developed so far. While automatic speech recognition has become fairly successful in recognizing clearly spoken speech in environments with high Signal-to-noise ratio, once the conditions become a bit less than ideal, recognition algorithms tend to perform vary poorly compared to humans. It seems from this that our computer speech recognition algorithms have not yet come close to capturing the underlying algorithm that humans use to recognize speech.
Evidence has shown that the perception of speech takes quite a different route than the perception of other sounds in the brain. While studies on non-speech sound responses have generally found response to be graded with stimulus, speech studies have repeatedly found a discretization of response when a graded stimulus is presented. For instance, Lisker and Abramson, played a pre-voiced 'b/p' sound. Whether the sound is interpreted as a /b/ or a /p/ depends on the voice onset time (VOT). They found that when smoothly varying the VOT, there was a sharp change (at ~20ms after the consonant is played) where subjects switched their identification from /b/ to /p/. Furthermore, subjects had a great deal of difficulty differentiating between two sounds in the same category (e.g. pairs of sounds with a VOTs of -10ms to 10m, which would both be /b/'s, than sounds with a 10ms to 30ms, which would be identified as a b and a p). This shows that some type of categorization scheme is going on. One of the main problems encountered when trying to build a model of speech perception is the so-called 'Lack of Invariance', which could more straightforwardly just be stated as the 'variance'. This term refers to the fact that a single phoneme (e.g. /p/ as in sPeech or Piety), has a great variety of waveforms that map to it, and that the mapping between an acoustic waveform and a phoneme is far from obvious and heavily context-dependent, yet human listeners reliably give the correct result. Even when the context is similar, a waveform will show a great deal of variance due to factors such as the pace of speech, the identity of the speaker and the tone in which he is speaking. So while there is no agreed-upon model of speech perception, the existing models can be split into two classes: Passive Perception and Active perception.
Passive Perception Models
Passive perception theories generally describe the problem of speech perception in the same way that most sensory signal-processing algorithms do: Some raw input signal goes in, and is processed though a hierarchy where each subsequent step extracts some increasingly abstract signal from the input. One of the early examples of a passive model was distinctive feature theory. The idea is to identify the presence of sets of binary values for certain features. For example, 'nasal/oral', 'vocalic/non-vocalic'. The theory is that a phoneme is interpreted as a binary vector of the presence or absence of these features. These features can be extracted from the spectrogram data. Other passive models, such as those described by Selfridge  and Uttley, involve a kind of template-matching, where a hierarchy of processing layers extract features that are increasingly abstract and invariant to certain irrelevant features (such as identity of the speaker when classifying phonemes).
Active Perception Models
An entirely different take on speech perception are active-perception theories. These theories make the point that it would be redundant for the brain to have two parallel systems for speech perception and speech production, given that the ability produce a sound is so closely tied with the ability to identify it - proponents of these theories argue that it would be wasteful and complicated to maintain two separate databases-one containing the programs to identify phonemes, and another to produce them. They argue that speech perception is actually done by attempting to replicate the incoming signal, and thus using the same circuits for phoneme production as for identification. The Motor Theory of speech perception (Liberman et al., 1967), states that speech sounds are identified not by any sort of template matching, but by using the speech-generating mechanisms to try and regenerate a copy of the speech signal. It states that phonemes should not be seen as hidden signals within the speech, but as “cues” that the generating mechanism attempts to reproduce in a pre-speech signal. The theory states that speech-generating regions of the brain learn which speech-precursor signals will produce which sounds by the constant feedback loop of always hearing one's own speech. The babbling of babies, it is argued, is a way of learning this how to generate these “cue” sounds from pre-motor signals.
A similar idea is proposed in the analysis-by-synthesis model, by Stevens and Halle. This describes a generative model which attempts to regenerate a similar signal to the incoming sound. It essentially takes advantage of the fact that speech-generating mechanisms are similar between people, and that the characteristic features that one hears in speech can be reproduced by the speaker. As the speaker hears the sound, the speech centers attempt to generate the signal that's coming in. Comparators give constant feedback on the quality of the regeneration. The 'units of perception', are therefore not so much abstractions of the incoming sound, as pre-motor commands for generating the same speech.
Motor theories took a serious hit when a series of studies on what is now known as Broca's Aphasia were published. This condition impairs one's ability to produce speech sounds, without impairing the ability to comprehend them, whereas motor theory, taken in its original form, states that production and comprehension are done by the same circuits, so impaired speech production should imply impaired speech comprehension. The existence of Broca's aphasia appears to contradicts this prediction.
One of the most influential computational models of speech perception is called TRACE. TRACE is a neural-network-like model, with three layers and a recurrent connection scheme. The first layer extracts features from an input spectrogram in temporal order, basically simulating the cochlea. The second layer extracts phonemes from the feature information, and the third layer extracts words from the phoneme information. The model contains feed-forward (bottom-up) excitatory connections, lateral inhibitory connections, and feedback (top-down) excitatory connections. In this model, each computational unit corresponds to some unit of perception (e.g. the phoneme /p/ or the word "preposterous"). The basic idea is that, based on their input, units within a layer will compete to have the strongest output. The lateral inhibitory connections result in a sort of winner-takes-all circuit, in which the unit with the strongest input will inhibit its neighbors and become the clear winner. The feedback connections allow us to explain the effect of context-dependent comprehension - for example, suppose the phoneme layer, based on its bottom-up inputs, could not decide whether it had heard a /g/ or a /k/, but that the phoneme was preceded by 'an', and followed by 'ry'. Both the /g/ and /k/ units would initially be equally activated, sending inputs up to the word level, which would already contain excited units corresponding to words such as 'anaconda', 'angry', and 'ankle', which had been activated by the preceding 'an'. The excitement of the /g/ or /k/
A cochlear implant (CI) is a surgically implanted electronic device that replaces the mechanical parts of the auditory system by directly stimulating the auditory nerve fibers through electrodes inside the cochlea. Candidates for cochlear implants are people with severe to profound sensorineural hearing loss in both ears and a functioning auditory nervous system. They are used by post-lingually deaf people to regain some comprehension of speech and other sounds as well as by pre-lingually deaf children to enable them to gain spoken language skills. (Diagnosis of hearing loss in newborns and infants is done using otoacoustic emissions, and/or the recording of auditory evoked potentials.) A quite recent evolution is the use of bilateral implants allowing recipients basic sound localization.
Parts of the cochlear implant
The implant is surgically placed under the skin behind the ear. The basic parts of the device include:
- a microphone which picks up sound from the environment
- a speech processor which selectively filters sound to prioritize audible speech and sends the electrical sound signals through a thin cable to the transmitter,
- a transmitter, which is a coil held in position by a magnet placed behind the external ear, and transmits the processed sound signals to the internal device by electromagnetic induction,
- a receiver and stimulator secured in bone beneath the skin, which converts the signals into electric impulses and sends them through an internal cable to electrodes,
- an array of up to 24 electrodes wound through the cochlea, which send the impulses to the nerves in the scala tympani and then directly to the brain through the auditory nerve system
Signal processing for cochlear implants
In normal hearing subjects, the primary information carrier for speech signals is the envelope, whereas for music, it is the fine structure. This is also relevant for tonal languages, like Mandarin, where the meaning of words depends on their intonation. It was also found that interaural time delays coded in the fine structure determine where a sound is heard from rather than interaural time delays coded in the envelope, although it is still the speech signal coded in the envelope that is perceived.
The speech processor in a cochlear implant transforms the microphone input signal into a parallel array of electrode signals destined for the cochlea. Algorithms for the optimal transfer function between these signals are still an active area of research. The first cochlear implants were single-channel devices. The raw sound was band-passed filtered to include only the frequency range of speech, then modulated onto a 16 kHz wave to allow the electrical signal to electrically couple to the nerves. This approach was able to provide very basic hearing, but was extremely limited in that it was completely unable to take advantage of the frequency-location map of the cochlea.
The advent of multi-channel implants opened the door to try a number of different speech-processing strategies to facilitate hearing. These can be roughly divided into Waveform and Feature-Extraction strategies.
These generally involve applying a non-linear gain on the sound (as an input audio signal with a ~30dB dynamic range must be compressed into an electrical signal with just a ~5dB dynamic range), and passing it through parallel filter banks. The first waveform strategy to be tried was Compressed Analog approach. In this system, the raw audio is initially filtered with a gain-controlled amplifier (the gain-control reduces the dynamic range of the signal). The signal is then passed through parallel band-pass filters, and the output of these filters goes on to stimulate electrodes at their appropriate locations.
A problem with the Compressed Analog approach was that the there was a strong interaction-effect between adjacent electrodes. If electrodes driven by two filters happened to be stimulating at the same time, the superimposed stimulation could cause unwanted distortion in the signals coming from hair cells that were within range of both of these electrodes. The solution to this was the Continuous Interleaved Sampling Approach - in which the electrodes driven by adjacent filters stimulate at slightly different times. This eliminates the interference effect between nearby electrodes, but introduces the problem that, due to the interleaving, temporal resolution suffers.
These strategies focus less on transmitting filtered versions of the audio signal and more on extracting more abstract features of the signal and transmitting them to the electrodes. The first feature-extraction strategies looked for the formants (frequencies with maximum energy) in speech. In order to do this, they would apply wide band filters (e.g. 270 Hz low-pass for F0 - the base formant, 300 Hz-1 kHz for F1, and 1 kHz-4 kHz for F2), then calculate the formant frequency, using the zero-crossings of each of these filter outputs, and formant-amplitude by looking at the envelope of the signals from each filter. Only electrodes corresponding to these formant frequencies would be activated. The main limitation of this approach was that formants primarily identify vowels, and consonant information, which primarily resides in higher frequencies, was poorly transmitted. The MPEAK system later improved on this design my incorporating high-frequency filters which could better simulate unvoiced sounds (consonants) by stimulating high-frequency electrodes, and formant frequency electrodes at random intervals.
Currently, the leading strategy is the SPEAK system, which combines characteristics of Waveform and Feature-Detection strategies. In this system, the signal passes through a parallel array of 20 band-pass filters. The envelope is extracted from each of these and several of the most powerful frequencies are selected (how many depends on the shape of the spectrum), and the rest are discarded. This is known as a 'n-of-m" strategy. The amplitudes of these are then logarithmically compressed to adapt the mechanical signal range of sound to the much narrower electrical signal range of hair cells.
On its newest implants, the company Cochlea uses 3 microphones instead of one. The additional information is used for beam-forming, i.e. extracting more information from sound coming from straight ahead. This can improve the signal-to-noise ratio when talking to other people by up to 15dB, thereby significantly enhancing speech perception in noisy environments.
Integration CI – Hearing Aid
Preservation of low-frequency hearing after cochlear implantation is possible with careful surgical technique and with careful attention to electrode design. For patients with remaining low-frequency hearing, the company MedEl offers a combination of a cochlea implant for the higher frequencies, and classical hearing aid for the lower frequencies. This system, called EAS for electric-acoustic stimulation, uses with a lead of 18mm, compared to 31.5 mm for the full CI. (The length of the cochlea is about 36 mm.) This results in a significant improvement of music perception, and improved speech recognition for tonal languages.
For high frequencies, the human auditory system uses only tonotopic coding for information. For low frequencies, however, also temporal information is used: the auditory nerve fires synchronously with the phase of the signal. In contrast, the original CIs only used the power spectrum of the incoming signal. In its new models, MedEl incorporates the timing information for low frequencies, which it calls fine structure, in determining the timing of the stimulation pulses. This improves music perception, and speech perception for tonal languages like Mandarin.
Mathematically, envelope and fine-structure of a signal can be elegantly obtained with the Hilbert Transform (see Figure). The corresponding Python code is available under.
The numbers of electrodes available is limited by the size of the electrode (and the resulting charge and current densities), and by the current spread along the endolymph. To increase the frequency specificity, one can stimulate two adjacent electrodes. Subjects report to perceive this as a single tone at a frequency intermediate to the two electrodes.
Simulation of a cochlear implant
Sound processing in cochlear implant is still subject to a lot of research and one of the major product differentiations between the manufacturers. However, the basic sound processing is rather simple and can be implemented to gain an impression of the quality of sound perceived by patients using a cochlear implant. The first step in the process is to sample some sound and analyze its frequency. Then a time-window is selected, during which we want to find the stimulation strengths of the CI electrodes. There are two ways to achieve that: i) through the use of linear filters ( see Gammatone filters); or ii) through the calculation of the powerspectrum (see Spectral Analysis).
Cochlear implants and Magnetic Resonance Imaging
With more than 150 000 implantations worldwide, Cochlear Implants (CIs) have now become a standard method for treating severe to profound hearing loss. Since the benefits of CIs become more evident, payers become more willing to support CIs and due to the screening programs of newborns in most industrialized nations, many patients get CIs in infancy and will likely continue to have them throughout their lives. Some of them may require diagnostic scanning during their lives which may be assisted by imaging studies with Magnetic resonance imaging (MRI). For large segments of the population, including patients suffering from stroke, back pain or headache, MRI has become a standard method for diagnosis. MRI uses pulses of magnetic fields to generate images and current MRI machines are working with 1.5 Tesla magnet fields. 0.2 to 4.0 Tesla devices are common and the radiofrequency power can peak as high as 6 kW in a 1.5 Tesla machine.
Cochlear implants have been historically thought to incompatible with MRI with magnetic fields higher than 0.2 T. The external parts of the device always have to be removed. There are different regulations for the internal parts of the device. Current US Food and Drug Administration (FDA) guidelines allow limited use of MRI after CI implantation. The pulsar and Sonata (MED-EL Corp, Innsbruck, Austria) devices are approved for 0.2 T MRI with the magnet in place. The Hi-res 90K (Advanced Bionics Corp, Sylmar, CA, USA) and the Nucleus Freedom (Cochlear Americas, Englewood, CO, USA) are approved for up to 1.5 T MRI after surgical removal of the internal magnet. Each removal and replacement of the magnet can be done using a small incision under local anesthesia, but the procedure is likely to weaken the pocket of the magnet and to risk infection of the patient.
Cadaver studies have shown that there is a risk that the implant may be displaced from the internal device in a 1.5 T MRI scanner. However, the risk could be eliminated when a compression dressing was applied. Nevertheless, the CI produces an artifact that could potentially reduce the diagnostic value of the scan. The size of the artifact will be larger relative to the size of the patient’s head and this might be particularly challenging for MRI scans with children. A recent study by Crane et al., 2010 found out that the artifact around the area of the CI had a mean anterior-posterior dimension of 6.6 +/- 1.5 cm (mean +/- standard deviation) and a left-right dimension averaging 4.8 +/- 1.0 cm (mean +/- standard deviation) (Crane et al., 2010). ()
Computer Simulations of the Auditory System
Working with Sound
Audio signals can be stored in a variety of formats. They can be uncompressed or compressed, and the encoding can be open or proprietary. On Windows systems, the most common format is the WAV-format. It contains a header with information about the number of channels, sample rate, bits per sample etc. This header is followed by the data themselves. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.
Many programing languages provide commands for reading and writing WAV-files. When working with data in other formats, you have two options:
- You can either you convert them into WAV-format, and go on from there. A very comprehensive free cross-platform solution to record, convert and stream audio and video is ffmpeg (http://www.ffmpeg.org/).
- Or you can obtain special programs moduls for reading/writing the desired format.
Reminder of Fourier Transformations
To transform a continuous function, one uses the Fourier Integral:
where k represents frequency. Note that F(k) is a complex value: its absolute value gives us the amplitude of the function, and its phase defines the phase-shift between cosine and sine components.
The inverse transform is given by
If the data are sampled with a constant sampling frequency and there are N data points,
The coefficients Fn can be obtained by
Since there are a discrete, limited number of data points and with a discrete, limited number of waves, this transform is referred to as Discrete Fourier Transform (DFT). The Fast Fourier Transform (FFT) is just a special case of the DFT, where the number of points is a power of 2: .
Note that each is a complex number: its magnitude defines to the amplitude of the corresponding frequency component in the signal; and the phase of defines the corresponding phase (see illustration). If the signal in the time domain "f(t)" is real valued, as is the case with most measured data, this puts a constraint on the corresponding frequency components: in that case we have
A frequent source of confusion is the question: “Which frequency corresponds to ?” If there are N data points and the sampling period is , the frequency is given by
In other words, the lowest frequency is [in Hz], while the highest independent frequency is due to the Nyquist-Shannon theorem. Note that in MATLAB, the first return value corresponds to the offset of the function, and the second value to n=1!
Spectral Analysis of Biological Signals
Power Spectrum of Stationary Signals
Most FFT functions and algorithms return the complex Fourier coefficients . If we are only interested in the magnitude of the contribution at the corresponding frequency, we can obtain this information by
This is the power spectrum of our signal, and tells us how big the contribution of the different frequencies is.
Power Spectrum of Non-stationary Signals
Often one has to deal with signals that are changing their characteristics over time. In that case, one wants to know how the power spectrum changes with time. The simplest way is to take only a short segment of data at a time, and calculate the corresponding power spectrum. This approach is called Short Time Fourier Transform (STFT). However in that case edge effects can significantly distort the signals, since we are assuming that our signal is periodic.
To eliminate edge artifacts, the signals can be filtered, or "windowed". An examples of such a window is shown in the figure above. While some windows provide better frequency resolution (e.g. the rectangular window), others exhibit fewer artifacts such as spectral leakage (e.g. Hanning window). For a selected section of the signal, the data resulting from windowing are obtained by multiplying the signal with the window (left Figure):
An example can show how cutting a signal, and applying a window to it, can affect the spectral power distribution, is shown in the right figure above. (The corrsponding Python code can be found at  ) Note that decreasing the width of the sample window increases the width of the corresponding powerspectrum!
Stimulation strength for one time window
To obtain the power spectrum for one selected time window, the first step is to calculate the power spectrum through the Fast Fourier Transform (FFT) of the time signal. The result is the sound intensity in frequency domain, and the corresponding frequencies. The second step is to concentrate those intensities on a few distinct frequencies ("binning"). The result is a sound signal consisting of a few distinct frequencies - the location of the electrodes in the simulated cochlea. Back conversion into the time domain gives the simulated sound signal for that time window.
The following Python function does sound processing on a given signal.
import numpy as np def pSpect(data, rate): '''Calculation of power spectrum and corresponding frequencies, using a Hamming window''' nData = len(data) window = np.hamming(nData) fftData = np.fft.fft(data*window) PowerSpect = fftData * fftData.conj() / nData freq = np.arange(nData) * float(rate) / nData return (np.real(PowerSpect), freq) def calc_stimstrength(sound, rate=1000, sample_freqs=[100, 200, 400]): '''Calculate the stimulation strength for a given sound''' # Calculate the powerspectrum Pxx, freq = pSpect(sound, rate) # Generate matrix to sum over the requested bins num_electrodes = len(sample_freqs) sample_freqs = np.hstack((0, sample_freqs)) average_freqs = np.zeros([len(freq), num_electrodes]) for jj in range(num_electrodes): average_freqs[((freq>sample_freqs[jj]) * (freq<sample_freqs[jj+1])),jj] = 1 # Calculate the stimulation strength (the square root has to be taken, to get the amplitude) StimStrength = np.sqrt(Pxx).dot(average_freqs) return StimStrength
Sound Transduction by Pinna and Outer Ear
The outer ear is divided into two parts: the visible part on the side of the head (the pinna), and the external auditory meatus (outer ear canal) leading to the eardrum, as shown in the figure below. With such a structure, the outer ear contributes the ‘spectral cues’ for people’s sound localization abilities, making people not only have the ability to detect and identify a sound, but also have the ability to localize a sound source. 
The Pinna’s cone shape enables it to gather sound waves and funnel them into the out ear canal. On top of that, its various folds make the pinna a resonant cavity which amplifies certain frequencies. Furthermore, the interference effects resulting from the sound reflection caused by the pinna are directionally dependent and will attenuate other frequencies. Therefore, the pinna could be simulated as a filter function applied to the incoming sound, modulating its amplitude and phase spectra.
The resonance of the pinna cavity can be approximated well by 6 normal modes . Among these normal modes, the first mode, which mainly depends on the concha depth (i.e. the depth of the bowl-shaped part of the pinna nearest the ear canal), is the dominant one.
The cancellation of certain frequencies caused by the pinna reflection is called “pinna notch”.  As shown in the right figure , sound transmitted by the pinna goes through two paths, a direct path and a longer reflected path. The different paths have different length, and thereby produce phase differences. When the frequency of incoming sound signal reaches certain criterion, which is that the path difference is half of the sound wavelength, the interference of sounds via direct and reflected paths will be destructive. This phenomenon is called “pinna notch”. Normally the notch frequency could happen in the range from 6k Hz to 16k Hz depending on the pinna shape. It is also seen that the frequency response of pinna is directionally dependent. This makes the pinna contribute to the spatial cues for sound localization.
Ear Canal Function
The outer ear canal is approximately 25 mm long and 8 mm in diameter, with a tortuous path from the entrance of the canal to the eardrum. The outer ear canal can be modeled as a cylinder closed at one end which leads to a resonant frequency around 3k Hz. This way the outer ear canal amplifies sounds in a frequency range important for human speech. 
Simulation of Outer Ear
Based on the main functions of the outer ear, it is easy to simulate the sound transduction by the pinna and outer ear canal with a filter, or a filter bank, if we know the characteristics of the filter.
Many researchers are working on the simulation of human auditory system, which includes the simulation of the outer ear. In the next chapter, a Pinna-Related Transfer Function model is first introduced, followed by two MATLAB toolboxes developed by Finnish and British research groups, respectively.
Model of Pinna-Related Transfer Function by Spagnol
This part is entirely from the paper published by S.Spagnol, M.Geronazzo, and F.Avanzini.  In order to model the functions of the pinna, Spagnol developed a reconstruction model of the Pinna-Related Transfer Function (PRTF), which is a frequency response characterizing how sound is transduced by the pinna. This model is composed by two distinct filter blocks, accounting for resonance function and reflection function of the pinna respectively, as shown in the figure below.
and is the sampling frequency, the central frequency, and the notch depth.
For the reflection part, three second-order notch filters of the form  are designed with the parameters including center frequency , notch depth , and bandwidth .
where is the same as previously defined for the resonance function, and
each accounting for a different spectral notch.
By cascading the three in-series placed notch filters after the parallel two peak filters, an eighth-order filter is designed to model the PRTF.
By comparing the synthetic PRTF with the original one, as shown in the figures below, Spagnol concluded that the synthesis model for PRTF was overall effective. This model may have missing notches due to the limitation of cutoff frequency. Approximation errors may also be brought in due to the possible presence of non-modeled interfering resonances.
HUTear MATLAB Toolbox
HUTear is a MATLAB Toolbox for auditory modeling developed by Lab of Acoustics and Audio Signal Processing at Helsinki University of Technology . This open source toolbox could be downloaded from here. The structure of the toolbox is shown in the right figure.
In this model, there is a block for “Outer and Middle Ear” (OME) simulation. This OME model is developed on the basis of Glassberg and Moor . The OME filter is usually a linear filter. Auditory filter is generated with taking the "Equal Loudness Curves at 60 dB"(ELC)/"Minimum Audible Field"(MAF)/"Minimum Audible Pressure at ear canal"(MAP) correction into account. This model accounts for the outer ear simulation. By specifying different parameters with the "OEMtool", you may compare the MAP IIR approximation and MAP data, as shown in the figure below.
MATLAB Model of the Auditory Periphery (MAP)
MAP is developed by researchers in the Hearing Research Lab at University of Essex, England . Being a computer model of physiological basis of human hearing, MAP is an open-source code package for testing, developing the model, which could be downloaded from here. Its model structure is shown in the right figure.
Within the MAP model, there is the “Outer Middle Ear (OME)” sub-model, allowing the user to test and create an OME model. In this OME model, the function of the outer ear is modeled as a resonance function. The resonances are composed by two parallel bandpass filters, respectively, representing concha resonance and outer ear canal resonance. These two filters are specified by the pass frequency range, gain and order. By adding the output of resonance filters to the original sound pressure wave, the output of the outer ear model is obtained.
To test the OME model, run the function named “testOME.m”. A figure plotting the external ear resonances and stapes peak displacement will be displayed. (as shown in the figure below)
The outer ear, including pinna and outer ear canal, can be simulated as a linear filter, or a filter bank. This reflects its resonance and reflection effect to incoming sound. It is worth noting that since the pinna shape varies from person to person, the model parameters, like the resonant frequencies, depend on the subject.
One aspect not included in the models described above is the Head-Related Transfer Function(HRTF). The HRTF describes how an ear receives a sound from a point sound source in space. It is not introduced here because it goes beyond the effect of the outer ear (pinna and outer ear canal) as it is also influenced by the effects of head and torso. There are plenty of literature and publications for HRTF for the interested reader.(wiki, tutorial 1,2, reading list for spatial audio research including HRTF)
Simulation of the Inner Ear
The shape and organisation of the basilar membrane means that different frequencies resonate particularly strongly at different points along the membrance. This leads to a tonotopic organisation of the sensitivity to frequency ranges along the membrane, which can be modeled as being an array of overlapping band-pass filters known as "auditory filters". The auditory filters are associated with points along the basilar membrane and determine the frequency selectivity of the cochlea, and therefore the listener’s discrimination between different sounds. They are non-linear, level-dependent and the bandwidth decreases from the base to apex of the cochlea as the tuning on the basilar membrane changes from high to low frequency. The bandwidth of the auditory filter is called the critical bandwidth, as first suggested by Fletcher (1940). If a signal and masker are presented simultaneously then only the masker frequencies falling within the critical bandwidth contribute to masking of the signal. The larger the critical bandwidth the lower the signal-to-noise ratio (SNR) and the more the signal is masked.
Another concept associated with the auditory filter is the "equivalent rectangular bandwidth" (ERB). The ERB shows the relationship between the auditory filter, frequency, and the critical bandwidth. An ERB passes the same amount of energy as the auditory filter it corresponds to and shows how it changes with input frequency. At low sound levels, the ERB is approximated by the following equation according to Glasberg and Moore:
where the ERB is in Hz and F is the centre frequency in kHz.
One filter type used to model the auditory filters is the "gammatone filter". It provides a simple linear filter for describing the movement of one location of the basilar membrane for a given sound input, which is therefore easy to implement. Linear filters are popular for modeling different aspects of the auditory system. In general, they are IIR-filters (infinite impulse response) incorporating feedforward and feedback, which are defined by
where a1=1. In other words, the coefficients ai and bj uniquely determine this type of filter. The feedback-character of these filters can be made more obvious by re-shuffling the equation
(In contrast, FIR-filters, or finite impulse response filters, only involve feedforward: for them for i>1.)
Linear filters cannot account for nonlinear aspects of the auditory system. They are nevertheless used in a variety of models of the auditory system. The gammatone impulse response is given by
where is the frequency, is the phase of the carrier, is the amplitude, is the filter's order, is the filter's bandwidth, and is time.
This is a sinusoid with an amplitude envelope which is a scaled gamma distribution function.
Variations and improvements of the gammatone model of auditory filtering include the gammachirp filter, the all-pole and one-zero gammatone filters, the two-sided gammatone filter, and filter cascade models, and various level-dependent and dynamically nonlinear versions of these.
For computer simulations, efficient implementations of gammatone models are availabel for Matlab and for Python .
When working with gammatone filters, we can elegantly exploit Parseval's Theorem to determine the energy in a given frequency band:
- T. Haslwanter (2012). "Hodgkin-Huxley Simulations [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/HH_model.py.
- T. Haslwanter (2012). "Fitzhugh-Nagumo Model [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/Fitzhugh_Nagumo.py.
- T. Anastasio (2010). "Tutorial on Neural systems Modeling". http://www.sinauer.com/detail.php?id=3396.
The main function of the balance system, or vestibular system, is to sense head movements, especially involuntary ones, and counter them with reflexive eye movements and postural adjustments that keep the visual world stable and keep us from falling. An excellent, more extensive article on the vestibular system is available on Scholorpedia . An extensive review of our current knowledge about the vestibular system can be found in "The Vestibular System: a Sixth Sense" by J Goldberg et al .
Anatomy of the Vestibular System
Together with the cochlea, the vestibular system is carried by a system of tubes called the membranous labyrinth. These tubes are lodged within the cavities of the bony labyrinth located in the inner ear. A fluid called perilymph fills the space between the bone and the membranous labyrinth, while another one called endolymph fills the inside of the tubes spanned by the membranous labyrinth. These fluids have a unique ionic composition suited to their function in regulating the electrochemical potential of hair cells, which are as we will later see the transducers of the vestibular system. The electric potential of endolymph is of about 80 mV more positive than perilymph.
Since our movements consist of a combination of linear translations and rotations, the vestibular system is composed of two main parts: The otolith organs, which sense linear accelerations and thereby also give us information about the head’s position relative to gravity, and the semicircular canals, which sense angular accelerations.
|Human bony labyrinth (Computed tomography 3D)||Internal structure of the human labyrinth|
The otolith organs of both ears are located in two membranous sacs called the utricle and the saccule which primary sense horizontal and vertical accelerations, respectively. Each utricle has about 30'000 hair cells, and each saccule about 16'000. The otoliths are located at the central part of the labyrinth, also called the vestibule of the ear. Both utricle and saccule have a thickened portion of the membrane called the macula. A gelatinous membrane called the otolthic membrane sits atop the macula, and microscopic stones made of calcium carbonate crystal, the otoliths, are embedded on the surface of this membrane. On the opposite side, hair cells embedded in supporting cells project into this membrane.
Each ear has three semicircular canals. They are half circular, interconnected membranous tubes filled with endolymph and can sense angular accelerations in the three orthogonal planes. The radius of curvature of the human horizontal semicircular canal is 3.2 mm .
The canals on each side are approximately orthogonal to each other. The orientation of the on-directions of the canals on the right side are :
(The axes are oriented such that the positive x-,y-,and z-axis point forward, left, and up, respectively. The horizontal plane is defined by Reid's line, the line connecting the lower rim of the orbita and the center of the external auditory canal. And the directions are such that a rotation about that vector, according to the right-hand-rule, excites the corresponding canal.) The anterior and posterior semicircular canals are approximately vertical, and the horizontal semicircular canals approximately horizontal.
Each canal presents a dilatation at one end, called the ampulla. Each membranous ampulla contains a saddle-shaped ridge of tissue, the crista, which extends across it from side to side. It is covered by neuroepithelium, with hair cells and supporting cells. From this ridge rises a gelatinous structure, the cupula, which extends to the roof of the ampulla immediately above it, dividing the interior of the ampulla into two approximately equal parts.
The sensors within both the otolith organs and the semicircular canals are the hair cells. They are responsible for the transduction of a mechanical force into an electrical signal and thereby build the interface between the world of accelerations and the brain.
Hair cells have a tuft of stereocilia that project from their apical surface. The thickest and longest stereocilia is the kinocilium. Stereocilia deflection is the mechanism by which all hair cells transduce mechanical forces. Stereocilia within a bundle are linked to one another by protein strands, called tip links, which span from the side of a taller stereocilium to the tip of its shorter neighbor in the array. Under deflection of the bundle, the tip links act as gating springs to open and close mechanically sensitive ion channels. Afferent nerve excitation works basically the following way: when all cilia are deflected toward the kinocilium, the gates open and cations, including potassium ions from the potassium rich endolymph, flow in and the membrane potential of the hair cell becomes more positive (depolarization). The hair cell itself does not fire action potentials. The depolarization activates voltage-sensitive calcium channels at the basolateral aspect of the cell. Calcium ions then flow in and trigger the release of neurotransmitters, mainly glutamate, which in turn diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger an increase of the action potentials firing rate in the nerve. On the other hand, afferent nerve inhibition is the process induced by the bending of the stereocilia away from the kinocilium (hyperpolarization) and by which the firing rate is decreased. Because the hair cells are chronically leaking calcium, the vestibular afferent nerve fires actively at rest and thereby allows the sensing of both directions (increase and decrease of firing rate). Hair cells are very sensitive and respond extremely quickly to stimuli. The quickness of hair cell response may in part be due to the fact that they must be able to release neurotransmitter reliably in response to a threshold receptor potential of only 100 µV or so.
Regular and Irregular Haircells
While afferent haircells in the auditory system are fairly homogeneous,those in the vestibular system can be broadly separated into two groups: "regular units" and "irregular units". Regular haircells have approximately constant interspike intervals, and fire constantly proportional to their displacement. In contrast, the inter-spike interval of irregular haircells is much more variable, and their discharge rate increases with increasing frequency; they can thus act as event detectors at high frequencies. Regular and irregular haircells also differ in their location, morphology and innervation.
Peripheral Signal Transduction
Transduction of Linear Acceleration
The hair cells of the otolith organs are responsible for the transduction of a mechanical force induced by linear acceleration into an electrical signal. Since this force is the product of gravity plus linear movements of the head
it is therefore sometimes referred to as gravito-inertial force. The mechanism of transduction works roughly as follows: The otoconia, calcium carbonate crystals in the top layer of the otoconia membrane, have a higher specific density than the surrounding materials. Thus a linear acceleration leads to a displacement of the otoconia layer relative to the connective tissue. The displacement is sensed by the hair cells. The bending of the hairs then polarizes the cell and induces afferent excitation or inhibition.
While each of the three semicircular canals senses only one-dimensional component of rotational acceleration, linear acceleration may produce a complex pattern of inhibition and excitation across the maculae of both the utricle and saccule. The saccule is located on the medial wall of the vestibule of the labyrinth in the spherical recess and has its macula oriented vertically. The utricle is located above the saccule in the elliptical recess of the vestibule, and its macula is oriented roughly horizontally when the head is upright. Within each macula, the kinocilia of the hair cells are oriented in all possible directions.
Therefore, under linear acceleration with the head in the upright position, the saccular macula is sensing acceleration components in the vertical plane, while the utricular macula is encoding acceleration in all directions in the horizontal plane. The otolthic membrane is soft enough that each hair cell is deflected proportional to the local force direction. If denotes the direction of maximum sensitivity or on-direction of the hair cell, and the gravito-inertial force, the stimulation by static accelerations is given by
The direction and magnitude of the total acceleration is then determined from the excitation pattern on the otolith maculae.
Transduction of Angular Acceleration
The three semicircular canals are responsible for the sensing of angular accelerations. When the head accelerates in the plane of a semicircular canal, inertia causes the endolymph in the canal to lag behind the motion of the membranous canal. Relative to the canal walls, the endolymph effectively moves in the opposite direction as the head, pushing and distorting the elastic cupula. Hair cells are arrayed beneath the cupula on the surface of the crista and have their stereocilia projecting into the cupula. They are therefore excited or inhibited depending on the direction of the acceleration.
This facilitates the interpretation of canal signals: if the orientation of a semicircular canal is described by the unit vector , the stimulation of the canal is proportional to the projection of the angular velocity onto this canal
The horizontal semicircular canal is responsible for sensing accelerations around a vertical axis, i.e. the neck. The anterior and posterior semicircular canals detect rotations of the head in the sagittal plane, as when nodding, and in the frontal plane, as when cartwheeling.
In a given cupula, all the hair cells are oriented in the same direction. The semicircular canals of both sides also work as a push-pull system. For example, because the right and the left horizontal canal cristae are “mirror opposites” of each other, they always have opposing (push-pull principle) responses to horizontal rotations of the head. Rapid rotation of the head toward the left causes depolarization of hair cells in the left horizontal canal's ampulla and increased firing of action potentials in the neurons that innervate the left horizontal canal. That same leftward rotation of the head simultaneously causes a hyperpolarization of the hair cells in the right horizontal canal's ampulla and decreases the rate of firing of action potentials in the neurons that innervate the horizontal canal of the right ear. Because of this mirror configuration, not only the right and left horizontal canals form a push-pull pair but also the right anterior canal with the left posterior canal (RALP), and the left anterior with the right posterior (LARP).
Central Vestibular Pathways
The information resulting from the vestibular system is carried to the brain, together with the auditory information from the cochlea, by the vestibulocochlear nerve, which is the eighth of twelve cranial nerves. The cell bodies of the bipolar afferent neurons that innervate the hair cells in the maculae and cristae in the vestibular labyrinth reside near the internal auditory meatus in the vestibular ganglion (also called Scarpa's ganglion, Figure Figure 10.1). The centrally projecting axons from the vestibular ganglion come together with axons projecting from the auditory neurons to form the eighth nerve, which runs through the internal auditory meatus together with the facial nerve. The primary afferent vestibular neurons project to the four vestibular nuclei that constitute the vestibular nuclear complex in the brainstem.
Vestibulo-Ocular Reflex (VOR)
An extensively studied example of function of the vestibular system is the vestibulo-ocular reflex (VOR). The function of the VOR is to stabilize the image during rotation of the head. This requires the maintenance of stable eye position during horizontal, vertical and torsional head rotations. When the head rotates with a certain speed and direction, the eyes rotate with the same speed but in the opposite direction. Since head movements are present all the time, the VOR is very important for stabilizing vision.
How does the VOR work? The vestibular system signals how fast the head is rotating and the oculomotor system uses this information to stabilize the eyes in order to keep the visual image motionless on the retina. The vestibular nerves project from the vestibular ganglion to the vestibular nuclear complex, where the vestibular nuclei integrate signals from the vestibular organs with those from the spinal cord, cerebellum, and the visual system. From these nuclei, fibers cross to the contralateral abducens nucleus. There they synapse with two additional pathways. One pathway projects directly to the lateral rectus muscle of eye via the abducens nerve. Another nerve tract projects from the abducens nucleus by the abducens interneurons to the oculomotor nuclei, which contain motor neurons that drive eye muscle activity, specifically activating the medial rectus muscles of the eye through the oculomotor nerve. This short latency connection is sometimes referred to as three-neuron-arc, and allows an eye movement within less than 10 ms after the onset of the head movement.
For example, when the head rotates rightward, the following occurs. The right horizontal canal hair cells depolarize and the left hyperpolarize. The right vestibular afferent activity therefore increases while the left decreases. The vestibulocochlear nerve then carries this information to the brainstem and the right vestibular nuclei activity increases while the left decreases. This makes in turn neurons of the left abducens nucleus and the right oculomotor nucleus fire at higher rate. Those in the left oculomotor nucleus and the right abducens nucleus fire at a lower rate. This results in the fact than the left lateral rectus extraocular muscle and the right medial rectus contract while the left medial rectus and the right lateral rectus relax. Thus, both eyes rotate leftward.
The gain of the VOR is defined as the change in the eye angle divided by the change in the head angle during the head turn
If the gain of the VOR is wrong, that is, different than one, then head movements result in image motion on the retina, resulting in blurred vision. Under such conditions, motor learning adjusts the gain of the VOR to produce more accurate eye motion. Thereby the cerebellum plays an important role in motor learning.
The Cerebellum and the Vestibular System
It is known that postural control can be adapted to suit specific behavior. Patient experiments suggest that the cerebellum plays a key role in this form of motor learning. In particular, the role of the cerebellum has been extensively studied in the case of adaptation of vestibulo-ocular control. Indeed, it has been shown that the gain of the vestibulo-ocular reflex adapts to reach the value of one even if damage occur in a part of the VOR pathway or if it is voluntary modified through the use of magnifying lenses. Basically, there are two different hypotheses about how the cerebellum plays a necessary role in this adaptation. The first from (Ito 1972;Ito 1982) claims that the cerebellum itself is the site of learning, while the second from Miles and Lisberger (Miles and Lisberger 1981) claims that the vestibular nuclei are the site of adaptive learning while the cerebellum constructs the signal that drives this adaptation. Note that in addition to direct excitatory input to the vestibular nuclei, the sensory neurons of the vestibular labyrinth also provide input to the Purkinje cells in the flocculo-nodular lobes of the cerebellum via a pathway of mossy and parallel fibers. In turn, the Purkinje cells project an inhibitory influence back onto the vestibular nuclei. Ito argued that the gain of the VOR can be adaptively modulated by altering the relative strength of the direct excitatory and indirect inhibitory pathways. Ito also argued that a message of retinal image slip going through the inferior olivary nucleus carried by the climbing fiber plays the role of an error signal and thereby is the modulating influence of the Purkinje cells. On the other hand, Miles and Lisberger argued that the brainstem neurons targeted by the Purkinje cells are the site of adaptive learning and that the cerebellum constructs the error signal that drives this adaptation.
Computer Simulation of the Vestibular System
Model without Cupula
Let us consider the mechanical description of the semi-circular canals (SCC). We will make very strong and reductive assumptions in the following description. The goal here is merely to understand the very basic mechanical principles underlying the semicircular canals.
The first strong simplification we make is that a semicircular canal can be modeled as a circular tube of “outer” radius R and “inner” radius r. (For proper hydro mechanical derivations see (Damiano and Rabbitt 1996) and Obrist (2005)). This tube is filled with endolymph.
The orientation of the semicircular canal can be described, in a given coordinate system, by a vector that is perpendicular to the plane of the canal. We will also use the following notations:
- Rotation angle of tube [rad]
- Angular velocity of the tube [rad/s]
- Angular acceleration of the tube [rad/s^2]
- Rotation angle of the endolymph inside the tube [rad], and similar notation for the time derivatives
- movement between the tube and the endolymph [rad].
Note that all these variables are scalar quantities. We use the fact that the angular velocity of the tube can be viewed as the projection of the actual angular velocity vector of the head onto the plane of the semicircular canal described by to go from the 3D environment of the head to our scalar description. That is,
where the standard scalar product is meant with the dot.
To characterize the endolymph movement, consider a free floating piston, with the same density as the endolymph. Two forces are acting on the system:
- The inertial moment , where I characterizes the inertia of the endolymph.
- The viscous moment , caused by the friction of the endolymph on the walls of the tube.
This gives the equation of motion
Substituting and integrating gives
Let us now consider the example of a velocity step of constant amplitude . In this case, we obtain a displacement
and for , we obtain the constant displacement
Now, let us derive the time constant . Fora thin tube, , the inertia is approximately given by
From the Poiseuille-Hagen Equation, the force F from a laminar flow with velocity v in a thin tube is
where is the volume flow per second, the viscosity and the length of the tube.
With the torque and the relative angular velocity , substitution provides
Finally, this gives the time constant
For the human balance system, replacing the variables with experimentally obtained parameters yields a time constant of about 0.01 s. This is brief enough that in equation (10.5) the can be replaced by " = ". This gives a system gain of
Model with Cupula
Our discussion until this point has not included the role of the cupula in the SCC: The cupula acts as an elastic membrane that gets displaced by angular accelerations. Through its elasticity the cupula returns the system to its resting position. The elasticity of the cupula adds an additional elastic term to the equation of movement. If it is taken into account, this equation becomes
An elegant way to solve such differential equations is the Laplace-Transformation. The Laplace transform turns differential equations into algebraic equations: if the Laplace transform of a signal x(t) is denoted by X(s), the Laplace transform of the time derivative is
The term x(0) details the starting condition, and can often be set to zero by an appropriate choice of the reference position. Thus, the Laplace transform is
where "~" indicates the Laplace transformed variable. With from above, and defined by
we get the
For humans, typical values for are about 5 sec.
To find the poles of this transfer function, we have to determine for which values of s the denominator equals 0:
Since , and since
Typically we are interested in the cupula displacement as a function of head velocity :
For typical head movements (0.2 Hz < f < 20Hz), the system gain is approximately constant. In other words, for typical head movements the cupula displacement is proportional to the angular head velocity!
For Linear, Time-Invariant systems (LTI systems), the input and output have a simple relationship in the frequency domain :
where the transfer function G(s) can be expressed by the algebraic function
In other words, specifying the coefficients of the numerator (n) and denominator (d) uniquely characterizes the transfer function. This notation is used by some computational tools to simulate the response of such a system to a given input.
Different tools can be used to simulate such a system. For example, the response of a low-pass filter with a time-constant of 7 sec to an input step at 1 sec has the following transfer function
and can be simulated as follows:
If you work on the command line, you can use the Control System Toolbox of MATLAB or the module signal of the Python package SciPy:
MATLAB Control System Toolbox:
% Define the transfer function num = ; tau = 7; den = [tau, 1]; mySystem = tf(num,den) % Generate an input step t = 0:0.1:30; inSignal = zeros(size(t)); inSignal(t>=1) = 1; % Simulate and show the output [outSignal, tSim] = lsim(mySystem, inSignal, t); plot(t, inSignal, tSim, outSignal);
Python - SciPy:
# Import required packages import numpy as np import scipy.signal as ss import matplotlib.pylab as mp # Define transfer function num =  tau = 7 den = [tau, 1] mySystem = ss.lti(num, den) # Generate inSignal t = np.arange(0,30,0.1) inSignal = np.zeros(t.size) inSignal[t>=1] = 1 # Simulate and plot outSignal tout, outSignal, xout = ss.lsim(mySystem, inSignal, t) mp.plot(t, inSignal, tout, outSignal) mp.show()
Consider now the mechanics of the otolith organs. Since they are made up by complex, visco-elastic materials with a curved shape, their mechanics cannot be described with analytical tools. However, their movement can be simulated numerically with the finite element technique. Thereby the volume under consideration is divided into many small volume elements, and for each element the physical equations are approximated by analytical functions.
Here we will only show the physical equations for the visco-elastic otolith materials. The movement of each elastic material has to obey Cauchy’s equations of motion:
where is the effective density of the material, the displacements along the i-axis, the i-component of the volume force, and the components of the Cauchy’s strain tensor. are the coordinates.
For linear elastic, isotropic material, Cauchy’s strain tensor is given by
where and are the Lamé constants; is identical with the shear modulus. , and is the stress tensor
This leads to Navier’s Equations of motion
This equation holds for purely elastic, isotropic materials, and can be solved with the finite element technique. A typical procedure to find the mechanical parameters that appear in this equation is the following: when a cylindrical sample of the material is put under strain, the Young coefficient E characterizes the change in length, and the Poisson’s ratio the simultaneous decrease in diameter. The Lamé constants and are related to E and by:
Central Vestibular Processing
Central processing of vestibular information significantly affects the perceived orientation and movement in space. The corresponding information processing in the brainstem can often be modeled efficiently with control-system tools. As a specific example, we show how to model the effect of velocity storage.
The concept of velocity storage is based on the following experimental finding: when we abruptly stop from a sustained rotation about an earth-vertical axis, the cupula is deflected by the deceleration, but returns to its resting state with a time-constant of about 5 sec. However, the perceived rotation continues much longer, and decreases with a much longer time constant, typically somewhere between 15 and 20 sec.
In the attached figure, the response of the canals to an angular velocity stimulus ω is modeled by the transfer function C, here a simple high-pass filter with a time constant of 5 sec. (The canal response is determined by the deflection of the cupula, and is approximately proportional to the neural firing rate.) To model the increase in time constant, we assume that the central vestibular system has an internal model of the transfer function of the canals, . Based on this internal model, the expected firing rate of the internal estimate of the angular velocity, , is compared to the actual firing rate. With a the gain-factor k set to 2, the output of the model nicely reproduces the increase in the time constant. The corresponding Python code can be found at .
It is worth noting that this feedback loop can be justified physiologically: we know that there are strong connections between the left and right vestibular nuclei. If those connections are severed, the time constant of the perceived rotation decreases to the peripheral time-constant of the semicircular canals.
Mathematically, negative feedback with a high gain has the interesting property that it can practically invert the transfer function in the negative feedback loop: if k>>1, and if the internal model of the canal transfer function is similar to the actual transfer function, the estimated angular velocity corresponds to the actual angular velocity.
Alcohol and the Vestibular System
As you may or may not know from personal experience, consumption of alcohol can also induce a feeling of rotation. The explanation is quite straightforward, and basically relies on two factors: i) alcohol is lighter than the endolymph; and ii) once it is in the blood, alcohol gets relatively quickly into the cupula, as the cupula has a good blood supply. In contrast, it diffuses only slowly into the endolymph, over a period of a few hours. In combination, this leads to a buoyancy of the cupola soon after you have consumed (too much) alcohol. When you lie on your side, the deflection of the left and right horizontal cupulae add up, and induce a strong feeling of rotation. The proof: just roll on the other side - and the perceived direction of rotation will flip around!
Due to the position of the cupulae, you will experience the strongest effect when you lie on your side. When you lie on your back, the deflection of the left and right cupula compensate each other, and you don't feel any horizontal rotation. This explains why hanging one leg out of the bed slows down the perceived rotation.
The overall effect is minimized in the upright head position - so try to stay up(right) as long as possible during the party!
If you have drunk way too much, the endolymph will contain a significant amount of alcohol the next morning - more so than the cupula. This explains while at that point, a small amount of alcohol (e.g. a small beer) balances the difference, and reduces the feeling of spinning.
Anatomy of the Somatosensory System
Our somatosensory system consists of sensors in the skin and sensors in our muscles, tendons, and joints. The receptors in the skin, the so called cutaneous receptors, tell us about temperature (thermoreceptors), pressure and surface texture (mechano receptors), and pain (nociceptors). The receptors in muscles and joints provide information about muscle length, muscle tension, and joint angles. (The following description is based on lecture notes from Laszlo Zaborszky, from Rutgers University.)
Sensory information from Meissner corpuscles and rapidly adapting afferents leads to adjustment of grip force when objects are lifted. These afferents respond with a brief burst of action potentials when objects move a small distance during the early stages of lifting. In response to rapidly adapting afferent activity, muscle force increases reflexively until the gripped object no longer moves. Such a rapid response to a tactile stimulus is a clear indication of the role played by somatosensory neurons in motor activity.
The slowly adapting Merkel's receptors are responsible for form and texture perception. As would be expected for receptors mediating form perception, Merkel‘s receptors are present at high density in the digits and around the mouth (50/mm2 of skin surface), at lower density in other glabrous surfaces, and at very low density in hairy skin. This innervations density shrinks progressively with the passage of time so that by the age of 50, the density in human digits is reduced to 10/mm2. Unlike rapidly adapting axons, slowly adapting fibers respond not only to the initial indentation of skin, but also to sustained indentation up to several seconds in duration.
Activation of the rapidly adapting Pacinian corpuscles gives a feeling of vibration, while the slowly adapting Ruffini corpuscles respond to the lataral movement or stretching of skin.
|Rapidly adapting||Slowly adapting|
|Surface receptor / small receptive field||Hair receptor, Meissner's corpuscle: Detect an insect or a very fine vibration. Used for recognizing texture.||Merkel's receptor: Used for spatial details, e.g. a round surface edge or "an X" in brail.|
|Deep receptor / large receptive field||Pacinian corpuscle: "A diffuse vibration" e.g. tapping with a pencil.||Ruffini's corpuscle: "A skin stretch". Used for joint position in fingers.|
Nociceptors have free nerve endings. Functionally, skin nociceptors are either high-threshold mechanoreceptors or polymodal receptors. Polymodal receptors respond not only to intense mechanical stimuli, but also to heat and to noxious chemicals. These receptors respond to minute punctures of the epithelium, with a response magnitude that depends on the degree of tissue deformation. They also respond to temperatures in the range of 40-60oC, and change their response rates as a linear function of warming (in contrast with the saturating responses displayed by non-noxious thermoreceptors at high temperatures).
Pain signals can be separated into individual components, corresponding to different types of nerve fibers used for transmitting these signals. The rapidly transmitted signal, which often has high spatial resolution, is called first pain or cutaneous pricking pain. It is well localized and easily tolerated. The much slower, highly affective component is called second pain or burning pain; it is poorly localized and poorly tolerated. The third or deep pain, arising from viscera, musculature and joints, is also poorly localized, can be chronic and is often associated with referred pain.
The thermoreceptors have free nerve endings. Interestingly, we have only two types of thermoreceptors that signal innocuous warmth and cooling respectively in our skin (however, some nociceptors are also sensitive to temperature, but capable of unamibiously signaling only noxious temperatures). The warm receptors show a maximum sensitivity at ~ 45°C, signal temperatures between 30 and 45°C, and cannot unambiguously signal temperatures higher than 45°C , and are unmyelinated. The cold receptors have their maximum sensitivity at ~ 27°C, signal temperatures above 17°C, and some consist of lightly myelinated fibers, while others are unmyelinated. Our sense of temperature comes from the comparison of the signals from the warm and cold receptors. Thermoreceptors are poor indicators of absolute temperature but are very sensitive to changes in skin temperature.
The term proprioceptive or kinesthetic sense is used to refer to the perception of joint position, joint movements, and the direction and velocity of joint movement. There are numerous mechanoreceptors in the muscles, the muscle fascia, and in the dense connective tissue of joint capsules and ligaments. There are two specialized encapsulated, low-threshold mechanoreceptors: the muscle spindle and the Golgi tendon organ. Their adequate stimulus is stretching of the tissue in which they lie. Muscle spindles, joint and skin receptors all contribute to kinesthesia. Muscle spindles appear to provide their most important contribution to kinesthesia with regard to large joints, such as the hip and knee joints, whereas joint receptors and skin receptors may provide more significant contributions with regard to finger and toe joints.
Scattered throughout virtually every striated muscle in the body are long, thin, stretch receptors called muscle spindles. They are quite simple in principle, consisting of a few small muscle fibers with a capsule surrounding the middle third of the fibers. These fibers are called intrafusal fibers, in contrast to the ordinary extrafusal fibers. The ends of the intrafusal fibers are attached to extrafusal fibers, so whenever the muscle is stretched, the intrafusal fibers are also stretched. The central region of each intrafusal fiber has few myofilaments and is non-contractile, but it does have one or more sensory endings applied to it. When the muscle is stretched, the central part of the intrafusal fiber is stretched and each sensory ending fires impulses.
Numerous specializations occur in this simple basic organization, so that in fact the muscle spindle is one of the most complex receptor organs in the body. Only three of these specializations are described here; their overall effect is to make the muscle spindle adjustable and give it a dual function, part of it being particularly sensitive to the length of the muscle in a static sense and part of it being particularly sensitive to the rate at which this length changes.
- Intrafusal muscle fibers are of two types. All are multinucleated, and the central, non-contractile region contains the nuclei. In one type of intrafusal fiber, the nuclei are lined up single file; these are called nuclear chain fiber. In the other type, the nuclear region is broader, and the nuclei are arranged several abreast; these are called nuclear bag fibers. There are typically two or three nuclear bag fibers per spindle and about twice that many chain fibers.
- There are also two types of sensory endings in the muscle spindle. The first type, called the primary ending, is formed by a single Ia (A-alpha) fiber, supplying every intrafusal fiber in a given spindle. Each branch wraps around the central region of the intrafusal fiber, frequently in a spiral fashion, so these are sometimes called annulospiral endings. The second type of ending is formed by a few smaller nerve fibers (II or A-Beta) on both sides of the primary endings. These are the secondary endings, which are sometimes referred to as flower-spray endings because of their appearance. Primary endings are selectively sensitive to the onset of muscle stretch but discharge at a slower rate while the stretch is maintained. Secondary endings are less sensitive to the onset of stretch, but their discharge rate does not decline very much while the stretch is maintained. In other words, both primary and secondary endings signal the static length of the muscle (static sensitivity) whereas only the primary ending signals the length changes (movement) and their velocity (dynamic sensitivity). The change of firing frequency of group Ia and group II fibers can then be related to static muscle length (static phase) and to stretch and shortening of the muscle (dynamic phases).
- Muscle spindles also receive a motor innervation. The large motor neurons that supply extrafusal muscle fibers are called alpha motor neurons, while the smaller ones supplying the contractile portions of intrafusal fibers are called gamma neurons. Gamma motor neurons can regulate the sensitivity of the muscle spindle so that this sensitivity can be maintained at any given muscle length.
Golgi tendon organ
The Golgi tendon organ is located at the musculotendinous junction. There is no efferent innervation of the tendon organ, therefore its sensitivity cannot be controlled from the CNS. The tendon organ, in contrast to the muscle spindle, is coupled in series with the extrafusal muscle fibers. Both passive stretch and active contraction of the muscle increase the tension of the tendon and thus activate the tendon organ receptor, but active contraction produces the greatest increase. The tendon organ, consequently, can inform the CNS about the “muscle tension”. In contrast, the activity of the muscle spindle depends on the “muscle length” and not on the tension. The muscle fibers attached to one tendon organ appear to belong to several motor units. Thus the CNS is informed not only of the overall tension produced by the muscle but also of how the workload is distributed among the different motor units.
The joint receptors are low-threshold mechanoreceptors and have been divided into four groups. They signal different characteristics of joint function (position, movements, direction and speed of movements). The free receptors or type 4 joint receptors are nociceptors.
Proprioceptive Signal Processing
Modelling muscle spindles and afferent response
The response of the muscle spindles in mammals to muscle stretch has been thoroughly studied, and various models have been proposed. However, due to the difficulty in obtaining accurate data of the afferent and fusimotor responses during muscular movement, these models have usually been quite limited. For example, several of the earliest models account only for the afferent response, ignoring the fusimotor activity.
Mileusnic et al. (2006) model
One recent model, developed by Mileusnic et al. (2006), portrays the muscle spindle as consisting of several (typically 4 to 11) nuclear chain fibres, and two different nuclear bag fibres, connected in parallel as shown here in the figure below. The muscle fibres respond to three inputs: fascicle length, dynamic fusimotor input and static fusimotor input. The fibre is mainly responsible for detecting dynamic fusimotor input, while the and chain fibres are mainly responsible for detecting static fusimotor input. All fibres respond to changes in the fascicle length, and are modelled in largely the same way but with different coefficients to account for their different physiological properties. The responses of the three types of fibres are summed to generate the primary and secondary afferent activities. The primary afferent activity is affected by the response of all three types of muscle fibres, while the secondary afferent activity only depends on the and chain fibre responses.
Hasan (1983) model
Another comprehensive model of muscle spindles was proposed by Hasan in 1983 . This representation of muscle fibres and spindles is based closely on their physical properties. The muscle spindle is represented as two separate regions connected in series: sensory and non-sensory. The firing rate of the spindle afferent depends on the state of the two regions. The lengths of the two regions can be labelled for the sensory and for the non-sensory region. The tension in the two regions is equal, since they are placed in series. The sensory zone can be assumed to act like a spring (equation (3)), while in the non-sensory region, tension is a non-linear function of (equation (2) derived by Hasan).
The total length of the muscle spindle, x(t) is the sum of the length of the two regions (equation (4)).
Using this substitution and rearranging, we can derive the following expression for the length of the sensory zone (equation (5)):
Here, parameter represents the sensitivity of the tension to to velocity in the non-sensory zone, parameter and parameter determines the zero-length tension which influences the background firing rate of the afferent. The length of the sensory zone depends not only on the current length and velocity of the spindle, but on the history of the length changes.
The firing rate, in Hasan's model depends on a combination of the sensory zone length and its first derivative (equation (6)), with an experimentally derived weighting.
Approximate values for the model parameters a, b and c were suggested by Hasan (1983), and differ for voluntary and passive movements. A summary of these values is presented in the table below. Type of ending Condition A (mm/s) B C (mm)
|Type of ending||Condition||A (mm/s)||B||C (mm)|
|Primary||Gamma - dynamic||0.1||125||-15|
|Primary||Gamma - static||100||100||-25|
In the model, these values are assumed to be static for the duration of a movement, however this is not believed to be the case.
Internal models of limb dynamics
In addition to modelling the response of muscle spindle afferents to muscle stretch, several groups have worked on modelling the signals which are sent from the brain to the spindle efferents in order for muscles to complete specific movements. The complexity here lies in the fact that the brain must be able to adapt to unexpected changes in the dynamics of planned movements, using feedback from the spindle afferents.
Studies in this area suggest that humans achieve this using internal models, which are built through an “error-feedback-learning” process, and transform planned muscle states into the motor commands required to achieve them. To generate the motor commands for a particular reaching movement, the brain performs calculations based on the expected dynamics of the planned movement. However, any unexpected changes in these dynamics while the movement is being executed (e.g. external strain placed on the muscle) will lead to errors in expected muscle length (Gottlieb 1994, Shadmehr and Muss-Ivaldi 1994). These errors are communicated to the brain through the muscle spindle afferents, which experience a different sensory state to what is expected. The brain then reacts to these error signals with short and long latency responses, which work to minimise the error, but cannot eliminate it completely due to the delay in the system.
Studies suggest that the error can be eliminated in a subsequent attempt at the movement under the same dynamics, and this is where the “error-feedback-learning” idea comes from (Thoroughman and Shadmehr 1999). The corrections which are generated by the brain form an internal model, which maps a desired action (in kinematic coordinates) to the necessary motor commands (as torques). This internal model can be represented as a weighted combination of basis elements:
Here each basis represents some characteristic of the muscle's sensory state, and the motor command is a “population code”. Population coding is a method of representing stimuli as the combined activity of many neurons (in contrast to rate coding). In order to use such a model, we need to know how the bases represent particular limb or muscle positions, and the neuronal firing rates associated with them. The bases can, in principle, represent every aspect of the state: position, velocity, acceleration and even higher derivatives. However, this high dimensionality makes it very difficult to derive relationships experimentally between each dimension of the bases and the firing rates.
Somatosensory Perception of Whiskers
The barrel Cortex is a specialized region in somatosensory cortex responsible for processing the tactile information from whiskers. As every other cortical region, the barrel cortex also preserves the columnar organization which plays a crucial role in information processing. Information from each whisker is represented in separate, discrete columns analogous to “barrels”, hence the name barrel cortex. Rodents use whiskers constantly to acquire sensory information from the environment. Given their nocturnal nature, tactile information carried by whisker forms the primary sensory signals to build a perceptual map of the environment. The whiskers on the snouts of mice and rats serve as arrays of highly sensitive detectors for acquiring tactile information as shown in Figure 1 A and B. By using their whiskers, rodents can build spatial representations of their environment, locate objects, and perform fine-grain texture discrimination. Somatosensory whisker-related processing is highly organized into stereotypical maps, which occupy a large portion of the rodent brain. During exploration and palpation of objects, the whiskers are under motor control, often executing rapid large-amplitude rhythmic sweeping movements, and this sensory system is therefore an attractive model for investigating active sensory processing and sensory-motor integration. In these animals, a large part of the neocortex is dedicated to the processing of information from the whiskers. Since rodents are nocturnal, visual information is relatively poor and they rely heavily on the tactile information from whiskers. Perhaps the most remarkable specialization of this sensory system is the primary somatosensory ‘‘barrel’’ cortex, where each whisker is represented by a discrete and well-defined structure in layer 4.
These layer 4 barrels are somatotopically arranged in an almost identical fashion to the layout of the whiskers on the snout i.e. bordering whiskers are represented in adjacent cortical areas . Sensorimotor integration of whisker related activity leads to pattern discrimination and enables rodents to have a reliable map of the environment. This is an interesting model to study because rodents use whisker to “see” and this cross modality sensory information processing could help us to improve the life of humans, who are deprived of one sensory modality. Specifically, blind people can be trained to use somatosensory information to build a spatial map of the environment .
Pathways carrying whisker information to Barrel Cortex
Whisker information processing in Barrel Cortex with specialized local microcircuit
The deflection of a whisker is thought to open mechano-gated ion channels in nerve endings of sensory neurons innervating the hair follicle (although the molecular signalling machinery remains to be identified). The resulting depolarization evokes action potential firing in the sensory neurons of the infraorbital branch of the trigeminal nerve. The transduction through mechanical deformation is similar to the hair cells in the inner ear; in this case the contact of whiskers with the objects causes the mechano-gated ion channels to open. Cation-permeable ion channels let positively charged ions into the cells and causes depolarization, eventually leading to generation of action potentials. A single sensory neuron only fires action potentials to deflection of one specific whisker. The innervation of the hair follicle shows a diversity of nerve endings, which may be specialized for detecting different types of sensory input .
The layer 4 barrel map is arranged almost identically to the layout of the whiskers on the snout of the rodent. There are several recurrent connections in layer 4 and it sends axons to layer 2/3 neurons, which integrates information from other cortical regions like primary motor cortex. These intra-cortical and inter-cortical connections enable the rodents to achieve stimulus discrimination capabilities and to extract optimal information from the incoming tactile stimulus. Also, these projections play a crucial role in integrating somatosensory information with motor output. Information from whiskers is processed in the barrel cortex with specialized local microcircuits formed to extract optimal information about the environment. These cortical microcircuits are composed of excitatory and inhibitory neurons as shown in Figure 4.
Learning whisker based object discrimination & texture differentiation
Rodents move their sensors to collect information, and these movements are guided by sensory input. When action sequences are required to achieve success in novel tasks, interactions between movement and sensation underlie motor control  and complex learned behaviours . The motor cortex has important roles in learning motor skills [6-9], but its function in learning sensorimotor associations is unknown. The neural circuits underlying sensorimotor integration are beginning to be mapped. Different motor cortex layers harbour excitatory neurons with distinct inputs and projections [10-12]. Outputs to motor centres in the brain stem and spinal cord arise from pyramidal tract-type neurons in layer 5B (L5B). Within motor cortex, excitation descends from L2/3 to L5 [13, 14]. Input from somatosensory cortex impinges preferentially onto L2/3 neurons. L2/3 neurons  therefore directly link somatosensation and control of movements. In one of the recent studies , mice were trained head fixed in a vibrissa-based object-detection task while imaging populations of neurons . Following a sound, a pole was moved to one of several target positions within reach of the whiskers (the ‘go’ stimulus) or to an out-of-reach position (the ‘no-go’ stimulus). Target and out-of-reach locations were arranged along the anterior–posterior axis; the out-of reach position was most anterior. Mice searched for the pole with one whisker row, the C row, and reported the pole as ‘present’ by licking, or ‘not present’ by withholding licking. Licking on go trials (hit) was rewarded with water, whereas licking on no-go trials (false alarm) was punished with a time-out during which the trial was stopped for 2 seconds. Trials without licking (no-go, correct rejection, go, and miss) were not rewarded or punished. All mice showed learning within the first two or three sessions. Performance reached expert levels after three to six training sessions. Learning the behavioural task was directly dependent on the motor related behaviour. Naive mice whisked occasionally in a manner unrelated to trail structure. Thus, object detection relies on a sequence of actions, linked by sensory cues. An auditory cue triggers whisking during the sampling period. Contact between whisker and object causes licking for a water reward during a response period. Silencing vM1 indicates that this task requires the motor cortex; with vM1 silenced, task-dependent whisking persisted, but was reduced in amplitude and repeatability, and task performance dropped.
Neural Correlates of Sensorimotor learning mechanism
Coding of touch in the motor cortex is consistent with direct input from vS1 to the imaged neurons. A model based on population coding of individual behavioural features also predicted motor behaviours. Accurate decoding of whisking amplitude, whisking set-point and lick rate suggests that vM1 controls these slowly varying motor parameters, as expected from previous motor cortex and neurophysiological experiments.
1 Feldmeyer D, Brecht M, Helmchen F, Petersen CCH, Poulet JFA, Staiger JF, Luhmann HJ, Schwarz C."Barrel cortex function" Progress in Neurobiology 2013, 103 : 3-27.
2 Lahav O, Mioduser D. "Multisensory virtual environment for supporting blind persons' acquisition of spatial cognitive mapping, orientation, and mobility skills." Academia.edu 2002.
3 Alloway KD. "Information processing streams in rodent barrel cortex: The differential functions of barrel and septal circuits." Cereb Cortex 2008, 18(5):979-989.
4 Scott SH. "Inconvenient truths about neural processing in primary motor cortex." The Journal of physiology 2008, 586(5):1217-1224.
5 Wolpert DM, Diedrichsen J, Flanagan JR. "Principles of sensorimotor learning." Nature reviews Neuroscience 2011, 12(12):739-751.
6 Wise SP, Moody SL, Blomstrom KJ, Mitz AR. "Changes in motor cortical activity during visuomotor adaptation." Experimental brain research Experimentelle Hirnforschung Experimentation cerebrale 1998, 121(3):285-299.
7 Rokni U, Richardson AG, Bizzi E, Seung HS. "Motor learning with unstable neural representations." Neuron 2007, 54(4):653-666.
8 Komiyama T, Sato TR, O'Connor DH, Zhang YX, Huber D, Hooks BM, Gabitto M, Svoboda K. "Learning-related fine-scale specificity imaged in motor cortex circuits of behaving mice." Nature 2010, 464(7292):1182-1186.
9 Hosp JA, Pekanovic A, Rioult-Pedotti MS, Luft AR. "Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning." The Journal of neuroscience : the official journal of the Society for Neuroscience 2011, 31(7):2481-2487.
10 Keller A. "Intrinsic synaptic organization of the motor cortex." Cereb Cortex 1993, 3(5):430-441.
11 Mao T, Kusefoglu D, Hooks BM, Huber D, Petreanu L, Svoboda K. "Long-range neuronal circuits underlying the interaction between sensory and motor cortex." Neuron 2011, 72(1):111-123.
12 Hooks BM, Hires SA, Zhang YX, Huber D, Petreanu L, Svoboda K, Shepherd GM. "Laminar analysis of excitatory local circuits in vibrissal motor and sensory cortical areas." PLoS biology 2011, 9(1):e1000572.
13 Anderson CT, Sheets PL, Kiritani T, Shepherd GM. "Sublayer-specific microcircuits of corticospinal and corticostriatal neurons in motor cortex." Nature neuroscience 2010, 13(6):739-744.
14 Kaneko T, Cho R, Li Y, Nomura S, Mizuno N. "Predominant information transfer from layer III pyramidal neurons to corticospinal neurons." The Journal of comparative neurology 2000, 423(1):52-65.
15 O'Connor DH, Clack NG, Huber D, Komiyama T, Myers EW, Svoboda K. "Vibrissa-based object localization in head-fixed mice." The Journal of neuroscience : the official journal of the Society for Neuroscience 2010, 30(5):1947-1967.
16 O'Connor DH, Peron SP, Huber D, Svoboda K. "Neural activity in barrel cortex underlying vibrissa-based object localization in mice." Neuron 2010, 67(6):1048-1061.
17 Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY. "Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein." Nature biotechnology 2004, 22(12):1567-1572.
18 Tian L, Hires SA, Mao T, Huber D, Chiappe ME, Chalasani SH, Petreanu L, Akerboom J, McKinney SA, Schreiter ER. "Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators." Nature methods 2009, 6(12):875-881.
Probably the oldest sensory system in the nature, the olfactory system concerns the sense of smell. The olfactory system is physiologically strongly related to the gustatory system, so that the two are often examined together. Complex flavors require both taste and smell sensation to be recognized. Consequently, food may taste “different” if the sense of smell does not work properly (e.g. head cold).
Generally the two systems are classified as visceral sense because of their close association with gastrointestinal function. They are also of central importance while speaking of emotional and sexual functions.
Both taste and smell receptors are chemoreceptors that are stimulated by molecules soluted respectively in mucus or saliva. However these two senses are anatomically quite different. While smell receptors are distance receptors that do not have any connection to the thalamus, receptors pass up the brainstem to the thalamus and project to the postcentral gyrus along with those for touch and pressure sensibility for the mouth.
In this article we will first focus on the organs composing the olfactory system, then we will characterize them in order to understand their functionality and we will end explaining the transduction of the signal and the commercial application such as the eNose.
In vertebrates the main olfactory system detects odorants that are inhaled through the nose where they come to contact with the olfactory epithelium, which contains the olfactory receptors.
Olfactory sensitivity is directly proportional to the area in the nasal cavity near the septum reserved to the olfactory mucous membrane, which is the region where the olfactory receptor cells are located. The extent of this area is a specific between animals species. In dogs, for example, the sense of smell is highly developed and the area covered by this membrane is about 75 – 150 cm2; these animals are called macrosmatic animals. Differently in humans the olfactory mucous membrane cover an area about 3 – 5 cm2, thus they are known as microsmatic animals.
In humans there are about 10 million olfactory cells, each of which have 350 different receptor types composing the olfactory mucous membrane. The 350 different receptors are characteristic for only one odorant type. The bond with one odorant molecule starts a molecular chain reaction, which transforms the chemical perception into an electrical signal.
The electrical signal proceeds through the olfactory nerve’s axons to the olfactory bulbs. In this region there are between 1000 and 2000 glomerular cells which combine and interpret the potentials coming from different receptors. This way it is possible to unequivocally characterise e.g. the coffee aroma, which is composed by about 650 different odorants. Humans can distinguish between about 10.000 odors.
The signal then goes forth to the olfactory cortex where it will be recognized and compared with known odorants (i.e. olfactory memory) involving also an emotional response to the olfactory stimuli.
It is also interesting to note that the human genome has about 600 – 700 genes (~2% of the complete genome) specialized in characterizing the olfactory receptors, but only 350 are still used to build the olfactory system. This is a proof of the evolution change in the necessity of humans in using the olfaction.
Sensory Organ Components
Similar to other sensory modalities, olfactory information must be transmitted from peripheral olfactory structures, like the olfactory epithelium, to more central structures, meaning the olfactory bulb and cortex. The specific stimuli has to be integrated, detected and transmitted to the brain in order to reach sensory consciousness. However the olfactory system is different from other sensory systems in three fundamental ways as depicted in the book of Paxianos G. and Mai J.K., "The human Nervous System".
- Olfactory receptor neurons are continuously replaced by mitotic division of the basal cells of the olfactory epithelium. The motivation of this is the high vulnerability of the neurons, which are directly exposed to the environment.
- Because of phylogenetic relationship, olfactory sensory activity is transferred directly fro the olfactory bulb to the olfactory cortex, without a thalamic relay.
- Neural integration and analysis of olfactory stimuli may not involve topographic organization beyond the olfactory bulb, meaning that spatial or frequency axis are not needed to project the signal.
Olfactory Mucous Membrane
The olfactory mucous membrane contain the olfactory receptor cells and in humans it covers an area about 3 – 5 cm^2 in the roof of the nasal cavity near the septum. Because the receptors are continuously regenerated it contains both the supporting cells and progenitors cells of the olfactory receptors. Interspersed between these cells are 10 – 20 millions receptor cells.
Olfactory receptors are infect neurons with a short and thick dendrites. Their extended end is called an olfactory rod, from which cilia project to the surface of the mucus. These neurons have a length of 2 micrometers and have between 10 and 20 cilia of diameter about 0.1 micrometers.
The axons of the olfactory receptor neurons go through the cribriform plate of the ethmoid bone and enter the olfactory bulb. This passage is in absolute the most sensitive of the olfactory system; the damage of the cribriform plate (e.g. breaking the nasal septum) can imply the destruction of the axons compromising the sense of smell.
A further particularity of the mucous membrane is that with a period of a few weeks it is completely renewed.
In humans the olfactory bulb is located anteriorly with respect to the cerebral hemisphere and remain connected to it only by a long olfactory stalk. Furthermore in mammals it is separated into layers and consist of a concentric lamina structure with well-defined neuronal somata and synaptic neuropil.
After passing the cribriform plate the olfactory nerve fibers ramify in the most superficial layer (olfactory nerve layer). When these axons reach the olfactory bulb the layer gets thicker and they terminate in the primary dendrites of the mitral cells and tufted cells forming in this way the complex globular synapses called olfactory glomeruli. Both these cells send other axons to the olfactory cortex and appear to have the same functionality but in fact tufted cells are smaller and consequently have also smaller axons.
The axons from several thousand receptor neurons coverage on one or two glomeruli in a corresponding zone of the olfactory bulb; this suggest that the glomeruli are the unit structures for the olfactory discrimination.
In order to avoid threshold problems in addition to mitral and tufted cells, the olfactory bulb contains also two type of cells with inhibitory properties: periglomerular cells and granule cells. The first will connect two different glomeruli, the second, without using any axons, build a reciprocal synapses with the lateral dendrites of the mitral and tufted cells. By releasing GABA the granule cell on the one side of these synapse are able to inhibits the mitral (or tufted) cells, while on the other side of the synapses the mitral (or tufted) cells are able to excite the granule cells by releasing glutamate. Nowadays about 8.000 glomeruli and 40.000 mitral cells have been counted in young adults. Unfortunately this huge number of cells decrease progressively with the age compromising the structural integrity of the different layers.
The axons of the mitral and tufted cells pass through the granule layer, the intermediate olfactory stria and the lateral olfactory stria to the olfactory cortex. This tract forms in humans the bulk of the olfactory peduncle. As depicted in the book of Paxianos G. and Mai J.K., "The human Nervous System", the primary olfactory cortical areas can be easily described by a simple structure composed of three layers: a broad plexiform layer (first layer); a compact pyramidal cell somata layer (second layer) and a deeper layer composed by both pyramidal and nonpyramidal cells (third layer). Furthermore, in contrast to the olfactory bulb, only a little spatial encoding can be observed; “that is, small areas of the olfactory bulb virtually project the entire olfactory cortex, and small areas of the cortex receive fibers from virtually the entire olfactory bulb” .
In general the olfactory tract can be divided in five major regions of the cerebrum: Anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, Anterior cortical nucleus of the amygdala and the entorhinal cortex.Olfactory information is transmitted from primary olfactory cortex to several other parts of the forebrain, including orbital cortex, amigdala, hippocampus, central striatum, hypothalamus and mediodorsal thalamus.
Interesting is also to note that in humans, the piriform cortex can be activated by sniffing, whereas the to activate the lateral and the anterior orbitofrontal gyri of the frontal lobe only the smell is required. This is possible because in general the orbitofrontal activation is grater on the right side than the left side, this directly imply an asymmetry in the corticals reception of the olfaction. A further implication of the emotional response to olfactory stimuli as olfactory memories can be assigned to the fibers projection to the amigdala of the entorhinal cortex.
A good and complete description of the substructure of the olfactory cortex can be found in the book of Paxianos G. and Mai J.K., "The human Nervous System".
|Substance||mg/L of Ari|
|Oil of peppermint||0.02|
Only substances which comes in contact with the olfactory epithelium can be excite the olfactory receptors. The right table shows some threshold for some representative substances. These values give an impression of the huge sensitivity of the olfactory receptors.
It is remarkable that humans can recognize more than 10'000 different odors but they should at least differ about the 30% before they can be distinguished. Compared to the visual system, such precision would mean a 1% change in light intensity, where as compared to hearing the direction perception may be indicated by the slight difference in the time of arrival of odoriferous molecules in the two nostrils . It is amazing how the same number of carbon atoms (normally between 3 and 20) in odors molecules can leads to different odors just by slightly change in the structural configuration.
An interesting feature of the olfactory system is how a simple sense organ that apparently lacks a high degree of complexity can mediate discrimination of more than 10'000 different odors. On the one hand this is made possible by the huge number of different odorant receptor. The gene family for the olfactory receptor is infect the largest family studied so far in mammals. On the other hand the neural net of the olfactory system’s provide with their 1800 glomeruli a large two dimensional map in the olfactory bulb that is unique to each odorant. In addition, the extracellular field potential in each glomerulus oscillates, and the granule cells appear to regulate the frequency of the oscillation. The exact function of the oscillation is unknown, but it probably also helps to focus the olfactory signal reaching the cortex .
Olfaction, as described in the research of R. Haddad et al., consists of a set of transforms from physical space of odorant molecules (olfactory physicochemical space), through a neural space of information processing (olfactory neural space), into a perceptual space of smell (olfactory perceptual space). The rules of these transforms depend on obtaining valid metrics for each of those spaces.
Olfactory perceptual space
As the perceptual space represent the “input” of the smell measurement, it’s aim is to describe the odors in the most simple possible way. Odor are infect ordered so that their reciprocal distance in space confers them similarity. This mean that odors the more two odors are near each other in this space the more are they expected to be similar. This space is thus defined by so called perceptual axes characterized by some arbitrarily chosen “unit” odors.
Olfactory neural space
As suggested by its name the neural spaces are generated from neural responses. This gives rise to an extensive database of odorant-induced activity, which can be used to formulate an olfactory space where the concept of similarity serves as a guiding principal. Using this procedure different odorant are than expected to be similar if they generate a similar neuronal response. This database can be navigated at the Glomerular Activity Response Archive .
Olfactory physicochemical space
The need of identify the molecular encryption of the biological interaction, make the physicochemical space the most complex one of the olfactory space described so far. R. Haddad suggest that one possibility is to span this space would to represent each odorant by a very large number of molecular descriptors by use either a variance metric or a distance metric. In his first description single odorants may have many physicochemical features and one expect these feature to present themselves at various probabilities within the world of molecules that have a smell. In such metric the orthogonal basis generated from the description of the odorant leads to represent each odorant by a single value. While in the second, the metric represents each odorant with a vector of 1664 values, on the basis of Euclidean distances between odorants in the 1664 physicochemical space. Whereas the first metric enabled the prediction of perceptual attributes, the second enabled the prediction of odorant-induced neuronal response patterns.
Electronic measurement of odors
Nowadays odors can be measured electronically in a huge amount of different way, some examples are: mass spectrography, gas chromatography, raman spectra and most recently electronic nose. In general they assume that different olfactory receptors have different affinities to specific molecular physicochemical properties, and that the different activation of these receptors gives rise to a spatio-temporal pattern of activity that reflects odors.
eNoses are analytic devices for mimicking the principle of biological olfaction that have as main component an array of non specific chemical sensors. Combining electronics, path recognition and modern technology, the eNoses uses gas sensors to translate the chemical signal into an electrical signal when an volatile odorant from a sample reaches the gas sensor array. Usually the pattern recognition is used to perform either the quantitative or the qualitative identification. In order to reproduce the olfactory epithelium a gas sensor array is sealed in a chamber of the eNose. A cross-sensitive chemical sensors will than act as olfactory neuron transferring the odor information from a chemical into an electric form similar to the one process which occur in the olfactory bulb where the signal is integrated and enhanced. The information is than elaborated by an artificial neuronal network, which provide coding, processing and storage. The gas sensor array transforms odor information from the sample space into a measurement space. This is a key procedure for information processing within an eNose. Gas sensors with different transduction principles and different fabrication techniques provide various ways to obtain odor information. Commercially a lot of different sensor types are available the most frequently used sensor types include metal oxide semiconductors (MOS), quartz crystal microbalances (QCM), conducting polymers (CP) and surface acoustic wave (SAW) sensors. A big influence in the choice of the sensor is made by the fast response, reversibility, repeatability and high sensitivity of the sensor. While constructing the sensor array for a eNose the sensors are selected to be cross-selective to different odors, such that their sensitivity is overlapped with the same odor, to make the most of type-limited sensors for obtaining adequate odor information. In general the amount of raw data generated from the array of sensor’s is huge, so that the information has to be transferred from a high dimensional space into a lower one. Pattern recognition are then needed to encode the signal into a so called classification space. Both are important and necessary for designing a powerful information processing algorithm and constructing an array with high quality gas sensors. Many pattern recognition methods have been introduced into eNose, including parameterized and non-parameterized multivariate statistical methods. Artificial neural network have various significant advantages: (i) Self-adaptive, (ii) capability of error tolerance and generalization suitable for treating the problems (iii) parallel processing and distributed storage.
- Schmidt, Lang (2007). "Ohysiologie des Menschen", Soringer, 30. Auflage.
- Faller A., Schünke M. (2008). "Der Körper des Menschen", Thieme, 15. Auflage.
- Paxianos G., Mai J.K. (2004). "The human Nervous System", Elsevier accademic press, 2nd Edition.
- William. "Review of Medial Physiology", Lange, 22th Edition.
- Haddad R. ed al (2008). "Measuring smells", Elsevier Ltd, 18:438-444
- Mamlouk A.M., Martinez T. (2004). "One dimensions of the olfatory perception space", Elsevier B.V.
- >Guang L ed al (2009), "Progress in bionic information processing techniques for an electronic nose based on olfactory models", Chinese Science Bulletin, 54(4)521-53Z
The Gustatory System or sense of taste allows us to perceive different flavors from substances like food, drinks, medicine etc. Molecules that we taste or tastants are sensed by cells in our mouth, which send information to the brain. These specialized cells are called taste cells and can sense 5 main tastes: bitter, salty, sweet, sour and umami (savory). All the variety of flavors that we know are combinations of molecules which fall into these categories.
Measuring the degree by which a substance presents one of the basic tastes is done subjectively by comparing its taste to a taste of a reference substance according to relative indexes of different substances. For the bitter taste quinine (found in tonic water) is used to rate how bitter a substance is. Saltiness can be rated by comparing to a dilute salt solution. The sourness is compared to diluted hydrochloric acid (H+Cl-). Sweetness is measured relative to sucrose. The values of these reference substances are defined as 1.
(Coffee, mate, beer, tonic water etc.)
It is considered by many as unpleasant. In general bitterness is very interesting because a large number of bitter compounds are known to be toxic so the bitter taste is considered to provide an important protective function. Plant leafs often contain toxic compounds. Herbivores have a tendency to prefer immature leaves, which have higher protein content and lower poison levels than mature leaves. It seems that even if the bitter taste is not very pleasant at first, there is a tendency to overcome this aversion because coffee and drinks containing rich amount of caffeine and are widely consumed. Sometimes bitter agents are added to substances to prevent accidental ingestion.
The salty taste is primarily produced by the presence of cations such as Li+ (lithium ions), K+ (potassium ions) and more commonly Na+ (sodium). The saltiness of substances is compared to sodium chloride, which is typically used as table salt (Na+Cl-). Potassium chloride K+Cl- is the principal ingredient used in salt substitutes and has an index of 0.6 (see bellow part 5) compared to 1 of Na+Cl-.
(Lemon, orange, wine, spoiled milk and candies containing citric acid)
Sour taste can be mildly pleasant and it is linked to salty flavor but more exacerbated. Typically sour are fruits, which are over-riped, spoiled milk, rotten meat, and other spoiled foods, which can be dangerous. It also tastes acids (H+ ions) which taken in large quantities can cause irreversible tissue damage. Sourness is rated compared to hydrochloric acid (H+Cl-), which has a sourness index of 1.
(Sucrose (table sugar), cake, ice cream etc.)
Sweetness is regarded as a pleasant sensation and is produced by the presence of mostly sugars. Sweet substances are rated relative to sucrose, which has an index of 1. Nowadays there are many artificial sweeteners in the market, these include saccharin, aspartame and sucralose but it is still not clear how these substitutes activate the receptors.
Umami (savory or tasty)
(Cheese, soy sauce etc.)
Recently, monosodium glutamate (umami) has been added as the fifth taste. This taste signals the presence of L-glutamate and it is a very important for the Eastern cuisines.
Tongue and Taste Buds
Taste cells are epithelial and are clustered in taste buds located in the tongue, soft palate, epiglottis, pharynx and the esophagus the tongue being the primary organ of the Gustatory System.
Taste buds are located in papillae along the surface of the tongue. There are three types of papillae in human: fungiform located in the anterior part containing approximately five taste buds, circumvallate papillae which are bigger and more posterior than the previous ones and the foliate papillae that are in the posterior edge of the tongue. Circumvallate and foliate papillae contain hundreds of taste buds. In each taste bud there are different types of cells: basal, dark, intermediate and light cells. Basal cells are believed to be the stem cells that give rise to the other types. It is thought that the rest of the cells correspond to different stages of differentiation where the light cells are the most mature type of cells. An alternative idea is that dark, intermediate and light cells correspond to different cellular lineages. Taste cells are short lived and are continuously regenerated. They contain a taste pore at the surface of the epithelium where they extend microvilli, the site where sensory transduction takes place. Taste cells are innervated by fibers of primary gustatory neurons. They contact sensory fibers and these connections resemble chemical synapses, they are excitable with voltage-gated channels: K+, Na+ and Ca+ channels capable of generating action potentials. Although the reaction from different tastants varies, in general tastants interact with receptors or ion channels in the membrane of a taste cells. These interactions depolarize the cell directly or via second messengers and in this way the receptor potential generates action potentials within the taste cells, which lead to Ca2+ influx through Ca2+ voltage-gated channels followed by the release of neurotransmitters at the synapses with the sensory fibers.
The idea that the tongue is most sensitive to certain tastes in different regions was a long time misconception, which has now been proved to be wrong. All sensations come from all regions of the tongue.
An average person has about 5'000 taste buds. A "supertaster" is a person whose sense of taste is significantly more sensitive than average. The increase in the response is thought to be because they have more than 20’000 taste buds, or due to an increased number of fungiform papillae.
Transduction of Taste
As mentioned before we distinguish between 5 types of basic tastes: bitter, salty, sour, sweet and umami. There is one type of taste receptor for each flavor known and each type of taste stimulus is transduced by a different mechanisms. In general bitter, sweet and umami are detected by G protein-coupled receptors and salty and sour are detected via ion channels.
Bitter compounds act through G protein coupled receptors (GPCR’s) also known as a seven-transmembrane domains, which are located in the walls of the taste cells. Taste receptors of type 2 (T2Rs) which is a group of GPCR’s is thought respond to bitter stimuli. When the bitter-tasting ligand binds to the GPCR it releases the G protein gustducin, its 3 subunits break apart and activate phosphodiesterase, which in turn converts a precursor within the cell into a secondary messenger, closing the K+ channels. This secondary messenger stimulates the release of Ca2+, contributing to depolarization followed by neurotransmitter release. It is possible that bitter substances that are permeable to the membrane are sensed by mechanisms not involving G proteins.
The amiloride-sensitive epithelial sodium channel (ENaC), a type of ion channel in the taste cell wall, allows Na+ ions to enter the cell down an electrochemical gradient, altering the membrane potential of the taste cells by depolarizing the cell. This leads to an opening of voltage-gated Ca2+ channels, followed by neurotransmitter release.
The sour taste signals the presence of acidic compounds (H+ ions) and there are three receptors: 1) The ENaC, (the same protein involved in salty taste). 2) There are also H+ gated channels; one is the K+ channel, which allows K+ outflux of the cell. H+ ions block these so the K+ stays inside the cell. 3) A third channel undergoes a configuration change when a H+ attaches to it leading to an opening of the channel and allowing an influx of Na+ down the concentration gradient into the cell, leading to the opening of a voltage gated Ca2+ channels. These three receptors work in parallel and lead to depolarization of the cell followed by neurotransmitter release.
Sweet transduction is mediated by the binding of a sweet tastant to GPCR’s located in the apical membrane of the taste cell. Saccharide activates the GPCR, which releases gustducin and this in turn activates cAMP (cyclic adenylate monophosphate). cAMP will activate the cAMP kinase that will phosphorylate the K+ channels and eventually inactivate them, leading to depolarization of the cell and followed by neurotransmitter release.
Umami receptors involve also GPCR’s, the same way as bitter and sweet receptors. Glutamate binds a type of the metabotropic glutamate receptor mGlurR4 causing a G-protein complex to activate a secondary receptor, which ultimately leads to neurotransmitter release. In particular how the intermediate steps work, is currently unknown.
In humans, the sense of taste is transmitted to the brain via three cranial nerves. The VII facial nerve carries information from the anterior 2/3 part of the tongue and soft palate. The IX nerve or glossopharyngeal nerve carries taste sensations from the posterior 1/3 part of the tongue and the X nerve or vagus nerve carries information from the back of the oral cavity and the epiglottis.
The gustatory cortex is the brain structure responsible for the perception of taste. It consists of the anterior insula on the insular lobe and the frontal operculum on the inferior frontal gyrus of the frontal lobe. Neurons in the gustatory cortex respond to the five main tastes.
Taste cells synapse with primary sensory axons of the mentioned cranial nerves. The central axons of these neurons in the respective cranial nerve ganglia project to rostral and lateral regions of the nucleus of the solitary tract in the medulla. Axons from the rostral (gustatory) part of the solitary nucleus project to the ventral posterior complex of the thalamus, where they terminate in the medial half of the ventral posterior medial nucleus. This nucleus projects to several regions of the neocortex, which include the gustatory cortex.
Gustatory cortex neurons exhibit complex responses to changes in concentration of tastant. For one tastant, the same neuron might increase its firing and for an other tastant, it may only respond to an intermediate concentration.
Taste and Other Senses
In general the Gustatory Systems does not work alone. While eating, consistency and texture are sensed by the mechanoreceptors from the somatosensory system. The sense of taste is also correlated with the olfactory system because if we lack the sense of smell it makes it difficult to distinguish the flavor.
(black peppers, chili peppers, etc.)
It is not a basic taste because this sensation does not arise from taste buds. Capsaicin is the active ingredient in spicy food and causes “hotness” or “spiciness” when eaten. It stimulates temperature fibers and also nociceptors (pain) in the tongue. In the nociceptors it stimulates the release of substance P, which causes vasodilatation and release of histamine causing hiperalgesia (increased sensitivity to pain).
In general basic tastes can be appetitive or aversive depending on the effect that the food has on us but also essential to the taste experience are the presentation of food, color, texture, smell, previous experiences, expectations, temperature and satiety.
Ageusia (complete loss of taste)
Ageusia is a partial or complete loss in the sense of taste and sometimes it can be accompanied by the loss of smell.
Dysgeusia (abnormal taste)
Is an alteration in the perception associated with the sense of taste. Tastes of food and drinks vary radically and sometimes the taste is perceived as repulsive. The causes of dysgeusia can be associated with neurologic disorders.
Sensory Systems in Non-Primates
Primates are animals belonging to the class of mammals. Primates include humans and the nonhuman primates, the apes, monkeys, lemurs, tree-shrews, lorises, bush babies and tarsiers. They are characterized by a voluminous and complicated forebrain. Most have excellent sight and are highly adapted to an arboreal existence, including in some species the possession of a prehensile tail. Non primates on the other hand often posses smaller brains. But as we learn more about the rest of the animal world, it’s becoming clear that non-primates are pretty intelligent too. Some examples include pigs, octopus, and crows.
In many branches of mythology, the crow plays a shrewd trickster, and in the real world, crows are proving to be quite a clever species. Crows have been found to engage in feats such as tool use, the ability to hide and store food from season to season, episodic-like memory, and the ability to use personal experience to predict future conditions.
As it turns out, being piggy is actually a pretty smart tactic. Pigs are probably the most intelligent domesticated animal on the planet. Although their raw intelligence is most likely commensurate with a dog or cat, their problem-solving abilities top those of felines and canine pals.
If pigs are the most intelligent of the domesticated species, octopuses take the cake for invertebrates. Experiments in maze and problem-solving have shown that they have both short-term and long-term memory. Octopuses can open jars, squeeze through tiny openings, and hop from cage to cage for a snack. They can also be trained to distinguish between different shapes and patterns. In a kind of play-like activity (one of the hallmarks of higher intelligence species) octopuses have been observed repeatedly releasing bottles or toys into a circular current in their aquariums and then catching them.
Neural Mechanism for Song Learning in Zebra Finches
Over the past four decades songbirds have become a widely used model organism for neuroscientists studying complex sequential behaviours and sensory-guided motor learning. Like human babies, young songbirds learn many of the sounds they use for communication by imitating adults. One songbird in particular, the zebra finch (Taeniopygia guttata), has been the focus of much research because of its proclivity to sing and breed in captivity and its rapid maturation. The song of an adult male zebra finch is a stereotyped series of acoustic signals with structure and modulation over a wide range of time scales, from milliseconds to several seconds. The adult zebra finch song comprises a repeated sequence of sounds, called a motif, which lasts about a second. The motif is composed of shorter bursts of sound called syllables, which often contain sequences of simpler acoustic elements called notes as shown in Fig.1. The songbirds learning system is a very good model to study the sensory-motor integration because the juvenile bird actively listens to the tutor and modulates its own song by correcting for errors in the pitch and offset. The neural mechanism and the architecture of the song bird brain which plays a crucial role in learning is similar to the language processing region in frontal cortex of humans. Detailed study of the hierarchical neural network involved in the learning process could provide significant insights into the neural mechanism of speech learning in humans.
One of the most interesting non-primate is the octopus. The most interesting feature about this non-primate is its arm movement. In these invertebrates, the control of the arm is especially complex because the arm can be moved in any direction, with a virtually infinite number of degrees of freedom. In the octopus, the brain only has to send a command to the arm to do the action—the entire recipe of how to do it is embedded in the arm itself. Observations indicate that octopuses reduce the complexity of controlling their arms by keeping their arm movements to set, stereotypical patterns. To find out if octopus arms have minds of their own, the researchers cut off the nerves in an octopus arm from the other nerves in its body, including the brain. They then tickled and stimulated the skin on the arm. The arm behaved in an identical fashion to what it would in a healthy octopus. The implication is that the brain only has to send a single move command to the arm, and the arm will do the rest.
In this chapter we discuss in detail the sensory system of an octopus and focus on the sensory motor system in this non-primate.
Octopus - The intelligent non-primate
Octopuses have two eyes and four pairs of arms, and they are bilaterally symmetric. An octopus has a hard beak, with its mouth at the center point of the arms. Octopuses have no internal or external skeleton (although some species have a vestigial remnant of a shell inside their mantle), allowing them to squeeze through tight places. Octopuses are among the most intelligent and behaviorally flexible of all invertebrates.
The most interesting feature of the octopuses is their arm movements. For goal directed arm movements, the nervous system in octopus generates a sequence of motor commands that brings the arm towards the target. Control of the arm is especially complex because the arm can be moved in any direction, with a virtually infinite number of degrees of freedom. The basic motor program for voluntary movement is embedded within the neural circuitry of the arm itself.
Arm Movements in Octopus
In the hierarchical organization in octopus, the brain only has to send a command to the arm to do the action. The entire recipe of how to do it is embedded in the arm itself. By the use of the arms octopus walks, seizes its pray, or rejects unwanted objects and also obtains a wide range of mechanical and chemical information about its immediate environment.
Octopus arms, unlike human arms, are not limited in their range of motion by elbow, wrist, and shoulder joints. To accomplish goals such as reaching for a meal or swimming, however, an octopus must be able to control its eight appendages. The octopus arm can move in any direction using virtually infinite degrees of freedom. This ability results from the densely packed flexible muscle fibers along the arm of the octopus.
Observations indicate that octopuses reduce the complexity of controlling their arms by keeping their arm movements to set, stereotypical patterns. For example, the reaching movement always consists of a bend that propagates along the arm toward the tip. Since octopuses always use the same kind of movement to extend their arms, the commands that generate the pattern are stored in the arm itself, not in the central brain. Such a mechanism further reduces the complexity of controlling a flexible arm. These flexible arms are controlled by an elaborate peripheral nervous system containing 5 × 107 neurons distributed along each arm. 4 × 105 of these are motor neurons, which innervate the intrinsic muscles of the arm and locally control muscle action.
Whenever it is required, the nervous system in octopus generates a sequence of motor commands which in turn produces forces and corresponding velocities making the limb reach the target. The movements are simplified by the use of optimal trajectories made through vectorial summation and superposition of basic movements. This requires that the muscles are quite flexible.
The Nervous System of the Arms
The eight arms of the octopus are elongated, tapering, muscular organs, projecting from the head and regularly arranged around the mouth. The inner surface of each arm bears a double row of suckers, each sucker alternating with that of the opposite row. There are about 300 suckers on each arm.
The arms perform both motor and sensory functions. The nervous system in the arms of the octopus is represented by the nerve ganglia, subserving motor and inter-connecting functions. The peripheral nerve cells represent the sensory systems. There exists a close functional relationship between the nerve ganglia and the peripheral nerve cells.
General anatomy of the arm
The muscles of the arm can be divided into three separate groups, each having a certain degree of anatomical and functional independence:
- Intrinsic muscles of the arm,
- Intrinsic muscles of the suckers, and
- Acetabulo-brachial muscles (connects the suckers to the arm muscles).
Each of these three groups of muscles comprises three muscle bundles at right angles to one another. Each bundle is innervated separately from the surrounding units and shows a remarkable autonomy.In spite of the absence of a bony or cartilaginous skeleton, octopus can produce arm movements using the contraction and relaxation of different muscles. Behaviorally, the longitudinal muscles shorten the arm and play major role in seizing objects carrying them to mouth, and the oblique and transverse muscles lengthen the arms and are used by octopus for rejecting unwanted objects.
Six main nerve centers lie in the arm and are responsible for the performance of these sets of muscles. The axial nerve cord is by far the most important motor and integrative center of the arm. The eight cords one in each arm contains altogether 3.5 × 108 neurons. Each axial cord is linked by means of connective nerve bundles with five sets of more peripheral nerve centers, the four intramuscular nerve cords, lying among the intrinsic muscles of the arm, and the ganglia of the suckers, situated in the peduncle just beneath the acetabular cup of each sucker.
All these small peripheral nerves contain motor neurons and receive sensory fibers from deep muscle receptors which play the role of local reflex centers. The motor innervation of the muscles of the arm is thus provided not only by the motor neurons of the axial nerve cord, which receives pre-ganglionic fibers from the brain, but also by these more peripheral motor centers.
Sensory Nervous system
The arms contain a complex and extensive sensory system. Deep receptors in the three main muscle systems of the arms, provide the animal with a widespread sensory apparatus for collecting information from muscles. Many primary receptors lie in the epithelium covering the surface of the arm. The sucker, and particularly its rim, has the greatest number of these sensory cells, while the skin of the arm is rather less sensitive. Several tens of thousands of receptors lie in each sucker.
Three main morphological types of receptors are found in arms of an octopus. These are round cells, irregular multipolar cells, and tapered ciliated cells. All these elements send their processes centripetally towards the ganglia. The functional significance of these three types of receptors is still not very well known and can only be conjectured. It has been suggested that the round and multipolar receptors may record mechanical stimuli, while ciliated receptors are likely to be chemo-receptors.
The ciliated receptors do not send their axons directly to the ganglia but the axons meet encapsulated neurons lying underneath the epithelium and make synaptic contacts with the dendritic processes of these. This linkage helps in reduction of input between primary nerve cells. Round and multipolar receptors on the other hand send their axons directly to the ganglia where the motor neurons lie.
Functioning of peripheral nervous system in arm movements
Behavioral experiments suggest that information regarding the movement of the muscles does not reach the learning centers of the brain, and morphological observations prove that the deep receptors send their axons to peripheral centers such as the ganglion of the sucker or the intramuscular nerve cords. The information regarding the stretch or movement of the muscles is used in local reflexes only.
When the dorsal part of the axial nerve cord that contains the axonal tracts from the brain is stimulated by electrical signals, movements in entire arm are still noticed. The movements are triggered by the stimulation which is provided and is not directly driven by the stimuli coming from the brain. Thus, arm extensions are evoked by stimulation of the dorsal part of the axial nerve cord. In contrast, the stimulation of the muscles within the same area or the ganglionic part of the cord evokes only local muscular contractions. The implication is that the brain only has to send a single move command to the arm, and the arm will do the rest.
A dorsally oriented bend propagates along the arm causing the suckers to point in the direction of the movement. As the bend propagates, the part of the arm proximal to the bend remains extended. For further conformations that an octopus arm has a mind of its own, the nerves in an octopus arm have been cut off from the other nerves in its body, including the brain. Movements resembling normal arm extensions were initiated in amputated arms by electrical stimulation of the nerve cord or by tactile stimulation of the skin or suckers.
It has been noted that the bend propagations are more readily initiated when a bend is created manually before stimulation. If the fully relaxed arm is stimulated, the initial movement is triggered by the stimuli, which follows the same bend propagation. The nervous system of the arm thus, not only drives local reflexes but controls complex movements involving the entire arm.
These evoked movements are almost kinematically identical to the movements of freely behaving octopus. When stimulated, a severed arm shows an active propagation of the muscle activity as in natural arm extensions. Movements evoked from similar initial arm postures result in similar paths, while different starting postures result in different final paths.
As the extensions evoked in denervated octopus arms are qualitatively and kinematically similar to natural arm extensions, an underlying motor program seems to be controlling the movements which are embedded in the neuromuscular system of the arm, which does not require central control.
Fish are aquatic animals with great diversity. There are over 32’000 species of fish, making it the largest group of vertebrates.
Most fish possess highly developed sense organs. The eyes of most daylight dwelling fish are capable of color vision. Some can even see ultra violet light. Fish also have a very good sense of smell. Trout for example have special holes called “nares” in their head that they use to register tiny amounts of chemicals in the water. Migrating salmon coming from the ocean use this sense to find their way back to their home streams, because they remember what they smell like. Especially ground dwelling fish have a very strong tactile sense in their lips and barbels. Their taste buds are also located there. They use these senses to search for food on the ground and in murky waters.
Fish also have a lateral line system, also known as the lateralis system. It is a system of tactile sense organs located in the head and along both sides of the body. It is used to detect movement and vibration in the surrounding water.
Fish use the lateral line sense organ to sense prey and predators, changes in the current and its orientation and they use it to avoid collision in schooling.
Coombs et al. have shown  that the lateral line sensory organ is necessary for fish to detect their prey and orient towards it. The fish detect and orient themselves towards movements created by prey or a vibrating metal sphere even when they are blinded. When signal transduction in the lateral lines is inhibited by cobalt chloride application, the ability to target the prey is greatly diminished.
The dependency of fish on the lateral line organ to avoid collisions in schooling fish was demonstrated by Pitcher et al. in 1976, where they show that optically blinded fish can swim in a school of fish, while those with a disabled lateral line organ cannot .
The lateral lines are visible as two faint lines that run along either side of the fish body, from its head to its tail. They are made up of a series of mechanoreceptor cells called neuromasts. These are either located on the surface of the skin or are, more frequently, embedded within the lateral line canal. The lateral line canal is a mucus filled structure that lies just beneath the skin and transduces the external water displacement through openings from the outside to the neuromasts on the inside. The neuromasts themselves are made up of sensory cells with fine hair cells that are encapsulated by a cylindrical gelatinous cupula. These reach either directly into the open water (common in deep sea fish) or into the lymph fluid of the lateral line canal. The changing water pressures bend the cupula, and in turn the hair cells inside. Similar to the hair cells in all vertebrate ears, a deflection towards the shorter cilia leads to a hyperpolarization (decrease of firing rate) and a deflection in the opposite direction leads to depolarization (increase of firing rate) of the sensory cells. Therefore the pressure information is transduced to digital information using rate coding that is then passed along the lateral line nerve to the brain. By integrating many neuromasts through their afferent and efferent connections, complex circuits can be formed. This can make them respond to different stimulation frequencies and consequently coding for different parameters, like acceleration or velocity .
In sharks and rays, some neuromasts have undergone an interesting evolution. They have evolved into electroreceptors called ampullae of Lorenzini. They are mostly concentrated around the head of the fish and can detect a change of electrical stimuli as small as 0.01 microvolt . With this sensitive instrument these fish are able to detect tiny electrical potentials generated by muscle contractions and can thus find their prey over large distances, in murky waters or even hidden under the sand. It has been suggested that sharks also use this sense for migration and orientation, since the ampullae of Lorenzini are sensitive enough to detect the earth’s electromagnetic field.
Cephalopods such as squids, octopuses and cuttlefish have lines of ciliated epidermal cells on head and arms that resemble the lateral lines of fish. Electrophysiological recordings from these lines in the common cuttlefish (Sepia officinalis) and the brief squid (Lolliguncula brevis) have identified them as an invertebrate analogue to the mechanoreceptive lateral lines of fish and aquatic amphibians .
Another convergence to the fish lateral line is found in some crustaceans. Contrary to fish, they don’t have the mechanosensory cells on their body, but have them spaced at regular intervals on long trailing antennae. These are held parallel to the body. This forms two ‘lateral lines’ parallel to the body that have similar properties to those of fish lateral lines and are mechanically independent of the body .
In aquatic manatees the postcranial body bears tactile hairs. They resemble the mechanosensory hairs of naked mole rats. This arrangement of hair has been compared to the fish lateral line and complement the poor visual capacities of the manatees. Similarly, the whiskers of harbor seals are known to detect minute water movements and serve as a hydrodynamic receptor system. This system is far less sensitive than the fish equivalent. 
Halteres are sensory organs present in many flying insects. Widely thought to be an evolutionary modifcation of the rear pair of wings on such insects, halteres provide gyroscopic sensory data, vitally important for flight. Although the fly has other relevant systems to aid in flight, the visual system of the fly is too slow to allow for rapid maneuvers. Additionally, to be able to fly adeptly in low light conditions, a requirement to avoid predation, such a sensory system is necessary. Indeed, without halteres, flies are incapable of sustained, controlled flight. Since the 18th century, scientists have been aware of the role halteres play in flight, but it was only recently that the mechanisms by which they operate have been better explored.  
The haltere evolved from the rearmost of two pairs of wings. While the first has maintained its usage for flight, the posterior pair has lost its flight functions and has adopted a slightly different shape. The haltere is visually comprised of three structural components: a knob-shaped end, a thin shaft, and a slightly wider base. The knob contains approximately 13 innervated hairs, while the base contains two chordotonal organs, each innervated by about 20-30 nerves. Chordotonal organs are sense organs thought to be solely responsive to extension, though they remain relatively unknown. The base is also covered by around 340 campaniform sensilla, which are small fibers which respond preferentially to compression in the direction in which they are elongated. Each of these fibers is also innervated. Relative to the stalk of the haltere, both the chordotonal organs and the campaniform sensilla have an orientation of approximately 45 degrees, which is optimal for measuring bending forces on the haltere. The halteres move contrary (anti-phase) to the wings during flight. The sensory components can be categorized into three groups ): those sensitive to vertical oscillations of the haltere, including the dorsal and ventral scapal plates, dorsal and ventral Hicks papillae (both the plates and papillae are subcategories of the aforementioned campaniform sensilla), and the small chordotonal organ. The basal plate (another manifestation of the sensilla) and the large chordotonal organ are sensitive to gyroscopic torque acting on the haltere, and there is also a population of undifferentiated papillae which are responsive to all strains acting on the base of the haltere. This provides an additional method for flies to distinguish between the direction of force being applied to the haltere.
As Homeobox genes were being discovered and explored for the first time, it was found that the deletion or inactivation of the Hox gene Ultrabithorax (Ubx) causes the halteres to develop into a normal pair of wings. This was a very compelling early result as to the nature of Hox genes. Manipulations to the Antennapedia gene can similarly cause legs to become severely deformed, or can cause a set of legs to develop instead of antennae on the head.
The halteres function by detecting Coriolis forces, sensing the movement of air across the potentially rotating fly body. Studies have indicated that the angular velocity of the body is encoded by the Coriolis forces measured by the halteres . Active halteres can recruit any neighboring units, influencing nearby muscles and causing dramatic changes in the flight dynamics. Halteres have been shown to have extremely fast response times, allowing these flight changes to be performed much more quickly than if the fly were to rely on its visual system. In order to distinguish between different rotational components, such as pitch and roll, the fly must be able to combine signals from the two halteres, which must not be coincident (coincident signals would diminish the ability of the fly to differentiate the rotational axes). The halteres are capable of contributing to image stabilization, as well as in-flight attitude control, which was established by numerous authors noting a reaction from the head and wings to inputs from the components of the rotation rate vector. contributions from halteres to head and neck movements have been noted, explaining their role in gaze stabilization. The fly therefore uses input from the halteres to establish where to fixate its gaze, an interesting integration of the two senses.
Recordings have indicated that halteres are capable of responding to stimuli at the same (double-wingbeat) frequency as Coriolis forces, the proof of concept that allows further mathematical analysis of how these measurements can occur. The vector cross-product of the halteres' angular velocity and the rotation of the body provide the Coriolis force vector to the fly. This force is at the same frequency as the wingbeat in both the pitch and roll planes, and is doubly fast in the yaw plane. Halteres are capable of providing a rate damping signal to affect rotations. This is because the Coriolis force is proportional to the fly's own rotation rate. By measuring the Coriolis force, the halteres can send an appropriate signal to their affiliated muscles, allowing the fly to properly control its flight. The large amplitude of haltere motion allows for the calculation of the vertical and horizontal rates of rotation. Because of the large disparity in haltere movement between vertical and horizontal movement, Ω1, the vertical component of the rotation rate, generates a force of double the frequency of the horizontal component. It is widely thought that this twofold frequency difference is what allows the fly to distinguish between the vertical and horizontal components. If we assume that the haltere moves sinusoidally, a reasonably accurate approximation of its real-world behavior, the angular position γ can be modeled as: where ω is the haltere beat frequency, and the amplitude is 180, a close approximation to the real life range of motion. The body rotational velocities can be computed, given the known rates (the roll, pitch, and yaw components are labeled below with 1, 2, and 3, respectively) from the two halteres' (Ωb being the left and Ωc being the right haltere) reference frames, respective to the body of the fly with the following calculations :
α represents the haltere angle of rotation from the body plane, and the Ω terms are, as mentioned, the angular velocity of the haltere with respect to the body. Knowing this, one could roughly simulate input to the halteres using the equation for forces on the end knob of a haltere:
m is the mass of the knob of the haltere, g is the acceleration due to gravity, ri, vi,} and ai are the position, velocity, and acceleration of the knob relative to the body of the fly in the i direction, aF is the fly's linear acceleration, and Ωi and Ώi are the angular velocity and acceleration components for the direction i, respectively, of the fly in space. The Coriolis force is simulated by the 2mΩ × vi term. Because the sensory signal generated is proportional to the forces exerted on the halteres, this would allow the haltere signal to be simulated. If attempting to reconcile the force equation with the rotational component equations, it is worthwhile to remember that the force equation must be calculated separately for both halteres.
Butterflies and moth keep their balance with Johnston's organ: this is an organ at the base of a butterfly's antennae, and is responsible for maintaining the butterfly's sense of balance and orientation, especially during flight.
Spider´s Visual System
While the highly developed visual systems of some spider species have been subject to extensive studies since many decades, terms like animal intelligence or cognition were not usually used in the context of spider studies. Instead, spiders were traditionally portrayed as rather simple, instinct driven animals (Bristowe 1958, Savory 1928), processing visual input in pre-programmed patterns rather than actively interpreting the information received from their visual apparatus towards appropriate reactions. While Although this still seems to be the case in a majority of spiders, which primarily interact with the world through tactile sensation rather than by visual cues, some spider species have shown surprisingly intelligent use of their eyes. Considering its limited dimensions within the body, a spider´s optical apparatus and visual processing perform extremely well. Recent research points towards a very sophisticated use of visual cues in a spider´s world when investigating topics such as the complex hunting schemes of the vision-guided jumping spiders (Salticidae) taking huge leaps of up to 30 times their own body length onto prey or a wolf spider´s (Lycosidae) ability to visually recognize asymmetries in potential mates. Even in the case of the night-active Cupiennius salei (Ctenidae), relying primarily on other sensory organs, or the ogre-faced Dinopis hunting at night by spinning small webs and throwing them at approaching prey, the visual system is still highly developed. Findings like these are not only fascinating but are also inspiring other scientific and engineering fields such as robotics and computer-guided image analysis.
General structure of a spider´s visual system
A spider´s anatomy primarily consists of two major body segments, the prosoma and the opisthosoma, which are also known as the cephalothorax and abdomen, respectively. All extremities as well as the sensory organs including the eyes are located in the prosoma. Other than the visual system of arthropods featuring compound eyes, modern arachnid eyes are ocelli (simple eyes consisting of a lens covering a vitreous fluid-filled pit with a retina at the bottom), of which spiders have six or eight, characteristically arranged in three or four rows across the prosoma´s carapace. Overall, 99% of all spiders have eight eyes and of the remaining 1% almost all have six. Spiders with only six eyes lack the “principal eyes”, which are described in detail below.
The pairs of eyes are called anterior median eyes (AME), anterior lateral eyes (ALE), posterior median eyes (PME), and posterior lateral eyes (PLE). The large principal eyes facing forward are the anterior median eyes, which provide the highest spatial resolution to a spider, at the cost of a very narrow field of view. The smaller forward-facing eyes are the anterior lateral eyes with a moderate field of view and medium spatial resolution. The two posterior eye pairs are rather peripheral, secondary eyes with wide field of view. They are extremely sensitive and suitable for low-light conditions. Spiders use their secondary eyes for sensing motion, while their principal eyes allow shape and object recognition. In contrast to insect vision, a visually-based spider´s brain is almost completely devoted to vision, as it receives only the optic nerves and consists of only the optic ganglia and some association centers. The brain is apparently able to recognize object motion, but even more to also classify the counterpart into a potential mate, rival or prey by seeing legs (lines) at a particular angle to the body. Such stimulus will result in a spider displaying either courtship or threatening signs respectively.
A Spider´s eyes
Although spider eyes may be described as “camera eyes”, they are very different in their details from the “camera eyes” of mammals or any other animals. In order to fit a high-resolution eye into such a small body, neither an insect´s compound eyes nor spherical eyes, as we humans have them, would solve the problem. The ocelli found in spiders are the optically better solution, as their resolution is not limited by refractive effects at the lens which would be the case with compound eyes. When replacing the eye of a spider by a compound eye of the same resolving power, it would simply not fit into the spider´s prosoma. By using ocelli, the spatial acuity of some spiders is more similar to that of a mammal than to that of an insect, with a huge size difference and only a few thousand photocells, e.g. in a jumping spider´s eye, as compared to more than 150 million photocells in the human retina.
The anterior median eyes (AME), which are present in most spider species, are also called the principal eyes. Details about the principal eye´s structure and its components are illustrated in the figure below and are explained in the following by going through the AME of the jumping spider Portia (family Salticidae), which is famous for its high-spatial-acuity eyes and vision-guided behavior despite its very small body size of 4.5-9.5 mm.
When a light beam enters the principal eye it firstly passes a large corneal lens. This lens features a long focal length enabling it to magnify even distant objects. The combined field of view of the two principal eyes´ corneal lenses would cover about 90° in front of the salticid spider, however a retina with the desired acuity would be too large to fit inside a spider´s eye. The surprising solution is a small, elongated retina, which lies behind a long, narrow tube and a second lens (a concave pit) at its end. Such combination of a corneal lens (with a long focal length) and a long eye tube (magnifying the image from the corneal lens) resembles a telephoto system, making the pair of principal eyes similar to a pair of binoculars.
The salticid spider captures light beams successively on four retina layers of receptors, which lie behind each other (in contrast, the human retina is arranged in only one plane). This structure allows not only a larger number of photoreceptors in a confined area but also enables color vision, as the light is split into different colours (chromatic aberration) by the lens system. Different wavelengths of light thus come into focus at different distances, which correspond to the positions of the retina´s layers. While salticids discern green (layer 1 – ~580 nm, layer 2 – ~520-540 nm), blue (layer 3 – ~480-500 nm) and ultraviolet (layer 4 – ~360 nm) using their principal eyes, it is only the two rearmost layers (layers 1 and 2) which allow shape and form detection due to their close receptor spacing.
As in human eyes, there is a central region in layer 1 called the “fovea”, where the inter-receptor spacing was measured to about 1 μm. This was found to be optimal, as the telephoto optical system provides images precise enough to be sampled in this resolution, but any closer spacing would reduce the retina´s sampling quality due to quantum-level interference between adjacent receptors. Equipped with such eyes, Portia exceeds any insect by far when it comes to visual acuity: While the dragonfly Sympetrum striolatus has the highest acuity known for insects (0.4°), the acuity of Portia is ten times higher (0.04°) with much smaller eyes. The human eye with 0.007° acuity is only five times better than Portia´s. With such visual precision, Portia would be technically able to discriminate two objects which are 0.12 mm apart from a distance of 200 mm. The spatial acuity of other salticid eyes is usually not far behind that of Portia.
Principal eye retina movements
Such spectacular visual abilities come at a price within small animals as the jumping spiders: The retina in each of Portia´s principal eyes has only 2-5° field of view, while its fovea even captures only 0.6° field of view. This results from the principal retina having elongated boomerang-like shapes which span about 20° vertically and only 1° horizontally, corresponding to about six receptor rows. This severe limitation is compensated by sweeping the eye tube over the whole image of the scene using eye muscles, of which jumping spiders have six. These are attached to the outside of the principal eye tube and allow the same three degrees of freedom – horizontal, vertical, rotation – as in human eyes. Principal retinae can move by as much as 50° horizontally and vertically and rotate about the optical axis (torsion) by a similar amount.
Spiders making sophisticated use of visual cues move their principal eyes´ retinae either spontaneously, in “saccades” fixating the fovea on a moving visual target (“tracking”), or by “scanning”, which serves presumably for pattern recognition. It seems today, that spiders scan a scene sequentially by moving the eye-tube in complex patterns, allowing it to process high amounts of visual information despite their very limited brain capacities.
The spontaneous retinal movements, so-called “microsaccades”, are a mechanism thought to prevent the photoreceptor cells of the anterior-median eyes from adapting to a motionless visual stimulus. Cupiennius spiders, which feature 4 eye muscles - two dorsal and two ventral ones – continuously perform such microsaccades of 2° to 4° in the dorso-median direction, lasting about 80 ms (when fixed to a holder). The 2-4° of microsaccadic movements match closely to Cupiennius´ angle of about 3° between the receptor cells, supporting the idea of its function preventing adaption. In contrast, retinal movements elicited by mechanical stimulation (directing an air puff onto the tarsus of the second walking leg) can be considerably larger than the spontaneous retinal movements, with deflections up to 15°. Such stimulus increases eye muscle activity from being spontaneously active at 12 ± 1 Hz at the resting level to 80 Hz with the air puff stimulation applied. Active retinal movement of the two principal eyes is however never activated simultaneously during such experiments and no correlation exists between the two eyes regarding their direction either. These two mechanisms, spontaneous microsaccades as well as active “peering” by active retinal movement, seemingly allow spiders to follow and analyze stationary visual targets efficiently using only their principal eyes without reinforcing the saccadic movements by body movements.
However, there is another factor influencing visual capacities of a spider´s eye, which is the problem of keeping objects at different distances in focus. In human eyes, this is solved by accommodation, i.e. changing the shape of the lens, but salticids take a different approach: the receptors in layer 1 of their retina are arranged on a “staircase” at different distances from the lens. Thus, the image of any object, whether a few centimeters or some meters in front of the eye, will be in focus on some part of the layer-1 staircase. Additionally, the salticid can swing the eye tubes side to side without moving the corneal lenses and will thus sweep the staircase of each retina across the image of the corneal lense, sequentially obtaining a sharp image of the object.
The resulting visual performance is impressive: Jumping spiders such as Portia focus accurately on an object at distances between 2 centimeters to infinity, being able to see up to about 75 centimeters in practice. The time needed to recognize objects is however relatively long (seemingly in the range of 10-20 s) because of the complex scanning process needed to capture high-quality images from such tiny eyes. Due to this limitation, it is very difficult for spiders such as Portia to identify much larger predators fast enough because of the predator´s size, making the small spider an easy prey for birds, frogs and other predators.
Blurry vision for distance estimation
An unexpected finding recently surprised researchers, when it was shown that jumping spiders use a technique called blurry vision to estimate their distance to previously recognized prey before taking a jump. Where humans achieve depth perception using binocular vision and other animals do so by moving their heads around or measuring ultrasound responses, jumping spiders perform this task within their principal eyes. As in other jumping spider species, the principal eyes of Hasarius adansoni feature four retinal layers with the two bottom ones featuring photocells responding to green impulses. However, green light will only ever focus sharply on the bottom one, layer 1, due to its distance from the inner lens. Layer 2 would receive focused blue light, however these photoreceptor cells are not sensitive to blue and receive a fuzzy green image instead. Interestingly, the amount of blur depends on the distance of an object from the spider´s eye – the closer it is, the more out of focus it will appear on the second retina layer. At the same time, the first retina layer 1 always receives a sharp image due to its staircase structure. Jumping spiders are thus able to estimate depth using a single unmoving eye by comparing the images of the two bottom retina layers. This was confirmed by letting spiders jump at prey in an arena flooded with green light versus red light of equal brightness. Without the ability to use the green retina layers, jumping spiders would repeatedly fail to judge distance accurately and miss their jump.
In contrast to the principal eyes responsible for object analysis and discrimination, a spider´s secondary eyes act as motion detectors and therefore do not feature eye muscles to analyze a scene more extensively. Depending on their arrangement on the spider´s carapace, secondary eyes enable the animal to have panoramic vision detecting moving objects almost 360° around its body. The anterior and posterior lateral eyes (i.e. secondary eyes) only feature a single type of visual cells with a maximum spectral sensitivity for green colored light of ~535-540 nm wavelength. The number and arrangement of secondary eyes differs significantly between or even within different spider families, as does their structure: Large secondary eyes can contain several thousand rhabdomeres (the light-sensitive parts of the retina) and support hunters or nocturnal spiders with their high sensitivity to light, while small secondary eyes contain at most a few hundred rhabdomeres and only providing basic movement detection. Differently from the principal eyes which are everted (the rhabdomeres point towards the light), the secondary eyes of a spider are inverted, i.e. their rhabdomeres point away from the light, as is the case for vertebrates like the human eye. Spatial resolution of the secondary eyes e.g. in the extensively studied Cupiennius salei is greatest in horizontal direction, enabling the spider to analyse horizontal movements well even with the secondary eyes, while vertical movement may not be especially important when living in a “flat world”.
The reaction time of jumping spiders´ lateral eyes is comparably slow and amounts to 80-120 ms, measured with a 3°-sized (inter-receptor angle) square stimulus travelling past the animal´s eyes. The minimum stimulus travel distances, until the spider reacts, are 0.1° at a stimulus velocity of 1°/s, 1° at 9°/s and 2.5° at 27°/s. This means that a jumping spider´s visual system detects motion even if an object is travelling only a tenth of the secondary eyes´ inter-receptor angle at slow speed. If the stimulus gets even smaller to a size of only 0.5°, responds occur only after long delays, indicating that they lie at the spiders´ limit of perceivable motion.
Secondary eyes of (night-active) spiders usually feature a tapetum behind the rhabdomeres, which is a layer of crystals reflecting light back to the receptors to increase visual sensitivity. This allows night-hunting spiders to have eyes with an aperture as large as f/0.58 enabling them to capture visual information even in ultra-low-light conditions. Secondary eyes containing a tapetum thus easily reveal a spider´s location at night when illuminated e.g. by a flashlight.
Central nervous system and visual processing in the brain
As anywhere in neuroscience, we still know very little about a spider´s central nervous system (CNS), especially regarding its functioning in visually controlled behavior. Of all the spiders, the CNS of Cupiennius has been studied most extensively, focusing mainly on the CNS structure. As of today, only little is known about electrophysiological properties of central neurons in Cupiennius, and even less about other spiders in this regard.
The structure of a spider´s nervous system is closely related to its body´s subdivisions, but instead of being spread all over the body, the nervous tissue is enormously concentrated and centralized. The CNS is made up of two paired, rather simple nerve cell clusters (ganglia), which are connected to the spider´s muscles and sensory systems by nerves. The brain is formed by fusion of these ganglia in the head segments ahead of and behind the mouth and fills the prosoma largely with nervous tissue, while no ganglia exist in the abdomen. Looking at the spider´s brain, it receives direct inputs from only one sensory system, the eyes - unlike any insects and crustaceans. The eight optic nerves enter the brain from the front and their signals are processed in two optic lobes in the anterior region of the brain. When a spider´s behavior is especially dependent on vision, as in the case of the jumping spider, the optic ganglia contribute up to 31% of the brain´s volume, indicating the brain to be almost completely devoted to vision. This score still amounts to 20% for Cupiennius, whereas other spiders like Nephila and Ephebopus come in at only 2%.
The distinction between principal and secondary eyes persists in the brain. Both types of eyes have their own visual pathway with two separate neuropil regions fulfilling distinct tasks. Thus spiders evidently process the visual information provided by their two eye types in parallel, with the secondary eyes being specialized for detecting horizontal movement of objects and the principal eyes being used for the detection of shape and texture.
Two visual systems in one brain
While principal and secondary eyesight seems to be distinct in spiders´ brains, surprising inter-relations between both visual systems in the brain are known as well. In visual experiments principal eye muscle activity of Cupiennius was measured while covering either its principal or secondary eyes. When stimulating the animals in a white arena with short sequences of moving black bars, the principal eyes moved involuntarily whenever a secondary eye detected motion within its visual field. This activity increase of the principal eye muscles, compared to no stimulation presented, would not change when covering the principal eyes with black paint, but would stop with the secondary eyes masked. Thus it is now clear, that only the input received from secondary eyes controls principal eye muscle activity. Also, a spider´s principal eyes do not seem to be involved in motion detection, which is only the secondary eyes´ responsibility.
Other experiments using dual-channel telemetric registration of the eye muscle activities of Cupiennius have shown that the spider actively peers into the walking direction: The ipsilateral retina of the principal eyes was measured to shift with respect to the walking direction before, during and after a turn, while the contralateral retina remained in its resting position. This happened independently from the actual light conditions, suggesting a “voluntary” peering initiated by the spider´s brain.
Pattern recognition using principal eyes
Recognition of shape and form by jumping spiders is believed to be accomplished through a scanning process of the visual field, which consists of a complex set of rotations (torsional movements) and translations of the anterior-median eyes´ retinae. As described in the section “Principal eye retina movements”, a spider´s retinae are narrow and shaped like boomerangs, which can be matched with straight features by sweeping over the visual scene. When investigating a novel target, the eyes scan it in a stereotyped way: By moving slowly from side to side at speeds of 3-10° per second and rotating through ± 25°, horizontal and torsional retina movement allows the detection of differently positioned and rotated lines. This method can be understood as template matching where the template has elongated shape and produces a strong neural response whenever the retina matches a straight feature in the scene. This identifies a straight line with little or no further processing necessary.
A computer vision algorithm for straight line detection as an optimization problem (da Costa, da F. Costa) was inspired by the jumping spider´s visual system and uses the same approach of scanning a scene sequentially using template matching. While the well-known Hough Transform allows robust detection of straight visual features in an image, its efficiency is limited due to the necessity to calculate a good part or even the whole parameter space while searching for lines. In contrast the alternative approach used in salticid visual systems suggests searching the visual space by using a linear window, which allows adaptive searching schemes during the straight line search process without the need to systematically calculate the parameter space. Also, solving the straight line detection in such a way allows to understand it as an optimization problem, which makes efficient processing by computers possible. While it is necessary to find appropriate parameters controlling the annealing-based scanning experimentally, the approach taking a jumping spider´s path of straight line detection was proven to be very effective, especially with properly set parameters.
Discernment of visual targets
The ability of discerning between slightly different visual targets has been shown for Cupiennius salei, although this species relies mainly on its mechanosensory systems during prey catching or mating behavior. When presenting two targets at a distance of 2 m to the spider, its walking path depends on their visual appearance: Having to choose between two identical targets such as vertical bars, Cupiennius shows no preference. However the animal strongly prefers a vertical bar to a sloping bar or a V-shaped target.
The discrimination of different targets has been shown to be only possible with the principal eyes uncovered, while the spider is able to detect the targets using any of the eyes. This suggests that many spiders´ anterior-lateral (secondary) eyes are capable of much more than simply object movement detection. With all eyes covered, the spider exhibits totally undirected walking paths.
Placing Cupiennius in total darkness however results not only in undirected walks but also elicits a change of gait: Instead of using all eight legs the spider will only walk with six and employ the first legs as antennae, comparable to a blind person´s cane. In order to feel the surroundings the extended forelegs are moved up and down as well as sideways. This is specific to the first leg pair only, influenced solely by the visual input when the normal room light is switched to the invisible infrared light.
Vision-based decision making in jumping spiders
The behavior of jumping spiders after having detected movement with the eyes depends on three factors: the target´s size, speed and distance. If it has more than twice the spider´s size, the object is not approached and the spider tries to escape if it comes towards her. If the target has adequate size, its speed is visually analyzed using the secondary eyes. Fast moving targets with a speed of more than 4°/s are chased by jumping spiders, guided by her anterior-lateral eyes. Slower objects are carefully approached and analyzed with the anterior-median (i.e. principal) eyes to determine whether it is prey or another spider of the same species. This is seemingly achieved by applying the above described straight line detection, to find out whether a visual target features legs or not. While jumping spiders have shown to approach potential prey of appropriate characteristics as long as it moves, males are pickier in deciding whether their current counterpart might be a potential mate.
Potential mate detection
Experiments have shown that drawings of a central dot with leg-like appendages on the sides will result in courtship displays, suggesting that visual feature extraction is used by jumping spiders to detect the presence and orientation of linear structures in the target. Additionally, a spider´s behavior towards a considered conspecific spider depends on different factors such as sex and maturity of both involved spiders and whether it is mating time. Female wolf spiders, Schizocosa ocreata, even discern asymmetries in male secondary sexual characters when choosing their mate, possibly to avoid developmental instability in their offspring. Conspicuous tufts of bristles on a male´s forelegs, which are used for visual courtship signaling, appear to influence female mate choice and asymmetry of these body parts in consequence of leg loss and regeneration apparently reduces female receptivity to such male spiders.
Secondary eye-guided hunting
A jumping spider´s stalking behavior when hunting insect prey is comparable to a cat stalking birds. If something moves within the visual field of the secondary eyes, they initiate a turn to bring the larger, forward-facing pair of principal eyes into position for classifying the object´s shape into mate, rival or prey. Even very small, low contrast dot stimuli moving at slow or fast speeds elicit such orientation behavior. Like Cupiennius, jumping spiders are also able to use their secondary eyes for more sophisticated tasks than just motion detection: Presenting visual prey cues to salticids with only visual information from the secondary eyes available and both primary eyes covered, results in the animal exhibiting complete hunting sequences. This suggests that the anterior lateral eyes of jumping spiders may be the most versatile components of their visual system. Besides detecting motion, the secondary eyes obviously also feature a spatial acuity which is good enough to direct complete visually-guided hunting sequences.
Prey “face recognition”
Visual cues also play an important role for jumping spiders (salticids) when discriminating between salticid and non-salticid prey using principal eyesight. To this end a salticid prey´s large principal eyes provide critical cues, to which the jumping spider Portia fimbriata reacts by exhibiting cryptic stalking tactics before attacking (walking very slowly with palps retracted and freezing when faced). This behavior is only used when identifying a prey as salticid. This was exploited in experiments presenting computer-rendered, realistic three-dimensional lures with modified principal eyes to Portia fimbriata. While intact virtual lures resulted in cryptic stalking, lures without or with smaller principal eyes than usual (as sketched in the figure on the right) elicited different behavior. Presenting virtual salticid prey with only one anterior-median eye or a regular lure with two enlarged secondary eyes elicited cryptic stalking behavior suggesting successful recognition of a salticid, while P. fimbriata froze less often when faced by a Cyclops-like lure (a single principal eye centered between the two secondary eyes). Lures with square-edged principal eyes were usually not classified as a salticid, indicating that the shape of the principal eyes´ edges are an important cue to identify fellow salticids.
Jumping decisions from visual features
Spiders in the genus Phidippus have been tested within a study for their willingness to cross inhospitable open space by placing visual targets on the other side of a gap. It was found that whether the spider takes the risk of crossing open ground or not is mainly dependent on factors like distance to target, relative target size compared to distance and the target´s color and shape. In independent test runs, the spider moved to tall, distant targets equally often as to short, close targets, with both objects appearing equally sized on the spider´s retina. When giving the choice of moving to either white or green grass-like targets, the spiders consistently chose the green target irrespective of its contrast with the background, thus proving their ability to use color discernment in hunting situations.
Identifying microhabitat traits by visual cues
Presented with manipulated real plants and photos of plants, Psecas chapoda (a bromeliad-dwelling salticid spider) is able to detect a favorable microhabitat by visually analyzing architectural features of the host plant´s leaves and rosette. By using black-and-white photos, any potential influence of other cues, such as color and smell, on host plant selection by the spider could be excluded during a study, leaving only shape and form as discerning characteristics. Even when having to decide solely from photographs, Psecas chapoda consistently preferred rosette-shaped plants (Agavaceae) with narrow and long leaves over differently looking plants, which proves that some spider species are able to evaluate and distinguish physical structure of microhabitats only on the basis of shape from visual cues of plant traits.
If light passes through a prism, a colour spectrum will be formed at the other end of the prism ranging from red to violet. The wavelength of the red light is from 650nm to 700nm, and the violet light is at around 400nm to 420nm. This is the EM range detectable for the human eye.
The colour triangle is often used to illustrate the colour-mixing effect. The triangle entangles the visible spectrum, and a white dot is located in the middle of the triangle. Because of additive colour mixing property of red (700nm), green(546nm) and blue(435nm), every colour can be produced by mixing those three colours.
History of Sensory Systems
This Wikibook was started by engineers studying at ETH Zurich as part of the course Computational Simulations of Sensory Systems. The course combines physiology with an emphasis on the sensory systems, programming and signal processing. There is a plethora of information regarding these topics on the internet and in the literature, but there's a distinct lack of concise texts and books on the fusion of these 3 topics. The world needs a structured and thorough overview of biology and biological systems from an engineering point of view, which is what this book is trying to correct. We will start off with the Visual System, focusing on the biological and physiological aspects, mainly because this will be used in part to grade our performance in the course. The other part being the programming aspects have already been evaluated and graded. It is the authors' wishes that eventually information on physiology/biology, signal processing AND programming shall be added to each of the sensory systems. Also we hope that more sections will be added to extend the book in ways previously not thought of.
The original title of the Wikibook, Biological Machines, stressed the technical aspects of sensory system. However, as the wikibook evolved it became a comprehensive overview of human sensory systems, with additional emphasis on technical aspects of these systems. This focus is better represented with Sensory Systems, the new wikibook title since December 2011.
- http://www.eyedesignbook.com/ <-- Watch out, religious fanatic here.
- Biology of Spiders by Rainer F. Foelix - Vision page 82-93
- Photoreceptors and light signalling by Alfred Batschauer, Royal Society of Chemistry (Great Britain), Published by Royal Society of Chemistry, 2003, ISBN 085404311X, 9780854043118
- Structural differences of cone 'oil-droplets' in the light and dark adapted retina of Poecilia reticulata P., Yvette W. Kunz and Christina Wise
- Advances in organ biology, Volume 10, Pages 1-395 (2005)
- <http://www.search.eb.com/eb/art-53283>"optic chiasm: visual pathways." Online Art. Encyclopædia Britannica Online.
- Color atlas of physiology,Despopoulos, A. and Silbernagl, S.,2003, Thieme
- Neurotransmitter systems in the retina, Ehinger, B., Retina, 2-4, 305, 1982
- Intraoperative Neurophysiological Monitoring, 2nd Edition, Aage R. Møller, Humana Press 2006, Totowa, New Jersey, pages 55-70
- The Science and Applications of Acoustics, 2nd Edition, Daniel R. Raichel, Springer Science&Business Media 2006, New York, pages 213-220
- Physiology of the Auditory System, P. J. Abbas, 1993, in: Cummings Otolaryngology: Head and Neck Surgery, 2nd edition, Mosby Year Book, St. Louis
- Computer Simulations of Sensory Systems, Lecture Script Ver 1.3 March 2010, T. Haslwanter, Upper Austria University of Applied Sciences, Linz, Austria,
- A. Carleton, R. Accolla, S. A. Simon, Trends Neurosci 33, 326 (Jul).
- P. Dalton, N. Doolittle, H. Nagata, P. A. Breslin, Nat Neurosci 3, 431 (May, 2000).
- J. A. Gottfried, R. J. Dolan, Neuron 39, 375 (Jul 17, 2003).
- K. L. Mueller et al., Nature 434, 225 (Mar 10, 2005).
- J. B. Nitschke et al., Nat Neurosci 9, 435 (Mar, 2006).
- T. Okubo, C. Clark, B. L. Hogan, Stem Cells 27, 442 (Feb, 2009).
- D. V. Smith, S. J. St John, Curr Opin Neurobiol 9, 427 (Aug, 1999).
- D. A. Yarmolinsky, C. S. Zuker, N. J. Ryba, Cell 139, 234 (Oct 16, 2009).
- G. Q. Zhao et al., Cell 115, 255 (Oct 31, 2003).
- Kandel, E., Schwartz, J., and Jessell, T. (2000) Principles of Neural Science. 4th edition. McGraw Hill, New York.
This list contains the names of all the authors that have contributed to this text. If you have added, modified or contributed in any way, please add your name to this list.
|Thomas Haslwanter||Upper Austria University of Applied Sciences / ETH Zurich|
|Aleksander George Slater||Imperial College London / ETH Zurich|
|Piotr Jozef Sliwa||Imperial College London / ETH Zurich|
|Qian Cheng||ETH Zurich|
|Salomon Wettstein||ETH Zurich|
|Philipp Simmler||ETH Zurich|
|Renate Gander||ETH Zurich|
|Gerick Lee||University of Zurich & ETH Zurich|
|Gabriela Michel||ETH Zurich|
|Peter O'Connor||ETH Zurich|
|Nikhil Biyani||ETH Zurich|
|Mathias Buerki||ETH Zurich|
|Jianwen Sun||ETH Zurich|
|Maurice Göldi||University of Zurich|
|Sofia Jativa||ETH Zurich|
|Salomon Diether||ETH Zurich|
|Arturo Moncada-Torres||ETH Zurich|
|Datta Singh Goolaub||ETH Zurich|