Biological Machines/Print version
|This is the print version of Biological Machines
You won't see this message or any elements not part of the book's content when you print or preview this page.
The Wikibook of
Biological Organisms, an Engineer's Point of View.
From Wikibooks: The Free Library
This Wikibook was started by engineers studying at ETH Zurich as part of the course Computational Simulations of Sensory Systems. The course combines physiology with an emphasis on the sensory systems, programming and signal processing. There is a plethora of information regarding these topics on the internet and in the literature, but there's a distinct lack of concise texts and books on the fusion of these 3 topics. The world needs a structured and thorough overview of biology and biological systems from an engineering point of view, which is what this book is trying to correct. We will start off with the Visual System, focusing on the biological and physiological aspects, mainly because this will be used in part to grade our performance in the course. The other part being the programming aspects have already been evaluated and graded. It is the authors' wishes that eventually information on physiology/biology, signal processing AND programming shall be added to each of the sensory systems. Also we hope that more sections will be added to extend the book in ways previously unthought of.
Table of Contents
- Simulating Neural Systems
- Visual System
- Auditory System
- Vestibular System
- Somatosensory System
- Olfactory System
- Gustatory System
- Sensory Systems in Non-primates
While the human brain may make us what we are, our sensory systems are our windows and doors to the world. In fact they are our ONLY windows and doors to the world. So when one of these systems fails, the corresponding part of our world is no longer accessible to us. Recent advances in engineering have made it possible to replace sensory systems by mechanical and electrical sensors, and to couple those sensors electronically to our nervous system. While to many this may sound futuristic and maybe even a bit scary, it can work magically. For the auditory system, so called “cochlea implants” have given thousands of patients who were completely deaf their hearing back, so that they can interact and communicate freely again with their family and friends. Many research groups are also exploring different approaches to retinal implants, in order to restore vision to the blind. And in 2010 the first patient has been implanted with a “vestibular implant”, to alleviate defects in his balance system.
The wikibook “Sensory Systems” wants to present our sensory systems from an engineering and information processing point of view. On the one hand, this provides some insight in the sometimes spectacular ingenuity and performance of our senses. On the other hand, it provides some understanding of how our senses transduce external information into signals that our central nervous system can work with, and how – and how well - this process can be replaced by technical components.
Generally speaking, visual systems rely on electromagnetic (EM) waves to give an organism more information about its surroundings. This information could be regarding potential mates, dangers and sources of sustenance. Different organisms have different constituents that make up what is referred to as a visual system.
The complexity of eyes range from something as simple as an eye spot, which is nothing more than a collection of photosensitive cells, to a fully fledged camera eye. If an organism has different types of photosensitive cells, or cells sensitive to different wavelength ranges, the organism would theoretically be able to perceive colour or at the very least colour differences. Polarisation, another property of EM radiation, can be detected by some organisms, with insects and cephalopods having the highest accuracy.
Please note, in this text, the focus has been on using EM waves to see. Granted, some organisms have evolved alternative ways of obtaining sight or at the very least supplementing what they see with extra-sensory information. For example, whales or bats, which use echo-location. This may be seeing in some sense of the definition of the word, but it is not entirely correct. Additionally, vision and visual are words most often associated with EM waves in the visual wavelength range, which is normally defined as the same wavelength limits of human vision. Since some organisms detect EM waves with frequencies below and above that of humans a better definition must be made. We therefore define the visual wavelength range as wavelengths of EM between 300nm and 800nm. This may seem arbitrary to some, but selecting the wrong limits would render parts of some bird's vision as non-vision. Also, with this range of wavelengths, we have defined for example the thermal-vision of certain organisms, like for example snakes as non-vision. Therefore snakes using their pit organs, which is sensitive to EM between 5000nm and 30,000nm (IR), do not "see", but somehow "feel" from afar. Even if blind specimens have been documented targeting and attacking particular body parts.
Firstly a brief description of different types of visual system sensory organs will be elaborated on, followed by a thorough explanation of the components in human vision, the signal processing of the visual pathway in humans and finished off with an example of the perceptional outcome due to these stages.
Vision, or the ability to see depends on visual system sensory organs or eyes. There are many different constructions of eyes, ranging in complexity depending on the requirements of the organism. The different constructions have different capabilities, are sensitive to different wave-lengths and have differing degrees of acuity, also they require different processing to make sense of the input and different numbers to work optimally. The ability to detect and decipher EM has proved to be a valuable asset to most forms of life, leading to an increased chance of survival for organisms that utilise it. In environments without sufficient light, or complete lack of it, lifeforms have no added advantage of vision, which ultimately has resulted in atrophy of visual sensory organs with subsequent increased reliance on other senses (e.g. some cave dwelling animals, bats etc.). Interestingly enough, it appears that visual sensory organs are tuned to the optical window, which is defined as the EM wavelengths (between 300nm and 1100nm) that pass through the atmosphere reaching to the ground. This is shown in the figure below. You may notice that there exists other "windows", an IR window, which explains to some extent the thermal-"vision" of snakes, and a radiofrequency (RF) window, of which no known lifeforms are able to detect.
Through time evolution has yielded many eye constructions, and some of them have evolved multiple times, yielding similarities for organisms that have similar niches. There is one underlying aspect that is essentially identical, regardless of species, or complexity of sensory organ type, the universal usage of light-sensitive proteins called opsins. Without focusing too much on the molecular basis though, the various constructions can be categorised into distinct groups:
- Spot Eyes
- Pit Eyes
- Pinhole Eyes
- Lens Eyes
- Refractive Cornea Eyes
- Reflector Eyes
- Compound Eyes
The least complicated configuration of eyes enable organisms to simply sense the ambient light, enabling the organism to know whether there is light or not. It is normally simply a collection of photosensitive cells in a cluster in the same spot, thus sometimes referred to as spot eyes, eye spot or stemma. By either adding more angular structures or recessing the spot eyes, an organisms gains access to directional information as well, which is a vital requirement for image formation. These so called pit eyes are by far the most common types of visual sensory organs, and can be found in over 95% of all known species.
Taking this approach to the obvious extreme leads to the pit becoming a cavernous structure, which increases the sharpness of the image, alas at a loss in intensity. In other words, there is a trade-off between intensity or brightness and sharpness. An example of this can be found in the Nautilus, species belonging to the family Nautilidae, organisms considered to be living fossils. They are the only known species that has this type of eye, referred to as the pinhole eye, and it is completely analogous to the pinhole camera or the camera obscura. In addition, like more advanced cameras, Nautili are able to adjust the size of the aperture thereby increasing or decreasing the resolution of the eye at a respective decrease or increase in image brightness. Like the camera, the way to alleviate the intensity/resolution trade-off problem is to include a lens, a structure that focuses the light unto a central area, which most often has a higher density of photo-sensors. By adjusting the shape of the lens and moving it around, and controlling the size of the aperture or pupil, organisms can adapt to different conditions and focus on particular regions of interest in any visual scene. The last upgrade to the various eye constructions already mentioned is the inclusion of a refractive cornea. Eyes with this structure have delegated two thirds of the total optic power of the eye to the high refractive index liquid inside the cornea, enabling very high resolution vision. Most land animals, including humans have eyes of this particular construct. Additionally, many variations of lens structure, lens number, photosensor density, fovea shape, fovea number, pupil shape etc. exists, always, to increase the chances of survival for the organism in question. These variations lead to a varied outward appearance of eyes, even with a single eye construction category. Demonstrating this point, a collection of photographs of animals with the same eye category (refractive cornea eyes) is shown below.
An alternative to the lens approach called reflector eyes can be found in for example mollusks. Instead of the conventional way of focusing light to a single point in the back of the eye using a lens or a system of lenses, these organisms have mirror like structures inside the chamber of the eye that reflects the light into a central portion, much like a parabola dish. Although there are no known examples of organisms with reflector eyes capable of image formation, at least one species of fish, the spookfish (Dolichopteryx longipes) uses them in combination with "normal" lensed eyes.
The last group of eyes, found in insects and crustaceans, is called compound eyes. These eyes consist of a number of functional sub-units called ommatidia, each consisting of a facet, or front surface, a transparent crystalline cone and photo-sensitive cells for detection. In addition each of the ommatidia are separated by pigment cells, ensuring the incoming light is as parallel as possible. The combination of the outputs of each of these ommatidia form a mosaic image, with a resolution proportional to the number of ommatidia units. For example, if humans had compound eyes, the eyes would have covered our entire faces to retain the same resolution. As a note, there are many types of compound eyes, but delving to deep into this topic is beyond the scope of this text.
Not only the type of eyes vary, but also the number of eyes. As you are well aware of, humans usually have two eyes, spiders on the other hand have a varying number of eyes, with most species having 8. Normally the spiders also have varying sizes of the different pairs of eyes and the differing sizes have different functions. For example, in jumping spiders 2 larger front facing eyes, give the spider excellent visual acuity, which is used mainly to target prey. 6 smaller eyes have much poorer resolution, but helps the spider to avoid potential dangers. Two photographs of the eyes of a jumping spider and the eyes of a wolf spider are shown to demonstrate the variability in the eye topologies of arachnids.
Anatomy of the Visual System
We humans are visual creatures, therefore our eyes are complicated with many components. In this chapter, an attempt is made to describe these components, thus giving some insight into the properties and functionality of human vision.
Getting inside of the eyeball - Pupil, iris and the lens
Light rays enter the eye structure through the black aperture or pupil in the front of the eye. The black appearance is due to the light being fully absorbed by the tissue inside the eye. Only through this pupil can light enter into the eye which means the amount of incoming light is effectively determined by the size of the pupil. A pigmented sphincter surrounding the pupil functions as the eye's aperture stop. It is the amount of pigment in this iris, that give rise to the various eye colours found in humans.
In addition to this layer of pigment, the iris has 2 layers of ciliary muscles. A circular muscle called the pupillary sphincter in one layer, that contracts to make the pupil smaller. The other layer has a smooth muscle called the pupillary dilator, which contracts to dilate the pupil. The combination of these muscles can thereby dilate/contract the pupil depending on the requirements or conditions of the person. The ciliary muscles are controlled by ciliary zonules, fibres that also change the shape of the lens and hold it in place.
The lens is situated immediately behind the pupil. Its shape and characteristics reveal a similar purpose to that of camera lenses, but they function in slightly different ways. The shape of the lens is adjusted by the pull of the ciliary zonules, which consequently changes the focal length. Together with the cornea, the lens can change the focus, which makes it a very important structure indeed, however only one third of the total optical power of the eye is due to the lens itself. It is also the eye's main filter. Lens fibres make up most of the material for the lense, which are long and thin cells void of most of the cell machinery to promote transparency. Together with water soluble proteins called crystallins, they increase the refractive index of the lens. The fibres also play part in the structure and shape of the lens itself.
Beamforming in the eye – Cornea and its protecting agent - Sclera
The cornea, responsible for the remaining 2/3 of the total optical power of the eye, covers the iris, pupil and lens. It focuses the rays that pass through the iris before they pass through the lens. The cornea is only 0.5mm thick and consists of 5 layers:
- Epithelium: A layer of epithelial tissue covering the surface of the cornea.
- Bowman's membrane: A thick protective layer composed of strong collagen fibres, that maintain the overall shape of the cornea.
- Stroma: A layer composed of parallel collagen fibrils. This layer makes up 90% of the cornea's thickness.
- Descemet's membrane and Endothelium: Are two layers adjusted to the anterior chamber of the eye filled with aqueous humor fluid produced by the ciliary body. This fluid moisturises the lens, cleans it and maintains the pressure in the eye ball. The chamber, positioned between cornea and iris, contains a trabecular meshwork body through which the fluid is drained out by Schlemm canal, through posterior chamber.
The surface of the cornea lies under two protective membranes, called the sclera and Tenon’s capsule. Both of these protective layers completely envelop the eyeball. The sclera is built from collagen and elastic fibres, which protect the eye from external damages, this layer also gives rise to the white of the eye. It is pierced by nerves and vessels with the largest hole reserved for the optic nerve. Moreover, it is covered by conjunctiva, which is a clear mucous membrane on the surface of the eyeball. This membrane also lines the inside of the eyelid. It works as a lubricant and, together with the lacrimal gland, it produces tears, that lubricate and protect the eye. The remaining protective layer, the eyelid, also functions to spread this lubricant around.
Moving the eyes – extra-ocular muscles
The eyeball is moved by a complicated muscle structure of extra-ocular muscles consisting of four rectus muscles – inferior, medial, lateral and superior and two oblique – inferior and superior. Positioning of these muscles is presented below, along with functions:
As you can see, the extra-ocular muscles (2,3,4,5,6,8) are attached to the sclera of the eyeball and originate in the annulus of Zinn, a fibrous tendon surrounding the optic nerve. A pulley system is created with the trochlea acting as a pulley and the superior oblique muscle as the rope, this is required to redirect the muscle force in the correct way. The remaining extra-ocular muscles have a direct path to the eye and therefore do not form these pulley systems. Using these extra-ocular muscles, the eye can rotate up, down, left, right and alternative movements are possible as a combination of these.
Other movements are also very important for us to be able to see. Vergence movements enable the proper function of binocular vision. Unconscious fast movements called saccades, are essential for people to keep an object in focus. The saccade is a sort of jittery movement performed when the eyes are scanning the visual field, in order to displace the point of fixation slightly. When you follow a moving object with your gaze, your eyes perform what is referred to as smooth pursuit. Additional involuntary movements called nystagmus are caused by signals from the vestibular system, together they make up the vestibulo-ocular reflexes.
The brain stem controls all of the movements of the eyes, with different areas responsible for different movements.
- Pons: Rapid horizontal movements, such as saccades or nystagmus
- Mesencephalon: Vertical and torsional movements
- Cerebellum: Fine tuning
- Edinger-Westphal nucleus: Vergence movements
Where the vision reception occurs – The retina
Before being transduced, incoming EM passes through the cornea, lens and the macula. These structures also act as filters to reduce unwanted EM, thereby protecting the eye from harmful radiation. The filtering response of each of these elements can be seen in the figure "Filtering of the light performed by cornea, lens and pigment epithelium". As one may observe, the cornea attenuates the lower wavelengths, leaving the higher wavelengths nearly untouched. The lens blocks around 25% of the EM below 400nm and more than 50% below 430nm. Finally, the pigment ephithelium, the last stage of filtering before the photo-reception, affects around 30% of the EM between 430nm and 500nm.
A part of the eye, which marks the transition from non-photosensitive region to photosensitive region, is called the ora serrata. The photosensitive region is referred to as the retina, which is the sensory structure in the back of the eye. The retina consists of multiple layers presented below with millions of photoreceptors called rods and cones, which capture the light rays and convert them into electrical impulses. Transmission of these impulses is nervously initiaed by the ganglion cells and conducted through the optic nerve, the single route by which information leaves the eye.
A conceptual illustration of the structure of the retina is shown on the right. As we can see, there are five main cell types:
- photoreceptor cells
- horizontal cells
- bipolar cells
- amecrine cells
- ganglion cells
Photoreceptor cells can be further subdivided into two main types called rods and cones. Cones are much less numerous than rods in most parts of the retina, but there is an enormous aggregation of them in the macula, especially in its central part called the fovea. In this central region, each photo-sensitive cone is connected to one ganglion-cell. In addition, the cones in this region are slightly smaller than the average cone size, meaning you get more cones per area. Because of this ratio, and the high density of cones, this is where we have the highest visual acuity.
There are 3 types of human cones, each of the cones responding to a specific range of wavelengths, because of three types of a pigment called photopsin. Each pigment is sensitive to red, blue or green wavelength of light, so we have blue, green and red cones, also called S-, M- and L-cones for their sensitivity to short-, medium- and long-wavelength respectively. It consists of protein called opsin and a bound chromphore called the retinal. The main building blocks of the cone cell are the synaptic terminal, the inner and outer segments, the interior nucleus and the mitochondria.
The spectral sensitivities of the 3 types of cones:
- 1. S-cones absorb short-wave light, i.e. blue-violet light. The maximum absorption wavelength for the S-cones is 420nm
- 2. M-cones absorb blue-green to yellow light. In this case The maximum absorption wavelength is 535nm
- 3. L-cones absorb yellow to red light. The maximum absorption wavelength is 565nm
The inner segment contains organelles and the cell's nucleus and organelles. The pigment is located in the outer segment, attached to the membrane as trans-membrane proteins within the invaginations of the cell-membrane that form the membranous disks, which are clearly visible in the figure displaying the basic structure of rod and cone cells. The disks maximize the reception area of the cells. The cone photoreceptors of many vertebrates contain spherical organelles called oil droplets, which are thought to constitute intra-ocular filters which may serve to increase contrast, reduce glare and lessen chromatic aberrations caused by the mitochondrial size gradient from the periphery to the centres.
Rods have a structure similar to cones, however they contain the pigment rhodopsin instead, which allows them to detect low-intensity light and makes them 100 times more sensitive than cones. Rhodopsin is the only pigment found in human rods, and it is found on the outer side of the pigment epithelium, which similarly to cones maximizes absorption area by employing a disk structure. Similarly to cones, the synaptic terminal of the cell joins it with a bipolar cell and the inner and outer segments are connected by cilium.
The pigment rhodopsin absorbs the light between 400-600nm, with a maximum absorption at around 500nm. This wavelength corresponds to greenish-blue light which means blue colours appear more intense in relation to red colours at night.
EM waves with wavelengths outside the range of 400 – 700 nm are not detected by either rods nor cones, which ultimately means they are not visible to human beings.
Horizontal cells occupy the inner nuclear layer of the retina. There are two types of horizontal cells and both types hyper-polarise in response to light i.e. they become more negative. Type A consists of a subtype called HII-H2 which interacts with predominantly S-cones. Type B cells have a subtype called HI-H1, which features a dendrite tree and an axon. The former contacts mostly M- and L-cone cells and the latter rod cells. Contacts with cones are made mainly by prohibitory synapses, while the cells themselves are joined into a network with gap junctions.
Bipolar cells spread single dendrites in the outer plexiform layer and the perikaryon, their cell bodies, are found in the inner nuclear layer. Dendrites interconnect exclusively with cones and rods and we differentiate between one rod bipolar cell and nine or ten cone bipolar cells. These cells branch with amacrine or ganglion cells in the inner plexiform layer using an axon. Rod bipolar cells connect to triad synapses or 18-70 rod cells. Their axons spread around the inner plexiform layer synaptic terminals, which contain ribbon synapses and contact a pair of cell processes in dyad synapses. They are connected to ganglion cells with AII amacrine cell links.
Amecrine cells can be found in the inner nuclear layer and in the ganglion cell layer of the retina. Occasionally they are found in the inner plexiform layer, where they work as signal modulators. They have been classified as narrow-field, small-field, medium-field or wide-field depending on their size. However, many classifications exist leading to over 40 different types of amecrine cells.
Ganglion cells are the final transmitters of visual signal from the retina to the brain. The most common ganglion cells in the retina is the midget ganglion cell and the parasol ganglion cell. The signal after having passed through all the retinal layers is passed on to these cells which are the final stage of the retinal processing chain. All the information is collected here forwarded to the retinal nerve fibres and optic nerves. The spot where the ganglion axons fuse to create an optic nerve is called the optic disc. This nerve is built mainly from the retinal ganglion axons and Portort cells. The majority of the axons transmit data to the lateral geniculate nucleus, which is a termination nexus for most parts of the nerve and which forwards the information to the visual cortex. Some ganglion cells also react to light, but because this response is slower than that of rods and cones, it is believed to be related to sensing ambient light levels and adjusting the biological clock.
As mentioned before the retina is the main component in the eye, because it contains all the light sensitive cells. Without it, the eye would be comparable to a digital camera without the CCD (Charge Coupled Device) sensor. This part elaborates on how the retina perceives the light, how the optical signal is transmitted to the brain and how the brain processes the signal to form enough information for decision making.
Creation of the initial signals - Photosensor Function
Vision invariably starts with light hitting the photo-sensitive cells found in the retina. Light-absorbing visual pigments, a variety of enzymes and transmitters in retinal rods and cones will initiate the conversion from visible EM stimuli into electrical impulses, in a process known as photoelectric transduction. Using rods as an example, the incoming visible EM hits rhodopsin molecules, transmembrane molecules found in the rods' outer disk structure. Each rhodopsin molecule consists of a cluster of helices called opsin that envelop and surround 11-cis retinal, which is the part of the molecule that will change due to the energy from the incoming photons. In biological molecules, moieties, or parts of molecules that will cause conformational changes due to this energy is sometimes referred to as chromophores. 11-cis retinal straightens in response to the incoming energy, turning into retinal (all-trans retinal), which forces the opsin helices further apart, causing particular reactive sites to be uncovered. This "activated" rhodopsin molecule is sometimes referred to as Metarhodopsin II. From this point on, even if the visible light stimulation stops, the reaction will continue. The Metarhodopsin II can then react with roughly 100 molecules of a Gs protein called transducing, which then results in as and ß? after the GDP is converted into GTP. The activated as-GTP then binds to cGMP-phosphodiesterase(PDE), suppressing normal ion-exchange functions, which results in a low cytosol concentration of cation ions, and therefore a change in the polarisation of the cell.
The natural photoelectric transduction reaction has an amazing power of amplification. One single retinal rhodopsin molecule activated by a single quantum of light causes the hydrolysis of up to 106 cGMP molecules per second.
- A light photon interacts with the retinal in a photoreceptor. The retinal undergoes isomerisation, changing from the 11-cis to all-trans configuration.
- Retinal no longer fits into the opsin binding site.
- Opsin therefore undergoes a conformational change to metarhodopsin II.
- Metarhodopsin II is unstable and splits, yielding opsin and all-trans retinal.
- The opsin activates the regulatory protein transducin. This causes transducin to dissociate from its bound GDP, and bind GTP, then the alpha subunit of transducin dissociates from the beta and gamma subunits, with the GTP still bound to the alpha subunit.
- The alpha subunit-GTP complex activates phosphodiesterase.
- Phosphodiesterase breaks down cGMP to 5'-GMP. This lowers the concentration of cGMP and therefore the sodium channels close.
- Closure of the sodium channels causes hyperpolarization of the cell due to the ongoing potassium current.
- Hyperpolarization of the cell causes voltage-gated calcium channels to close.
- As the calcium level in the photoreceptor cell drops, the amount of the neurotransmitter glutamate that is released by the cell also drops. This is because calcium is required for the glutamate-containing vesicles to fuse with cell membrane and release their contents.
- A decrease in the amount of glutamate released by the photoreceptors causes depolarization of On center bipolar cells (rod and cone On bipolar cells) and hyperpolarization of cone Off bipolar cells.
Without visible EM stimulation, rod cells containing a cocktail of ions, proteins and other molecules, have membrane potential differences of around -40mV. Compared to other nerve cells, this is quite high (-65mV). In this state, the neurotransmitter glutamate is continuously released from the axon terminals and absorbed by the neighbouring bipolar cells. With incoming visble EM and the previously mentioned cascade reaction, the potential difference drops to -70mV. This hyper-polarisation of the cell causes a reduction in the amount of released glutamate, thereby affecting the activity of the bipolar cells, and subsequently the following steps in the visual pathway.
Similar processes exist in the cone-cells and in photosensitive ganglion cells, but make use of different opsins. Photopsin I through III (yellowish-green, green and blue-violet respectively) are found in the three different cone cells and melanopsin (blue) can be found in the photosensitive ganglion cells.
Processing Signals in the Retina
Different bipolar cells react differently to the changes in the released glutamate. The so called ON and OFF bipolar cells are used to form the direct signal flow from cones to bipolar cells. The ON bipolar cells will depolarise by visible EM stimulation and the corresponding ON ganglion cells will be activated. On the other hand the OFF bipolar cells are hyper polarised by the visible EM stimulation, and the OFF ganglion cells are inhibited. This is the basic pathway of the Direct signal flow. The Lateral signal flow will start from the rods, then go to the bipolar cells, the amacrine cells, and the OFF bipolar cells inhibited by the Rod-amacrine cells and the ON bipolar cells will stimulated via an electrical synapse, after all of the previous steps, the signal will arrive at the ON or OFF ganglion cells and the whole pathway of the Lateral signal flow is established.
When the action potential (AP) in ON, ganglion cells will be triggered by the visible EM stimulus. The AP frequency will increase when the sensor potential increases. In other words, AP depends on the amplitude of the sensor's potential. The region of ganglion cells where the stimulatory and inhibitory effects influence the AP frequency is called receptive field (RF). Around the ganglion cells, the RF is usually composed of two regions: the central zone and the ring-like peripheral zone. They are distinguishable during visible EM adaptation. A visible EM stimulation on the centric zone could lead to AP frequency increase and the stimulation on the periphery zone will decrease the AP frequency. When the light source is turned off the excitation occurs. So the name of ON field (central field ON) refers to this kind of region. Of course the RF of the OFF ganglion cells act the opposite way and is therefore called "OFF field" (central field OFF). The RFs are organised by the horizontal cells. The impulse on the periphery region will be impulsed and transmitted to the central region, and there the so-called stimulus contrast is formed. This function will make the dark seem darker and the light brighter. If the whole RF is exposed to light. the impulse of the central region will predominate.
Signal Transmission to the Cortex
As mentioned previously, axons of the ganglion cells converge at the optic disk of the retina, forming the optic nerve. These fibres are positioned inside the bundle in a specific order. Fibres from the macular zone of the retina are in the central portion, and those from the temporal half of the retina take up the periphery part. A partial decussation or crossing occurs when these fibres are outside the eye cavity. The fibres from the nasal halves of each retina cross to the opposite halves and extend to the brain. Those from the temporal halves remain uncrossed. This partial crossover is called the optic chiasma, and the optic nerves past this point are called optic tracts, mainly to distinguish them from single-retinal nerves. The function of the partial crossover is to transmit the right-hand visual field produced by both eyes to the left-hand half of the brain only and vice versa. Therefore the information from the right half of the body, and the right visual field, is all transmitted to the left-hand part of the brain when reaches the posterior part of the fore-brain (diencephalon).
The information relay between the fibers of optic tracts and the nerve cells occurs in the lateral geniculate bodies, the central part of the visual signal processing, located in the thalamus of the brain. From here the information is passed to the nerve cells in the occipital cortex of the corresponding side of the brain. Connections from the retina to the brain can be separated into a 'parvocellular pathway' and a "magnocellular pathway". The parvocellular pathways signals color and fine detail, whereas the magnocellular pathways detect fast moving stimuli.
Signals from standard digital cameras correspond approximately to those of the parvocellular pathway. To simulate the responses of parvocellular pathways, researchers have been developing neuromorphic sensory systems, which try to mimic spike-based computation in neural systems. Thereby they use a scheme called "address-event representation" for the signal transmission in the neuromorphic electronic systems (Liu and Delbruck 2010 ).
Anatomically, the retinal Magno and Parvo ganglion cells respectively project to 2 ventral magnocellular layers and 4 dorsal parvocellular layers of the Lateral Geniculate Nucleus (LGN). Each of the six LGN layers receives inputs from either the ipsilateral or contralateral eye, i.e., the ganglion cells of the left eye cross over and project to layer 1, 4 and 6 of the right LGN, and the right eye ganglion cells project (uncrossed) to its layer 2, 3 and 5. From here the information from the right and left eye is separated.
Although human vision is combined by two halves of the retina and the signal is processed by the opposite cerebral hemispheres, the visual field is considered as a smooth and complete unit. Hence the two visual cortical areas are thought of as being intimately connected. This connection, called corpus callosum is made of neurons, axons and dendrites. Because the dendrites make synaptic connections to the related points of the hemispheres, electric simulation of every point on one hemisphere indicates simulation of the interconnected point on the other hemisphere. The only exception to this rule is the primary visual cortex.
The synapses are made by the optic tract in the respective layers of the lateral geniculate body. Then these axons of these third-order nerve cells are passed up to the calcarine fissure in each occipital lobe of the cerebral cortex. Because bands of the white fibres and axons pair from the nerve cells in the retina go through it, it is called the striate cortex, which incidentally is our primary visual cortex, sometimes known as V1. At this point, impulses from the separate eyes converge to common cortical neurons, which then enables complete input from both eyes in one region to be used for perception and comprehension. Pattern recognition is a very important function of this particular part of the brain, with lesions causing problems with visual recognition or blindsight.
Based on the ordered manner in which the optic tract fibres pass information to the lateral geniculate bodies and after that pass in to the striate area, if one single point stimulation on the retina was found, the response which produced electrically in both lateral geniculate body and the striate cortex will be found at a small region on the particular retinal spot. This is an obvious point-to-point way of signal processing. And if the whole retina is stimulated, the responses will occur on both lateral geniculate bodies and the striate cortex gray matter area. It is possible to map this brain region to the retinal fields, or more usually the visual fields.
Any further steps in this pathway is beyond the scope of this book. Rest assured that, many further levels and centres exist, focusing on particular specific tasks, like for example colour, orientations, spatial frequencies, emotions etc.
Cortical Processing - Visual Perception
Equipped with a firmer understanding of some of the more important concepts of the signal processing in the visual system, comprehension or perception of the processed sensory information is the last important piece in the puzzle. Visual perception is the process of translating information received by the eyes into an understanding of the external state of things. It makes us aware of the world around us and allows us to understand it better. Based on visual perception we learn patterns which we then apply later in life and we make decisions based on this and the obtained information. In other words, our survival depends on perception.
The field of Visual Perception has been divided into different subfields, due to the fact that processing is too complex and requires of different specialized mechanisms to perceive what is seen. These subfields include: Color Perception, Motion Perception, Depth Perception, and Face Recognition, etc. In the following we will describe important aspects of Motion Perception in primates.
Motion Perception is the process of inferring speed and direction of moving objects. Area V5 in humans and area MT (Middle Temporal) in primates are responsible for cortical perception of Motion. Area V5 is part of the extrastriate cortex, which is the region in the occipital region of the brain next to the primary visual cortex. The function of Area V5 is to detect speed and direction of visual stimuli, and integrate local visual motion signals into global motion. Area V1 or Primary Visual cortex is located in the occipital lobe of the brain in both hemispheres. It processes the first stage of cortical processing of visual information. This area contains a complete map of the visual field covered by the eyes. The difference between area V5 and area V1 (Primary Visual Cortex) is that area V5 can integrate motion of local signals or individual parts of an object into a global motion of an entire object. Area V1, on the other hand, responds to local motion that occurs within the receptive field. The estimates from these many neurons are integrated in Area V5.
Movement is defined as changes in retinal illumination over space and time. Motion signals are classified into First order motions and Second order motions. These motion types are briefly described in the following paragraphs.
First-order motion perception refers to the motion perceived when two or more visual stimuli switch on and off over time and produce different motion perceptions. First order motion is also termed "apparent motion,” and it is used in television and film. An example of this is the "Beta movement", which is an illusion in which fixed images seem to move, even though they do not move in reality. These images give the appearance of motion, because they change and move faster than what the eye can detect. This optical illusion happens because the human optic nerve responds to changes of light at ten cycles per second, so any change faster than this rate will be registered as a continuum motion, and not as separate images.
Second order motion refers to the motion that occurs when a moving contour is defined by contrast, texture, flicker or some other quality that does not result in an increase in luminance or motion energy of the image. Evidence suggests that early processing of First order motion and Second order motion is carried out by separate pathways. Second order mechanisms have poorer temporal resolution and are low-pass in terms of the range of spatial frequencies to which they respond. Second-order motion produces a weaker motion aftereffect. First and second-order signals are combined in are V5.
In this chapter, we will analyze the concepts of Motion Perception and Motion Analysis, and explain the reason why these terms should not be used interchangeably. We will analyze the mechanisms by which motion is perceived such as Motion Sensors and Feature Tracking. There exist three main theoretical models that attempt to describe the function of neuronal sensors of motion. Experimental tests have been conducted to confirm whether these models are accurate. Unfortunately, the results of these tests are inconclusive, and it can be said that no single one of these models describes the functioning of Motion Sensors entirely. However, each of these models simulates certain features of Motion Sensors. Some properties of these sensors are described. Finally, this chapter shows some motion illusions, which demonstrate that our sense of motion can be mislead by static external factors that stimulate motion sensors in the same way as motion.
Motion Analysis and Motion Perception
The concepts of Motion Analysis and Motion Perception are often confused as interchangeable. Motion Perception and Motion Analysis are important to each other, but they are not the same.
Motion Analysis refers to the mechanisms in which motion signals are processed. In a similar way in which Motion Perception does not necessarily depend on signals generated by motion of images in the retina, Motion Analysis may or may not lead to motion perception. An example of this phenomenon is Vection, which occurs when a person perceives that she is moving when she is stationary, but the object that she observes is moving. Vection shows that motion of an object can be analyzed, even though it is not perceived as motion coming from the object. This definition of Motion analysis suggests that motion is a fundamental image property. In the visual field, it is analyzed at every point. The results from this analysis are used to derive perceptual information.
Motion Perception refers to the process of acquiring perceptual knowledge about motion of objects and surfaces in an image. Motion is perceived either by delicate local sensors in the retina or by feature tracking. Local motion sensors are specialized neurons sensitive to motion, and analogous to specialized sensors for color. Feature tracking is an indirect way to perceive motion, and it consists of inferring motion from changes in retinal position of objects over time. It is also referred to as third order motion analysis. Feature tracking works by focusing attention to a particular object and observing how its position has changed over time.
Detection of motion is the first stage of visual processing, and it happens thanks to specialized neural processes, which respond to information regarding local changes of intensity of images over time. Motion is sensed independently of other image properties at all locations in the image. It has been proven that motion sensors exist, and they operate locally at all points in the image. Motion sensors are dedicated neuronal sensors located in the retina that are capable of detecting a motion produced by two brief and small light flashes that are so close together that they could not be detected by feature tracking. There exist three main models that attempt to describe the way that these specialized sensors work. These models are independent of one another, and they try to model specific characteristics of Motion Perception. Although there is not sufficient evidence to support that any of these models represent the way the visual system (motion sensors particularly) perceives motion, they still correctly model certain functions of these sensors.
The Reichardt Detector
The Reichardt Detector is used to model how motion sensors respond to First order motion signals. When an objects moves from point A in the visual field to point B, two signals are generated: one before the movement began and another one after the movement has completed. This model perceives this motion by detecting changes in luminance at one point on the retina and correlating it with a change in luminance at another point nearby after a short delay. The Reichardt Detector operates based on the principle of correlation (statistical relation that involves dependency). It interprets a motion signal by spatiotemporal correlation of luminance signals at neighboring points. It uses the fact that two receptive fields at different points on the trajectory of a moving object receive a time shifted version of the same signal – a luminance pattern moves along an axis and the signal at one point in the axis is a time shifted version of a previous signal in the axis. The Reichardt Detector model has two spatially separate neighboring detectors. The output signals of the detectors are multiplied (correlated) in the following way: a signal multiplied by a second signal that is the time-shifted version of the original. The same procedure is repeated but in the reverse direction of motion (the signal that was time-shifted becomes the first signal and vice versa). Then, the difference between these two multiplications is taken, and the outcome gives the speed of motion. The response of the detector depends upon the stimulus’ phase, contrast and speed. Many detectors tuned at different speeds are necessary to encode the true speed of the pattern. The most compelling experimental evidence for this kind of detector comes from studies of direction discrimination of barely visible targets.
Motion Energy Filter is a model of Motion Sensors based on the principle of phase invariant filters. This model builds spatio-temporal filters oriented in space-time to match the structure of moving patterns. It consists of separable filters, for which spatial profiles remain the same shape over time but are scaled by the value of the temporal filters. Motion Energy Filters match the structure of moving patterns by adding together separable filters. For each direction of motion, two space-time filters are generated: one, which is symmetric (bar-like), and one which is asymmetric (edge-like). The sum of the squares of these filters is called the motion energy. The difference in the signal for the two directions is called the opponent energy. This result is then divided by the squared output of another filter, which is tuned to static contrast. This division is performed to take into account the effect of contrast in the motion. Motion Energy Filters can model a number of motion phenomenon, but it produces a phase independent measurement, which increases with speed but does not give a reliable value of speed.
This model of Motion sensors was originally developed in the field of computer vision, and it is based on the principle that the ratio of the temporal derivative of image brightness to the spatial derivative of image brightness gives the speed of motion. It is important to note that at the peaks and troughs of the image, this model will not compute an adequate answer, because the derivative in the denominator would be zero. In order to solve this problem, the first-order and higher-order spatial derivatives with respect to space and time can also be analyzed. Spatiotemporal Gradients is a good model for determining the speed of motion at all points in the image.
Motion Sensors are Orientation-Selective
One of the properties of Motion Sensors is orientation-selectivity, which constrains motion analysis to a single dimension. Motion sensors can only record motion in one dimension along an axis orthogonal to the sensor’s preferred orientation. A stimulus that contains features of a single orientation can only be seen to move in a direction orthogonal to the stimulus’ orientation. One-dimensional motion signals give ambiguous information about the motion of two-dimensional objects. A second stage of motion analysis is necessary in order to resolve the true direction of motion of a 2-D object or pattern. 1-D motion signals from sensors tuned to different orientations are combined to produce an unambiguous 2-D motion signal. Analysis of 2-D motion depends on signals from local broadly oriented sensors as well as on signals from narrowly oriented sensors.
Another way in which we perceive motion is through Feature Tracking. Feature Tracking consists of analyzing whether or not the local features of an object have changed positions, and inferring movement from this change. In this section, some features about Feature trackers are mentioned.
Feature trackers fail when a moving stimulus occurs very rapidly. Feature trackers have the advantage over Motion sensors that they can perceive movement of an object even if the movement is separated by intermittent blank intervals. They can also separate these two stages (movements and blank intervals). Motion sensors, on the other hand, would just integrate the blanks with the moving stimulus and see a continuous movement. Feature trackers operate on the locations of identified features. For that reason, they have a minimum distance threshold that matches the precision with which locations of features can be discriminated. Feature trackers do not show motion aftereffects, which are visual illusions that are caused as a result of visual adaptation. Motion aftereffects occur when, after observing a moving stimulus, a stationary object appears to be moving in the opposite direction of the previously observed moving stimulus. It is impossible for this mechanism to monitor multiple motions in different parts of the visual field and at the same time. On the other hand, multiple motions are not a problem for motion sensors, because they operate in parallel across the entire visual field.
Experiments have been conducted using the information above to reach interesting conclusions about feature trackers. Experiments with brief stimuli have shown that color patterns and contrast patterns at high contrasts are not perceived by feature trackers but by motion sensors. Experiments with blank intervals have confirmed that feature tracking can occur with blank intervals in the display. It is only at high contrast that motion sensors perceive the motion of chromatic stimuli and contrast patterns. At low contrasts feature trackers analyze the motion of both chromatic patterns and contrast envelopes and at high contrasts motion sensors analyze contrast envelopes. Experiments in which subjects make multiple motion judgments suggest that feature tracking is a process that occurs under conscious control and that it is the only way we have to analyze the motion of contrast envelopes in low-contrast displays. These results are consistent with the view that the motion of contrast envelopes and color patterns depends on feature tracking except when colors are well above threshold or mean contrast is high. The main conclusion of these experiments is that it is probably feature tracking that allows perception of contrast envelopes and color patterns.
As a consequence of the process in which Motion detection works, some static images might seem to us like they are moving. These images give an insight into the assumptions that the visual system makes, and are called visual illusions.
A famous Motion Illusion related to first order motion signals is the Phi phenomenon, which is an optical illusion that makes us perceive movement instead of a sequence of images. This motion illusion allows us to watch movies as a continuum and not as separate images. The phi phenomenon allows a group of frozen images that are changed at a constant speed to be seen as a constant movement. The Phi phenomenon should not be confused with the Beta Movement, because the former is an apparent movement caused by luminous impulses in a sequence, while the later one is an apparent movement caused by luminous stationary impulses.
Motion Illusions happen when Motion Perception, Motion Analysis and the interpretation of these signals are misleading, and our visual system creates illusions about motion. These illusions can be classified according to which process allows them to happen. Illusions are classified as illusions related to motion sensing, 2D integration, and 3D interpretation
The most popular illusions concerning motion sensing are four-stroke motion, RDKs and second order motion signals illusions. The most popular motion illusions concerning 2D integration are Motion Capture, Plaid Motion and Direct Repulsion. Similarly, the ones concerning 3D interpretation are Transformational Motion, Kinetic Depth, Shadow Motion, Biological Motion, Stereokinetic motion, Implicit Figure Motion and 2 Stroke Motion. There are far more Motion Illusions, and they all show something interesting regarding human Motion Detection, Perception and Analysis mechanisms. For more information, visit the following link: http://www.lifesci.sussex.ac.uk/home/George_Mather/Motion/
Although we still do not understand most of the specifics regarding Motion Perception, understanding the mechanisms by which motion is perceived as well as motion illusion can give the reader a good overview of the state of the art in the subject. Some of the open problems regarding Motion Perception are the mechanisms of formation of 3D images in global motion and the Aperture Problem.
Global motion signals from the retina are integrated to arrive at a 2 dimensional global motion signal; however, it is unclear how 3D global motion is formed. The Aperture Problem occurs because each receptive field in the visual system covers only a small piece of the visual world, which leads to ambiguities in perception. The aperture problem refers to the problem of a moving contour that, when observed locally, is consistent with different possibilities of motion. This ambiguity is geometric in origin - motion parallel to the contour cannot be detected, as changes to this component of the motion do not change the images observed through the aperture. The only component that can be measured is the velocity orthogonal to the contour orientation; for that reason, the velocity of the movement could be anything from the family of motions along a line in velocity space. This aperture problem is not only observed in straight contours, but also in smoothly curved ones, since they are approximately straight when observed locally. Although the mechanisms to solve the Aperture Problem are still unknown, there exist some hypothesis on how it could be solved. For example, it could be possible to resolve this problem by combining information across space or from different contours of the same object.
In this chapter, we introduced Motion Perception and the mechanisms by which our visual system detects motion. Motion Illusions showed how Motion signals can be misleading, and consequently lead to incorrect conclusions about motion. It is important to remember that Motion Perception and Motion Analysis are not the same. Motion Sensors and Feature trackers complement each other to make the visual system perceive motion.
Motion Perception is complex, and it is still an open area of research. This chapter describes models about the way that Motion Sensors function, and hypotheses about Feature trackers characteristics; however, more experiments are necessary to learn about the characteristics of these mechanisms and be able to construct models that resemble the actual processes of the visual system more accurately.
The variety of mechanisms of motion analysis and motion perception described in this chapter, as well as the sophistication of the artificial models designed to describe them demonstrate that there is much complexity in the way in which the cortex processes signals from the outside environment. Thousands of specialized neurons integrate and interpret pieces of local signals to form global images of moving objects in our brain. Understanding that so many actors and processes in our bodies must work in concert to perceive motion makes our ability to it all the more remarkable that we as humans are able to do it with such ease.
Humans (together with primates like monkeys and gorillas) have the best color perception among mammals  . Hence, it is not a coincidence that color plays an important role in a wide variety of aspects. For example, color is useful for discriminating and differentiating objects, surfaces, natural scenery, and even faces ,. Color is also an important tool for nonverbal communication, including that of emotion .
For many decades, it has been a challenge to find the links between the physical properties of color and its perceptual qualities. Usually, these are studied under two different approaches: the behavioral response caused by color (also called psychophysics) and the actual physiological response caused by it .
Here we will only focus on the latter. The study of the physiological basis of color vision, about which practically nothing was known before the second half of the twentieth century, has advanced slowly and steadily since 1950. Important progress has been made in many areas, especially at the receptor level. Thanks to molecular biology methods, it has been possible to reveal previously unknown details concerning the genetic basis for the cone pigments. Furthermore, more and more cortical regions have been shown to be influenced by visual stimuli, although the correlation of color perception with wavelength-dependent physiology activity beyond the receptors is not so easy to discern .
In this chapter, we aim to explain the basics of the different processes of color perception along the visual path, from the retina in the eye to the visual cortex in the brain. For anatomical details, please refer to Sec. "Anatomy of the Visual System" of this Wikibook.
Color Perception at the Retina
All colors that can be discriminated by humans can be produced by the mixture of just three primary (basic) colors. Inspired by this idea of color mixing, it has been proposed that color is subserved by three classes of sensors, each having a maximal sensitivity to a different part of the visible spectrum . It was first explicitly proposed in 1853 that there are three degrees of freedom in normal color matching . This was later confirmed in 1886  (with remarkably close results to recent studies , ).
These proposed color sensors are actually the so called cones (Note: In this chapter, we will only deal with cones. Rods contribute to vision only at low light levels. Although they are known to have an effect on color perception, their influence is very small and can be ignored here.) . Cones are of the two types of photoreceptor cells found in the retina, with a significant concentration of them in the fovea. The Table below lists the three types of cone cells. These are distinguished by different types of rhodopsin pigment. Their corresponding absorption curves are shown in the Figure below.
|Name||Higher sensitivity to color||Absorption curve peak [nm]|
|S, SWS, B||Blue||420|
|M, MWS, G||Green||530|
|L, LWS, R||Red||560|
Although no consensus has been reached for naming the different cone types, the most widely utilized designations refer either to their action spectra peak or to the color to which they are sensitive themselves (red, green, blue). In this text, we will use the S-M-L designation (for short, medium, and long wavelength), since these names are more appropriately descriptive. The blue-green-red nomenclature is somewhat misleading, since all types of cones are sensitive to a large range of wavelengths.
An important feature about the three cone types is their relative distribution in the retina. It turns out that the S-cones present a relatively low concentration through the retina, being completely absent in the most central area of the fovea. Actually, they are too widely spaced to play an important role in spatial vision, although they are capable of mediating weak border perception . The fovea is dominated by L- and M-cones. The proportion of the two latter is usually measured as a ratio. Different values have been reported for the L/M ratio, ranging from 0.67  up to 2 , the latter being the most accepted. Why L-cones almost always outnumber the M-cones remains unclear. Surprisingly, the relative cone ratio has almost no significant impact on color vision. This clearly shows that the brain is plastic, capable of making sense out of whatever cone signals it receives , .
It is also important to note the overlapping of the L- and M-cone absorption spectra. While the S-cone absorption spectrum is clearly separated, the L- and M-cone peaks are only about 30 nm apart, their spectral curves significantly overlapping as well. This results in a high correlation in the photon catches of these two cone classes. This is explained by the fact that in order to achieve the highest possible acuity at the center of the fovea, the visual system treats L- and M-cones equally, not taking into account their absorption spectra. Therefore, any kind of difference leads to a deterioration of the luminance signal . In other words, the small separation between L- and M-cone spectra might be interpreted as a compromise between the needs for high-contrast color vision and high acuity luminance vision. This is congruent with the lack of S-cones in the central part of the fovea, where visual acuity is highest. Furthermore, the close spacing of L- and M-cone absorption spectra might also be explained by their genetic origin. Both cone types are assumed to have evolved "recently" (about 35 million years ago) from a common ancestor, while the S-cones presumably split off from the ancestral receptor much earlier.
The spectral absorption functions of the three different types of cone cells are the hallmark of human color vision. This theory solved a long-known problem: although we can see millions of different colors (humans can distinguish between 7 to 10 million different colors, our retinas simply do not have enough space to accommodate an individual detector for every color at every retinal location.
From the Retina to the Brain
The signals that are transmitted from the retina to higher levels are not simple point-wise representations of the receptor signals, but rather consist of sophisticated combinations of the receptor signals. The objective of this section is to provide a brief of the paths that some of this information takes.
Once the optical image on the retina is transduced into chemical and electrical signals in the photoreceptors, the amplitude-modulated signals are converted into frequency-modulated representations at the ganglion-cell and higher levels. In these neural cells, the magnitude of the signal is represented in terms of the number of spikes of voltage per second fired by the cell rather than by the voltage difference across the cell membrane. In order to explain and represent the physiological properties of these cells, we will find the concept of receptive fields very useful.
A receptive field is a graphical representation of the area in the visual field to which a given cell responds. Additionally, the nature of the response is typically indicated for various regions in the receptive field. For example, we can consider the receptive field of a photoreceptor as a small circular area representing the size and location of that particular receptor's sensitivity in the visual field. The Figure below shows exemplary receptive fields for ganglion cells, typically in a center-surround antagonism. The left receptive field in the figure illustrates a positive central response (know as on-center). This kind of response is usually generated by a positive input from a single cone surrounded by a negative response generated from several neighboring cones. Therefore, the response of this ganglion cell would be made up of inputs from various cones with both positive and negative signs. In this way, the cell not only responds to points of light, but serves as an edge (or more correctly, a spot) detector. In analogy to the computer vision terminology, we can think of the ganglion cell responses as the output of a convolution with an edge-detector kernel. The right receptive field of in the figure illustrates a negative central response (know as off-center), which is equally likely. Usually, on-center and off-center cells will occur at the same spatial location, fed by the same photoreceptors, resulting in an enhanced dynamic range.
The lower Figure shows that in addition to spatial antagonism, ganglion cells can also have spectral opponency. For instance, the left part of the lower figure illustrates a red-green opponent response with the center fed by positive input from an L-cone and the surrounding fed by a negative input from M-cones. On the other hand, the right part of the lower figure illustrates the off-center version of this cell. Hence, before the visual information has even left the retina, processing has already occurred, with a profound effect on color appearance. There are other types and varieties of ganglion cell responses, but they all share these basic concepts.
On their way to the primary visual cortex, ganglion cell axons gather to form the optic nerve, which projects to the lateral geniculate nucleus (LGN) in the thalamus. Coding in the optic nerve is highly efficient, keeping the number of nerve fibers to a minimum (limited by the size of the optic nerve) and thereby also the size of the retinal blind spot as small as possible (approximately 5° wide by 7° high). Furthermore, the presented ganglion cells would have no response to uniform illumination, since the positive and negative areas are balanced. In other words, the transmitted signals are uncorrelated. For example, information from neighboring parts of natural scenes are highly correlated spatially and therefore highly predictable . Lateral inhibition between neighboring retinal ganglion cells minimizes this spatial correlation, therefore improving efficiency. We can see this as a process of image compression carried out in the retina.
Given the overlapping of the L- and M-cone absorption spectra, their signals are also highly correlated. In this case, coding efficiency is improved by combining the cone signals in order to minimize said correlation. We can understand this more easily using Principal Component Analysis (PCA). PCA is a statistical method used to reduce the dimensionality of a given set of variables by transforming the original variables, to a set of new variables, the principal components (PCs). The first PC accounts for a maximal amount of total variance in the original variables, the second PC accounts for a maximal amount of variance that was not accounted for by the first component, and so on. In addition, PCs are linearly-independent and orthogonal to each other in the parameter space. PCA's main advantage is that only a few of the strongest PCs are enough to cover the vast majority of system variability . This scheme has been used with the cone absorption functions  and even with the naturally occurring spectra,. The PCs that were found in the space of cone excitations produced by natural objects are 1) a luminance axis where the L- and M-cone signals are added (L+M), 2) the difference of the L- and M-cone signals (L-M), and 3) a color axis where the S-cone signal is differenced with the sum of the L- and M-cone signals (S-(L+M)). These channels, derived from a mathematical/computational approach, coincide with the three retino-geniculate channels discovered in electrophysiological experiments ,. Using these mechanisms, visual redundant information is eliminated in the retina.
There are three channels of information that actually communicate this information from the retina through the ganglion cells to the LGN. They are different not only on their chromatic properties, but also in their anatomical substrate. These channels pose important limitations for basic color tasks, such as detection and discrimination.
In the first channel, the output of L- and M-cones is transmitted synergistically to diffuse bipolar cells and then to cells in the magnocellular layers (M-) of the LGN (not to be confused with the M-cones of the retina). The receptive fields of the M-cells are composed of a center and a surround, which are spatially antagonist. M-cells have high-contrast sensitivity for luminance stimuli, but they show no response at some combination of L-M opponent inputs. However, because the null points of different M-cells vary slightly, the population response is never really zero. This property is actually passed on to cortical areas with predominant M-cell inputs.
The parvocellular pathway (P-) originates with the individual outputs from L- or M-cone to midget bipolar cells. These provide input to retinal P-cells. In the fovea, the receptive field centers of P-cells are formed by single L- or M-cones. The structure of the P-cell receptive field surround is still debated. However, the most accepted theory states that the surround consists of a specific cone type, resulting in a spatially opponent receptive field for luminance stimuli. Parvocellular layers contribute with about 80 % of the total projections from the retina to the LGN.
Finally, the recently discovered koniocellular pathway (K-) carries mostly signals from S-cones. Groups of this type of cones project to special bipolar cells, which in turn provide input to specific small ganglion cells. These are usually not spatially opponent. The axons of the small ganglion cells project to thin layers of the LGN (adjacent to parvocellular layers).
While the ganglion cells do terminate at the LGN (making synapses with LGN cells), there appears to be a one-to-one correspondence between ganglion cells and LGN cells. The LGN appears to act as a relay station for the signals. However, it probably serves some visual function, since there are neural projections from the cortex back to the LGN that could serve as some type of switching or adaptation feedback mechanism. The axons of LGN cells project to visual area one (V1) in the visual cortex in the occipital lobe.
Color Perception at the Brain
In the cortex, the projections from the magno-, parvo-, and koniocellular pathways end in different layers of the primary visual cortex. The magnocellular fibers innervate principally layer 4Cα and layer 6. Parvocellular neurons project mostly to 4Cβ, and layers 4A and 6. Koniocellular neurons terminate in the cytochrome oxidase (CO-) rich blobs in layers 1, 2, and 3.
Once in the visual cortex, the encoding of visual information becomes significantly more complex. In the same way the outputs of various photoreceptors are combined and compared to produce ganglion cell responses, the outputs of various LGN cells are compared and combined to produce cortical responses. As the signals advance further up in the cortical processing chain, this process repeats itself with a rapidly increasing level of complexity to the point that receptive fields begin to lose meaning. However, some functions and processes have been identified and studied in specific regions of the visual cortex.
In the V1 region (striate cortex), double opponent neurons - neurons that have their receptive fields both chromatically and spatially opposite with respect to the on/off regions of a single receptive field - compare color signals across the visual space . They constitute between 5 to 10% of the cells in V1. Their coarse size and small percentage matches the poor spatial resolution of color vision . Furthermore, they are not sensitive to the direction of moving stimuli (unlike some other V1 neurons) and, hence, unlikely to contribute to motion perception. However, given their specialized receptive field structure, these kind of cells are the neural basis for color contrast effects, as well as an efficient mean to encode color itself,. Other V1 cells respond to other types of stimuli, such as oriented edges, various spatial and temporal frequencies, particular spatial locations, and combinations of these features, among others. Additionally, we can find cells that linearly combine inputs from LGN cells as well as cells that perform nonlinear combination. These responses are needed to support advanced visual capabilities, such as color itself.
There is substantially less information on the chromatic properties of single neurons in V2 as compared to V1. On a first glance, it seems that there are no major differences of color coding in V1 and V2. One exception to this is the emergence of a new class of color-complex cell. Therefore, it has been suggested that V2 region is involved in the elaboration of hue. However, this is still very controversial and has not been confirmed.
Following the modular concept developed after the discovery of functional ocular dominance in V1, and considering the anatomical segregation between the P-, M-, and K-pathways (described in Sec. 3), it was suggested that a specialized system within the visual cortex devoted to the analysis of color information should exist. V4 is the region that has historically attracted the most attention as the possible "color area" of the brain. This is because of an influential study that claimed that V4 contained 100 % of hue-selective cells. However, this claim has been disputed by a number of subsequent studies, some even reporting that only 16 % of V4 neurons show hue tuning. Currently, the most accepted concept is that V4 contributes not only to color, but to shape perception, visual attention, and stereopsis as well. Furthermore, recent studies have focused on other brain regions trying to find the "color area" of the brain, such as TEO and PITd. The relationship of these regions to each other is still debated. To reconcile the discussion, some use the term posterior inferior temporal (PIT) cortex to denote the region that includes V4, TEO, and PITd.
If the cortical response in V1, V2, and V4 cells is already a very complicated task, the level of complexity of complex visual responses in a network of approximately 30 visual zones is humongous. Figure 4 shows a small portion of the connectivity of the different cortical areas (not cells) that have been identified.
At this stage, it becomes exceedingly difficult to explain the function of singles cortical cells in simple terms. As a matter of fact, the function of a single cell might not have meaning since the representation of various perceptions must be distributed across collections of cells throughout the cortex.
Color Vision Adaptation Mechanisms
Although researchers have been trying to explain the processing of color signals in the human visual system, it is important to understand that color perception is not a fixed process. Actually, there are a variety of dynamic mechanisms that serve to optimize the visual response according to the viewing environment. Of particular relevance to color perception are the mechanisms of dark, light, and chromatic adaptation.
Dark adaptation refers to the change in visual sensitivity that occurs when the level of illumination is decreased. The visual system response to reduced illumination is to become more sensitive, increasing its capacity to produce a meaningful visual response even when the light conditions are suboptimal.
Figure 5 shows the recovery of visual sensitivity after transition from an extremely high illumination level to complete darkness. First, the cones become gradually more sensitive, until the curve levels off after a couple of minutes. Then, after approximately 10 minutes have passed, visual sensitivity is roughly constant. At that point, the rod system, with a longer recovery time, has recovered enough sensitivity to outperform the cones and therefore recover control the overall sensitivity. Rod sensitivity gradually improves as well, until it becomes asymptotic after about 30 minutes. In other words, cones are responsible for the sensitivity recovery for the first 10 minutes. Afterwards, rods outperform the cones and gain full sensitivity after approximately 30 minutes.
This is only one of several neural mechanisms produced in order to adapt to the dark lightning conditions as good as possible. Some other neural mechanisms include the well-known pupil reflex, depletion and regeneration of photopigment, gain control in retinal cells and other higher-level mechanisms, and cognitive interpretation, among others.
Light adaptation is essentially the inverse process of dark adaptation. As a matter of fact, the underlying physiological mechanisms are the same for both processes. However, it is important to consider it separately since its visual properties differ.
Light adaptation occurs when the level of illumination is increased. Therefore, the visual system must become less sensitive in order to produce useful perceptions, given the fact that there is significantly more visible light available. The visual system has a limited output dynamic range available for the signals that produce our perceptions. However, the real world has illumination levels covering at least 10 orders of magnitude more. Fortunately, we rarely need to view the entire range of illumination levels at the same time.
At high light levels, adaptation is achieved by photopigment bleaching. This scales photon capture in the receptors and protects the cone response from saturating at bright backgrounds. The mechanisms of light adaptation occur primarily within the retina. As a matter of fact, gain changes are largely cone-specific and adaptation pools signals over areas no larger than the diameter of individual cones,. This points to a localization of light adaptation that may be as early as the receptors. However, there appears to be more than one site of sensitivity scaling. Some of the gain changes are extremely rapid, while others take seconds or even minutes to stabilize. Usually, light adaptation takes around 5 minutes (six times faster than dark adaptation). This might point to the influence of post-receptive sites.
Figure 6 shows examples of light adaptation . If we would use a single response function to map the large range of intensities into the visual system's output, then we would only have a very small range at our disposal for a given scene. It is clear that with such a response function, the perceived contrast of any given scene would be limited and visual sensitivity to changes would be severely degraded due to signal-to-noise issues. This case is shown by the dashed line. On the other hand, solid lines represent families of visual responses. These curves map the useful illumination range in any given scene into the full dynamic range of the visual output, thus resulting in the best possible visual perception for each situation. Light adaptation can be thought of as the process of sliding the visual response curve along the illumination level axis until the optimum level for the given viewing conditions is reached.
The general concept of chromatic adaptation consists in the variation of the height of the three cone spectral responsivity curves. This adjustment arises because light adaptation occurs independently within each class of cone. A specific formulation of this hypothesis is known as the von Kries adaptation. This hypothesis states that the adaptation response takes place in each of the three cone types separately and is equivalent to multiplying their fixed spectral sensitivities by a scaling constant. If the scaling weights (also known as von Kries coefficients) are inversely proportional to the absorption of light by each cone type (i.e. a lower absorption will require a larger coefficient), then von Kries scaling maintains a constant mean response within each cone class. This provides a simple yet powerful mechanism for maintaining the perceived color of objects despite changes in illumination. Under a number of different conditions, von Kries scaling provides a good account of the effects of light adaptation on color sensitivity and appearance,.
The easiest way to picture chromatic adaptation is by examining a white object under different types of illumination. For example, let's consider examining a piece of paper under daylight, fluorescent, and incandescent illumination. Daylight contains relatively far more short-wavelength energy than fluorescent light, and incandescent illumination contains relatively far more long-wavelength energy than fluorescent light. However, in spite of the different illumination conditions, the paper approximately retains its white appearance under all three light sources. This is because the S-cone system becomes relatively less sensitive under daylight (in order to compensate for the additional short-wavelength energy) and the L-cone system becomes relatively less sensitive under incandescent illumination (in order to compensate for the additional long-wavelength energy).
Since the late 20th century, restoring vision to blind people by means of artificial eye prostheses has been the goal of numerous research groups and some private companies around the world. Similar to cochlear implants, the key concept is to stimulate the visual nervous system with electric pulses, bypassing the damaged or degenerated photoreceptors on the human retina. In this chapter we will describe the basic functionality of a retinal implant, as well as the different approaches that are currently being investigated and developed. The two most common approaches to retinal implants are called “epiretinal” and “subretinal” implants, corresponding to eye prostheses located either on top or behind the retina respectively. We will not cover any non-retina related approaches to restoring vision, such as the BrainPort Vision System that aims at stimulating the tongue from visual input, cuff electrodes around the optic nerve, or stimulation implants in the primary visual cortex.
Retinal Structure and Functionality
Figure 1 depicts the schematic nervous structure of the human retina. We can differentiate between three layers of cells. The first, located furthest away from the eye lens, consists of the photoreceptors (rods and cones) whose purpose is to transduce incoming light into electrical signals that are then further propagated to the intermediate layer, which is mainly composed of bipolar cells. These bipolar cells, which are connected to photoreceptors as well as cell types such as horizontal cells and amacrine cells, passd on the electrical signal to the retinal ganglion cells (RGC). For a detailed description on the functionality of bipolar cells, specifically with respect to their subdivision into ON- and OFF-bipolar cells, refer to chapter on Visual Systems. The uppermost layer, consisting of RGCs, collects the electric pulses from the horizontal cells and passes them on to the thalamus via the optic nerve. From there, signals are propagated to the primary visual cortex. There are some key aspects worth mentioning about the signal processing within the human retina. First, while bipolar cells, as well as horizontal and amacrine, generate graded potentials, the RGCs generate action potentials instead. Further, the density of each cell type is not uniform across the retina. While there is an extremely high density of rods and cones in the area of the fovea, with in addition only very few photoreceptors connected to RGCs via the intermediate layer, a far lower density of photoreceptors is found in the peripheral areas of the retina with many photoreceptors connected to a single RGC. The latter also has direct implications on the receptive field of a RGC, as it tends to increase rapidly towards the outer regions of the retina, simply because of the lower photoreceptor density and the increased number of photoreceptors being connected to the same RGC.
Implant Use Case
Damage to the photoreceptor layer in the human can be caused by Retinitis pigmentosa, age-related macular degeneration and other diseases, eventually resulting in the affected person to become blind. However, the rest of the visual nervous system, both inside the retina as well as the visual nervous pathway in the brain, remains intact for several years after onset of blindness  . This allows artificial stimulation of the remaining, still properly functioning retina cells, through electrodes, to restore visual information for the human patient. Thereby a retina prosthesis can be implanted either behind the retina, and is then referred to as subretinal implant. This brings the electrodes closest to the damaged photoreceptors and the still properly functioning bipolar cells, which are the real stimulation target here. (If the stimulation electrodes penetrate the choroid, which contains the blood supply of the retina, the implants are sometimes called "suprachoroidal" implants.) Or the implant may be put on top of the retina, closest to the Ganglion cell layer, aiming at stimulation of the RGCs instead. These implants are referred to as epiretinal implants. Both approaches are currently being investigated by several research groups. They both have significant advantages as well as drawbacks. Before we treat them in more detail separately, we describe some key challenges that need consideration in both cases.
A big challenge for retinal implants comes from the extremely high spatial density of nervous cells in the human retina. There are roughly 125 million photoreceptors (rods and cones) and 1.5 million ganglion cells in the human retina, as opposed to approximately only 15000 hair cells in the human cochlea  . In the fovea, where the highest visual acuity is achieved, as many as 150000 cones are located within one square millimeter. While there are much fewer RGCs in total compared to photoreceptors, their density in the foveal area is close to the density of cones , imposing a tremendous challenge in addressing the nervous cells in high enough spatial resolution with artificial electrodes. Virtually all current scientific experiments with retinal implants use micro-electrode arrays (MEAs) to stimulate the retina cells. High resolution MEAs achieve an inter-electrode spacing of roughly 50 micrometers, resulting in an electrode density of 400 electrodes per square millimeter. Therefore, a one to one association between electrodes and photoreceptors or RGCs respectively is impossible in the foveal area with conventional electrode technology. However, spatial density of both photoreceptors as well as RGCs decrease s quickly towards the outer regions of the retina, making one-to-one stimulation between electrodes and peripheral nerve cells more feasible . Another challenge is operating the electrodes within safe limits. Imposing charge densities above 0.1 mC/cm2 may damage the nervous tissue . Generally, the further a cell is away from the stimulating electrode, the larger is the current amplitude required for stimulation of the cell. Furthermore, the lower the stimulation threshold, the smaller the electrode may be designed and the compacter the electrodes may be placed on the MEAs, thereby enhancing the spatial stimulation resolution. Stimulation threshold is defined as the minimal stimulation strength necessary to trigger a nervous response in at least 50% of the stimulation pulses. For these reasons, a primary goal in designing retinal implants is to use as low a stimulation current as possible while still guaranteeing a reliable stimulation (i.e. generation of an action potential in the case of RGCs) of the target cell. This can either be achieved by placing the electrode as close as possible to the area of the target cell that reacts most sensitive to an applied electric field pulse or by making the cell projections, i.e. dendrites and/or axons, grow on top the electrode, allowing a stimulation of the cell with very low currents even if the cell body is located far away. Further, an implant fixed to the retina automatically follows the movements of the eyeball. While this entails some significant benefits, it also means that any connection to the implant - for adjusting parameters, reading out data, or providing external power for the stimulation - requires a cable that moves with the implant. As we move our eyes approximately three times a second, this exposes the cable and involved connections to severe mechanical stress. For a device that should remain functioning for an entire life time without external intervention, this imposes a severe challenge on the materials and technologies involved.
As the name already suggest, subretinal implants are visual prosthesis located behind the retina. Therefore, the implant is located closest to the damaged photoreceptors, aiming at bypassing the rods and cones and stimulating the bipolar cells in the next nervous layer in the retina. The main advantage of this approach lies in relatively little visual signal processing that takes place between the photoreceptors and the bipolar cells that need to be imitated by the implant. That is, raw visual information, for example captured by a video camera, may be forwarded directly, or with only relatively rudimentary signal processing respectively, to the MEA stimulating the bipolar cells, rendering the procedure rather simple from a signal processing point of view. However, this approach has some severe disadvantages. The high spatial resolution of photoreceptors in the human retina imposes a big challenge in developing and designing a MEA with sufficiently high stimulation resolution and therefore low inter-electrode spacing. Furthermore, the stacking of the nervous layers in z-direction (with the x-y plane tangential to the retina curvature) adds another difficulty when it comes to placing the electrodes close to the bipolar cells. With the MAE located behind the retina, there is a significant spatial gap between the electrodes and the target cells that needs to be overcome. As mentioned above, an increased electrode to target cell distance forces the MAE to operate with higher currents, enlarging the electrode size, the number of cells within the stimulation range of a single electrode and the spatial separation between adjacent electrodes. All of this results in a decreased stimulation resolution as well as opposing the retina to the risk of tissue damage caused by too high charge densities. As shown below, one way to overcome large distances between electrodes and the target cells is to make the cells grow their projections over longer distances directly on top the electrode.
In late 2010, a German research group in collaboration with the private German company “Retina Implant AG”, published results from studies involving tests with subretinal implants in human subjects . A three by three millimeter microphotodiode array (MPDA) containing 1500 pixels, which each pixel consisting of an individual light-sensing photodiodes and an electrode, was implanted behind the retina of three patients suffering from blindness due to macular degeneration. The pixels were located approximately 70 micrometer apart from each other, yielding a spatial resolution of roughly 160 electrodes per square millimeter – or, as indicated by the authors of the paper, a visual cone angle of 15 arcmin for each electrode. It should be noted, that, in contrast to implants using external video cameras to generate visual input, each pixel of the MPDA itself contains a light-sensitive photodiode, autonomously generating the electric current from the light received through the eyeball for its own associated electrode. So each MPDA pixel corresponds in its full functionality to a photoreceptor cell. This has a major advantage: Since the MPDA is fixed behind the human retina, it automatically drags along when the eyeball is being moved. And since the MPDA itself receives the visual input to generate the electric currents for the stimulation electrodes, movements of the head or the eyeball are handled naturally and need no artificial processing. In one of the patients, the MPDA was placed directly beneath the macula, leading to superior results in experimental tests as opposed to the other two patients, whose MPDA was implanted further away from the center of the retina. The results achieved by the patient with the implant behind the macula were quite extraordinary. He was able to recognize letters (5-8cm large) and read words as well as distinguish black-white patterns with different orientations .
The experimental results with the MPDA implants have also drawn attention to another visual phenomenon, revealing an additional advantage of the MPDA approach over implants using external imaging devices: Subsequent stimulation of retinal cells quickly leads to decreased responses, suggesting that retinal neurons become inhibited after being stimulated repeatedly within a short period of time. This entails that a visual input projected onto a MEA fixed on or behind the retina will result in a sensed image that quickly fades away, even though the electric stimulation of the electrodes remains constant. This is due to the fixed electrodes on the retina stimulating the same cells on the retina all the time, rendering the cells less and less sensitive to a constant stimulus over time. However, the process is reversible, and the cells regain their initial sensitivity once the stimulus is absent again. So, how does an intact visionary system handle this effect? Why are healthy humans able to fix an object over time without it fading out? As mentioned in , the human eye actually continuously adjusts in small, unnoticeable eye movements, resulting in the same visual stimulus to be projected onto slightly different retinal spots over time, even as we tend to focus and fix the eye on some target object. This successfully circumvents the fading cell response phenomenon. With the implant serving both as photoreceptor and electrode stimulator, as it is the case with the MPDA, the natural small eye adjustments can be readily used to handle this effect in a straight forward way. Other implant approaches using external visual input (i.e. from video cameras) will suffer from their projected images fading away if stimulated continuously. Fast, artificial jittering of the camera images may not solve the problem as this external movement may not be in accordance with the eye movement and therefore, the visual cortex may interpret this simply as a wiggly or blurry scene instead of the desired steady long term projection of the fixed image. A further advantage of subretinal implants is the precise correlation between stimulated areas on the retina and perceived location of the stimulus in the visual field of the human subject. In contrast to RGCs, whose location on the retina may not directly correspond to the location of their individual receptive fields, the stimulation of a bipolar cell is perceived exactly at that point in the visual field that corresponds to the geometric location on the retina where that bipolar cell resides. A clear disadvantage of subretinal implants is the invasive surgical procedure involved.
Epiretinal implants are located on top of the retina and therefore closest to the retina ganglion cells (RGCs). For that reason, epiretinal implants aim at stimulating the RGCs directly, bypassing not only the damaged photoreceptors, but also any intermediate neural visual processing by the bipolar, horizontal and amacrine cells. This has some advantages: First of all, the surgical procedure for an epiretinal implant is far less critical than for a subretinal implant, since the prosthesis need not be implanted from behind the eye. Also, there are much fewer RGCs than photoreceptors or bipolar cells, allowing a more course grained stimulation with increased inter-electrode distance (at least in the peripheral regions of the retina), or an electrode density even superior to that of the actual RGC density, allowing for more flexibility and accuracy when stimulating the cells. A study on the epiretinal stimulation of peripheral parasol cells conducted on macaque retina provides quantitative details . Parasol cells are one type of RGCs forming the secondmost dense visual pathway in the retina. Their main purpose is to encode the movement of objects in the visual field, thus sensing motion. The experiments were performed in vitro by placing the macaque retina tissue on a 61 electrode MEA (60 micrometer inter-electrode spacing). 25 individual parasol cells were indentified and stimulated electronically while properties such as stimulation threshold and best stimulation location were analyzed. The threshold current was defined as the lowest current that triggered a spike on the target cell in 50% of the stimulus pulses (pulse duration: 50 milliseconds) and was determined by incrementally increasing the stimulation strength until sufficient spiking response was registered. Please note two aspects: First, parasol cells as RGCs exhibit action potential behavior, as opposed to bipolar cells which work with graded potentials. Second, the electrodes on the MAE were both used for the stimulation pulses as well as for recording the spiking response from the target cells. 25 parasol cells were located on the 61 electrode MAE with a electrode density significantly higher than the parasol cell density, effectively yielding multiple electrodes within the receptive fields of a single parasol cell. In addition to measuring the stimulation thresholds necessary to trigger a reliable cell response, also the location of best stimulation was determined. The location of best stimulation refers to the location of the stimulating electrode with respect to the target cell where the lowest stimulation threshold was achieved. Surprisingly, this was found out to not be on the cell soma, as one would expect, but roughly 13 micrometers further down the axon path. From there on, the experiments showed the expected quadratic increase in stimulation threshold currents with respect to increasing electrode to soma distance. The study results also showed that all stimulation thresholds were well below the safety limits (around 0.05mC/cm2, as opposed to 0.1mC/cm2 being a (low) safety limit) and that the cell response to a stimulation pulse was fast (0.2 ms latency on average) and precise (small variance on latency). Further, the superior electrode density over parasol cell density allowed a reliable addressing of individual cells by the stimulation of the appropriate electrode, while preventing neighboring cells from also evoking a spike.
Overview of Alternative Technical Approaches
In this section, we give a short overview over some alternative approaches and technologies currently being under research.
Classic MAEs contain electrodes made out of titanium nitride or indium tin oxide exposing the implant to severe issues with long-term biocompatibility . A promising alternative to metallic electrodes consists of carbon nanotubes (CNT) which combine a number of very advantageous properties. First, they are fully bio compatible since they are made from pure carbon. Second, their robustness makes them suited for long term implantation, a key property for visual prosthesis. Further, the good electric conductivity allows them to operate as electrodes. And finally, their very porous nature leads to extremely large contact surfaces, encouraging the neurons to grow on top the CNTs, thus improving the neuron to electrode contact and lowering the stimulation currents necessary to elicit a cell response. However, CNT electrodes have only emerged recently and at this point only few scientific results are available.
Wireless Implant Approaches
One of the main technical challenges with retinal implant relates to the cabling that connects the MEA with the external stimuli, the power supply as well as the control signals. The mechanical stress on the cabling affects its long term stability and durability, imposing a big challenge on the materials used. Wireless technologies could be a way to circumvent any cabling between the actual retinal implant and external devices. The energy of the incoming light through the eye is not sufficient to trigger neural responses. Therefore, to make a wireless implant work, extra power must be provided to the implant. An approach presented by the Stanford School of Medecine uses an infrared LCD display to project the scene captured by a video camera onto goggles, reflecting infrared pulses onto the chip located on the retina. The chip also uses a photovoltaic rechargeable battery to provide the power required to transfer the IR light into sufficiently strong stimulation pulses. Similar to the subretinal approach, this also allows the eye to naturally fix and focus onto objects in the scene, as the eye is free to move, allowing different parts of the IR image on the goggles to be projected onto different areas on the chip located on the retina. Instead of using infrared light, inductive coils can also be used to transmit electrical power and data signals from external devices to the implant on the retina. This technology has been successfully implemented and tested in the EPIRET3 retinal implant . However, those tests were more a proof-of-concept, as only the patient’s ability to sense a visual signal upon applying a stimulus on the electrodes was tested.
Directed Neural Growth
One way to allow a very precise neural stimulation with extremely low currents and even over longer distances is to make the neurons grow their projections onto the electrode. By applying the right chemical solution onto the retinal tissue, neural growth can be encouraged. This can be achieved by applying a layer of Laminin onto the MEA’s surface. In order to control the neural paths, the Laminin is not applied uniformly across the MEA surface, but in narrow paths forming a pattern corresponding to the connections, the neurons should form. This process of applying the Laminin in a precise, patterend way, is called “microcontact printing”. A picture of what these Lamini paths look like is shown in Figure 5. The successful directed neural growth achieved with this method allowed applying significantly lower stimulation currents compared to classic electrode stimulation while still able to reliably trigger neural response . Furthermore, the stimulation threshold no longer follows the quadratic increase with respect to electrode-soma distance, but remains constant at the same low level even for longer distances (>200 micrometer).
Other Visual Implants
In addition to the stimulation of the retina, also other elements of the visual system can be stimulated
Stimulation of the Optic Nerve
With cuff-electrodes, typically with only a few segments.
- Little trauma to the eye.
- Not very specific.
Dr. Mohamad Sawan, Professor and Researcher at Polystim neurotechnologies Laboratory at the Ecole Polytechnique de Montreal, has been working on a visual prosthesis to be implanted into the human cortex. The basic principle of Dr. Sawan’s technology consists in stimulating the visual cortex by implanting a silicium microchip on a network of electrodes made of biocompatible materials and in which each electrode injects a stimulating electrical current in order to provoke a series of luminous points to appear (an array of pixels) in the field of vision of the sightless person. This system is composed of two distinct parts: the implant and an external controller. The implant lodged in the visual cortex wirelessly receives dedicated data and energy from the external controller. This implantable part contains all the circuits necessary to generate the electrical stimuli and to oversee the changing microelectrode/biological tissue interface. On the other hand, the battery-operated outer control comprises a micro-camera which captures the image as well as a processor and a command generator which process the imaging data to select and translate the captured images and to generate and manage the electrical stimulation process and oversee the implant. The external controller and the implant exchange data in both directions by a powerful transcutaneous radio frequency (RF) link. The implant is powered the same way. (Wikipedia )
- Much larger area for stimulation: 2° radius of the central retinal visual field correspond to 1 mm2 on the retina, but to 2100 mm2 in the visual cortex.
- Implantation is more invasive.
- Parts of the visual field lie in a sulcus and are very hard to reach.
- Stimulation can trigger seizures.
Computer Simulation of the Visual System
In this section an overview in the simulation of processing done by the early levels of the visual system will be given. The implementation to reproduce the action of the visual system will thereby be done with MATLAB and its toolboxes. The processing done by the early visual system was discussed in the section before and can be put together with some of the functions they perform in the following schematic overview. A good description of the image processing can be found in (Cormack 2000).
As we can see in the above overview different stages of the image processing have to be considered to simulate the response of the visual system to a stimulus. The next section will therefore give a brief discussion in Image Processing. But first of all we will be concerned with the Simulation of Sensory Organ Components.
Simulating Sensory Organ Components
Anatomical Parameters of the Eye
The average eye has an anterior corneal radius of curvature of = 7.8 mm , and an aqueous refractive index of 1.336. The length of the eye is = 24.2 mm. The iris is approximately flat, and the edge of the iris (also called limbus) has a radius = 5.86 mm.
Optics of the Eyeball
The optics of the eyeball are characterized by its 2-D spatial impulse response function, the Point Spread Function (PSF)
in which is the radial distance in minutes of arc from the center of the image.
Obviously, the effect on a given digital image depends on the distance of that image from your eyes. As a simple place-holder, substitute this filter with a Gaussian filter with height 30, and with a standard deviation of 1.5.
In one dimension, a Gaussian is described by
Activity of Ganglion Cells
- temporal response
- effect of wavelength (especially for the cones)
- opening of the iris
- sampling and distribution of photo receptors
- bleaching of the photo-pigment
we can approximate the response of ganglion cells with a Difference of Gaussians (DOG, Wikipedia )
The source code for a Python implementation is available under .
The values of and have a ratio of approximately 1:1.6, but vary as a function of eccentricity. For midget cells (or P-cells), the Receptive Field Size (RFS) is approximately
where the RFS is given in arcmin, and the Eccentricity in mm distance from the center of the fovea (Cormack 2000).
Activity of simple cells in the primary visual cortex (V1)
Again ignoring temporal properties, the activity of simple cells in the primary visual cortex (V1) can be modeled with the use of Gabor filters (Wikipedia ). A Gabor filter is a linear filter whose impulse response is defined by a harmonic function (sinusoid) multiplied by a Gaussian function. The Gaussian function causes the amplitude of the harmonic function to diminish away from the origin, but near the origin, the properties of the harmonic function dominate
In this equation, represents the wavelength of the cosine factor, represents the orientation of the normal to the parallel stripes of a Gabor function (Wikipedia ), is the phase offset, is the sigma of the Gaussian envelope and is the spatial aspect ratio, and specifies the ellipticity of the support of the Gabor function.
The size of simple-cell receptive fields depends on its position relative to the fovea, but less strictly so than for retinal ganglion cells. The smallest fields, in and near the fovea, are about one-quarter degree by one-quarter degree, with the center region as small as a few minutes of arc (the same as the diameter of the smallest receptive-field centers in retinal ganglion cells). In the retinal periphery, simple-cell receptive fields can be about 1 degree by 1 degree. .
Gabor-like functions arise naturally, simply from the statistics of everyday scenes . An example how even the statistics of a simple image can lead to the emergence of Gabor-like receptive fields, written in Python, is presented in ; and a (Python-)demonstration of the effects of filtering an image with Gabor-functions can be found at .
This is an example implementation in MATLAB:
function gb = gabor_fn(sigma,theta,lambda,psi,gamma) sigma_x = sigma; sigma_y = sigma/gamma; % Bounding box nstds = 3; xmax = max(abs(nstds*sigma_x*cos(theta)),abs(nstds*sigma_y*sin(theta))); xmax = ceil(max(1,xmax)); ymax = max(abs(nstds*sigma_x*sin(theta)),abs(nstds*sigma_y*cos(theta))); ymax = ceil(max(1,ymax)); xmin = -xmax; ymin = -ymax; [x,y] = meshgrid(xmin:0.05:xmax,ymin:0.05:ymax); % Rotation x_theta = x*cos(theta) + y*sin(theta); y_theta = -x*sin(theta) + y*cos(theta); gb = exp(-.5*(x_theta.^2/sigma_x^2+y_theta.^2/sigma_y^2)).* cos(2*pi/lambda*x_theta+psi); end
And an equivalent Pyhon implementation would be:
import numpy as np import matplotlib.pyplot as mp def gabor_fn(sigma = 1, theta = 1, g_lambda = 4, psi = 2, gamma = 1): # Calculates the Gabor function with the given parameters sigma_x = sigma sigma_y = sigma/gamma # Boundingbox: nstds = 3 xmax = max( abs(nstds*sigma_x * np.cos(theta)), abs(nstds*sigma_y * np.sin(theta)) ) ymax = max( abs(nstds*sigma_x * np.sin(theta)), abs(nstds*sigma_y * np.cos(theta)) ) xmax = np.ceil(max(1,xmax)) ymax = np.ceil(max(1,ymax)) xmin = -xmax ymin = -ymax numPts = 201 (x,y) = np.meshgrid(np.linspace(xmin, xmax, numPts), np.linspace(ymin, ymax, numPts) ) # Rotation x_theta = x * np.cos(theta) + y * np.sin(theta) y_theta = -x * np.sin(theta) + y * np.cos(theta) gb = np.exp( -0.5* (x_theta**2/sigma_x**2 + y_theta**2/sigma_y**2) ) * \ np.cos( 2*np.pi/g_lambda*x_theta + psi ) return gb if __name__ == '__main__': # Main function: calculate Gabor function for default parameters and show it gaborValues = gabor_fn() mp.imshow(gaborValues) mp.colorbar() mp.show()
One major technical tool to understand is the way a computer handles images. We have to know how we can edit images and what techniques we have to rearrange images.
For a computer an image is nothing more than a huge amount of little squares. These squares are called "pixel". In a grayscale image, each of this pixel carries a number n, often it holds . This number n, represents the exactly color of this square in the image. This means, in a grayscale image we can use 256 different grayscales, where 255 means a white spot, and 0 means the square is black. To be honest, we could even use more than 256 different levels of gray. In the mentioned way, every pixels uses exactly 1 byte (or 8 bit) of memory to be saved. (Due to the binary system of a computer it holds: 28=256) If you think it is necessary to have more different gray scales in your image, this is not a problem. You just can use more memory to save the picture. But just remember, this could be a hard task for huge images. Further quite often you have the problem that your sensing device (e.g. your monitor) can not show more than this 256 different gray colors.
Representing a colourful image is only slightly more complicated than the grayscale picture. All you have to know is that the computer works with a additive colour mixture of the three main colors Red, Green and Blue. This are the so called RGB colours.
Also these images are saved by pixels. But now every pixel has to know 3 values between 0 and 256, for every Color 1 value. So know we have 2563= 16,777,216 different colours which can be represented. Similar to the grayscale images also here holds, that no color means black, and having all color means white. That means, the colour (0,0,0) is black, whereas (0,0,255) means blue and (255,255,255) is white.
WARNING - There are two common, but different ways to describe the location of a point in 2 dimensions: 1) The x/y notation, with x typically pointing to the left 2) The row/column orientation Carefully watch out which coordinates you are using to describe your data, as the two descriptions are not consistent!
In many technical applications, we find some primitive basis in which we easily can describe features. In 1 dimensional cases filters are not a big deal, therefore we can use this filters for changing images. The so called "Savitzky- Golay Filter" allows to smooth incoming signals. The filter was described in 1964 by Abraham Savitzky and Marcel J. E. Golay. It is a impulse-respond filter (IR).
For better understanding, lets look at a example. In 1d we usually deal with vectors. One such given vector, we call x and it holds: . Our purpose is to smooth that vector x. To do so all we need is another vector , this vector we call a weight vector.
With we now have a smoothed vector y. This vector is smoother than the vector before, because we only save the average over a few entries in the vector. These means the newly found vectorentries, depends on some entries right left and right of the entry to smooth. One major drawback of this approach is, the newly found vector y only has n-m entries instead of n as the original vector x.
Drawing this new vector would lead to the same function as before, just with less amplitude. So no data is lost, but we have less fluctuation.
Going from the 1d case to the 2d case is done by simply make out of vectors matrices. As already mentioned, a gray-level image is for a computer or for a softwaretool as MATLAB nothing more, than a huge matrix filled with natural numbers, often between 0 and 255.
The weight vector is now a weight-matrix. But still we use the filter by adding up different matrix-element-multiplications.
Dilation and Erosion
For linear filters as seen before, it holds that they are commutative. Cite from wikipedia: "One says that x commutes with y under ∗ if:
In other words, it does not matter how many and in which sequence different linear filters you use. E.g. if a Savitzky-Golay filter is applied to some date, and then a second Savitzky-Golay filter for calculationg the first derivative, the result is the same if the sequence of filters is reversed. It even holds, that there would have been one filter, which does the same as the two applied.
In contrast morphological operations on an image are non-linear operations and the final result depends on the sequence. If we think of any image, it is defined by pixels with values xij. Further this image is assumed to be a black-and-white image, so we have
To define a morphological operation we have to set a structural element SE. As example, a 3x3-Matrix as a part of the image.
The definition of erosion E says:
So in words, if any of the pixels in the structural element M has value 0, the erosion sets the value of M, a specific pixel in M, to zero. Otherwise E(M)=1
And for the dilation D it holds, if any value in SE is 1, the dilation of M, D(M), is set to 1.
Compositions of Dilation and Erosion: Opening and Closing of Images
There are two compositions of dilation and erosion. One called opening the other called closing. It holds:
- Conway, Bevil R (2009). "Color vision, cones, and color-coding in the cortex". The neuroscientist 15: 274-290.
- Russell, Richard and Sinha, Pawan} (2007). "Real-world face recognition: The importance of surface reflectance properties". Perception 36 (9).
- Gegenfurtner, Karl R and Rieger, Jochem (2000). "Sensory and cognitive contributions of color to the recognition of natural scenes". Current Biology 10 (13): 805-808.
- Changizi, Mark A and Zhang, Qiong and Shimojo, Shinsuke (2006). "Bare skin, blood and the evolution of primate colour vision". Biology letters 2 (2): 217-221.
- Beretta, Giordano (2000). Understanding Color. Hewlett-Packard.
- Boynton, Robert M (1988). "Color vision". Annual review of psychology 39 (1): 69-100.
- Grassmann, Hermann (1853). "Zur theorie der farbenmischung". Annalen der Physik 165 (5): 69-84.
- Konig, Arthur and Dieterici, Conrad (1886). "Die Grundempfindungen und ihre intensitats-Vertheilung im Spectrum". Koniglich Preussischen Akademie der Wissenschaften.
- Smith, Vivianne C and Pokorny, Joel (1975). "Spectral sensitivity of the foveal cone photopigments between 400 and 500 nm". Vision research 15 (2): 161-171.
- Vos, JJ and Walraven, PL (1971). "On the derivation of the foveal receptor primaries". Vision Research 11 (8): 799-818.
- Gegenfurtner, Karl R and Kiper, Daniel C (2003). "Color vision". Neuroscience 26 (1): 181.
- Kaiser, Peter K and Boynton, Robert M (1985). "Role of the blue mechanism in wavelength discrimination". Vision research 125 (4): 523-529.
- Paulus, Walter and Kroger-Paulus, Angelika (1983). "A new concept of retinal colour coding". Vision research 23 (5): 529-540.
- Nerger, Janice L and Cicerone, Carol M (1992). "The ratio of L cones to M cones in the human parafoveal retina". Vision research 32 (5): 879-888.
- Neitz, Jay and Carroll, Joseph and Yamauchi, Yasuki and Neitz, Maureen and Williams, David R (2002). "Color perception is mediated by a plastic neural mechanism that is adjustable in adults". Neuron 35 (4): 783-792.
- Jacobs, Gerald H and Williams, Gary A and Cahill, Hugh and Nathans, Jeremy (2007). "Emergence of novel color vision in mice engineered to express a human cone photopigment". Science 315 (5819): 1723-1725.
- Osorio, D and Ruderman, DL and Cronin, TW (1998). "Estimation of errors in luminance signals encoded by primate retina resulting from sampling of natural images with red and green cones". JOSA A 15 (1): 16-22.
- Kersten, Daniel (1987). "Predictability and redundancy of natural images". JOSA A 4 (112): 2395-2400.
- Jolliffe, I. T. (2002). Principal Component Analysis. Springer.
- Buchsbaum, Gershon and Gottschalk, A (1983). "Trichromacy, opponent colours coding and optimum colour information transmission in the retina". Proceedings of the Royal society of London. Series B. Biological sciences 220 (1218): 89-113.
- Zaidi, Qasim (1997). "Decorrelation of L-and M-cone signals". JOSA A 14 (12): 3430-3431.
- Ruderman, Daniel L and Cronin, Thomas W and Chiao, Chuan-Chin (1998). "Statistics of cone responses to natural images: Implications for visual coding". JOSA A 15 (8): 2036-2045.
- Lee, BB and Martin, PR and Valberg, A (1998). "The physiological basis of heterochromatic flicker photometry demonstrated in the ganglion cells of the macaque retina". The Journal of Physiology 404 (1): 323-347.
- Derrington, Andrew M and Krauskopf, John and Lennie, Peter (1984). "Chromatic mechanisms in lateral geniculate nucleus of macaque". The Journal of Physiology 357 (1): 241-265.
- Shapley, Robert (1990). "Visual sensitivity and parallel retinocortical channels". Annual review of psychology 41 (1): 635--658.
- Dobkins, Karen R and Thiele, Alex and Albright, Thomas D (2000). "Comparison of red--green equiluminance points in humans and macaques: evidence for different L: M cone ratios between species". JOSA A 17 (3): 545-556.
- Martin, Paul R and Lee, Barry B and White, Andrew JR and Solomon, Samuel G and Ruttiger, Lukas (2001). "Chromatic sensitivity of ganglion cells in the peripheral primate retina". Nature 410 (6831): 933-936.
- Perry, VH and Oehler, R and Cowey, A (1984). "Retinal ganglion cells that project to the dorsal lateral geniculate nucleus in the macaque monkey". Neuroscience 12 (4): 1101--1123.
- Casagrande, VA (1994). "A third parallel visual pathway to primate area V1". Trends in neurosciences 17 (7): 305-310.
- Hendry, Stewart HC and Reid, R Clay (2000). "The koniocellular pathway in primate vision". Annual review of neuroscience 23 (1): 127-153.
- Callaway, Edward M (1998). "Local circuits in primary visual cortex of the macaque monkey". Annual review of neuroscience 21 (1): 47-74.
- Conway, Bevil R (2001). "Spatial structure of cone inputs to color cells in alert macaque primary visual cortex (V-1)". The Journal of Neuroscience 21 (8): 2768-2783.
- Horwitz, Gregory D and Albright, Thomas D (2005). "Paucity of chromatic linear motion detectors in macaque V1". Journal of Vision 5 (6).
- Danilova, Marina V and Mollon, JD (2006). "The comparison of spatially separated colours". Vision research 46 (6): 823-836.
- Wachtler, Thomas and Sejnowski, Terrence J and Albright, Thomas D (2003). "Representation of color stimuli in awake macaque primary visual cortex". Neuron 37 (4): 681-691.
- Solomon, Samuel G and Lennie, Peter (2005). "Chromatic gain controls in visual cortical neurons". The Journal of neuroscience 25 (19): 4779-4792.
- Hubel, David H (1995). Eye, brain, and vision. Scientific American Library/Scientific American Books.
- Livingstone, Margaret S and Hubel, David H (1987). "Psychophysical evidence for separate channels for the perception of form, color, movement, and depth". The Journal of Neuroscience 7 (11): 3416-3468.
- Zeki, Semir M (1973). "Colour coding in rhesus monkey prestriate cortex". Brain research 53 (2): 422-427.
- Conway, Bevil R and Tsao, Doris Y (2006). "Color architecture in alert macaque cortex revealed by fMRI". Cerebral Cortex 16 (11): 1604-1613.
- Tootell, Roger BH and Nelissen, Koen and Vanduffel, Wim and Orban, Guy A (2004). "Search for color 'center(s)'in macaque visual cortex". Cerebral Cortex 14 (4): 353-363.
- Conway, Bevil R and Moeller, Sebastian and Tsao, Doris Y (2007). "Specialized color modules in macaque extrastriate cortex". 560--573 56 (3): 560-573.
- Fairchild, Mark D (2013). Color appearance models. John Wiley & Sons.
- Webster, Michael A (1996). "Human colour perception and its adaptation". Network: Computation in Neural Systems 7 (4): 587 - 634.
- Shapley, Robert and Enroth-Cugell, Christina (1984). "Visual adaptation and retinal gain controls". Progress in retinal research 3: 263-346.
- Chaparro, A and Stromeyer III, CF and Chen, G and Kronauer, RE (1995). "Human cones appear to adapt at low light levels: Measurements on the red-green detection mechanism". Vision Research 35 (22): 3103-3118.
- Macleod, Donald IA and Williams, David R and Makous, Walter (1992). "A visual nonlinearity fed by single cones". Vision research 32 (2): 347-363.
- Hayhoe, Mary (1991). Adaptation mechanisms in color and brightness. Springer.
- MacAdam, DAvid L (1970). Sources of Color Science. MIT Press.
- Webster, Michael A and Mollon, JD (1995). "Colour constancy influenced by contrast adaptation". Nature 373 (6516): 694-698.
- Brainard, David H and Wandell, Brian A (1992). "Asymmetric color matching: how color appearance depends on the illuminant". JOSA A 9 (9): 1443-1448.
- Eberhart Zrenner, KarlUlrich Bartz-Schmidt, Heval Benav, Dorothea Besch, Anna Bruckmann, Veit-Peter Gabel, Florian Gekeler, Udo Greppmaier, Alex Harscher, Steffen Kibbel, Johannes Koch, Akos Kusnyerik, tobias Peters, Katarina Stingl, Helmut Sachs et al. (2010). Subretinal electronic chips allow blind patients to read letters and combine them to words.
- Asaf Shoval, ChrisopherAdams, Moshe David-Pur, Mark Shein, Yael Hanein, Evelyne Sernagor (2009). Carbon nanotube electrodes for effective interfacing with retinal tissue.
- Jost B. Jonas, UlrikeSchneider, Gottfried O.H. Naumann (1992). Count and density of human retinal photoreceptors. Springer.
- Ashmore Jonathan (2008). Cochlear Outer Hair Cell Motility. American Physiological Society.
- Chris Sekirnjak, PawelHottowy, Alexander Sher, Wladyslaw Dabrowski, Alan M. Litke, E.J. Chichilnisky (2008). High-Resolution Electrical Stimulation of Primate Retina for Epiretinal Implant Design. Society of Neuroscience.
- Pritchard Roy. Stabilized Images on the Retina.
- Susanne Klauke, Michael Goertz, Stefan Rein, Dirk Hoehl, Uwe Thomas, Reinhard Eckhorn, Frank Bremmer, Thomas Wachtler (2011). Stimulation with a Wireless Intraocular Epiretinal Implant Elicits Visual Percepts in Blind Humans. The Association for Research in Vision and Ophthalmology.
- Neville Z. Mehenti, GrehS. Tsien, Theodore Leng, Harvey A. Fishman, Stacey F. Bent (2006). A model retinal interface based on directed neuronal growth for single cell stimulation. Springer.
- T. Haslwanter (2012). "Mexican Hat Function [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/mexican_hat.py.
- David, Hubel (1988). Eye, Brain, and Vision. Henry Holt and Company. http://hubel.med.harvard.edu/book/b17.htm. Retrieved 2014-08-08.
- Olshausen,B.A. and Field,D.J. (1996). "Emergence of simple-cell receptive field properties by learning a sparse code for natural images". Nature 381 (June 13): 607-609.
- scikits-image development team (2012). "Emergence of Gabor-like functions from a SimpleIimage [Python"]. http://work.thaslwanter.at/CSS/Code/lena2gabor.py.
- Thomas Haslwanter (2012). "Demo-application of Gabor filters to an image [Python"]. http://work.thaslwanter.at/CSS/Code/gabor_demo.py.
The sensory system for the sense of hearing is the auditory system. This wikibook covers the physiology of the auditory system, and its application to the most successful neurosensory prosthesis - cochlear implants. The physics and engineering of acoustics are covered in a separate wikibook, Acoustics. An excellent source of images and animations is "Journey into the world of hearing" .
The ability to hear is not found as widely in the animal kingdom as other senses like touch, taste and smell. It is restricted mainly to vertebrates and insects. Within these, mammals and birds have the most highly developed sense of hearing. The table below shows frequency ranges of humans and some selected animals:
The organ that detects sound is the ear. It acts as receiver in the process of collecting acoustic information and passing it through the nervous system into the brain. The ear includes structures for both the sense of hearing and the sense of balance. It does not only play an important role as part of the auditory system in order to receive sound but also in the sense of balance and body position.
Humans have a pair of ears placed symmetrically on both sides of the head which makes it possible to localize sound sources. The brain extracts and processes different forms of data in order to localize sound, such as:
- the shape of the sound spectrum at the tympanic membrane (eardrum)
- the difference in sound intensity between the left and the right ear
- the difference in time-of-arrival between the left and the right ear
- the difference in time-of-arrival between reflections of the ear itself (this means in other words: the shape of the pinna (pattern of folds and ridges) captures sound-waves in a way that helps localizing the sound source, especially on the vertical axis.
Healthy, young humans are able to hear sounds over a frequency range from 20 Hz to 20 kHz. We are most sensitive to frequencies between 2000 to 4000 Hz which is the frequency range of spoken words. The frequency resolution is 0.2% which means that one can distinguish between a tone of 1000 Hz and 1002 Hz. A sound at 1 kHz can be detected if it deflects the tympanic membrane (eardrum) by less than 1 Angstrom, which is less than the diameter of a hydrogen atom. This extreme sensitivity of the ear may explain why it contains the smallest bone that exists inside a human body: the stapes (stirrup). It is 0.25 to 0.33 cm long and weighs between 1.9 and 4.3 mg.
Anatomy of the Auditory System
The aim of this section is to explain the anatomy of the auditory system of humans. The chapter illustrates the composition of auditory organs in the sequence that acoustic information proceeds during sound perception.
Please note that the core information for “Sensory Organ Components” can also be found on the Wikipedia page “Auditory system”, excluding some changes like extensions and specifications made in this article. (see also: Wikipedia Auditory system)
The auditory system senses sound waves, that are changes in air pressure, and converts these changes into electrical signals. These signals can then be processed, analyzed and interpreted by the brain. For the moment, let's focus on the structure and components of the auditory system. The auditory system consists mainly of two parts:
- the ear and
- the auditory nervous system (central auditory system)
The ear is the organ where the first processing of sound occurs and where the sensory receptors are located. It consists of three parts:
- outer ear
- middle ear
- inner ear
Function: Gathering sound energy and amplification of sound pressure.
The folds of cartilage surrounding the ear canal (external auditory meatus, external acoustic meatus) are called the pinna. It is the visible part of the ear. Sound waves are reflected and attenuated when they hit the pinna, and these changes provide additional information that will help the brain determine the direction from which the sounds came. The sound waves enter the auditory canal, a deceptively simple tube. The ear canal amplifies sounds that are between 3 and 12 kHz. At the far end of the ear canal is the tympanic membrane (eardrum), which marks the beginning of the middle ear.
Function: Transmission of acoustic energy from air to the cochlea.
Sound waves traveling through the ear canal will hit the tympanic membrane (tympanum, eardrum). This wave information travels across the air-filled tympanic cavity (middle ear cavity) via a series of bones: the malleus (hammer), incus (anvil) and stapes (stirrup). These ossicles act as a lever and a teletype, converting the lower-pressure eardrum sound vibrations into higher-pressure sound vibrations at another, smaller membrane called the oval (or elliptical) window, which is one of two openings into the cochlea of the inner ear. The second opening is called round window. It allows the fluid in the cochlea to move. The malleus articulates with the tympanic membrane via the manubrium, whereas the stapes articulates with the oval window via its footplate. Higher pressure is necessary because the inner ear beyond the oval window contains liquid rather than air. The sound is not amplified uniformly across the ossicular chain. The stapedius reflex of the middle ear muscles helps protect the inner ear from damage. The middle ear still contains the sound information in wave form; it is converted to nerve impulses in the cochlea.
|Structural diagram of the cochlea||Cross section of the cochlea|
Function: Transformation of mechanical waves (sound) into electric signals (neural signals).
The inner ear consists of the cochlea and several non-auditory structures. The cochlea is a snail-shaped part of the inner ear. It has three fluid-filled sections: scala tympani (lower gallery), scala media (middle gallery, cochlear duct) and scala vestibuli (upper gallery). The cochlea supports a fluid wave driven by pressure across the basilar membrane separating two of the sections (scala tympani and scala media). The basilar membrane is about 3 cm long and between 0.5 to 0.04 mm wide. Reissner’s membrane (vestibular membrane) separates scala media and scala vestibuli. Strikingly, one section, the scala media, contains an extracellular fluid similar in composition to endolymph, which is usually found inside of cells. The organ of Corti is located in this duct, and transforms mechanical waves to electric signals in neurons. The other two sections, scala tympani and scala vestibuli, are located within the bony labyrinth which is filled with fluid called perilymph. The chemical difference between the two fluids endolymph (in scala media) and perilymph (in scala tympani and scala vestibuli) is important for the function of the inner ear.
Organ of Corti
The organ of Corti forms a ribbon of sensory epithelium which runs lengthwise down the entire cochlea. The hair cells of the organ of Corti transform the fluid waves into nerve signals. The journey of a billion nerves begins with this first step; from here further processing leads to a series of auditory reactions and sensations.
Transition from ear to auditory nervous system
Hair cells are columnar cells, each with a bundle of 100-200 specialized cilia at the top, for which they are named. These cilia are the mechanosensors for hearing. The shorter ones are called stereocilia, and the longest one at the end of each haircell bundle kinocilium. The location of the kinocilium determines the on-direction, i.e. the direction of deflection inducing the maximum hair cell excitation. Lightly resting atop the longest cilia is the tectorial membrane, which moves back and forth with each cycle of sound, tilting the cilia and allowing electric current into the hair cell.
The function of hair cells is not fully established up to now. Currently, the knowledge of the function of hair cells allows to replace the cells by cochlear implants in case of hearing lost. However, more research into the function of the hair cells may someday even make it possible for the cells to be repaired. The current model is that cilia are attached to one another by “tip links”, structures which link the tips of one cilium to another. Stretching and compressing, the tip links then open an ion channel and produce the receptor potential in the hair cell. Note that a deflection of 100 nanometers already elicits 90% of the full receptor potential.
The nervous system distinguishes between nerve fibres carrying information towards the central nervous system and nerve fibres carrying the information away from it:
- Afferent neurons (also sensory or receptor neurons) carry nerve impulses from receptors (sense organs) towards the central nervous system
- Efferent neurons (also motor or effector neurons) carry nerve impulses away from the central nervous system to effectors such as muscles or glands (and also the ciliated cells of the inner ear)
Afferent neurons innervate cochlear inner hair cells, at synapses where the neurotransmitter glutamate communicates signals from the hair cells to the dendrites of the primary auditory neurons. There are far fewer inner hair cells in the cochlea than afferent nerve fibers. The neural dendrites belong to neurons of the auditory nerve, which in turn joins the vestibular nerve to form the vestibulocochlear nerve, or cranial nerve number VIII.
Efferent projections from the brain to the cochlea also play a role in the perception of sound. Efferent synapses occur on outer hair cells and on afferent (towards the brain) dendrites under inner hair cells.
Auditory nervous system
The sound information, now re-encoded in form of electric signals, travels down the auditory nerve (acoustic nerve, vestibulocochlear nerve, VIIIth cranial nerve), through intermediate stations such as the cochlear nuclei and superior olivary complex of the brainstem and the inferior colliculus of the midbrain, being further processed at each waypoint. The information eventually reaches the thalamus, and from there it is relayed to the cortex. In the human brain, the primary auditory cortex is located in the temporal lobe.
Primary auditory cortex
The primary auditory cortex is the first region of cerebral cortex to receive auditory input. Perception of sound is associated with the right posterior superior temporal gyrus (STG). The superior temporal gyrus contains several important structures of the brain, including Brodmann areas 41 and 42, marking the location of the primary auditory cortex, the cortical region responsible for the sensation of basic characteristics of sound such as pitch and rhythm. The auditory association area is located within the temporal lobe of the brain, in an area called the Wernicke's area, or area 22. This area, near the lateral cerebral sulcus, is an important region for the processing of acoustic signals so that they can be distinguished as speech, music, or noise.
Auditory Signal Processing
Now that the anatomy of the auditory system has been sketched out, this topic goes deeper into the physiological processes which take place while perceiving acoustic information and converting this information into data that can be handled by the brain. Hearing starts with pressure waves hitting the auditory canal and is finally perceived by the brain. This section details the process transforming vibrations into perception.
Effect of the head
Sound waves with a wavelength shorter than the head produce a sound shadow on the ear further away from the sound source. When the wavelength is shorter than the head, diffraction of the sound leads to approximately equal sound intensities on both ears.
Sound reception at the pinna
The pinna collects sound waves in air affecting sound coming from behind and the front differently with its corrugated shape. The sound waves are reflected and attenuated or amplified. These changes will later help sound localization.
In the external auditory canal, sounds between 3 and 12 kHz - a range crucial for human communication - are amplified. It acts as resonator amplifying the incoming frequencies.
Sound conduction to the cochlea
Sound that entered the pinna in form of waves travels along the auditory canal until it reaches the beginning of the middle ear marked by the tympanic membrane (eardrum). Since the inner ear is filled with fluid, the middle ear is kind of an impedance matching device in order to solve the problem of sound energy reflection on the transition from air to the fluid. As an example, on the transition from air to water 99.9% of the incoming sound energy is reflected. This can be calculated using:
with Ir the intensity of the reflected sound, Ii the intensity of the incoming sound and Zk the wave resistance of the two media ( Zair = 414 kg m-2 s-1 and Zwater = 1.48*106 kg m-2 s-1). Three factors that contribute the impedance matching are:
- the relative size difference between tympanum and oval window
- the lever effect of the middle ear ossicles and
- the shape of the tympanum.
The longitudinal changes in air pressure of the sound-wave cause the tympanic membrane to vibrate which, in turn, makes the three chained ossicles malleus, incus and stirrup oscillate synchronously. These bones vibrate as a unit, elevating the energy from the tympanic membrane to the oval window. In addition, the energy of sound is further enhanced by the areal difference between the membrane and the stapes footplate. The middle ear acts as an impedance transformer by changing the sound energy collected by the tympanic membrane into greater force and less excursion. This mechanism facilitates transmission of sound-waves in air into vibrations of the fluid in the cochlea. The transformation results from the pistonlike in- and out-motion by the footplate of the stapes which is located in the oval window. This movement performed by the footplate sets the fluid in the cochlea into motion.
Through the stapedius muscle, the smallest muscle in the human body, the middle ear has a gating function: contracting this muscle changes the impedance of the middle ear, thus protecting the inner ear from damage through loud sounds.
Frequency analysis in the cochlea
The three fluid-filled compartements of the cochlea (scala vestibuli, scala media, scala tympani) are separated by the basilar membrane and the Reissner’s membrane. The function of the cochlea is to separate sounds according to their spectrum and transform it into a neural code. When the footplate of the stapes pushes into the perilymph of the scala vestibuli, as a consequence the membrane of Reissner bends into the scala media. This elongation of Reissner’s membrane causes the endolymph to move within the scala media and induces a displacement of the basilar membrane. The separation of the sound frequencies in the cochlea is due to the special properties of the basilar membrane. The fluid in the cochlea vibrates (due to in- and out-motion of the stapes footplate) setting the membrane in motion like a traveling wave. The wave starts at the base and progresses towards the apex of the cochlea. The transversal waves in the basilar membrane propagate with
with μ the shear modulus and ρ the density of the material. Since width and tension of the basilar membrane change, the speed of the waves propagating along the membrane changes from about 100 m/s near the oval window to 10 m/s near the apex.
There is a point along the basilar membrane where the amplitude of the wave decreases abruptly. At this point, the sound wave in the cochlear fluid produces the maximal displacement (peak amplitude) of the basilar membrane. The distance the wave travels before getting to that characteristic point depends on the frequency of the incoming sound. Therefore each point of the basilar membrane corresponds to a specific value of the stimulating frequency. A low-frequency sound travels a longer distance than a high-frequency sound before it reaches its characteristic point. Frequencies are scaled along the basilar membrane with high frequencies at the base and low frequencies at the apex of the cochlea.
Sensory transduction in the cochlea
Most everyday sounds are composed of multiple frequencies. The brain processes the distinct frequencies, not the complete sounds. Due to its inhomogeneous properties, the basilar membrane is performing an approximation to a Fourier transform. The sound is thereby split into its different frequencies, and each hair cell on the membrane corresponds to a certain frequency. The loudness of the frequencies is encoded by the firing rate of the corresponding afferent fiber. This is due to the amplitude of the traveling wave on the basilar membrane, which depends on the loudness of the incoming sound.
The sensory cells of the auditory system, known as hair cells, are located along the basilar membrane within the organ of Corti. Each organ of Corti contains about 16’000 such cells, innervated by about 30'000 afferent nerve fibers. There are two anatomically and functionally distinct types of hair cells: the inner and the outer hair cells. Along the basilar membrane these two types are arranged in one row of inner cells and three to five rows of outer cells. Most of the afferent innervation comes from the inner hair cells while most of the efferent innervation goes to the outer hair cells. The inner hair cells influence the discharge rate of the individual auditory nerve fibers that connect to these hair cells. Therefore inner hair cells transfer sound information to higher auditory nervous centers. The outer hair cells, in contrast, amplify the movement of the basilar membrane by injecting energy into the motion of the membrane and reducing frictional losses but do not contribute in transmitting sound information. The motion of the basilar membrane deflects the stereocilias (hairs on the hair cells) and causes the intracellular potentials of the hair cells to decrease (depolarization) or increase (hyperpolarization), depending on the direction of the deflection. When the stereocilias are in a resting position, there is a steady state current flowing through the channels of the cells. The movement of the stereocilias therefore modulates the current flow around that steady state current.
Lets look at the modes of action of the two different hair cell types separately:
- Inner hair cells:
The deflection of the hair-cell stereocilia opens mechanically gated ion channels that allow small, positively charged potassium ions (K+) to enter the cell and causing it to depolarize. Unlike many other electrically active cells, the hair cell itself does not fire an action potential. Instead, the influx of positive ions from the endolymph in scala media depolarizes the cell, resulting in a receptor potential. This receptor potential opens voltage gated calcium channels; calcium ions (Ca2+) then enter the cell and trigger the release of neurotransmitters at the basal end of the cell. The neurotransmitters diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger action potentials in the nerve. In this way, neurotransmitter increases the firing rate in the VIIIth cranial nerve and the mechanical sound signal is converted into an electrical nerve signal.
The repolarization in the hair cell is done in a special manner. The perilymph in Scala tympani has a very low concentration of positive ions. The electrochemical gradient makes the positive ions flow through channels to the perilymph. (see also: Wikipedia Hair cell)
- Outer hair cells:
In humans outer hair cells, the receptor potential triggers active vibrations of the cell body. This mechanical response to electrical signals is termed somatic electromotility and drives oscillations in the cell’s length, which occur at the frequency of the incoming sound and provide mechanical feedback amplification. Outer hair cells have evolved only in mammals. Without functioning outer hair cells the sensitivity decreases by approximately 50 dB (due to greater frictional losses in the basilar membrane which would damp the motion of the membrane). They have also improved frequency selectivity (frequency discrimination), which is of particular benefit for humans, because it enables sophisticated speech and music. (see also: Wikipedia Hair cell)
With no external stimulation, auditory nerve fibres discharge action potentials in a random time sequence. This random time firing is called spontaneous activity. The spontaneous discharge rates of the fibers vary from very slow rates to rates of up to 100 per second. Fibers are placed into three groups depending on whether they fire spontaneously at high, medium or low rates. Fibers with high spontaneous rates (> 18 per second) tend to be more sensitive to sound stimulation than other fibers.
Auditory pathway of nerve impulses
So in the inner hair cells the mechanical sound signal is finally converted into electrical nerve signals. The inner hair cells are connected to auditory nerve fibres whose nuclei form the spiral ganglion. In the spiral ganglion the electrical signals (electrical spikes, action potentials) are generated and transmitted along the cochlear branch of the auditory nerve (VIIIth cranial nerve) to the cochlear nucleus in the brainstem.
From there, the auditory information is divided into at least two streams:
- Ventral Cochlear Nucleus:
One stream is the ventral cochlear nucleus which is split further into the posteroventral cochlear nucleus (PVCN) and the anteroventral cochlear nucleus (AVCN). The ventral cochlear nucleus cells project to a collection of nuclei called the superior olivary complex.
Superior olivary complex: Sound localization
The superior olivary complex - a small mass of gray substance - is believed to be involved in the localization of sounds in the azimuthal plane (i.e. their degree to the left or the right). There are two major cues to sound localization: Interaural level differences (ILD) and interaural time differences (ITD). The ILD measures differences in sound intensity between the ears. This works for high frequencies (over 1.6 kHz), where the wavelength is shorter than the distance between the ears, causing a head shadow - which means that high frequency sounds hit the averted ear with lower intensity. Lower frequency sounds don't cast a shadow, since they wrap around the head. However, due to the wavelength being larger than the distance between the ears, there is a phase difference between the sound waves entering the ears - the timing difference measured by the ITD. This works very precisely for frequencies below 800 Hz, where the ear distance is smaller than half of the wavelength. Sound localization in the median plane (front, above, back, below) is helped through the outer ear, which forms direction-selective filters.
There, the differences in time and loudness of the sound information in each ear are compared. Differences in sound intensity are processed in cells of the lateral superior olivary complexm and timing differences (runtime delays) in the medial superior olivary complex. Humans can detect timing differences between the left and right ear down to 10 μs, corresponding to a difference in sound location of about 1 deg. This comparison of sound information from both ears allows the determination of the direction where the sound came from. The superior olive is the first node where signals from both ears come together and can be compared. As a next step, the superior olivary complex sends information up to the inferior colliculus via a tract of axons called lateral lemniscus. The function of the inferior colliculus is to integrate information before sending it to the thalamus and the auditory cortex. It is interesting to know that the superior colliculus close by shows an interaction of auditory and visual stimuli.
- Dorsal Cochlear Nucleus:
The dorsal cochlear nucleus (DCN) analyzes the quality of sound and projects directly via the lateral lemnisucs to the inferior colliculus.
From the inferior colliculus the auditory information from ventral as well as dorsal cochlear nucleus proceeds to the auditory nucleus of the thalamus which is the medial geniculate nucleus. The medial geniculate nucleus further transfers information to the primary auditory cortex, the region of the human brain that is responsible for processing of auditory information, located on the temporal lobe. The primary auditory cortex is the first relay involved in the conscious perception of sound.
Primary auditory cortex and higher order auditory areas
Sound information that reaches the primary auditory cortex (Brodmann areas 41 and 42). The primary auditory cortex is the first relay involved in the conscious perception of sound. It is known to be tonotopically organized and performs the basics of hearing: pitch and volume. Depending on the nature of the sound (speech, music, noise), is further passed to higher order auditory areas. Sounds that are words are processed by Wernicke’s area (Brodmann area 22). This area is involved in understanding written and spoken language (verbal understanding). The production of sound (verbal expression) is linked to Broca’s area (Brodmann areas 44 and 45). The muscles to produce the required sound when speaking are contracted by the facial area of motor cortex which are regions of the cerebral cortex that are involved in planning, controlling and executing voluntary motor functions.
The intensity of sound is typically expressed in deciBel (dB), defined as
where SPL = “sound pressure level” (in dB), and the reference pressure is . Note that this is much smaller than the air pressure (ca. 105 N/m2)! Also watch out, because sound is often expressed relative to "Hearing Level" instead of SPL.
- 0 - 20 dB SPL ... hearing level (0 dB for sinusoidal tones, from 1 kHz – 4 kHz)
- 60 dB SPL ... medium loud tone, conversational speech
Fundamental frequency, from the vibrations of the vocal cords in the larynx, is about 120 Hz for adult male, 250 Hz for adult female, and up to 400 Hz for children.
Formants are the dominant frequencies in human speech, and are caused by resonances of the signals from the vocal cord in our mouth etc. Formants show up as distinct peaks of energy in the sound's frequency spectrum. They are numbered in ascending order starting with the format at the lowest frequency.
Speech is often considered to consist of a sequence of acoustic units called phons, which correspond to linguistic units called phonemes. Phonemes are the smallest units of sound that allows different words to be distinguished. The word "dog", for example, contains three phonemes. Changes to the first, second, and third phoneme respectively produce the words "log", "dig", and "dot". English is said to contain 40 different phonemes, specified as in /d/, /o/, /g/ for the word "dog".
The ability of humans to decode speech signals still easily exceeds that of any algorithm developed so far. While automatic speech recognition has become fairly successful in recognizing clearly spoken speech in environments with high Signal-to-noise ratio, once the conditions become a bit less than ideal, recognition algorithms tend to perform vary poorly compared to humans. It seems from this that our computer speech recognition algorithms have not yet come close to capturing the underlying algorithm that humans use to recognize speech.
Evidence has shown that the perception of speech takes quite a different route than the perception of other sounds in the brain. While studies on non-speech sound responses have generally found response to be graded with stimulus, speech studies have repeatedly found a discretization of response when a graded stimulus is presented. For instance, Lisker and Abramson, played a pre-voiced 'b/p' sound. Whether the sound is interpreted as a /b/ or a /p/ depends on the voice onset time (VOT). They found that when smoothly varying the VOT, there was a sharp change (at ~20ms after the consonant is played) where subjects switched their identification from /b/ to /p/. Furthermore, subjects had a great deal of difficulty differentiating between two sounds in the same category (e.g. pairs of sounds with a VOTs of -10ms to 10m, which would both be /b/'s, than sounds with a 10ms to 30ms, which would be identified as a b and a p). This shows that some type of categorization scheme is going on. One of the main problems encountered when trying to build a model of speech perception is the so-called 'Lack of Invariance', which could more straightforwardly just be stated as the 'variance'. This term refers to the fact that a single phoneme (e.g. /p/ as in sPeech or Piety), has a great variety of waveforms that map to it, and that the mapping between an acoustic waveform and a phoneme is far from obvious and heavily context-dependent, yet human listeners reliably give the correct result. Even when the context is similar, a waveform will show a great deal of variance due to factors such as the pace of speech, the identity of the speaker and the tone in which he is speaking. So while there is no agreed-upon model of speech perception, the existing models can be split into two classes: Passive Perception and Active perception.
Passive Perception Models
Passive perception theories generally describe the problem of speech perception in the same way that most sensory signal-processing algorithms do: Some raw input signal goes in, and is processed though a hierarchy where each subsequent step extracts some increasingly abstract signal from the input. One of the early examples of a passive model was distinctive feature theory. The idea is to identify the presence of sets of binary values for certain features. For example, 'nasal/oral', 'vocalic/non-vocalic'. The theory is that a phoneme is interpreted as a binary vector of the presence or absence of these features. These features can be extracted from the spectrogram data. Other passive models, such as those described by Selfridge  and Uttley, involve a kind of template-matching, where a hierarchy of processing layers extract features that are increasingly abstract and invariant to certain irrelevant features (such as identity of the speaker when classifying phonemes).
Active Perception Models
An entirely different take on speech perception are active-perception theories. These theories make the point that it would be redundant for the brain to have two parallel systems for speech perception and speech production, given that the ability produce a sound is so closely tied with the ability to identify it - proponents of these theories argue that it would be wasteful and complicated to maintain two separate databases-one containing the programs to identify phonemes, and another to produce them. They argue that speech perception is actually done by attempting to replicate the incoming signal, and thus using the same circuits for phoneme production as for identification. The Motor Theory of speech perception (Liberman et al., 1967), states that speech sounds are identified not by any sort of template matching, but by using the speech-generating mechanisms to try and regenerate a copy of the speech signal. It states that phonemes should not be seen as hidden signals within the speech, but as “cues” that the generating mechanism attempts to reproduce in a pre-speech signal. The theory states that speech-generating regions of the brain learn which speech-precursor signals will produce which sounds by the constant feedback loop of always hearing one's own speech. The babbling of babies, it is argued, is a way of learning this how to generate these “cue” sounds from pre-motor signals.
A similar idea is proposed in the analysis-by-synthesis model, by Stevens and Halle. This describes a generative model which attempts to regenerate a similar signal to the incoming sound. It essentially takes advantage of the fact that speech-generating mechanisms are similar between people, and that the characteristic features that one hears in speech can be reproduced by the speaker. As the speaker hears the sound, the speech centers attempt to generate the signal that's coming in. Comparators give constant feedback on the quality of the regeneration. The 'units of perception', are therefore not so much abstractions of the incoming sound, as pre-motor commands for generating the same speech.
Motor theories took a serious hit when a series of studies on what is now known as Broca's Aphasia were published. This condition impairs one's ability to produce speech sounds, without impairing the ability to comprehend them, whereas motor theory, taken in its original form, states that production and comprehension are done by the same circuits, so impaired speech production should imply impaired speech comprehension. The existence of Broca's aphasia appears to contradicts this prediction.
One of the most influential computational models of speech perception is called TRACE. TRACE is a neural-network-like model, with three layers and a recurrent connection scheme. The first layer extracts features from an input spectrogram in temporal order, basically simulating the cochlea. The second layer extracts phonemes from the feature information, and the third layer extracts words from the phoneme information. The model contains feed-forward (bottom-up) excitatory connections, lateral inhibitory connections, and feedback (top-down) excitatory connections. In this model, each computational unit corresponds to some unit of perception (e.g. the phoneme /p/ or the word "preposterous"). The basic idea is that, based on their input, units within a layer will compete to have the strongest output. The lateral inhibitory connections result in a sort of winner-takes-all circuit, in which the unit with the strongest input will inhibit its neighbors and become the clear winner. The feedback connections allow us to explain the effect of context-dependent comprehension - for example, suppose the phoneme layer, based on its bottom-up inputs, could not decide whether it had heard a /g/ or a /k/, but that the phoneme was preceded by 'an', and followed by 'ry'. Both the /g/ and /k/ units would initially be equally activated, sending inputs up to the word level, which would already contain excited units corresponding to words such as 'anaconda', 'angry', and 'ankle', which had been activated by the preceding 'an'. The excitement of the /g/ or /k/
A cochlear implant (CI) is a surgically implanted electronic device that replaces the mechanical parts of the auditory system by directly stimulating the auditory nerve fibers through electrodes inside the cochlea. Candidates for cochlear implants are people with severe to profound sensorineural hearing loss in both ears and a functioning auditory nervous system. They are used by post-lingually deaf people to regain some comprehension of speech and other sounds as well as by pre-lingually deaf children to enable them to gain spoken language skills. (Diagnosis of hearing loss in newborns and infants is done using otoacoustic emissions, and/or the recording of auditory evoked potentials.) A quite recent evolution is the use of bilateral implants allowing recipients basic sound localization.
Parts of the cochlear implant
The implant is surgically placed under the skin behind the ear. The basic parts of the device include:
- a microphone which picks up sound from the environment
- a speech processor which selectively filters sound to prioritize audible speech and sends the electrical sound signals through a thin cable to the transmitter,
- a transmitter, which is a coil held in position by a magnet placed behind the external ear, and transmits the processed sound signals to the internal device by electromagnetic induction,
- a receiver and stimulator secured in bone beneath the skin, which converts the signals into electric impulses and sends them through an internal cable to electrodes,
- an array of up to 24 electrodes wound through the cochlea, which send the impulses to the nerves in the scala tympani and then directly to the brain through the auditory nerve system
Signal processing for cochlear implants
In normal hearing subjects, the primary information carrier for speech signals is the envelope, whereas for music, it is the fine structure. This is also relevant for tonal languages, like Mandarin, where the meaning of words depends on their intonation. It was also found that interaural time delays coded in the fine structure determine where a sound is heard from rather than interaural time delays coded in the envelope, although it is still the speech signal coded in the envelope that is perceived.
The speech processor in a cochlear implant transforms the microphone input signal into a parallel array of electrode signals destined for the cochlea. Algorithms for the optimal transfer function between these signals are still an active area of research. The first cochlear implants were single-channel devices. The raw sound was band-passed filtered to include only the frequency range of speech, then modulated onto a 16 kHz wave to allow the electrical signal to electrically couple to the nerves. This approach was able to provide very basic hearing, but was extremely limited in that it was completely unable to take advantage of the frequency-location map of the cochlea.
The advent of multi-channel implants opened the door to try a number of different speech-processing strategies to facilitate hearing. These can be roughly divided into Waveform and Feature-Extraction strategies.
These generally involve applying a non-linear gain on the sound (as an input audio signal with a ~30dB dynamic range must be compressed into an electrical signal with just a ~5dB dynamic range), and passing it through parallel filter banks. The first waveform strategy to be tried was Compressed Analog approach. In this system, the raw audio is initially filtered with a gain-controlled amplifier (the gain-control reduces the dynamic range of the signal). The signal is then passed through parallel band-pass filters, and the output of these filters goes on to stimulate electrodes at their appropriate locations.
A problem with the Compressed Analog approach was that the there was a strong interaction-effect between adjacent electrodes. If electrodes driven by two filters happened to be stimulating at the same time, the superimposed stimulation could cause unwanted distortion in the signals coming from hair cells that were within range of both of these electrodes. The solution to this was the Continuous Interleaved Sampling Approach - in which the electrodes driven by adjacent filters stimulate at slightly different times. This eliminates the interference effect between nearby electrodes, but introduces the problem that, due to the interleaving, temporal resolution suffers.
These strategies focus less on transmitting filtered versions of the audio signal and more on extracting more abstract features of the signal and transmitting them to the electrodes. The first feature-extraction strategies looked for the formants (frequencies with maximum energy) in speech. In order to do this, they would apply wide band filters (e.g. 270 Hz low-pass for F0 - the base formant, 300 Hz-1 kHz for F1, and 1 kHz-4 kHz for F2), then calculate the formant frequency, using the zero-crossings of each of these filter outputs, and formant-amplitude by looking at the envelope of the signals from each filter. Only electrodes corresponding to these formant frequencies would be activated. The main limitation of this approach was that formants primarily identify vowels, and consonant information, which primarily resides in higher frequencies, was poorly transmitted. The MPEAK system later improved on this design my incorporating high-frequency filters which could better simulate unvoiced sounds (consonants) by stimulating high-frequency electrodes, and formant frequency electrodes at random intervals.
Currently, the leading strategy is the SPEAK system, which combines characteristics of Waveform and Feature-Detection strategies. In this system, the signal passes through a parallel array of 20 band-pass filters. The envelope is extracted from each of these and several of the most powerful frequencies are selected (how many depends on the shape of the spectrum), and the rest are discarded. This is known as a 'n-of-m" strategy. The amplitudes of these are then logarithmically compressed to adapt the mechanical signal range of sound to the much narrower electrical signal range of hair cells.
On its newest implants, the company Cochlear uses 3 microphones instead of one. The additional information is used for beam-forming, i.e. extracting more information from sound coming from straight ahead. This can improve the signal-to-noise ratio when talking to other people by up to 15dB, thereby significantly enhancing speech perception in noisy environments.
Integration CI – Hearing Aid
Preservation of low-frequency hearing after cochlear implantation is possible with careful surgical technique and with careful attention to electrode design. For patients with remaining low-frequency hearing, the company MedEl offers a combination of a cochlea implant for the higher frequencies, and classical hearing aid for the lower frequencies. This system, called EAS for electric-acoustic stimulation, uses with a lead of 18mm, compared to 31.5 mm for the full CI. (The length of the cochlea is about 36 mm.) This results in a significant improvement of music perception, and improved speech recognition for tonal languages.
For high frequencies, the human auditory system uses only tonotopic coding for information. For low frequencies, however, also temporal information is used: the auditory nerve fires synchronously with the phase of the signal. In contrast, the original CIs only used the power spectrum of the incoming signal. In its new models, MedEl incorporates the timing information for low frequencies, which it calls fine structure, in determining the timing of the stimulation pulses. This improves music perception, and speech perception for tonal languages like Mandarin.
Mathematically, envelope and fine-structure of a signal can be elegantly obtained with the Hilbert Transform (see Figure). The corresponding Python code is available under.
The numbers of electrodes available is limited by the size of the electrode (and the resulting charge and current densities), and by the current spread along the endolymph. To increase the frequency specificity, one can stimulate two adjacent electrodes. Subjects report to perceive this as a single tone at a frequency intermediate to the two electrodes.
Simulation of a cochlear implant
Sound processing in cochlear implant is still subject to a lot of research and one of the major product differentiations between the manufacturers. However, the basic sound processing is rather simple and can be implemented to gain an impression of the quality of sound perceived by patients using a cochlear implant. The first step in the process is to sample some sound and analyze its frequency. Then a time-window is selected, during which we want to find the stimulation strengths of the CI electrodes. There are two ways to achieve that: i) through the use of linear filters ( see Gammatone filters); or ii) through the calculation of the powerspectrum (see Spectral Analysis).
Cochlear implants and Magnetic Resonance Imaging
With more than 150 000 implantations worldwide, Cochlear Implants (CIs) have now become a standard method for treating severe to profound hearing loss. Since the benefits of CIs become more evident, payers become more willing to support CIs and due to the screening programs of newborns in most industrialized nations, many patients get CIs in infancy and will likely continue to have them throughout their lives. Some of them may require diagnostic scanning during their lives which may be assisted by imaging studies with Magnetic resonance imaging (MRI). For large segments of the population, including patients suffering from stroke, back pain or headache, MRI has become a standard method for diagnosis. MRI uses pulses of magnetic fields to generate images and current MRI machines are working with 1.5 Tesla magnet fields. 0.2 to 4.0 Tesla devices are common and the radiofrequency power can peak as high as 6 kW in a 1.5 Tesla machine.
Cochlear implants have been historically thought to incompatible with MRI with magnetic fields higher than 0.2 T. The external parts of the device always have to be removed. There are different regulations for the internal parts of the device. Current US Food and Drug Administration (FDA) guidelines allow limited use of MRI after CI implantation. The pulsar and Sonata (MED-EL Corp, Innsbruck, Austria) devices are approved for 0.2 T MRI with the magnet in place. The Hi-res 90K (Advanced Bionics Corp, Sylmar, CA, USA) and the Nucleus Freedom (Cochlear Americas, Englewood, CO, USA) are approved for up to 1.5 T MRI after surgical removal of the internal magnet. Each removal and replacement of the magnet can be done using a small incision under local anesthesia, but the procedure is likely to weaken the pocket of the magnet and to risk infection of the patient.
Cadaver studies have shown that there is a risk that the implant may be displaced from the internal device in a 1.5 T MRI scanner. However, the risk could be eliminated when a compression dressing was applied. Nevertheless, the CI produces an artifact that could potentially reduce the diagnostic value of the scan. The size of the artifact will be larger relative to the size of the patient’s head and this might be particularly challenging for MRI scans with children. A recent study by Crane et al., 2010 found out that the artifact around the area of the CI had a mean anterior-posterior dimension of 6.6 +/- 1.5 cm (mean +/- standard deviation) and a left-right dimension averaging 4.8 +/- 1.0 cm (mean +/- standard deviation) (Crane et al., 2010). ()
Computer Simulations of the Auditory System
Working with Sound
Audio signals can be stored in a variety of formats. They can be uncompressed or compressed, and the encoding can be open or proprietary. On Windows systems, the most common format is the WAV-format. It contains a header with information about the number of channels, sample rate, bits per sample etc. This header is followed by the data themselves. The usual bitstream encoding is the linear pulse-code modulation (LPCM) format.
Many programing languages provide commands for reading and writing WAV-files. When working with data in other formats, you have two options:
- You can either you convert them into WAV-format, and go on from there. A very comprehensive free cross-platform solution to record, convert and stream audio and video is ffmpeg (http://www.ffmpeg.org/).
- Or you can obtain special programs moduls for reading/writing the desired format.
Reminder of Fourier Transformations
To transform a continuous function, one uses the Fourier Integral:
where k represents frequency. Note that F(k) is a complex value: its absolute value gives us the amplitude of the function, and its phase defines the phase-shift between cosine and sine components.
The inverse transform is given by
If the data are sampled with a constant sampling frequency and there are N data points,
The coefficients Fn can be obtained by
Since there are a discrete, limited number of data points and with a discrete, limited number of waves, this transform is referred to as Discrete Fourier Transform (DFT). The Fast Fourier Transform (FFT) is just a special case of the DFT, where the number of points is a power of 2: .
Note that each is a complex number: its magnitude defines to the amplitude of the corresponding frequency component in the signal; and the phase of defines the corresponding phase (see illustration). If the signal in the time domain "f(t)" is real valued, as is the case with most measured data, this puts a constraint on the corresponding frequency components: in that case we have
A frequent source of confusion is the question: “Which frequency corresponds to ?” If there are N data points and the sampling period is , the frequency is given by
In other words, the lowest frequency is [in Hz], while the highest independent frequency is due to the Nyquist-Shannon theorem. Note that in MATLAB, the first return value corresponds to the offset of the function, and the second value to n=1!
Spectral Analysis of Biological Signals
Power Spectrum of Stationary Signals
Most FFT functions and algorithms return the complex Fourier coefficients . If we are only interested in the magnitude of the contribution at the corresponding frequency, we can obtain this information by
This is the power spectrum of our signal, and tells us how big the contribution of the different frequencies is.
Power Spectrum of Non-stationary Signals
Often one has to deal with signals that are changing their characteristics over time. In that case, one wants to know how the power spectrum changes with time. The simplest way is to take only a short segment of data at a time, and calculate the corresponding power spectrum. This approach is called Short Time Fourier Transform (STFT). However in that case edge effects can significantly distort the signals, since we are assuming that our signal is periodic.
To eliminate edge artifacts, the signals can be filtered, or "windowed". An examples of such a window is shown in the figure above. While some windows provide better frequency resolution (e.g. the rectangular window), others exhibit fewer artifacts such as spectral leakage (e.g. Hanning window). For a selected section of the signal, the data resulting from windowing are obtained by multiplying the signal with the window (left Figure):
An example can show how cutting a signal, and applying a window to it, can affect the spectral power distribution, is shown in the right figure above. (The corrsponding Python code can be found at  ) Note that decreasing the width of the sample window increases the width of the corresponding powerspectrum!
Stimulation strength for one time window
To obtain the power spectrum for one selected time window, the first step is to calculate the power spectrum through the Fast Fourier Transform (FFT) of the time signal. The result is the sound intensity in frequency domain, and the corresponding frequencies. The second step is to concentrate those intensities on a few distinct frequencies ("binning"). The result is a sound signal consisting of a few distinct frequencies - the location of the electrodes in the simulated cochlea. Back conversion into the time domain gives the simulated sound signal for that time window.
The following Python function does sound processing on a given signal.
import numpy as np def pSpect(data, rate): '''Calculation of power spectrum and corresponding frequencies, using a Hamming window''' nData = len(data) window = np.hamming(nData) fftData = np.fft.fft(data*window) PowerSpect = fftData * fftData.conj() / nData freq = np.arange(nData) * float(rate) / nData return (np.real(PowerSpect), freq) def calc_stimstrength(sound, rate=1000, sample_freqs=[100, 200, 400]): '''Calculate the stimulation strength for a given sound''' # Calculate the powerspectrum Pxx, freq = pSpect(sound, rate) # Generate matrix to sum over the requested bins num_electrodes = len(sample_freqs) sample_freqs = np.hstack((0, sample_freqs)) average_freqs = np.zeros([len(freq), num_electrodes]) for jj in range(num_electrodes): average_freqs[((freq>sample_freqs[jj]) * (freq<sample_freqs[jj+1])),jj] = 1 # Calculate the stimulation strength (the square root has to be taken, to get the amplitude) StimStrength = np.sqrt(Pxx).dot(average_freqs) return StimStrength
Sound Transduction by Pinna and Outer Ear
The outer ear is divided into two parts: the visible part on the side of the head (the pinna), and the external auditory meatus (outer ear canal) leading to the eardrum, as shown in the figure below. With such a structure, the outer ear contributes the ‘spectral cues’ for people’s sound localization abilities, making people not only have the ability to detect and identify a sound, but also have the ability to localize a sound source. 
The Pinna’s cone shape enables it to gather sound waves and funnel them into the out ear canal. On top of that, its various folds make the pinna a resonant cavity which amplifies certain frequencies. Furthermore, the interference effects resulting from the sound reflection caused by the pinna are directionally dependent and will attenuate other frequencies. Therefore, the pinna could be simulated as a filter function applied to the incoming sound, modulating its amplitude and phase spectra.
The resonance of the pinna cavity can be approximated well by 6 normal modes . Among these normal modes, the first mode, which mainly depends on the concha depth (i.e. the depth of the bowl-shaped part of the pinna nearest the ear canal), is the dominant one.
The cancellation of certain frequencies caused by the pinna reflection is called “pinna notch”.  As shown in the right figure , sound transmitted by the pinna goes through two paths, a direct path and a longer reflected path. The different paths have different length, and thereby produce phase differences. When the frequency of incoming sound signal reaches certain criterion, which is that the path difference is half of the sound wavelength, the interference of sounds via direct and reflected paths will be destructive. This phenomenon is called “pinna notch”. Normally the notch frequency could happen in the range from 6k Hz to 16k Hz depending on the pinna shape. It is also seen that the frequency response of pinna is directionally dependent. This makes the pinna contribute to the spatial cues for sound localization.
Ear Canal Function
The outer ear canal is approximately 25 mm long and 8 mm in diameter, with a tortuous path from the entrance of the canal to the eardrum. The outer ear canal can be modeled as a cylinder closed at one end which leads to a resonant frequency around 3k Hz. This way the outer ear canal amplifies sounds in a frequency range important for human speech. 
Simulation of Outer Ear
Based on the main functions of the outer ear, it is easy to simulate the sound transduction by the pinna and outer ear canal with a filter, or a filter bank, if we know the characteristics of the filter.
Many researchers are working on the simulation of human auditory system, which includes the simulation of the outer ear. In the next chapter, a Pinna-Related Transfer Function model is first introduced, followed by two MATLAB toolboxes developed by Finnish and British research groups, respectively.
Model of Pinna-Related Transfer Function by Spagnol
This part is entirely from the paper published by S.Spagnol, M.Geronazzo, and F.Avanzini.  In order to model the functions of the pinna, Spagnol developed a reconstruction model of the Pinna-Related Transfer Function (PRTF), which is a frequency response characterizing how sound is transduced by the pinna. This model is composed by two distinct filter blocks, accounting for resonance function and reflection function of the pinna respectively, as shown in the figure below.
and is the sampling frequency, the central frequency, and the notch depth.
For the reflection part, three second-order notch filters of the form  are designed with the parameters including center frequency , notch depth , and bandwidth .
where is the same as previously defined for the resonance function, and
each accounting for a different spectral notch.
By cascading the three in-series placed notch filters after the parallel two peak filters, an eighth-order filter is designed to model the PRTF.
By comparing the synthetic PRTF with the original one, as shown in the figures below, Spagnol concluded that the synthesis model for PRTF was overall effective. This model may have missing notches due to the limitation of cutoff frequency. Approximation errors may also be brought in due to the possible presence of non-modeled interfering resonances.
HUTear MATLAB Toolbox
HUTear is a MATLAB Toolbox for auditory modeling developed by Lab of Acoustics and Audio Signal Processing at Helsinki University of Technology . This open source toolbox could be downloaded from here. The structure of the toolbox is shown in the right figure.
In this model, there is a block for “Outer and Middle Ear” (OME) simulation. This OME model is developed on the basis of Glassberg and Moor . The OME filter is usually a linear filter. Auditory filter is generated with taking the "Equal Loudness Curves at 60 dB"(ELC)/"Minimum Audible Field"(MAF)/"Minimum Audible Pressure at ear canal"(MAP) correction into account. This model accounts for the outer ear simulation. By specifying different parameters with the "OEMtool", you may compare the MAP IIR approximation and MAP data, as shown in the figure below.
MATLAB Model of the Auditory Periphery (MAP)
MAP is developed by researchers in the Hearing Research Lab at University of Essex, England . Being a computer model of physiological basis of human hearing, MAP is an open-source code package for testing, developing the model, which could be downloaded from here. Its model structure is shown in the right figure.
Within the MAP model, there is the “Outer Middle Ear (OME)” sub-model, allowing the user to test and create an OME model. In this OME model, the function of the outer ear is modeled as a resonance function. The resonances are composed by two parallel bandpass filters, respectively, representing concha resonance and outer ear canal resonance. These two filters are specified by the pass frequency range, gain and order. By adding the output of resonance filters to the original sound pressure wave, the output of the outer ear model is obtained.
To test the OME model, run the function named “testOME.m”. A figure plotting the external ear resonances and stapes peak displacement will be displayed. (as shown in the figure below)
The outer ear, including pinna and outer ear canal, can be simulated as a linear filter, or a filter bank. This reflects its resonance and reflection effect to incoming sound. It is worth noting that since the pinna shape varies from person to person, the model parameters, like the resonant frequencies, depend on the subject.
One aspect not included in the models described above is the Head-Related Transfer Function(HRTF). The HRTF describes how an ear receives a sound from a point sound source in space. It is not introduced here because it goes beyond the effect of the outer ear (pinna and outer ear canal) as it is also influenced by the effects of head and torso. There are plenty of literature and publications for HRTF for the interested reader.(wiki, tutorial 1,2, reading list for spatial audio research including HRTF)
Simulation of the Inner Ear
The shape and organisation of the basilar membrane means that different frequencies resonate particularly strongly at different points along the membrance. This leads to a tonotopic organisation of the sensitivity to frequency ranges along the membrane, which can be modeled as being an array of overlapping band-pass filters known as "auditory filters". The auditory filters are associated with points along the basilar membrane and determine the frequency selectivity of the cochlea, and therefore the listener’s discrimination between different sounds. They are non-linear, level-dependent and the bandwidth decreases from the base to apex of the cochlea as the tuning on the basilar membrane changes from high to low frequency. The bandwidth of the auditory filter is called the critical bandwidth, as first suggested by Fletcher (1940). If a signal and masker are presented simultaneously then only the masker frequencies falling within the critical bandwidth contribute to masking of the signal. The larger the critical bandwidth the lower the signal-to-noise ratio (SNR) and the more the signal is masked.
Another concept associated with the auditory filter is the "equivalent rectangular bandwidth" (ERB). The ERB shows the relationship between the auditory filter, frequency, and the critical bandwidth. An ERB passes the same amount of energy as the auditory filter it corresponds to and shows how it changes with input frequency. At low sound levels, the ERB is approximated by the following equation according to Glasberg and Moore:
where the ERB is in Hz and F is the centre frequency in kHz.
One filter type used to model the auditory filters is the "gammatone filter". It provides a simple linear filter for describing the movement of one location of the basilar membrane for a given sound input, which is therefore easy to implement. Linear filters are popular for modeling different aspects of the auditory system. In general, they are IIR-filters (infinite impulse response) incorporating feedforward and feedback, which are defined by
where a1=1. In other words, the coefficients ai and bj uniquely determine this type of filter. The feedback-character of these filters can be made more obvious by re-shuffling the equation
(In contrast, FIR-filters, or finite impulse response filters, only involve feedforward: for them for i>1.)
Linear filters cannot account for nonlinear aspects of the auditory system. They are nevertheless used in a variety of models of the auditory system. The gammatone impulse response is given by
where is the frequency, is the phase of the carrier, is the amplitude, is the filter's order, is the filter's bandwidth, and is time.
This is a sinusoid with an amplitude envelope which is a scaled gamma distribution function.
Variations and improvements of the gammatone model of auditory filtering include the gammachirp filter, the all-pole and one-zero gammatone filters, the two-sided gammatone filter, and filter cascade models, and various level-dependent and dynamically nonlinear versions of these.
For computer simulations, efficient implementations of gammatone models are availabel for Matlab and for Python .
When working with gammatone filters, we can elegantly exploit Parseval's Theorem to determine the energy in a given frequency band:
- NeurOreille and authors (2010). "Journey into the world of hearing". http://www.cochlea.org/en/spe.
- Lisker L., Abramson. "The voicing dimsension: Some experiments in comparative phonetics".(1970)
- Selfridge, O.C "Pandomonium: a paradigm for learning". 1959
- Uttley, AM. 1966, "The transmission of information and effects of local feedback in theoretical and neural networks." (1966).
- Liberman, A.M., Mattingly, IG, Turvey, MT. "Language codes and memory codes" (1967)
- Stevens, KN, Halle, M. "Remarks on analysis by synthesis and distinctive features"
- Hick, G. The role of mirror neurons in speech and language processing. (2010)
- McClelland, JL. The TRACE Model of Speech Perception (1986)
- T. Haslwanter (2012). "Hilbert Transformation [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/CI_hilbert.py.
- Crane BT, Gottschalk B, Kraut M, Aygun N, Niparko JK (2010) Magnetic resonance imaging at 1.5 T after cochlear implantation. Otol Neurotol 31:1215-1220
- T. Haslwanter (2012). "Short Time Fourier Transform [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/stft.py.
- Semple, M.N. (1998), "Auditory perception: Sounds in a virtual world", Nature (Nature Publishing Group) 396 (6713): 721-724, doi:10.1038/25447
- Shaw, E.A.G. (1997), "Acoustical features of the human ear", Binaural and spatial hearing in real and virtual environments (Mahwah, NJ: Lawrence Erlbaum) 25: 47
- Federico Avanzini (2007-2008), Algorithms for sound and music computing, Course Material of Informatica Musicale (http://www.dei.unipd.it/~musica/IM06/Dispense06/4_soundinspace.pdf), pp. 432
- Spagnol, S. and Geronazzo, M. and Avanzini, F. (2010), "Structural modeling of pinna-related transfer functions", In Proc. Int. Conf. on Sound and Music Computing (SMC 2010) (barcelona): 422-428
- S. J. Orfanidis, ed., Introduction To Signal Processing. Prentice Hall, 1996.
- U. Zölzer, ed., Digital Audio Effects. New York, NY, USA: J.Wiley & Sons, 2002.
- Glasberg, B.R. and Moore, B.C.J. (1990), "Derivation of auditory filter shapes from notched-noise data", Hearing research (Elsevier) 47 (1-2): 103-138
- Munkong, R. (2008), IEEE Signal Processing Magazine 25 (3): 98--117, doi:10.1109/MSP.2008.918418, Bibcode: 2008ISPM...25...98M
- Moore, B. C. J. (1998). Cochlear hearing loss. London: Whurr Publishers Ltd.. ISBN 0585122563.
- Moore, B. C. J. (1986), "Parallels between frequency selectivity measured psychophysically and in cochlear mechanics", Scand. Audio Suppl. (25): 129–52
- R. F. Lyon, A. G. Katsiamis, E. M. Drakakis (2010). "History and Future of Auditory Filter Models". Proc. ISCAS. IEEE. http://research.google.com/pubs/archive/36895.pdf.
- T. Haslwanter (2011). "Gammatone Toolbox [Python"]. private communications. http://work.thaslwanter.at/CSS/Code/GammaTones.py.
The main function of the balance system, or vestibular system, is to sense head movements, especially involuntary ones, and counter them with reflexive eye movements and postural adjustments that keep the visual world stable and keep us from falling. An excellent, more extensive article on the vestibular system is available on Scholorpedia . An extensive review of our current knowledge about the vestibular system can be found in "The Vestibular System: a Sixth Sense" by J Goldberg et al .
Anatomy of the Vestibular System
Together with the cochlea, the vestibular system is carried by a system of tubes called the membranous labyrinth. These tubes are lodged within the cavities of the bony labyrinth located in the inner ear. A fluid called perilymph fills the space between the bone and the membranous labyrinth, while another one called endolymph fills the inside of the tubes spanned by the membranous labyrinth. These fluids have a unique ionic composition suited to their function in regulating the electrochemical potential of hair cells, which are as we will later see the transducers of the vestibular system. The electric potential of endolymph is of about 80 mV more positive than perilymph.
Since our movements consist of a combination of linear translations and rotations, the vestibular system is composed of two main parts: The otolith organs, which sense linear accelerations and thereby also give us information about the head’s position relative to gravity, and the semicircular canals, which sense angular accelerations.
|Human bony labyrinth (Computed tomography 3D)||Internal structure of the human labyrinth|
The otolith organs of both ears are located in two membranous sacs called the utricle and the saccule which primary sense horizontal and vertical accelerations, respectively. Each utricle has about 30'000 hair cells, and each saccule about 16'000. The otoliths are located at the central part of the labyrinth, also called the vestibule of the ear. Both utricle and saccule have a thickened portion of the membrane called the macula. A gelatinous membrane called the otolthic membrane sits atop the macula, and microscopic stones made of calcium carbonate crystal, the otoliths, are embedded on the surface of this membrane. On the opposite side, hair cells embedded in supporting cells project into this membrane.
Each ear has three semicircular canals. They are half circular, interconnected membranous tubes filled with endolymph and can sense angular accelerations in the three orthogonal planes. The radius of curvature of the human horizontal semicircular canal is 3.2 mm .
The canals on each side are approximately orthogonal to each other. The orientation of the on-directions of the canals on the right side are :
(The axes are oriented such that the positive x-,y-,and z-axis point forward, left, and up, respectively. The horizontal plane is defined by Reid's line, the line connecting the lower rim of the orbita and the center of the external auditory canal. And the directions are such that a rotation about that vector, according to the right-hand-rule, excites the corresponding canal.) The anterior and posterior semicircular canals are approximately vertical, and the horizontal semicircular canals approximately horizontal.
Each canal presents a dilatation at one end, called the ampulla. Each membranous ampulla contains a saddle-shaped ridge of tissue, the crista, which extends across it from side to side. It is covered by neuroepithelium, with hair cells and supporting cells. From this ridge rises a gelatinous structure, the cupula, which extends to the roof of the ampulla immediately above it, dividing the interior of the ampulla into two approximately equal parts.
The sensors within both the otolith organs and the semicircular canals are the hair cells. They are responsible for the transduction of a mechanical force into an electrical signal and thereby build the interface between the world of accelerations and the brain.
Hair cells have a tuft of stereocilia that project from their apical surface. The thickest and longest stereocilia is the kinocilium. Stereocilia deflection is the mechanism by which all hair cells transduce mechanical forces. Stereocilia within a bundle are linked to one another by protein strands, called tip links, which span from the side of a taller stereocilium to the tip of its shorter neighbor in the array. Under deflection of the bundle, the tip links act as gating springs to open and close mechanically sensitive ion channels. Afferent nerve excitation works basically the following way: when all cilia are deflected toward the kinocilium, the gates open and cations, including potassium ions from the potassium rich endolymph, flow in and the membrane potential of the hair cell becomes more positive (depolarization). The hair cell itself does not fire action potentials. The depolarization activates voltage-sensitive calcium channels at the basolateral aspect of the cell. Calcium ions then flow in and trigger the release of neurotransmitters, mainly glutamate, which in turn diffuse across the narrow space between the hair cell and a nerve terminal, where they then bind to receptors and thus trigger an increase of the action potentials firing rate in the nerve. On the other hand, afferent nerve inhibition is the process induced by the bending of the stereocilia away from the kinocilium (hyperpolarization) and by which the firing rate is decreased. Because the hair cells are chronically leaking calcium, the vestibular afferent nerve fires actively at rest and thereby allows the sensing of both directions (increase and decrease of firing rate). Hair cells are very sensitive and respond extremely quickly to stimuli. The quickness of hair cell response may in part be due to the fact that they must be able to release neurotransmitter reliably in response to a threshold receptor potential of only 100 µV or so.
Regular and Irregular Haircells
While afferent haircells in the auditory system are fairly homogeneous,those in the vestibular system can be broadly separated into two groups: "regular units" and "irregular units". Regular haircells have approximately constant interspike intervals, and fire constantly proportional to their displacement. In contrast, the inter-spike interval of irregular haircells is much more variable, and their discharge rate increases with increasing frequency; they can thus act as event detectors at high frequencies. Regular and irregular haircells also differ in their location, morphology and innervation.
Peripheral Signal Transduction
Transduction of Linear Acceleration
The hair cells of the otolith organs are responsible for the transduction of a mechanical force induced by linear acceleration into an electrical signal. Since this force is the product of gravity plus linear movements of the head
it is therefore sometimes referred to as gravito-inertial force. The mechanism of transduction works roughly as follows: The otoconia, calcium carbonate crystals in the top layer of the otoconia membrane, have a higher specific density than the surrounding materials. Thus a linear acceleration leads to a displacement of the otoconia layer relative to the connective tissue. The displacement is sensed by the hair cells. The bending of the hairs then polarizes the cell and induces afferent excitation or inhibition.
While each of the three semicircular canals senses only one-dimensional component of rotational acceleration, linear acceleration may produce a complex pattern of inhibition and excitation across the maculae of both the utricle and saccule. The saccule is located on the medial wall of the vestibule of the labyrinth in the spherical recess and has its macula oriented vertically. The utricle is located above the saccule in the elliptical recess of the vestibule, and its macula is oriented roughly horizontally when the head is upright. Within each macula, the kinocilia of the hair cells are oriented in all possible directions.
Therefore, under linear acceleration with the head in the upright position, the saccular macula is sensing acceleration components in the vertical plane, while the utricular macula is encoding acceleration in all directions in the horizontal plane. The otolthic membrane is soft enough that each hair cell is deflected proportional to the local force direction. If denotes the direction of maximum sensitivity or on-direction of the hair cell, and the gravito-inertial force, the stimulation by static accelerations is given by
The direction and magnitude of the total acceleration is then determined from the excitation pattern on the otolith maculae.
Transduction of Angular Acceleration
The three semicircular canals are responsible for the sensing of angular accelerations. When the head accelerates in the plane of a semicircular canal, inertia causes the endolymph in the canal to lag behind the motion of the membranous canal. Relative to the canal walls, the endolymph effectively moves in the opposite direction as the head, pushing and distorting the elastic cupula. Hair cells are arrayed beneath the cupula on the surface of the crista and have their stereocilia projecting into the cupula. They are therefore excited or inhibited depending on the direction of the acceleration.
This facilitates the interpretation of canal signals: if the orientation of a semicircular canal is described by the unit vector , the stimulation of the canal is proportional to the projection of the angular velocity onto this canal
The horizontal semicircular canal is responsible for sensing accelerations around a vertical axis, i.e. the neck. The anterior and posterior semicircular canals detect rotations of the head in the sagittal plane, as when nodding, and in the frontal plane, as when cartwheeling.
In a given cupula, all the hair cells are oriented in the same direction. The semicircular canals of both sides also work as a push-pull system. For example, because the right and the left horizontal canal cristae are “mirror opposites” of each other, they always have opposing (push-pull principle) responses to horizontal rotations of the head. Rapid rotation of the head toward the left causes depolarization of hair cells in the left horizontal canal's ampulla and increased firing of action potentials in the neurons that innervate the left horizontal canal. That same leftward rotation of the head simultaneously causes a hyperpolarization of the hair cells in the right horizontal canal's ampulla and decreases the rate of firing of action potentials in the neurons that innervate the horizontal canal of the right ear. Because of this mirror configuration, not only the right and left horizontal canals form a push-pull pair but also the right anterior canal with the left posterior canal (RALP), and the left anterior with the right posterior (LARP).
Central Vestibular Pathways
The information resulting from the vestibular system is carried to the brain, together with the auditory information from the cochlea, by the vestibulocochlear nerve, which is the eighth of twelve cranial nerves. The cell bodies of the bipolar afferent neurons that innervate the hair cells in the maculae and cristae in the vestibular labyrinth reside near the internal auditory meatus in the vestibular ganglion (also called Scarpa's ganglion, Figure Figure 10.1). The centrally projecting axons from the vestibular ganglion come together with axons projecting from the auditory neurons to form the eighth nerve, which runs through the internal auditory meatus together with the facial nerve. The primary afferent vestibular neurons project to the four vestibular nuclei that constitute the vestibular nuclear complex in the brainstem.
Vestibulo-Ocular Reflex (VOR)
An extensively studied example of function of the vestibular system is the vestibulo-ocular reflex (VOR). The function of the VOR is to stabilize the image during rotation of the head. This requires the maintenance of stable eye position during horizontal, vertical and torsional head rotations. When the head rotates with a certain speed and direction, the eyes rotate with the same speed but in the opposite direction. Since head movements are present all the time, the VOR is very important for stabilizing vision.
How does the VOR work? The vestibular system signals how fast the head is rotating and the oculomotor system uses this information to stabilize the eyes in order to keep the visual image motionless on the retina. The vestibular nerves project from the vestibular ganglion to the vestibular nuclear complex, where the vestibular nuclei integrate signals from the vestibular organs with those from the spinal cord, cerebellum, and the visual system. From these nuclei, fibers cross to the contralateral abducens nucleus. There they synapse with two additional pathways. One pathway projects directly to the lateral rectus muscle of eye via the abducens nerve. Another nerve tract projects from the abducens nucleus by the abducens interneurons to the oculomotor nuclei, which contain motor neurons that drive eye muscle activity, specifically activating the medial rectus muscles of the eye through the oculomotor nerve. This short latency connection is sometimes referred to as three-neuron-arc, and allows an eye movement within less than 10 ms after the onset of the head movement.
For example, when the head rotates rightward, the following occurs. The right horizontal canal hair cells depolarize and the left hyperpolarize. The right vestibular afferent activity therefore increases while the left decreases. The vestibulocochlear nerve then carries this information to the brainstem and the right vestibular nuclei activity increases while the left decreases. This makes in turn neurons of the left abducens nucleus and the right oculomotor nucleus fire at higher rate. Those in the left oculomotor nucleus and the right abducens nucleus fire at a lower rate. This results in the fact than the left lateral rectus extraocular muscle and the right medial rectus contract while the left medial rectus and the right lateral rectus relax. Thus, both eyes rotate leftward.
The gain of the VOR is defined as the change in the eye angle divided by the change in the head angle during the head turn
If the gain of the VOR is wrong, that is, different than one, then head movements result in image motion on the retina, resulting in blurred vision. Under such conditions, motor learning adjusts the gain of the VOR to produce more accurate eye motion. Thereby the cerebellum plays an important role in motor learning.
The Cerebellum and the Vestibular System
It is known that postural control can be adapted to suit specific behavior. Patient experiments suggest that the cerebellum plays a key role in this form of motor learning. In particular, the role of the cerebellum has been extensively studied in the case of adaptation of vestibulo-ocular control. Indeed, it has been shown that the gain of the vestibulo-ocular reflex adapts to reach the value of one even if damage occur in a part of the VOR pathway or if it is voluntary modified through the use of magnifying lenses. Basically, there are two different hypotheses about how the cerebellum plays a necessary role in this adaptation. The first from (Ito 1972;Ito 1982) claims that the cerebellum itself is the site of learning, while the second from Miles and Lisberger (Miles and Lisberger 1981) claims that the vestibular nuclei are the site of adaptive learning while the cerebellum constructs the signal that drives this adaptation. Note that in addition to direct excitatory input to the vestibular nuclei, the sensory neurons of the vestibular labyrinth also provide input to the Purkinje cells in the flocculo-nodular lobes of the cerebellum via a pathway of mossy and parallel fibers. In turn, the Purkinje cells project an inhibitory influence back onto the vestibular nuclei. Ito argued that the gain of the VOR can be adaptively modulated by altering the relative strength of the direct excitatory and indirect inhibitory pathways. Ito also argued that a message of retinal image slip going through the inferior olivary nucleus carried by the climbing fiber plays the role of an error signal and thereby is the modulating influence of the Purkinje cells. On the other hand, Miles and Lisberger argued that the brainstem neurons targeted by the Purkinje cells are the site of adaptive learning and that the cerebellum constructs the error signal that drives this adaptation.
Computer Simulation of the Vestibular System
Model without Cupula
Let us consider the mechanical description of the semi-circular canals (SCC). We will make very strong and reductive assumptions in the following description. The goal here is merely to understand the very basic mechanical principles underlying the semicircular canals.
The first strong simplification we make is that a semicircular canal can be modeled as a circular tube of “outer” radius R and “inner” radius r. (For proper hydro mechanical derivations see (Damiano and Rabbitt 1996) and Obrist (2005)). This tube is filled with endolymph.
The orientation of the semicircular canal can be described, in a given coordinate system, by a vector that is perpendicular to the plane of the canal. We will also use the following notations:
- Rotation angle of tube [rad]
- Angular velocity of the tube [rad/s]
- Angular acceleration of the tube [rad/s^2]
- Rotation angle of the endolymph inside the tube [rad], and similar notation for the time derivatives
- movement between the tube and the endolymph [rad].
Note that all these variables are scalar quantities. We use the fact that the angular velocity of the tube can be viewed as the projection of the actual angular velocity vector of the head onto the plane of the semicircular canal described by to go from the 3D environment of the head to our scalar description. That is,
where the standard scalar product is meant with the dot.
To characterize the endolymph movement, consider a free floating piston, with the same density as the endolymph. Two forces are acting on the system:
- The inertial moment , where I characterizes the inertia of the endolymph.
- The viscous moment , caused by the friction of the endolymph on the walls of the tube.
This gives the equation of motion
Substituting and integrating gives
Let us now consider the example of a velocity step of constant amplitude . In this case, we obtain a displacement
and for , we obtain the constant displacement
Now, let us derive the time constant . Fora thin tube, , the inertia is approximately given by
From the Poiseuille-Hagen Equation, the force F from a laminar flow with velocity v in a thin tube is
where is the volume flow per second, the viscosity and the length of the tube.
With the torque and the relative angular velocity , substitution provides
Finally, this gives the time constant
For the human balance system, replacing the variables with experimentally obtained parameters yields a time constant of about 0.01 s. This is brief enough that in equation (10.5) the can be replaced by " = ". This gives a system gain of
Model with Cupula
Our discussion until this point has not included the role of the cupula in the SCC: The cupula acts as an elastic membrane that gets displaced by angular accelerations. Through its elasticity the cupula returns the system to its resting position. The elasticity of the cupula adds an additional elastic term to the equation of movement. If it is taken into account, this equation becomes
An elegant way to solve such differential equations is the Laplace-Transformation. The Laplace transform turns differential equations into algebraic equations: if the Laplace transform of a signal x(t) is denoted by X(s), the Laplace transform of the time derivative is
The term x(0) details the starting condition, and can often be set to zero by an appropriate choice of the reference position. Thus, the Laplace transform is
where "~" indicates the Laplace transformed variable. With from above, and defined by
we get the
For humans, typical values for are about 5 sec.
To find the poles of this transfer function, we have to determine for which values of s the denominator equals 0:
Since , and since
Typically we are interested in the cupula displacement as a function of head velocity :
For typical head movements (0.2 Hz < f < 20Hz), the system gain is approximately constant. In other words, for typical head movements the cupula displacement is proportional to the angular head velocity!
For Linear, Time-Invariant systems (LTI systems), the input and output have a simple relationship in the frequency domain :
where the transfer function G(s) can be expressed by the algebraic function
In other words, specifying the coefficients of the numerator (n) and denominator (d) uniquely characterizes the transfer function. This notation is used by some computational tools to simulate the response of such a system to a given input.
Different tools can be used to simulate such a system. For example, the response of a low-pass filter with a time-constant of 7 sec to an input step at 1 sec has the following transfer function
and can be simulated as follows:
If you work on the command line, you can use the Control System Toolbox of MATLAB or the module signal of the Python package SciPy:
MATLAB Control System Toolbox:
% Define the transfer function num = ; tau = 7; den = [tau, 1]; mySystem = tf(num,den) % Generate an input step t = 0:0.1:30; inSignal = zeros(size(t)); inSignal(t>=1) = 1; % Simulate and show the output [outSignal, tSim] = lsim(mySystem, inSignal, t); plot(t, inSignal, tSim, outSignal);
Python - SciPy:
# Import required packages import numpy as np import scipy.signal as ss import matplotlib.pylab as mp # Define transfer function num =  tau = 7 den = [tau, 1] mySystem = ss.lti(num, den) # Generate inSignal t = np.arange(0,30,0.1) inSignal = np.zeros(t.size) inSignal[t>=1] = 1 # Simulate and plot outSignal tout, outSignal, xout = ss.lsim(mySystem, inSignal, t) mp.plot(t, inSignal, tout, outSignal) mp.show()
Consider now the mechanics of the otolith organs. Since they are made up by complex, visco-elastic materials with a curved shape, their mechanics cannot be described with analytical tools. However, their movement can be simulated numerically with the finite element technique. Thereby the volume under consideration is divided into many small volume elements, and for each element the physical equations are approximated by analytical functions.
Here we will only show the physical equations for the visco-elastic otolith materials. The movement of each elastic material has to obey Cauchy’s equations of motion:
where is the effective density of the material, the displacements along the i-axis, the i-component of the volume force, and the components of the Cauchy’s strain tensor. are the coordinates.
For linear elastic, isotropic material, Cauchy’s strain tensor is given by
where and are the Lamé constants; is identical with the shear modulus. , and is the stress tensor
This leads to Navier’s Equations of motion
This equation holds for purely elastic, isotropic materials, and can be solved with the finite element technique. A typical procedure to find the mechanical parameters that appear in this equation is the following: when a cylindrical sample of the material is put under strain, the Young coefficient E characterizes the change in length, and the Poisson’s ratio the simultaneous decrease in diameter. The Lamé constants and are related to E and by:
Central Vestibular Processing
Central processing of vestibular information significantly affects the perceived orientation and movement in space. The corresponding information processing in the brainstem can often be modeled efficiently with control-system tools. As a specific example, we show how to model the effect of velocity storage.
The concept of velocity storage is based on the following experimental finding: when we abruptly stop from a sustained rotation about an earth-vertical axis, the cupula is deflected by the deceleration, but returns to its resting state with a time-constant of about 5 sec. However, the perceived rotation continues much longer, and decreases with a much longer time constant, typically somewhere between 15 and 20 sec.
In the attached figure, the response of the canals to an angular velocity stimulus ω is modeled by the transfer function C, here a simple high-pass filter with a time constant of 5 sec. (The canal response is determined by the deflection of the cupula, and is approximately proportional to the neural firing rate.) To model the increase in time constant, we assume that the central vestibular system has an internal model of the transfer function of the canals, . Based on this internal model, the expected firing rate of the internal estimate of the angular velocity, , is compared to the actual firing rate. With a the gain-factor k set to 2, the output of the model nicely reproduces the increase in the time constant. The corresponding Python code can be found at .
It is worth noting that this feedback loop can be justified physiologically: we know that there are strong connections between the left and right vestibular nuclei. If those connections are severed, the time constant of the perceived rotation decreases to the peripheral time-constant of the semicircular canals.
Mathematically, negative feedback with a high gain has the interesting property that it can practically invert the transfer function in the negative feedback loop: if k>>1, and if the internal model of the canal transfer function is similar to the actual transfer function, the estimated angular velocity corresponds to the actual angular velocity.
Alcohol and the Vestibular System
As you may or may not know from personal experience, consumption of alcohol can also induce a feeling of rotation. The explanation is quite straightforward, and basically relies on two factors: i) alcohol is lighter than the endolymph; and ii) once it is in the blood, alcohol gets relatively quickly into the cupula, as the cupula has a good blood supply. In contrast, it diffuses only slowly into the endolymph, over a period of a few hours. In combination, this leads to a buoyancy of the cupola soon after you have consumed (too much) alcohol. When you lie on your side, the deflection of the left and right horizontal cupulae add up, and induce a strong feeling of rotation. The proof: just roll on the other side - and the perceived direction of rotation will flip around!
Due to the position of the cupulae, you will experience the strongest effect when you lie on your side. When you lie on your back, the deflection of the left and right cupula compensate each other, and you don't feel any horizontal rotation. This explains why hanging one leg out of the bed slows down the perceived rotation.
The overall effect is minimized in the upright head position - so try to stay up(right) as long as possible during the party!
If you have drunk way too much, the endolymph will contain a significant amount of alcohol the next morning - more so than the cupula. This explains while at that point, a small amount of alcohol (e.g. a small beer) balances the difference, and reduces the feeling of spinning.
- Kathleen Cullen and Soroush Sadeghi (2008). "Vestibular System". Scholarpedia 3(1):3013. http://www.scholarpedia.org/article/Vestibular_system.
- JM Goldberg, VJ Wilson, KE Cullen and DE Angelaki (2012). "The Vestibular System: a Sixth Sense"". Oxford University Press, USA. http://www.scholarpedia.org/article/Vestibular_system.
- Curthoys IS and Oman CM (1987). "Dimensions of the horizontal semicircular duct, ampulla and utricle in the human.". Acta Otolaryngol 103: 254–261.
- Della Santina CC, Potyagaylo V, Migliaccio A, Minor LB, Carey JB (2005). "Orientation of Human Semicircular Canals Measured by Three-Dimensional Multi-planar CT Reconstruction.". J Assoc Res Otolaryngol 6(3): 191-206.
- Thomas Haslwanter (2013). "Vestibular Processing: Simulation of the Velocity Storage [Python"]. http://work.thaslwanter.at/CSS/Code/vestibular_feedback.py.
Anatomy of the Somatosensory System
Our somatosensory system consists of sensors in the skin and sensors in our muscles, tendons, and joints. The receptors in the skin, the so called cutaneous receptors, tell us about temperature (thermoreceptors), pressure and surface texture (mechano receptors), and pain (nociceptors). The receptors in muscles and joints provide information about muscle length, muscle tension, and joint angles. (The following description is based on lecture notes from Laszlo Zaborszky, from Rutgers University.)
Sensory information from Meissner corpuscles and rapidly adapting afferents leads to adjustment of grip force when objects are lifted. These afferents respond with a brief burst of action potentials when objects move a small distance during the early stages of lifting. In response to rapidly adapting afferent activity, muscle force increases reflexively until the gripped object no longer moves. Such a rapid response to a tactile stimulus is a clear indication of the role played by somatosensory neurons in motor activity.
The slowly adapting Merkel's receptors are responsible for form and texture perception. As would be expected for receptors mediating form perception, Merkel‘s receptors are present at high density in the digits and around the mouth (50/mm2 of skin surface), at lower density in other glabrous surfaces, and at very low density in hairy skin. This innervations density shrinks progressively with the passage of time so that by the age of 50, the density in human digits is reduced to 10/mm2. Unlike rapidly adapting axons, slowly adapting fibers respond not only to the initial indentation of skin, but also to sustained indentation up to several seconds in duration.
Activation of the rapidly adapting Pacinian corpuscles gives a feeling of vibration, while the slowly adapting Ruffini corpuscles respond to the lataral movement or stretching of skin.
|Rapidly adapting||Slowly adapting|
|Surface receptor / small receptive field||Hair receptor, Meissner's corpuscle: Detect an insect or a very fine vibration. Used for recognizing texture.||Merkel's receptor: Used for spatial details, e.g. a round surface edge or "an X" in brail.|
|Deep receptor / large receptive field||Pacinian corpuscle: "A diffuse vibration" e.g. tapping with a pencil.||Ruffini's corpuscle: "A skin stretch". Used for joint position in fingers.|
Nociceptors have free nerve endings. Functionally, skin nociceptors are either high-threshold mechanoreceptors or polymodal receptors. Polymodal receptors respond not only to intense mechanical stimuli, but also to heat and to noxious chemicals. These receptors respond to minute punctures of the epithelium, with a response magnitude that depends on the degree of tissue deformation. They also respond to temperatures in the range of 40-60oC, and change their response rates as a linear function of warming (in contrast with the saturating responses displayed by non-noxious thermoreceptors at high temperatures).
Pain signals can be separated into individual components, corresponding to different types of nerve fibers used for transmitting these signals. The rapidly transmitted signal, which often has high spatial resolution, is called first pain or cutaneous pricking pain. It is well localized and easily tolerated. The much slower, highly affective component is called second pain or burning pain; it is poorly localized and poorly tolerated. The third or deep pain, arising from viscera, musculature and joints, is also poorly localized, can be chronic and is often associated with referred pain.
The thermoreceptors have free nerve endings. Interestingly, we have only two types of thermoreceptors that signal innocuous warmth and cooling respectively in our skin (however, some nociceptors are also sensitive to temperature, but capable of unamibiously signaling only noxious temperatures). The warm receptors show a maximum sensitivity at ~ 45°C, signal temperatures between 30 and 45°C, and cannot unambiguously signal temperatures higher than 45°C , and are unmyelinated. The cold receptors have their maximum sensitivity at ~ 27°C, signal temperatures above 17°C, and some consist of lightly myelinated fibers, while others are unmyelinated. Our sense of temperature comes from the comparison of the signals from the warm and cold receptors. Thermoreceptors are poor indicators of absolute temperature but are very sensitive to changes in skin temperature.
The term proprioceptive or kinesthetic sense is used to refer to the perception of joint position, joint movements, and the direction and velocity of joint movement. There are numerous mechanoreceptors in the muscles, the muscle fascia, and in the dense connective tissue of joint capsules and ligaments. There are two specialized encapsulated, low-threshold mechanoreceptors: the muscle spindle and the Golgi tendon organ. Their adequate stimulus is stretching of the tissue in which they lie. Muscle spindles, joint and skin receptors all contribute to kinesthesia. Muscle spindles appear to provide their most important contribution to kinesthesia with regard to large joints, such as the hip and knee joints, whereas joint receptors and skin receptors may provide more significant contributions with regard to finger and toe joints.
Scattered throughout virtually every striated muscle in the body are long, thin, stretch receptors called muscle spindles. They are quite simple in principle, consisting of a few small muscle fibers with a capsule surrounding the middle third of the fibers. These fibers are called intrafusal fibers, in contrast to the ordinary extrafusal fibers. The ends of the intrafusal fibers are attached to extrafusal fibers, so whenever the muscle is stretched, the intrafusal fibers are also stretched. The central region of each intrafusal fiber has few myofilaments and is non-contractile, but it does have one or more sensory endings applied to it. When the muscle is stretched, the central part of the intrafusal fiber is stretched and each sensory ending fires impulses.
Numerous specializations occur in this simple basic organization, so that in fact the muscle spindle is one of the most complex receptor organs in the body. Only three of these specializations are described here; their overall effect is to make the muscle spindle adjustable and give it a dual function, part of it being particularly sensitive to the length of the muscle in a static sense and part of it being particularly sensitive to the rate at which this length changes.
- Intrafusal muscle fibers are of two types. All are multinucleated, and the central, non-contractile region contains the nuclei. In one type of intrafusal fiber, the nuclei are lined up single file; these are called nuclear chain fiber. In the other type, the nuclear region is broader, and the nuclei are arranged several abreast; these are called nuclear bag fibers. There are typically two or three nuclear bag fibers per spindle and about twice that many chain fibers.
- There are also two types of sensory endings in the muscle spindle. The first type, called the primary ending, is formed by a single Ia (A-alpha) fiber, supplying every intrafusal fiber in a given spindle. Each branch wraps around the central region of the intrafusal fiber, frequently in a spiral fashion, so these are sometimes called annulospiral endings. The second type of ending is formed by a few smaller nerve fibers (II or A-Beta) on both sides of the primary endings. These are the secondary endings, which are sometimes referred to as flower-spray endings because of their appearance. Primary endings are selectively sensitive to the onset of muscle stretch but discharge at a slower rate while the stretch is maintained. Secondary endings are less sensitive to the onset of stretch, but their discharge rate does not decline very much while the stretch is maintained. In other words, both primary and secondary endings signal the static length of the muscle (static sensitivity) whereas only the primary ending signals the length changes (movement) and their velocity (dynamic sensitivity). The change of firing frequency of group Ia and group II fibers can then be related to static muscle length (static phase) and to stretch and shortening of the muscle (dynamic phases).
- Muscle spindles also receive a motor innervation. The large motor neurons that supply extrafusal muscle fibers are called alpha motor neurons, while the smaller ones supplying the contractile portions of intrafusal fibers are called gamma neurons. Gamma motor neurons can regulate the sensitivity of the muscle spindle so that this sensitivity can be maintained at any given muscle length.
Golgi tendon organ
The Golgi tendon organ is located at the musculotendinous junction. There is no efferent innervation of the tendon organ, therefore its sensitivity cannot be controlled from the CNS. The tendon organ, in contrast to the muscle spindle, is coupled in series with the extrafusal muscle fibers. Both passive stretch and active contraction of the muscle increase the tension of the tendon and thus activate the tendon organ receptor, but active contraction produces the greatest increase. The tendon organ, consequently, can inform the CNS about the “muscle tension”. In contrast, the activity of the muscle spindle depends on the “muscle length” and not on the tension. The muscle fibers attached to one tendon organ appear to belong to several motor units. Thus the CNS is informed not only of the overall tension produced by the muscle but also of how the workload is distributed among the different motor units.
The joint receptors are low-threshold mechanoreceptors and have been divided into four groups. They signal different characteristics of joint function (position, movements, direction and speed of movements). The free receptors or type 4 joint receptors are nociceptors.
Proprioceptive Signal Processing
Modelling muscle spindles and afferent response
The response of the muscle spindles in mammals to muscle stretch has been thoroughly studied, and various models have been proposed. However, due to the difficulty in obtaining accurate data of the afferent and fusimotor responses during muscular movement, these models have usually been quite limited. For example, several of the earliest models account only for the afferent response, ignoring the fusimotor activity.
Mileusnic et al. (2006) model
One recent model, developed by Mileusnic et al. (2006), portrays the muscle spindle as consisting of several (typically 4 to 11) nuclear chain fibres, and two different nuclear bag fibres, connected in parallel as shown here in the figure below. The muscle fibres respond to three inputs: fascicle length, dynamic fusimotor input and static fusimotor input. The fibre is mainly responsible for detecting dynamic fusimotor input, while the and chain fibres are mainly responsible for detecting static fusimotor input. All fibres respond to changes in the fascicle length, and are modelled in largely the same way but with different coefficients to account for their different physiological properties. The responses of the three types of fibres are summed to generate the primary and secondary afferent activities. The primary afferent activity is affected by the response of all three types of muscle fibres, while the secondary afferent activity only depends on the and chain fibre responses.
Hasan (1983) model
Another comprehensive model of muscle spindles was proposed by Hasan in 1983 . This representation of muscle fibres and spindles is based closely on their physical properties. The muscle spindle is represented as two separate regions connected in series: sensory and non-sensory. The firing rate of the spindle afferent depends on the state of the two regions. The lengths of the two regions can be labelled for the sensory and for the non-sensory region. The tension in the two regions is equal, since they are placed in series. The sensory zone can be assumed to act like a spring (equation (3)), while in the non-sensory region, tension is a non-linear function of (equation (2) derived by Hasan).
The total length of the muscle spindle, x(t) is the sum of the length of the two regions (equation (4)).
Using this substitution and rearranging, we can derive the following expression for the length of the sensory zone (equation (5)):
Here, parameter represents the sensitivity of the tension to to velocity in the non-sensory zone, parameter and parameter determines the zero-length tension which influences the background firing rate of the afferent. The length of the sensory zone depends not only on the current length and velocity of the spindle, but on the history of the length changes.
The firing rate, in Hasan's model depends on a combination of the sensory zone length and its first derivative (equation (6)), with an experimentally derived weighting.
Approximate values for the model parameters a, b and c were suggested by Hasan (1983), and differ for voluntary and passive movements. A summary of these values is presented in the table below. Type of ending Condition A (mm/s) B C (mm)
|Type of ending||Condition||A (mm/s)||B||C (mm)|
|Primary||Gamma - dynamic||0.1||125||-15|
|Primary||Gamma - static||100||100||-25|
In the model, these values are assumed to be static for the duration of a movement, however this is not believed to be the case.
Internal models of limb dynamics
In addition to modelling the response of muscle spindle afferents to muscle stretch, several groups have worked on modelling the signals which are sent from the brain to the spindle efferents in order for muscles to complete specific movements. The complexity here lies in the fact that the brain must be able to adapt to unexpected changes in the dynamics of planned movements, using feedback from the spindle afferents.
Studies in this area suggest that humans achieve this using internal models, which are built through an “error-feedback-learning” process, and transform planned muscle states into the motor commands required to achieve them. To generate the motor commands for a particular reaching movement, the brain performs calculations based on the expected dynamics of the planned movement. However, any unexpected changes in these dynamics while the movement is being executed (e.g. external strain placed on the muscle) will lead to errors in expected muscle length (Gottlieb 1994, Shadmehr and Muss-Ivaldi 1994). These errors are communicated to the brain through the muscle spindle afferents, which experience a different sensory state to what is expected. The brain then reacts to these error signals with short and long latency responses, which work to minimise the error, but cannot eliminate it completely due to the delay in the system.
Studies suggest that the error can be eliminated in a subsequent attempt at the movement under the same dynamics, and this is where the “error-feedback-learning” idea comes from (Thoroughman and Shadmehr 1999). The corrections which are generated by the brain form an internal model, which maps a desired action (in kinematic coordinates) to the necessary motor commands (as torques). This internal model can be represented as a weighted combination of basis elements:
Here each basis represents some characteristic of the muscle's sensory state, and the motor command is a “population code”. Population coding is a method of representing stimuli as the combined activity of many neurons (in contrast to rate coding). In order to use such a model, we need to know how the bases represent particular limb or muscle positions, and the neuronal firing rates associated with them. The bases can, in principle, represent every aspect of the state: position, velocity, acceleration and even higher derivatives. However, this high dimensionality makes it very difficult to derive relationships experimentally between each dimension of the bases and the firing rates.
Somatosensory Perception of Whiskers
The barrel Cortex is a specialized region in somatosensory cortex responsible for processing the tactile information from whiskers. As every other cortical region, the barrel cortex also preserves the columnar organization which plays a crucial role in information processing. Information from each whisker is represented in separate, discrete columns analogous to “barrels”, hence the name barrel cortex. Rodents use whiskers constantly to acquire sensory information from the environment. Given their nocturnal nature, tactile information carried by whisker forms the primary sensory signals to build a perceptual map of the environment. The whiskers on the snouts of mice and rats serve as arrays of highly sensitive detectors for acquiring tactile information as shown in Figure 1 A and B. By using their whiskers, rodents can build spatial representations of their environment, locate objects, and perform fine-grain texture discrimination. Somatosensory whisker-related processing is highly organized into stereotypical maps, which occupy a large portion of the rodent brain. During exploration and palpation of objects, the whiskers are under motor control, often executing rapid large-amplitude rhythmic sweeping movements, and this sensory system is therefore an attractive model for investigating active sensory processing and sensory-motor integration. In these animals, a large part of the neocortex is dedicated to the processing of information from the whiskers. Since rodents are nocturnal, visual information is relatively poor and they rely heavily on the tactile information from whiskers. Perhaps the most remarkable specialization of this sensory system is the primary somatosensory ‘‘barrel’’ cortex, where each whisker is represented by a discrete and well-defined structure in layer 4.
These layer 4 barrels are somatotopically arranged in an almost identical fashion to the layout of the whiskers on the snout i.e. bordering whiskers are represented in adjacent cortical areas . Sensorimotor integration of whisker related activity leads to pattern discrimination and enables rodents to have a reliable map of the environment. This is an interesting model to study because rodents use whisker to “see” and this cross modality sensory information processing could help us to improve the life of humans, who are deprived of one sensory modality. Specifically, blind people can be trained to use somatosensory information to build a spatial map of the environment .
Pathways carrying whisker information to Barrel Cortex
Whisker information processing in Barrel Cortex with specialized local microcircuit
The deflection of a whisker is thought to open mechano-gated ion channels in nerve endings of sensory neurons innervating the hair follicle (although the molecular signalling machinery remains to be identified). The resulting depolarization evokes action potential firing in the sensory neurons of the infraorbital branch of the trigeminal nerve. The transduction through mechanical deformation is similar to the hair cells in the inner ear; in this case the contact of whiskers with the objects causes the mechano-gated ion channels to open. Cation-permeable ion channels let positively charged ions into the cells and causes depolarization, eventually leading to generation of action potentials. A single sensory neuron only fires action potentials to deflection of one specific whisker. The innervation of the hair follicle shows a diversity of nerve endings, which may be specialized for detecting different types of sensory input .
The layer 4 barrel map is arranged almost identically to the layout of the whiskers on the snout of the rodent. There are several recurrent connections in layer 4 and it sends axons to layer 2/3 neurons, which integrates information from other cortical regions like primary motor cortex. These intra-cortical and inter-cortical connections enable the rodents to achieve stimulus discrimination capabilities and to extract optimal information from the incoming tactile stimulus. Also, these projections play a crucial role in integrating somatosensory information with motor output. Information from whiskers is processed in the barrel cortex with specialized local microcircuits formed to extract optimal information about the environment. These cortical microcircuits are composed of excitatory and inhibitory neurons as shown in Figure 4.
Learning whisker based object discrimination & texture differentiation
Rodents move their sensors to collect information, and these movements are guided by sensory input. When action sequences are required to achieve success in novel tasks, interactions between movement and sensation underlie motor control  and complex learned behaviours . The motor cortex has important roles in learning motor skills [6-9], but its function in learning sensorimotor associations is unknown. The neural circuits underlying sensorimotor integration are beginning to be mapped. Different motor cortex layers harbour excitatory neurons with distinct inputs and projections [10-12]. Outputs to motor centres in the brain stem and spinal cord arise from pyramidal tract-type neurons in layer 5B (L5B). Within motor cortex, excitation descends from L2/3 to L5 [13, 14]. Input from somatosensory cortex impinges preferentially onto L2/3 neurons. L2/3 neurons  therefore directly link somatosensation and control of movements. In one of the recent studies , mice were trained head fixed in a vibrissa-based object-detection task while imaging populations of neurons . Following a sound, a pole was moved to one of several target positions within reach of the whiskers (the ‘go’ stimulus) or to an out-of-reach position (the ‘no-go’ stimulus). Target and out-of-reach locations were arranged along the anterior–posterior axis; the out-of reach position was most anterior. Mice searched for the pole with one whisker row, the C row, and reported the pole as ‘present’ by licking, or ‘not present’ by withholding licking. Licking on go trials (hit) was rewarded with water, whereas licking on no-go trials (false alarm) was punished with a time-out during which the trial was stopped for 2 seconds. Trials without licking (no-go, correct rejection, go, and miss) were not rewarded or punished. All mice showed learning within the first two or three sessions. Performance reached expert levels after three to six training sessions. Learning the behavioural task was directly dependent on the motor related behaviour. Naive mice whisked occasionally in a manner unrelated to trail structure. Thus, object detection relies on a sequence of actions, linked by sensory cues. An auditory cue triggers whisking during the sampling period. Contact between whisker and object causes licking for a water reward during a response period. Silencing vM1 indicates that this task requires the motor cortex; with vM1 silenced, task-dependent whisking persisted, but was reduced in amplitude and repeatability, and task performance dropped.
Neural Correlates of Sensorimotor learning mechanism
Coding of touch in the motor cortex is consistent with direct input from vS1 to the imaged neurons. A model based on population coding of individual behavioural features also predicted motor behaviours. Accurate decoding of whisking amplitude, whisking set-point and lick rate suggests that vM1 controls these slowly varying motor parameters, as expected from previous motor cortex and neurophysiological experiments.
1 Feldmeyer D, Brecht M, Helmchen F, Petersen CCH, Poulet JFA, Staiger JF, Luhmann HJ, Schwarz C."Barrel cortex function" Progress in Neurobiology 2013, 103 : 3-27.
2 Lahav O, Mioduser D. "Multisensory virtual environment for supporting blind persons' acquisition of spatial cognitive mapping, orientation, and mobility skills." Academia.edu 2002.
3 Alloway KD. "Information processing streams in rodent barrel cortex: The differential functions of barrel and septal circuits." Cereb Cortex 2008, 18(5):979-989.
4 Scott SH. "Inconvenient truths about neural processing in primary motor cortex." The Journal of physiology 2008, 586(5):1217-1224.
5 Wolpert DM, Diedrichsen J, Flanagan JR. "Principles of sensorimotor learning." Nature reviews Neuroscience 2011, 12(12):739-751.
6 Wise SP, Moody SL, Blomstrom KJ, Mitz AR. "Changes in motor cortical activity during visuomotor adaptation." Experimental brain research Experimentelle Hirnforschung Experimentation cerebrale 1998, 121(3):285-299.
7 Rokni U, Richardson AG, Bizzi E, Seung HS. "Motor learning with unstable neural representations." Neuron 2007, 54(4):653-666.
8 Komiyama T, Sato TR, O'Connor DH, Zhang YX, Huber D, Hooks BM, Gabitto M, Svoboda K. "Learning-related fine-scale specificity imaged in motor cortex circuits of behaving mice." Nature 2010, 464(7292):1182-1186.
9 Hosp JA, Pekanovic A, Rioult-Pedotti MS, Luft AR. "Dopaminergic projections from midbrain to primary motor cortex mediate motor skill learning." The Journal of neuroscience : the official journal of the Society for Neuroscience 2011, 31(7):2481-2487.
10 Keller A. "Intrinsic synaptic organization of the motor cortex." Cereb Cortex 1993, 3(5):430-441.
11 Mao T, Kusefoglu D, Hooks BM, Huber D, Petreanu L, Svoboda K. "Long-range neuronal circuits underlying the interaction between sensory and motor cortex." Neuron 2011, 72(1):111-123.
12 Hooks BM, Hires SA, Zhang YX, Huber D, Petreanu L, Svoboda K, Shepherd GM. "Laminar analysis of excitatory local circuits in vibrissal motor and sensory cortical areas." PLoS biology 2011, 9(1):e1000572.
13 Anderson CT, Sheets PL, Kiritani T, Shepherd GM. "Sublayer-specific microcircuits of corticospinal and corticostriatal neurons in motor cortex." Nature neuroscience 2010, 13(6):739-744.
14 Kaneko T, Cho R, Li Y, Nomura S, Mizuno N. "Predominant information transfer from layer III pyramidal neurons to corticospinal neurons." The Journal of comparative neurology 2000, 423(1):52-65.
15 O'Connor DH, Clack NG, Huber D, Komiyama T, Myers EW, Svoboda K. "Vibrissa-based object localization in head-fixed mice." The Journal of neuroscience : the official journal of the Society for Neuroscience 2010, 30(5):1947-1967.
16 O'Connor DH, Peron SP, Huber D, Svoboda K. "Neural activity in barrel cortex underlying vibrissa-based object localization in mice." Neuron 2010, 67(6):1048-1061.
17 Shaner NC, Campbell RE, Steinbach PA, Giepmans BN, Palmer AE, Tsien RY. "Improved monomeric red, orange and yellow fluorescent proteins derived from Discosoma sp. red fluorescent protein." Nature biotechnology 2004, 22(12):1567-1572.
18 Tian L, Hires SA, Mao T, Huber D, Chiappe ME, Chalasani SH, Petreanu L, Akerboom J, McKinney SA, Schreiter ER. "Imaging neural activity in worms, flies and mice with improved GCaMP calcium indicators." Nature methods 2009, 6(12):875-881.
The Gustatory System or sense of taste allows us to perceive different flavors from substances like food, drinks, medicine etc. Molecules that we taste or tastants are sensed by cells in our mouth, which send information to the brain. These specialized cells are called taste cells and can sense 5 main tastes: bitter, salty, sweet, sour and umami (savory). All the variety of flavors that we know are combinations of molecules which fall into these categories.
Measuring the degree by which a substance presents one of the basic tastes is done subjectively by comparing its taste to a taste of a reference substance according to relative indexes of different substances. For the bitter taste quinine (found in tonic water) is used to rate how bitter a substance is. Saltiness can be rated by comparing to a dilute salt solution. The sourness is compared to diluted hydrochloric acid (H+Cl-). Sweetness is measured relative to sucrose. The values of these reference substances are defined as 1.
(Coffee, mate, beer, tonic water etc.)
It is considered by many as unpleasant. In general bitterness is very interesting because a large number of bitter compounds are known to be toxic so the bitter taste is considered to provide an important protective function. Plant leafs often contain toxic compounds. Herbivores have a tendency to prefer immature leaves, which have higher protein content and lower poison levels than mature leaves. It seems that even if the bitter taste is not very pleasant at first, there is a tendency to overcome this aversion because coffee and drinks containing rich amount of caffeine and are widely consumed. Sometimes bitter agents are added to substances to prevent accidental ingestion.
The salty taste is primarily produced by the presence of cations such as Li+ (lithium ions), K+ (potassium ions) and more commonly Na+ (sodium). The saltiness of substances is compared to sodium chloride, which is typically used as table salt (Na+Cl-). Potassium chloride K+Cl- is the principal ingredient used in salt substitutes and has an index of 0.6 (see bellow part 5) compared to 1 of Na+Cl-.
(Lemon, orange, wine, spoiled milk and candies containing citric acid)
Sour taste can be mildly pleasant and it is linked to salty flavor but more exacerbated. Typically sour are fruits, which are over-riped, spoiled milk, rotten meat, and other spoiled foods, which can be dangerous. It also tastes acids (H+ ions) which taken in large quantities can cause irreversible tissue damage. Sourness is rated compared to hydrochloric acid (H+Cl-), which has a sourness index of 1.
(Sucrose (table sugar), cake, ice cream etc.)
Sweetness is regarded as a pleasant sensation and is produced by the presence of mostly sugars. Sweet substances are rated relative to sucrose, which has an index of 1. Nowadays there are many artificial sweeteners in the market, these include saccharin, aspartame and sucralose but it is still not clear how these substitutes activate the receptors.
Umami (savory or tasty)
(Cheese, soy sauce etc.)
Recently, monosodium glutamate (umami) has been added as the fifth taste. This taste signals the presence of L-glutamate and it is a very important for the Eastern cuisines.
Tongue and Taste Buds
Taste cells are epithelial and are clustered in taste buds located in the tongue, soft palate, epiglottis, pharynx and the esophagus the tongue being the primary organ of the Gustatory System.
Taste buds are located in papillae along the surface of the tongue. There are three types of papillae in human: fungiform located in the anterior part containing approximately five taste buds, circumvallate papillae which are bigger and more posterior than the previous ones and the foliate papillae that are in the posterior edge of the tongue. Circumvallate and foliate papillae contain hundreds of taste buds. In each taste bud there are different types of cells: basal, dark, intermediate and light cells. Basal cells are believed to be the stem cells that give rise to the other types. It is thought that the rest of the cells correspond to different stages of differentiation where the light cells are the most mature type of cells. An alternative idea is that dark, intermediate and light cells correspond to different cellular lineages. Taste cells are short lived and are continuously regenerated. They contain a taste pore at the surface of the epithelium where they extend microvilli, the site where sensory transduction takes place. Taste cells are innervated by fibers of primary gustatory neurons. They contact sensory fibers and these connections resemble chemical synapses, they are excitable with voltage-gated channels: K+, Na+ and Ca+ channels capable of generating action potentials. Although the reaction from different tastants varies, in general tastants interact with receptors or ion channels in the membrane of a taste cells. These interactions depolarize the cell directly or via second messengers and in this way the receptor potential generates action potentials within the taste cells, which lead to Ca2+ influx through Ca2+ voltage-gated channels followed by the release of neurotransmitters at the synapses with the sensory fibers.
The idea that the tongue is most sensitive to certain tastes in different regions was a long time misconception, which has now been proved to be wrong. All sensations come from all regions of the tongue.
An average person has about 5'000 taste buds. A "supertaster" is a person whose sense of taste is significantly more sensitive than average. The increase in the response is thought to be because they have more than 20’000 taste buds, or due to an increased number of fungiform papillae.
Transduction of Taste
As mentioned before we distinguish between 5 types of basic tastes: bitter, salty, sour, sweet and umami. There is one type of taste receptor for each flavor known and each type of taste stimulus is transduced by a different mechanisms. In general bitter, sweet and umami are detected by G protein-coupled receptors and salty and sour are detected via ion channels.
Bitter compounds act through G protein coupled receptors (GPCR’s) also known as a seven-transmembrane domains, which are located in the walls of the taste cells. Taste receptors of type 2 (T2Rs) which is a group of GPCR’s is thought respond to bitter stimuli. When the bitter-tasting ligand binds to the GPCR it releases the G protein gustducin, its 3 subunits break apart and activate phosphodiesterase, which in turn converts a precursor within the cell into a secondary messenger, closing the K+ channels. This secondary messenger stimulates the release of Ca2+, contributing to depolarization followed by neurotransmitter release. It is possible that bitter substances that are permeable to the membrane are sensed by mechanisms not involving G proteins.
The amiloride-sensitive epithelial sodium channel (ENaC), a type of ion channel in the taste cell wall, allows Na+ ions to enter the cell down an electrochemical gradient, altering the membrane potential of the taste cells by depolarizing the cell. This leads to an opening of voltage-gated Ca2+ channels, followed by neurotransmitter release.
The sour taste signals the presence of acidic compounds (H+ ions) and there are three receptors: 1) The ENaC, (the same protein involved in salty taste). 2) There are also H+ gated channels; one is the K+ channel, which allows K+ outflux of the cell. H+ ions block these so the K+ stays inside the cell. 3) A third channel undergoes a configuration change when a H+ attaches to it leading to an opening of the channel and allowing an influx of Na+ down the concentration gradient into the cell, leading to the opening of a voltage gated Ca2+ channels. These three receptors work in parallel and lead to depolarization of the cell followed by neurotransmitter release.
Sweet transduction is mediated by the binding of a sweet tastant to GPCR’s located in the apical membrane of the taste cell. Saccharide activates the GPCR, which releases gustducin and this in turn activates cAMP (cyclic adenylate monophosphate). cAMP will activate the cAMP kinase that will phosphorylate the K+ channels and eventually inactivate them, leading to depolarization of the cell and followed by neurotransmitter release.
Umami receptors involve also GPCR’s, the same way as bitter and sweet receptors. Glutamate binds a type of the metabotropic glutamate receptor mGlurR4 causing a G-protein complex to activate a secondary receptor, which ultimately leads to neurotransmitter release. In particular how the intermediate steps work, is currently unknown.
In humans, the sense of taste is transmitted to the brain via three cranial nerves. The VII facial nerve carries information from the anterior 2/3 part of the tongue and soft palate. The IX nerve or glossopharyngeal nerve carries taste sensations from the posterior 1/3 part of the tongue and the X nerve or vagus nerve carries information from the back of the oral cavity and the epiglottis.
The gustatory cortex is the brain structure responsible for the perception of taste. It consists of the anterior insula on the insular lobe and the frontal operculum on the inferior frontal gyrus of the frontal lobe. Neurons in the gustatory cortex respond to the five main tastes.
Taste cells synapse with primary sensory axons of the mentioned cranial nerves. The central axons of these neurons in the respective cranial nerve ganglia project to rostral and lateral regions of the nucleus of the solitary tract in the medulla. Axons from the rostral (gustatory) part of the solitary nucleus project to the ventral posterior complex of the thalamus, where they terminate in the medial half of the ventral posterior medial nucleus. This nucleus projects to several regions of the neocortex, which include the gustatory cortex.
Gustatory cortex neurons exhibit complex responses to changes in concentration of tastant. For one tastant, the same neuron might increase its firing and for an other tastant, it may only respond to an intermediate concentration.
Taste and Other Senses
In general the Gustatory Systems does not work alone. While eating, consistency and texture are sensed by the mechanoreceptors from the somatosensory system. The sense of taste is also correlated with the olfactory system because if we lack the sense of smell it makes it difficult to distinguish the flavor.
(black peppers, chili peppers, etc.)
It is not a basic taste because this sensation does not arise from taste buds. Capsaicin is the active ingredient in spicy food and causes “hotness” or “spiciness” when eaten. It stimulates temperature fibers and also nociceptors (pain) in the tongue. In the nociceptors it stimulates the release of substance P, which causes vasodilatation and release of histamine causing hiperalgesia (increased sensitivity to pain).
In general basic tastes can be appetitive or aversive depending on the effect that the food has on us but also essential to the taste experience are the presentation of food, color, texture, smell, previous experiences, expectations, temperature and satiety.
Ageusia (complete loss of taste)
Ageusia is a partial or complete loss in the sense of taste and sometimes it can be accompanied by the loss of smell.
Dysgeusia (abnormal taste)
Is an alteration in the perception associated with the sense of taste. Tastes of food and drinks vary radically and sometimes the taste is perceived as repulsive. The causes of dysgeusia can be associated with neurologic disorders.
Probably the oldest sensory system in the nature, the olfactory system concerns the sense of smell. The olfactory system is physiologically strongly related to the gustatory system, so that the two are often examined together. Complex flavors require both taste and smell sensation to be recognized. Consequently, food may taste “different” if the sense of smell does not work properly (e.g. head cold).
Generally the two systems are classified as visceral sense because of their close association with gastrointestinal function. They are also of central importance while speaking of emotional and sexual functions.
Both taste and smell receptors are chemoreceptors that are stimulated by molecules soluted respectively in mucus or saliva. However these two senses are anatomically quite different. While smell receptors are distance receptors that do not have any connection to the thalamus, receptors pass up the brainstem to the thalamus and project to the postcentral gyrus along with those for touch and pressure sensibility for the mouth.
In this article we will first focus on the organs composing the olfactory system, then we will characterize them in order to understand their functionality and we will end explaining the transduction of the signal and the commercial application such as the eNose.
In vertebrates the main olfactory system detects odorants that are inhaled through the nose where they come to contact with the olfactory epithelium, which contains the olfactory receptors.
Olfactory sensitivity is directly proportional to the area in the nasal cavity near the septum reserved to the olfactory mucous membrane, which is the region where the olfactory receptor cells are located. The extent of this area is a specific between animals species. In dogs, for example, the sense of smell is highly developed and the area covered by this membrane is about 75 – 150 cm2; these animals are called macrosmatic animals. Differently in humans the olfactory mucous membrane cover an area about 3 – 5 cm2, thus they are known as microsmatic animals.
In humans there are about 10 million olfactory cells, each of which have 350 different receptor types composing the olfactory mucous membrane. The 350 different receptors are characteristic for only one odorant type. The bond with one odorant molecule starts a molecular chain reaction, which transforms the chemical perception into an electrical signal.
The electrical signal proceeds through the olfactory nerve’s axons to the olfactory bulbs. In this region there are between 1000 and 2000 glomerular cells which combine and interpret the potentials coming from different receptors. This way it is possible to unequivocally characterise e.g. the coffee aroma, which is composed by about 650 different odorants. Humans can distinguish between about 10.000 odors.
The signal then goes forth to the olfactory cortex where it will be recognized and compared with known odorants (i.e. olfactory memory) involving also an emotional response to the olfactory stimuli.
It is also interesting to note that the human genome has about 600 – 700 genes (~2% of the complete genome) specialized in characterizing the olfactory receptors, but only 350 are still used to build the olfactory system. This is a proof of the evolution change in the necessity of humans in using the olfaction.
Sensory Organ Components
Similar to other sensory modalities, olfactory information must be transmitted from peripheral olfactory structures, like the olfactory epithelium, to more central structures, meaning the olfactory bulb and cortex. The specific stimuli has to be integrated, detected and transmitted to the brain in order to reach sensory consciousness. However the olfactory system is different from other sensory systems in three fundamental ways as depicted in the book of Paxianos G. and Mai J.K., "The human Nervous System".
- Olfactory receptor neurons are continuously replaced by mitotic division of the basal cells of the olfactory epithelium. The motivation of this is the high vulnerability of the neurons, which are directly exposed to the environment.
- Because of phylogenetic relationship, olfactory sensory activity is transferred directly fro the olfactory bulb to the olfactory cortex, without a thalamic relay.
- Neural integration and analysis of olfactory stimuli may not involve topographic organization beyond the olfactory bulb, meaning that spatial or frequency axis are not needed to project the signal.
Olfactory Mucous Membrane
The olfactory mucous membrane contain the olfactory receptor cells and in humans it covers an area about 3 – 5 cm^2 in the roof of the nasal cavity near the septum. Because the receptors are continuously regenerated it contains both the supporting cells and progenitors cells of the olfactory receptors. Interspersed between these cells are 10 – 20 millions receptor cells.
Olfactory receptors are infect neurons with a short and thick dendrites. Their extended end is called an olfactory rod, from which cilia project to the surface of the mucus. These neurons have a length of 2 micrometers and have between 10 and 20 cilia of diameter about 0.1 micrometers.
The axons of the olfactory receptor neurons go through the cribriform plate of the ethmoid bone and enter the olfactory bulb. This passage is in absolute the most sensitive of the olfactory system; the damage of the cribriform plate (e.g. breaking the nasal septum) can imply the destruction of the axons compromising the sense of smell.
A further particularity of the mucous membrane is that with a period of a few weeks it is completely renewed.
In humans the olfactory bulb is located anteriorly with respect to the cerebral hemisphere and remain connected to it only by a long olfactory stalk. Furthermore in mammals it is separated into layers and consist of a concentric lamina structure with well-defined neuronal somata and synaptic neuropil.
After passing the cribriform plate the olfactory nerve fibers ramify in the most superficial layer (olfactory nerve layer). When these axons reach the olfactory bulb the layer gets thicker and they terminate in the primary dendrites of the mitral cells and tufted cells forming in this way the complex globular synapses called olfactory glomeruli. Both these cells send other axons to the olfactory cortex and appear to have the same functionality but in fact tufted cells are smaller and consequently have also smaller axons.
The axons from several thousand receptor neurons coverage on one or two glomeruli in a corresponding zone of the olfactory bulb; this suggest that the glomeruli are the unit structures for the olfactory discrimination.
In order to avoid threshold problems in addition to mitral and tufted cells, the olfactory bulb contains also two type of cells with inhibitory properties: periglomerular cells and granule cells. The first will connect two different glomeruli, the second, without using any axons, build a reciprocal synapses with the lateral dendrites of the mitral and tufted cells. By releasing GABA the granule cell on the one side of these synapse are able to inhibits the mitral (or tufted) cells, while on the other side of the synapses the mitral (or tufted) cells are able to excite the granule cells by releasing glutamate. Nowadays about 8.000 glomeruli and 40.000 mitral cells have been counted in young adults. Unfortunately this huge number of cells decrease progressively with the age compromising the structural integrity of the different layers.
The axons of the mitral and tufted cells pass through the granule layer, the intermediate olfactory stria and the lateral olfactory stria to the olfactory cortex. This tract forms in humans the bulk of the olfactory peduncle. As depicted in the book of Paxianos G. and Mai J.K., "The human Nervous System", the primary olfactory cortical areas can be easily described by a simple structure composed of three layers: a broad plexiform layer (first layer); a compact pyramidal cell somata layer (second layer) and a deeper layer composed by both pyramidal and nonpyramidal cells (third layer). Furthermore, in contrast to the olfactory bulb, only a little spatial encoding can be observed; “that is, small areas of the olfactory bulb virtually project the entire olfactory cortex, and small areas of the cortex receive fibers from virtually the entire olfactory bulb” .
In general the olfactory tract can be divided in five major regions of the cerebrum: Anterior olfactory nucleus, the olfactory tubercle, the piriform cortex, Anterior cortical nucleus of the amygdala and the entorhinal cortex.Olfactory information is transmitted from primary olfactory cortex to several other parts of the forebrain, including orbital cortex, amigdala, hippocampus, central striatum, hypothalamus and mediodorsal thalamus.
Interesting is also to note that in humans, the piriform cortex can be activated by sniffing, whereas the to activate the lateral and the anterior orbitofrontal gyri of the frontal lobe only the smell is required. This is possible because in general the orbitofrontal activation is grater on the right side than the left side, this directly imply an asymmetry in the corticals reception of the olfaction. A further implication of the emotional response to olfactory stimuli as olfactory memories can be assigned to the fibers projection to the amigdala of the entorhinal cortex.
A good and complete description of the substructure of the olfactory cortex can be found in the book of Paxianos G. and Mai J.K., "The human Nervous System".
|Substance||mg/L of Ari|
|Oil of peppermint||0.02|
Only substances which comes in contact with the olfactory epithelium can be excite the olfactory receptors. The right table shows some threshold for some representative substances. These values give an impression of the huge sensitivity of the olfactory receptors.
It is remarkable that humans can recognize more than 10'000 different odors but they should at least differ about the 30% before they can be distinguished. Compared to the visual system, such precision would mean a 1% change in light intensity, where as compared to hearing the direction perception may be indicated by the slight difference in the time of arrival of odoriferous molecules in the two nostrils . It is amazing how the same number of carbon atoms (normally between 3 and 20) in odors molecules can leads to different odors just by slightly change in the structural configuration.
An interesting feature of the olfactory system is how a simple sense organ that apparently lacks a high degree of complexity can mediate discrimination of more than 10'000 different odors. On the one hand this is made possible by the huge number of different odorant receptor. The gene family for the olfactory receptor is infect the largest family studied so far in mammals. On the other hand the neural net of the olfactory system’s provide with their 1800 glomeruli a large two dimensional map in the olfactory bulb that is unique to each odorant. In addition, the extracellular field potential in each glomerulus oscillates, and the granule cells appear to regulate the frequency of the oscillation. The exact function of the oscillation is unknown, but it probably also helps to focus the olfactory signal reaching the cortex .
Olfaction, as described in the research of R. Haddad et al., consists of a set of transforms from physical space of odorant molecules (olfactory physicochemical space), through a neural space of information processing (olfactory neural space), into a perceptual space of smell (olfactory perceptual space). The rules of these transforms depend on obtaining valid metrics for each of those spaces.
Olfactory perceptual space
As the perceptual space represent the “input” of the smell measurement, it’s aim is to describe the odors in the most simple possible way. Odor are infect ordered so that their reciprocal distance in space confers them similarity. This mean that odors the more two odors are near each other in this space the more are they expected to be similar. This space is thus defined by so called perceptual axes characterized by some arbitrarily chosen “unit” odors.
Olfactory neural space
As suggested by its name the neural spaces are generated from neural responses. This gives rise to an extensive database of odorant-induced activity, which can be used to formulate an olfactory space where the concept of similarity serves as a guiding principal. Using this procedure different odorant are than expected to be similar if they generate a similar neuronal response. This database can be navigated at the Glomerular Activity Response Archive .
Olfactory physicochemical space
The need of identify the molecular encryption of the biological interaction, make the physicochemical space the most complex one of the olfactory space described so far. R. Haddad suggest that one possibility is to span this space would to represent each odorant by a very large number of molecular descriptors by use either a variance metric or a distance metric. In his first description single odorants may have many physicochemical features and one expect these feature to present themselves at various probabilities within the world of molecules that have a smell. In such metric the orthogonal basis generated from the description of the odorant leads to represent each odorant by a single value. While in the second, the metric represents each odorant with a vector of 1664 values, on the basis of Euclidean distances between odorants in the 1664 physicochemical space. Whereas the first metric enabled the prediction of perceptual attributes, the second enabled the prediction of odorant-induced neuronal response patterns.
Electronic measurement of odors
Nowadays odors can be measured electronically in a huge amount of different way, some examples are: mass spectrography, gas chromatography, raman spectra and most recently electronic nose. In general they assume that different olfactory receptors have different affinities to specific molecular physicochemical properties, and that the different activation of these receptors gives rise to a spatio-temporal pattern of activity that reflects odors.
eNoses are analytic devices for mimicking the principle of biological olfaction that have as main component an array of non specific chemical sensors. Combining electronics, path recognition and modern technology, the eNoses uses gas sensors to translate the chemical signal into an electrical signal when an volatile odorant from a sample reaches the gas sensor array. Usually the pattern recognition is used to perform either the quantitative or the qualitative identification. In order to reproduce the olfactory epithelium a gas sensor array is sealed in a chamber of the eNose. A cross-sensitive chemical sensors will than act as olfactory neuron transferring the odor information from a chemical into an electric form similar to the one process which occur in the olfactory bulb where the signal is integrated and enhanced. The information is than elaborated by an artificial neuronal network, which provide coding, processing and storage. The gas sensor array transforms odor information from the sample space into a measurement space. This is a key procedure for information processing within an eNose. Gas sensors with different transduction principles and different fabrication techniques provide various ways to obtain odor information. Commercially a lot of different sensor types are available the most frequently used sensor types include metal oxide semiconductors (MOS), quartz crystal microbalances (QCM), conducting polymers (CP) and surface acoustic wave (SAW) sensors. A big influence in the choice of the sensor is made by the fast response, reversibility, repeatability and high sensitivity of the sensor. While constructing the sensor array for a eNose the sensors are selected to be cross-selective to different odors, such that their sensitivity is overlapped with the same odor, to make the most of type-limited sensors for obtaining adequate odor information. In general the amount of raw data generated from the array of sensor’s is huge, so that the information has to be transferred from a high dimensional space into a lower one. Pattern recognition are then needed to encode the signal into a so called classification space. Both are important and necessary for designing a powerful information processing algorithm and constructing an array with high quality gas sensors. Many pattern recognition methods have been introduced into eNose, including parameterized and non-parameterized multivariate statistical methods. Artificial neural network have various significant advantages: (i) Self-adaptive, (ii) capability of error tolerance and generalization suitable for treating the problems (iii) parallel processing and distributed storage.
- Schmidt, Lang (2007). "Ohysiologie des Menschen", Soringer, 30. Auflage.
- Faller A., Schünke M. (2008). "Der Körper des Menschen", Thieme, 15. Auflage.
- Paxianos G., Mai J.K. (2004). "The human Nervous System", Elsevier accademic press, 2nd Edition.
- William. "Review of Medial Physiology", Lange, 22th Edition.
- Haddad R. ed al (2008). "Measuring smells", Elsevier Ltd, 18:438-444
- Mamlouk A.M., Martinez T. (2004). "One dimensions of the olfatory perception space", Elsevier B.V.
- >Guang L ed al (2009), "Progress in bionic information processing techniques for an electronic nose based on olfactory models", Chinese Science Bulletin, 54(4)521-53Z
This list contains the names of all the authors that have contributed to this text. If you have added, modified or contributed in any way, please add your name to this list.
|Thomas Haslwanter||Upper Austria University of Applied Sciences / ETH Zurich|
|Aleksander George Slater||Imperial College London / ETH Zurich|
|Piotr Jozef Sliwa||Imperial College London / ETH Zurich|
|Qian Cheng||ETH Zurich|
|Salomon Wettstein||ETH Zurich|
|Philipp Simmler||ETH Zurich|
|Renate Gander||ETH Zurich|
|Gerick Lee||University of Zurich & ETH Zurich|
|Gabriela Michel||ETH Zurich|
|Peter O'Connor||ETH Zurich|
|Nikhil Biyani||ETH Zurich|
|Mathias Buerki||ETH Zurich|
|Jianwen Sun||ETH Zurich|
|Maurice Göldi||University of Zurich|
|Sofia Jativa||ETH Zurich|
|Salomon Diether||ETH Zurich|
|Arturo Moncada-Torres||ETH Zurich|
|Datta Singh Goolaub||ETH Zurich|
|Stephanie Marquardt||University of Zurich & ETH Zurich|
- http://www.eyedesignbook.com/ <-- Watch out, religious fanatic here.
- Biology of Spiders by Rainer F. Foelix - Vision page 82-93
- Photoreceptors and light signalling by Alfred Batschauer, Royal Society of Chemistry (Great Britain), Published by Royal Society of Chemistry, 2003, ISBN 085404311X, 9780854043118
- Structural differences of cone 'oil-droplets' in the light and dark adapted retina of Poecilia reticulata P., Yvette W. Kunz and Christina Wise
- Advances in organ biology, Volume 10, Pages 1-395 (2005)
- <http://www.search.eb.com/eb/art-53283>"optic chiasm: visual pathways." Online Art. Encyclopædia Britannica Online.
- Color atlas of physiology,Despopoulos, A. and Silbernagl, S.,2003, Thieme
- Neurotransmitter systems in the retina, Ehinger, B., Retina, 2-4, 305, 1982
- Intraoperative Neurophysiological Monitoring, 2nd Edition, Aage R. Møller, Humana Press 2006, Totowa, New Jersey, pages 55-70
- The Science and Applications of Acoustics, 2nd Edition, Daniel R. Raichel, Springer Science&Business Media 2006, New York, pages 213-220
- Physiology of the Auditory System, P. J. Abbas, 1993, in: Cummings Otolaryngology: Head and Neck Surgery, 2nd edition, Mosby Year Book, St. Louis
- Computer Simulations of Sensory Systems, Lecture Script Ver 1.3 March 2010, T. Haslwanter, Upper Austria University of Applied Sciences, Linz, Austria,
- A. Carleton, R. Accolla, S. A. Simon, Trends Neurosci 33, 326 (Jul).
- P. Dalton, N. Doolittle, H. Nagata, P. A. Breslin, Nat Neurosci 3, 431 (May, 2000).
- J. A. Gottfried, R. J. Dolan, Neuron 39, 375 (Jul 17, 2003).
- K. L. Mueller et al., Nature 434, 225 (Mar 10, 2005).
- J. B. Nitschke et al., Nat Neurosci 9, 435 (Mar, 2006).
- T. Okubo, C. Clark, B. L. Hogan, Stem Cells 27, 442 (Feb, 2009).
- D. V. Smith, S. J. St John, Curr Opin Neurobiol 9, 427 (Aug, 1999).
- D. A. Yarmolinsky, C. S. Zuker, N. J. Ryba, Cell 139, 234 (Oct 16, 2009).
- G. Q. Zhao et al., Cell 115, 255 (Oct 31, 2003).
- Kandel, E., Schwartz, J., and Jessell, T. (2000) Principles of Neural Science. 4th edition. McGraw Hill, New York.
If light passes through a prism, a colour spectrum will be formed at the other end of the prism ranging from red to violet. The wavelength of the red light is from 650nm to 700nm, and the violet light is at around 400nm to 420nm. This is the EM range detectable for the human eye.
The colour triangle is often used to illustrate the colour-mixing effect. The triangle entangles the visible spectrum, and a white dot is located in the middle of the triangle. Because of additive colour mixing property of red (700nm), green(546nm) and blue(435nm), every colour can be produced by mixing those three colours.
History of Sensory Systems
This Wikibook was started by engineers studying at ETH Zurich as part of the course Computational Simulations of Sensory Systems. The course combines physiology with an emphasis on the sensory systems, programming and signal processing. There is a plethora of information regarding these topics on the internet and in the literature, but there's a distinct lack of concise texts and books on the fusion of these 3 topics. The world needs a structured and thorough overview of biology and biological systems from an engineering point of view, which is what this book is trying to correct. We will start off with the Visual System, focusing on the biological and physiological aspects, mainly because this will be used in part to grade our performance in the course. The other part being the programming aspects have already been evaluated and graded. It is the authors' wishes that eventually information on physiology/biology, signal processing AND programming shall be added to each of the sensory systems. Also we hope that more sections will be added to extend the book in ways previously not thought of.
The original title of the Wikibook, Biological Machines, stressed the technical aspects of sensory system. However, as the wikibook evolved it became a comprehensive overview of human sensory systems, with additional emphasis on technical aspects of these systems. This focus is better represented with Sensory Systems, the new wikibook title since December 2011.
- Hasan 1983, Hasan 1983