User:Schancee/sandbox

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Efficient coding[edit | edit source]

Introduction

Between the late 1990s and at the beginning of the 21st century Bruno Olshausen and Michael Lewicki respectively studied how natural images[1] and natural sounds[2] are encoded by the brain and tried to create a model which would replicate this process as accurately as possible. It was found that the process of both input signals could be modeled with very similar methods. The goal of efficient coding theory is to conceil a maximal amount of information about a stimulus by using a set of statistically independent characteristics[3]. Efficient coding of natural images arises to a population of localized, oriented Gabor wavelet-like filters[1],[4]. Gammatone filters are the equivalent of these for the auditory system. In order to distinguish shapes in an image the most important feature is edge detection, which is achieved with gabor filters. In sound processing, sound onsets or 'acoustic edges' can be encoded by a pool of filters similar to a gammatone filterbank[2].

Vision

In 1996, Bruno Olshausen and his team were the first to create a learning algorithm which aims to find sparse linear codes for natural images and maximizes sparseness will form a group of localized, oriented, bandpass receptive fields, analogous to those found in the primary visual cortex[1].

Assuming that an image can be depicted as a linear superposition of basis functions, .

According to what basis function is chosen the image code is different. The parameters are different for each image. The objective of efficient coding is to find a family of that spans the image space and obtains parameters which are as statistically independent as possible.

Natural scenes contain many higher-order forms of statistical structure which are non-gaussian[5]. Using principal component analysis to attain these two objectives would thereby be unsuitable. Statistical dependences among a pool of parameters can be detected as soon as the joint entropy is less than the sum of individual entropies:

It is assumed that natural images have a 'sparse structure', meaning the image can be expressed in function of a a small amount of characteristics amongst a larger set[6],[5]. The objective is to look for a code lowering entropy, where the probability distribution of each parameter is unimodal and tops out around zero. This can be articulated as an optimization problem[3].

where is positive weight coefficient. The first quantity evaluates the mean square error between the natural image and the reconstructed image.

The second quantity is attributed a higher cost if for a given picture the different parameters are distributed sparsely. This is calculated by adding up each coefficient's activity plugged in a nonlinear function .

where is a scaling constant. For , functions favoring amid activity states with equal variance those with the least amount of non-zero parameters(e.g. , , ).

By minimizing the total cost over , learning is achieved. The converges by gradient descent on averaged over multiple image variations. The algorithm enables the basis functions to be overcomplete dimensionwise and non-orthogonal[7], without decreasing the state of sparseness.

After the learning process, the algorithm was tested on artificial datasets, confirming that it is suited to detecting sparse structure in the data. Basis functions are well localized, oriented and selective to diverse spatial scales. Contriving the response of each to spots at every position established a similarity between the receptive fields and the basis functions. All basis functions form together an accomplished image code spanning the joint space of spatial position, orientation and scale in a manner similar to wavelet codes.

To conclude, Olshausen's team's results show that the two sufficient objectives for the emergence of localized, oriented, bandpass receptive fields are that information be preserved and the representation be sparse.

Audition

Fig.1: Time–frequency analysis. (a) The filters in a Fourier transform are localized in frequency but not in time. (b) Wavelet filters are localized in both time and frequency. (c–e) The statistical structure of the signals determines how the filter shapes derived from efficient coding of the different data ensembles are distributed in time–frequency space. Each ellipse is a schematic of the extent of a single filter in time–frequency space. (c) Environmental sounds. (d) Animal vocalizations. (e) Speech.

Lewicki published his findings posterior to Olshausen in 2002 and he tested the efficient coding theory inspired from the prior paper to derive efficient codes for different classes of natural sounds, which were animal vocalizations, environmental sound and human speech.

The precise method is called independent component analysis (ICA), which enables the extraction of linear decomposition of signals minimizing correlations and higher-order statistical dependencies[8]. This learning algorithm then yields a filter for each data set which can be interpreted in the form of a time-frequency windows. The filter shape is determined by the statistical structure of the ensemble[2].

When applied to the different sample sounds, the method obtained filters with time-frequency windows similar to that of a wavelet for environmental sounds where sound is localized in both time and frequency (Fig. 1c), when for animal vocalizations a tiling pattern similar to Fourier transform is obtained where sound is localized in frequency but not in time (Fig. 1d). Speech contains a mixture of both with a weighting of 2:1 of environmental to animal sounds (Fig. 1e). That is due to the fact that speech is composed of harmonic vowels and non-harmonic consonants. These discovered patterns have been observed experimentally in animals and humans previously[9].

In order to break down the core differences of these three types of sounds, Lewicki's team analyzed bandwidth, filter sharpness, and the temporal envelope. Bandwidth increases as a function of center frequency for environmental sounds, whereas it stays constant for animal vocalizations. Speech increases as well but less than environmental sounds. Due to the time/frequency trade-off the temporal envelope curves behave similarly. When comparing the sharpness with respect to center frequency of physiological measurements[10],[11] from speech data with the sharpness of the combined sound ensembles, consistency between both intricacies was confirmed.

It must be noted that several approximations were necessary to conduct this analysis. This analysis omitted to include the variations in intensity of sound. The auditory system obeys to certain intensity thresholds according to which frequencies are chosen[12]. However the physiological measurements, with which these measurements are compared, are made using isolated pure tones, which in term limits the extent of application of this model but does not discredit it. Moreover the filters' symmetry in time does not match the physiologically characterized 'gamma-tone filters'. Modifying the algorithm to be causal is possible and the filters' temporal envelopes would then become asymmetric, similarly to gamma-tone filters.

Conclusion

There is an analogy which surfaces between these two systems. The location and spatial frequency of visual stimuli is encoded by the neurons in the visual cortex. The adjustment between these two variables is similar to that between timing and frequency in auditory coding.

Another interesting aspect of this parallel is why ICA elucidates the neural response properties in the earlier stages of analysis in the auditory system, while it elucidates the response properties of cortical neurons in the visual system. It must be noted that the neuronal anatomy of both systems differs. In the visual system a bottleneck occurs where information from 100 million photoreceptors is condensed into 1 million optic nerve fibers. The information is then spread by a factor of 50 in the cortex. In the auditory system no bottleneck occurs and information from 3000 cochlea inner hair cells directly bolster onto 30000 auditory nerve fibers. ICA is then actually assigned to the point of expansion in the representation[13].

References[edit | edit source]

  1. a b c Olshausen, B. A. & Field, D. J. Emergence of simple-cell receptive-field properties by learning a sparse code for natural images. Nature 381, 607-609 (1996)
  2. a b c Lewicki, M. Efficient coding of natural sounds Nature Neurosci. 5, 356-363 (2002)
  3. a b Barlow, H.B. Possible principles underlying the transformation of sensory messages. in Sensory Communication (ed. Rosenbluth, W.A.) 217-234 (MIT Press, Cambridge, 1961).
  4. Bell, A.J. & Sejnowski, T: J: The 'independent components' of natural scenes are edges filters. Vision Res. 37, 3327-3338 (1997).
  5. a b Field, D. J. What is the goal of sensory coding? Neural Comp. 6, 559–601 (1994).
  6. Field, D. J. Relations between the satistics of natural images and the response properties of cortical cells. J. Optical Soc. Am. A 12, 2379–2394 (1987).
  7. Daugman, J.G. Computational Neuroscience (ed. Schwartz, E.) 403-423 (MIT Press, Cambridge, MA,1990).
  8. Hyvarinen, A., Karhunen, J. & Oja, E. Independent Component Analysis (Wiley, New York, 2001)
  9. Ehret, G. in Advances in Hearing Research. Proceedings of the 10th International Symposium on Hearing (eds. Manley, G. A., Klump, G. M., Koppl, C., Fastl, H. & Oekinghaus, H.) 387-400 (World Scientific, London, 1995).
  10. Evans, E. F. Cochlear nerve and cochlear nucleus. in Handbook of Sensory Physiology Vol. 5/2 (eds. Keidel, W. D. & Neff, W. D.) 1–108 (Springer, Berlin, 1975).
  11. Rhode, W. S. & Smith, P. H. Characteristics of tone-pip response patterns in relationship to spontaneous rate in cat auditory nerve fibers. Hearing Res. 18, 159–168 (1985).
  12. Evans, E. F. & Palmer, A. R. Exp. Brain Res. 40, 115–118 (1980).
  13. Olshausen, B. A. & O'Connor K. N. A new window on sound Nature Neurosci. 5, 292-295 (2002)