Artificial Neural Networks/Neural Network Basics

From Wikibooks, open books for an open world
Jump to: navigation, search

Artificial Neural Networks[edit]

Artificial Neural Networks, also known as “Artificial neural nets”, “neural nets”, or ANN for short, are a computational tool modeled on the interconnection of the neuron in the nervous systems of the human brain and that of other organisms. Biological Neural Nets (BNN) are the naturally occurring equivalent of the ANN. Both BNN and ANN are network systems constructed from atomic components known as “neurons”. Artificial neural networks are very different from biological networks, although many of the concepts and characteristics of biological systems are faithfully reproduced in the artificial systems. Artificial neural nets are a type of non-linear processing system that is ideally suited for a wide range of tasks, especially tasks where there is no existing algorithm for task completion. ANN can be trained to solve certain problems using a teaching method and sample data. In this way, identically constructed ANN can be used to perform different tasks depending on the training received. With proper training, ANN are capable of generalization, the ability to recognize similarities among different input patterns, especially patterns that have been corrupted by noise.

What Are Neural Nets?[edit]

The term “Neural Net” refers to both the biological and artificial variants, although typically the term is used to refer to artificial systems only. Mathematically, neural nets are nonlinear. Each layer represents a non-linear combination of non-linear functions from the previous layer. Each neuron is a multiple-input, multiple-output (MIMO) system that receives signals from the inputs, produces a resultant signal, and transmits that signal to all outputs. Practically, neurons in an ANN are arranged into layers. The first layer that interacts with the environment to receive input is known as the input layer. The final layer that interacts with the output to present the processed data is known as the output layer. Layers between the input and the output layer that do not have any interaction with the environment are known as hidden layers. Increasing the complexity of an ANN, and thus its computational capacity, requires the addition of more hidden layers, and more neurons per layer.

Biological neurons are connected in very complicated networks. Some regions of the human brain such as the cerebellum are composed of very regular patterns of neurons. Other regions of the brain, such as the cerebrum have less regular arrangements. A typical biological neural system has millions or billions of cells, each with thousands of interconnections with other neurons. Current artificial systems cannot achieve this level of complexity, and so cannot be used to reproduce the behavior of biological systems exactly.

Processing Elements[edit]

In an artificial neural network, neurons can take many forms and are typically referred to as Processing Elements (PE) to differentiate them from the biological equivalents. The PE are connected into a particular network pattern, with different patterns serving different functional purposes. Unlike biological neurons with chemical interconnections, the PE in artificial systems are electrical only, and may be either analog, digital, or a hybrid. However, to reproduce the effect of the synapse, the connections between PE are assigned multiplicative weights, which can be calibrated or “trained” to produce the proper system output.

McCulloch-Pitts Model[edit]

Processing Elements are typically defined in terms of two equations that represent the McCulloch-Pitts model of a neuron:


[McCulloch-Pitts Model]

\zeta = \sum_i w_ix_i
y = \sigma(\zeta)

Where ζ is the weighted sum of the inputs (the inner product of the input vector and the tap-weight vector), and σ(ζ) is a function of the weighted sum. If we recognize that the weight and input elements form vectors w and x, the ζ weighted sum becomes a simple dot product:

\zeta = \bold{w} \cdot \bold {x}
Mcculloch pitts.svg

This may be called either the activation function (in the case of a threshold comparison) or a transfer function. The image to the right shows this relationship diagrammatically. The dotted line in the center of the neuron represents the division between the calculation of the input sum using the weight vector, and the calculation of the output value using the activation function. In an actual artificial neuron, this division may not be made explicitly.

The inputs to the network, x, come from an input space and the system outputs are part of the output space. For some networks, the output space Y may be as simple as {0, 1}, or it may be a complex multi-dimensional space. Neural networks tend to have one input per degree of freedom in the input space, and one output per degree of freedom in the output space.

The tap weight vector is updated during training by various algorithms. One of the more popular of which is the backpropagation algorithm which we will discuss in more detail later.

Why Use Neural Nets?[edit]

Artificial neural nets have a number of properties that make them an attractive alternative to traditional problem-solving techniques. The two main alternatives to using neural nets are to develop an algorithmic solution, and to use an expert system.

Algorithmic methods arise when there is sufficient information about the data and the underlying theory. By understanding the data and the theoretical relationship between the data, we can directly calculate unknown solutions from the problem space. Ordinary von Neumann computers can be used to calculate these relationships quickly and efficiently from a numerical algorithm.

Expert systems, by contrast, are used in situations where there is insufficient data and theoretical background to create any kind of a reliable problem model. In these cases, the knowledge and rationale of human experts is codified into an expert system. Expert systems emulate the deduction processes of a human expert, by collecting information and traversing the solution space in a directed manner. Expert systems are typically able to perform very well in the absence of an accurate problem model and complete data. However, where sufficient data or an algorithmic solution is available, expert systems are a less than ideal choice.

Artificial neural nets are useful for situations where there is an abundance of data, but little underlying theory. The data, which typically arises through extensive experimentation may be non-linear, non-stationary, or chaotic, and so may not be easily modeled. Input-output spaces may be so complex that a reasonable traversal with an expert system is not a satisfactory option. Importantly, neural nets do not require any a priori assumptions about the problem space, not even information about statistical distribution. Though such assumptions are not required, it has been found that the addition of such a priori information as the statistical distribution of the input space can help to speed training. Many mathematical problem models tend to assume that data lies in a standard distribution pattern, such as Gaussian or Maxwell-Boltzmann distributions. Neural networks require no such assumption. During training, the neural network performs the necessary analytical work, which would require non-trivial effort on the part of the analyst if other methods were to be used.

Learning[edit]

Learning is a fundamental component to an intelligent system, although a precise definition of learning is hard to produce. In terms of an artificial neural network, learning typically happens during a specific training phase. Once the network has been trained, it enters a production phase where it produces results independently. Training can take on many different forms, using a combination of learning paradigms, learning rules, and learning algorithms. A system which has distinct learning and production phases is known as a static network. Networks which are able to continue learning during production use are known as dynamical systems.

A learning paradigm is supervised, unsupervised or a hybrid of the two, and reflects the method in which training data is presented to the neural network. A method that combines supervised and unsupervised training is known as a hybrid method. A learning rule is a model for the types of methods to be used to train the system, and also a goal for what types of results are to be produced. The learning algorithm is the specific mathematical method that is used to update the inter-neuronal synaptic weights during each training iteration. Under each learning rule, there are a variety of possible learning algorithms for use. Most algorithms can only be used with a single learning rule. Learning rules and learning algorithms can typically be used with either supervised or unsupervised learning paradigms, however, and each will produce a different effect.

Overtraining is a problem that arises when too many training examples are provided, and the system becomes incapable of useful generalization. This can also occur when there are too many neurons in the network and the capacity for computation exceeds the dimensionality of the input space. During training, care must be taken not to provide too many input examples and different numbers of training examples could produce very different results in the quality and robustness of the network.

Network Parameters[edit]

There are a number of different parameters that must be decided upon when designing a neural network. Among these parameters are the number of layers, the number of neurons per layer, the number of training iterations, et cetera. Some of the more important parameters in terms of training and network capacity are the number of hidden neurons, the learning rate and the momentum parameter.


Number of neurons in the hidden layer[edit]

Hidden neurons are the neurons that are neither in the input layer nor the output layer. These neurons are essentially hidden from view, and their number and organization can typically be treated as a black box to people who are interfacing with the system. Using additional layers of hidden neurons enables greater processing power and system flexibility. This additional flexibility comes at the cost of additional complexity in the training algorithm. Having too many hidden neurons is analogous to a system of equations with more equations than there are free variables: the system is over specified, and is incapable of generalization. Having too few hidden neurons, conversely, can prevent the system from properly fitting the input data, and reduces the robustness of the system.

Neural network.svg
Artificial neural network.svg

Data type: Integer Domain: [1, ∞) Typical value: 8

Meaning: Number of neurons in the hidden layer (additional layer to the input and output layers, not connected externally).

Learning Rate[edit]

Data type: Real Domain: [0, 1] Typical value: 0.3

Meaning: Learning Rate. Training parameter that controls the size of weight and bias changes during learning.

Momentum[edit]

Data type: Real Domain: [0, 1] Typical value: 0.9

Meaning: Momentum simply adds a fraction m of the previous weight update to the current one. The momentum parameter is used to prevent the system from converging to a local minimum or saddle point. A high momentum parameter can also help to increase the speed of convergence of the system. However, setting the momentum parameter too high can create a risk of overshooting the minimum, which can cause the system to become unstable. A momentum coefficient that is too low cannot reliably avoid local minima, and can also slow down the training of the system.

Training type[edit]

Data type: Integer Domain: [0, 1] Typical value: 1

Meaning: 0 = train by epoch, 1 = train by minimum error

Epoch[edit]

Data type: Integer Domain: [1, ∞) Typical value: 5000000

Meaning: Determines when training will stop once the number of iterations exceeds epochs. When training by minimum error, this represents the maximum number of iterations.

Minimum Error[edit]

Data type: Real Domain: [0, 0.5] Typical value: 0.01

Meaning: Minimum mean square error of the epoch. Square root of the sum of squared differences between the network targets and actual outputs divided by number of patterns (only for training by minimum error).