Coding and Information Theory

Is Coding Theory Boring?

That depends on how you look at it. Everything in the universe is an encoding of something else. You can think of color, light, sound, music, teddy bears, your friends and many other things as encodings of energy.

What is coding theory?

The key idea is “most efficient”; this comes up all the time in life. For instance, "What is the best way to get to work?" or "What is the cheapest lunch?" These are really asking what is the most efficient path from one point to another, in terms of some way of measuring things.

In this article, it should be remembered the term Information is used in an abstract way. It can mean information in an ordinary sense; but it can also mean patterns, energy, sound, or a lot of other things.

So coding theory is the study of how to encode information (or behaviour or thought, etc.) in the most efficient way. It also has to do with methods of deleting noise in the environment, so that the original message can be received clearly. But this also should be done in the most efficient way.

The flow of information in the universe

Energy is constantly being transformed from one form to another. Heat to light, light to heat, potential to kinetic and so on. One can think of the entire universe as a computer and the matter as the data. So then one has a continuous data transformer. If this seems too restrictive and breeds questions about free will and so on, then one can simply look at the inanimate universe in this way.

Coding theory in Music, Astronomy, Politics

Music

Heinrich Schenker is one of the most famous music theorists. He isn't too well known to the general public because you almost have to be a graduate student in music to understand what he is saying. It is hoped that others who have studied the theory wont frown wrathfully (like Beethoven) at the oversimplifications in this section. But hopefully more people will understand it from this point of view.

His main ideas gradually took their present form over the course of his life. His ideas were still under construction when he passed away, and it is somewhat debatable where they would have led him if he had lived to be very very old. A reasonable guess is that he would have concluded that all music exists before it is even written. As waves radiate outward when a stone is dropped into water, so music has a natural immutable form that cannot be changed; it arises from the initial disturbance of sound, just as the waves in the pond do. So the music is simply playing itself out recursively, and the job of the composer is to find the "sweep of improvisation" that plays out the initial disturbance.

His central concept is the "chord of nature"; he taught that all music arises from and falls back into this chord, just as a feedback version of fourier series and interference patterns arises from and then recedes back into a Chinese gong when it is struck. The chord itself is based on physics and is well known to scientists. But here we see an example of coding theory. That is, if the music is already there, and is simply moving back to equilibrium, then it is really looking for the most efficient path to that chord. It should not waste any energy, so to speak, and this is what gives it its beauty and natural artistry, just as rain or snow has a natural artistry. So the most profound music is an encoding of sound information that matches the Shannon entropy. (There is probability inherent in this in the sense that sound can be thought of as brownian motion, and the waves as global effects of the brownian motion).

Note that there is no error correcting in music though. If you miss it, you miss it. It would be interesting to see what Schenker would have said about communication in the presence of lots of other radios playing, horns honking, and noise in general. Perhaps he would have come up with a theory of Music in the Presence of Noise, just as Shannon did for other types of information.

Astronomy

One author asked the question "why does Copernicus seem more right than the theorists that went before him?" Both the epicycle model and the Copernican model of the solar system can be used to predict the location of the planets. He concluded that the human race feels that Copernicus is more right because his ideas are simpler - the key is simplicity. So one could say that he found a more efficient encoding of the information of the solar system, and that the human race is passionate about coding theory without even realizing it!

Politics

What makes the following political statements powerful (or at least famous)?

      Ich...bin...ein...Berliner! (cheers)
      Ask not what your country can do for you, but what you can do for your country

One can argue that what gives these statements their force is efficiency of expression; they are an interpolation of many feelings, attitudes and perceptions; they are an efficient encoding of emotional and mental information.

A modern example: cell phone technology

At the time of this writing cell phones are important. But cellphone text messaging costs money. So how do you minimize costs? In some third world nations, people are actually developing elaborate languages for data transmission that help them save money. For instance, one may encode the message "talk to you at 7 o'clock" as "tk 2 u @ 7" and so on. This is effective in the situation that Claude E. Shannon described as Communication Without Noise. That is, as long as the message is properly received, no harm is done. But Shannon also stated that because of the presence of noise, some redundancy may be important. He noted that the English language seems to be about 50% meaningless syntax, letters, phrasing and so on. He felt that this occurred by the natural function of the mind, the purposeful addition of excessive information that allows for error correction - by the listener or reader.

Animals also encode information efficiently in various ways. Chirps, squawks, growls, tweets, meows, woofs and even elephants' pounding on the ground in order to communicate with other elephants far away; these are all examples of encodings of information that attempt to be as efficient as possible.

At this point an interesting question about cellphone technology becomes apparent: Is there one most optimal encoding method for cellphone communication? In order to answer that effectively, one would have to average all messages sent (in a region for example), then translate the result into binary, then construct a stochastic process (essentially a computer that sends an average message in terms of probabilities of various patterns occurring) that is essentially equivalent to those messages. At that point the entropy could be calculated and a matching code developed. But the lingering problem is this: given a communication pattern (a stochastic process), how can you generate an efficient encoding scheme? In summary, one is looking for a black box that receives a stochastic process and produces a code.

                         average message  --->  black box --->  best encoding scheme

A Powerful Solution: Neural Networks

One can design a neural network that can essentially writes programs by itself. It is possible to train a network on a large number of randomly chosen stochastic processes, while providing an increased reward for producing a more efficient code, and a decreased reward for producing a less efficient code. In many situations, the entropy can almost certainly be approximated.

Neural networks are interpolations of data. As they learn the pattern they can be more finely tuned and so on. However this is not the fun kind of interpolation that may be taught in high school as Lagrange interpolation; although an interesting game, this method involves usually a single variable and a few data points. On the other hand, neural networks can be quite intense, in the sense that the number of variables can be very large, and the number of data points can run into the millions. Therefore, there is no way of using ordinary methods to interpolate that kind of data.

Proponents of differential geometry have proposed that neural networks should be studied carefully in order to produce a solid mathematical theory that can be used to predict their behaviour exactly in every situation. But the task is very complex, and almost no hope of achieving the goal exists at the present time. So the general study remains as more of an experimental science. In practice however, networks have been used very effectively in many situations and are very intriguing.

A River of Sound

When a composer writes a song, he or she may be given the melody first, and then may want to put "chords" to it. That is, he or she is harmonizing the melody. Sometimes the person will put a nice chord to the melody at each point, but the chords will not match in the long run. Locally, the music sounds nice, but globally it sounds rather ghastly. Yet when they alter one of the chords, to make it sound nice globally, it takes away from the quality at an unexpected chord somewhere else. So you end up with a knot that is almost impossible to untangle.

The technical term is The Harmonization of Chorales. This is a standard part of musical training in universities. One can see how it may take a lot of practice and training to become good at it. (In the previous paragraph, the concept of several melodies is oversimplified to the concept "chord" in order to make it easier to read). Can computers be used to solve problems like this?

Yes, but not ordinary computers. Neural networks have been used a lot during the past decade to harmonize chorales. They have been very successful and people are putting a lot of time and energy into the study.

So you can see that global harmony is in interpolation of the sound information, in such a way as to maximize the total beauty of the music. Its a matter of efficiency. Gradually moving from one point to another with each step instantaneously optimized. This is exactly how a river flows, maximizing efficiency at each step instantaneously. (The reader may recall the use of the word Instantaneous from Calculus).

Computer Programs as Encodings

If you were going to tell a taxi driver where to go, would you take ten minutes to do so? Why not just give him the address and go? It you are going to tell a mechanic what is wrong with your car, why spend days explaining it? Why not explain it in a single sentence?

In the same way, a program that tells a computer what to do can be very wordy or very precise. This may make the difference between solving a large-scale programming project and not solving it, and it can make or break a multimillion dollar contract renewal. The process is essentially summed up as numerical analysis.

The task is to find the most efficient program (in terms of time and/or storage); a study to which people have dedicated their entire careers. There are many professional journals that deal with nothing else. But it suggests that there is pre-existing best program in nature, apart from the programmers efforts. That is, there is an entropy associated with programs, just as there is a predictable pattern of the behaviour of comets or of other entities that occur in nature.

Therefore we are having difficulty in separating the concept of 'problem to be solved' from 'solution of problem'. Again, they seem to occur as dual objects in nature, just as the behaviours of Pluto and Charon are fully dependent on each other. So a different concept of complexity was developed, and this is known as Kolmogorov's complexity. It says that the information (or light or sound) is the same as the most efficient possible program (or sentence) that generates it. But we have to find the most efficient program; so again neural networks can be used as a means of trying to solve the problem. It is interesting that a technology modeled on the human brain (a computer that exists in nature) should be used to find a solution to problems that occur in nature.

An interesting generalization of the Kolmogorov complexity is the Kolmogorov complexity for an infinite (or perhaps finite) number of parallel processors. That is, what is the most efficient sentence that generates a pattern with several processors all working at the same time? One can see how the answer will depend on the total number of processors allowed.

Geodesics and Galaxies

What is a geodesic? It's just the shortest distance from point A to point B. But isn't that just a straight line? If you are working on a flat plane, then it is. But a flat plane is just the limit of an infinite sequence of surfaces. So then what is the shortest distance on those surfaces? That's exactly the problem, and it can be a very interesting one.

Now of course one can see how this has to do with life. In every day situations, with many variables and fluctuations of data going on all the time, the most efficient path from one situation to another situation can be thought of as a geodesic. And the set of all possible situations can be thought of as a surface. Therefore the best plan of action would be the most efficient encoding of information. The poorer the encoding, the less effective the action. The better the encoding, the more effective the action. Remember that the more realistic the variables are, the more valuable the encoding will be.

For instance, if a young man wants to meet a certain woman, he doesn't want to spend 30 lifetimes accomplishing it. Therefore his plan of action can be thought of as a geodesic on the surface of all life situations, and the better he encodes that information, the more valuable his actions will be.

Another example would be a mouse learning to solve a maze. He has to repeatedly re-encode the information in his mind so that he can improve his actions. Therefore he is computing a geodesic of the information manifold of the maze.

Everything in the inanimate universe follows geodesics. This is how the galaxies rotate and what gives them their power and beauty.