# An Intuitive Guide to the Concept of Entropy Arising in Various Sectors of Science

*14 October 2019*. There are template/file changes awaiting review.

A Wikibookian believes this page should be split into smaller pages with a narrower subtopic.You can help by splitting this big page into smaller ones. Please make sure to follow the naming policy. Dividing books into smaller sections can provide more focus and allow each one to do one thing well, which benefits everyone. |

The concept of entropy was traditionally derived as the only function satisfying certain criteria for a consistent measure of the "amount of uncertainty" in information theory, from a classical analogy derived within the field of statistical mechanics. It can in fact be given a unified and intuitive interpretation in two equivalent ways. The first way is as the logarithm of the "effective number of states" of a system, and the second, as the "effective number of possible values" for a random variable. In the first case, reference systems with that number of equally probable states can be shown to behave in a way identical with the system under consideration, when a certain aspect of the behavior of the system under study is focused upon. In this case, this "number of possible states of an equally probable system" can be used to characterize the system under consideration because some aspects of its behavior are encapsulated in the number. The entropy function can be applied in an analogous fashion to the characterization of the second case, that of equally probable random variables with a given number of possible values.

The concepts of entropy in three particular sectors of science can be shown to be mathematically equivalent and to have the same fundamental interpretation although they are applied in different ways. These measures are the "statistical mechanical entropy" used to characterize the disorderness of a particular system, the "information entropy" used to measure the amount of information that a particular message conveys, and the "information entropy" which can be used to give the most unbiased statistical inference. Within the framework of that interpretation, the possible non-uniqueness for the definition of entropy arises in a natural way. This guide is written with the aim of providing newcomers to these fields with an intuitive picture of entropy that can be applied broadly.

# Introduction[edit]

Since its early inception in the field of thermodynamics in the early
1850s by Rudolf Clausius,^{[1]} the concept of entropy had
been introduced into a range of sectors of science and had been proved
to be a quantity of great importance. Notably among them are the
statistical mechanical entropy proposed by Boltzmann, Planck
^{[2]} and Gibbs,^{[3]} which gives the
phenomenological thermodynamical entropy an insightful microscopic
understanding, the information entropy devised by Shannon
^{[4]}
to measure the information content of a particular
message, and its later further development by Jaynes ^{[5]}
that the Shannon entropy for a probability distribution of a random
variable can be used to quantize our uncertainty about it and the
distribution which maximizes the entropy in compliance with a set of
predefined constraints gives the most unbiased estimate of the
probability distribution of the random variable. All of the proposed
concepts of entropy are often categorized as either being statistical
mechanical or information-theoretic ^{[6]}
with their
interconnection being somewhat obscure and still under debate. In this
guide, the aforementioned three concepts of entropy, covering most of
the definitions of entropy ever proposed, are all shown to be able to
be understood as the logarithm of the "effective number of states"
of the system under consideration or "effective number of possible
values" of the random variable, which is the number of possible
states or values for a system or a random variable whose probability
distribution is uniform and whose behavior is asymptotically
equivalent to the system or the random variable under investigation in
some respects. And then an intuitive and unified interpretation of the
concept of entropy can be obtained. In contrast with a previous
similar attempt toward a unified concept of entropy that had been made
by Jaynes,^{[7]} in which the objective statistical mechanical
entropy is given a subjective meaning, the present guide tries to give
the traditionally subjective information-theoretic entropy and the
maximum-entropy principle an objective interpretation and
justification, so that their connection with statistical mechanical
entropy can be made more apparent and their nature might be more
clearly seen. Because of the mathematical equivalence between the
concept of a state of a system and that of a possible value for a
random variable, these two terms would be used interchangeably in the
following to avoid cumbersome expressions.

# Statistical mechanical entropy[edit]

The famous formula carved on the grave stone of Boltzmann,

gives a tremendously insightful understanding of the thermodynamical
entropy and its relationship with the direction of spontaneous change:
all the microstates accessible by an isolated system are equally
probable (equal *a priori* probability postulate) and the more
microstate a particular macrostate is corresponding to, the more
likely the macrostate, so the system would spontaneously evolve from
macrostates with fewer number of microstates corresponding to
macrostate with a greater number. But for the more general case other
than the most fundamental micro-canonical ensemble, the probabilities
of occurrence for the microstates are not necessarily equal, so the
entropy of a system must be computed by the formula proposed by Gibbs
,^{[8]} which is

where the summation goes over all the possible microstates, which are
indexed by , and gives their respective probability. This
formula can be readily seen to be equivalent to equation
(1) for a uniform probability distribution. But when
it comes to the justification of the above formula for non-uniform
probability distributions, some authors,^{[9]}
including Gibbs himself, derived the formula by classical analogy
while some other authors ^{[10]} tend to use equation
(2) as the starting point definition of entropy so that
equation (1) is able to come as merely a special case
of it.

Here, noticed is the fact that other forms of ensemble, in which the
probability distribution might be non-uniform, are all derived by
embedment of the system under consideration into a large reservoir so
that the micro-canonical equal *a priori* probability postulate
can be invoked. Different probabilities for different states are
considered to be stemmed from their different number of recurring in
the large micro-canonical ensemble and their number of recurring is
given by the number of states of the reservoir that are compatible
with it. If the number of recurrence of the th state of the system,
i. e. its statistical weight, is denoted by , then its
probability must be equal to , where is
the partition function. substituting the above
expression for the probability of a particular state into equation
(2) gives upon rearrangement:

This brings a new intuitive interpretation of the statistical mechanical entropy for ensembles whose probability distribution is non-uniform, if the denominator is recognized as a kind of self-weighted geometrical mean value of the statistical weights of the states. Hence the quotient of the partition function with it can be interpreted as the number of states the system can be in if the partition function is made up of equal statistical weights.

Now for the detailed derivation, Gibbs's definition of entropy can be written in terms of weights and partition function as which can be decomposed into the sum of two summations, and . The latter can readily be summed into , so the entropy formula can now be written as By rearranging the coefficient of the logarithm, the former sum can be written as , which equals . So combining the two sums gives the final formula .

And it is obvious that if all the weights are simultaneously multiplied by a common factor then the entropy will not be affected, so that the multiplication by a common factor can be justifiably regarded as a kind of change of unit for counting the number of microstates, which would cause no change in its physics. Or it can be considered as substituting the reservoir with another one of identical relevant behavior (temperature in the case of canonical ensemble, temperature and chemical potential in the case of grand canonical ensemble) but with different initial total number of states before its coupling to the system, so that properties of the system such as entropy would not be altered by it.

When approaches zero, approaches the limit
of unity, therefore as the statistical weight of a particular state
approaches zero, its contribution to the self-weighted geometrical
mean also vanishes. So in the thermodynamic limit, where only some
particular set of states covers almost all of the total probability
distribution, entropy calculated by equation (2) gives
the number of states of this particular overwhelmingly likely set. In
this respect, this interpretation is quite similar to the "phase
extension" interpretation used by such authors as Landau ^{[11]}
, in which the maximum value of the statistical weight
plays the role of the self-weighted geometric mean
value. Specifically,
taking the logarithm of the self-weighted-geometric mean value of the
statistical weights in canonical ensemble gives
which can be easily rearranged to give , which
equals to . Hence the
self-weighted-geometric mean of the statistical weights equals
, which is the maximum value of the
statistical weights and also the statistical weight of the system with
average energy. This kind of equivalence shows the suitability of the
self-weighted-geometric mean and the entropy formula for the
particular form of weights in canonical ensemble. Hence the equivalence of
maximum value of the statistical weight
and its self-weighted geometric mean.

If two systems are completely uncorrelated, then the entropy of the composite system can be decoupled quite straightforwardly by But when it comes to the case of correlated subsystems, as in the case of system and reservoir in the canonical and grand-canonical ensemble, in which the non-uniformity of the number of states to which a particular state can be coupled excludes the usage of a simple multiplication, the decoupling process can still be done by averaging the number of states of the reservoir to which the states can be coupled. So the entropy computation procedure described above can be considered as factoring out the contribution of the system from the total entropy of the system and the large reservoir, with a function of being the remaining part. Thus the entropy calculated by the Gibbsian equation (2) can justifiably be interpreted as the logarithm of "effective number of states" of the system in the ensemble in that a system with that number of states and a uniform probability distribution gives equal total number of states when uncorrelatedly coupled to a reservoir with number of states equal to the previously defined self-weighted geometric mean value, which can be proved to be the statistical weight of states with average energy in the original canonical ensemble, i. e. it equals the value obtained by a micro-canonical ensemble when the thermodynamic limit is approached. So equation (2) is doing the same thing as equation (1), which is counting and taking logarithm of the number of states that a system can be in to measure its disorderedness, and their difference approaches zero as the thermodynamic limit is approached.

# Information entropy of Shannon[edit]

In his seminal paper in 1948, Shannon proposed a formula which is
formally identical to the Gibbsian statistical mechanical entropy
formula in equation (2) to measure the information
content of a particular message chosen from a set of all possible
messages.^{[12]} This particular choice of measure was
justified by the fact that it is the only possible function which is
able to fulfill a set of criteria necessary for a consistent
measure. But this way of introducing the concept suffers from the
serious deficiency that its derivation is not very intuitively clear,
so some authors such as Kardar had offered some more intuitive
derivations ^{[13]}
and the following is just a further
clarification and detailed analysis of the derivation by Kardar, so
that the interpretation of entropy as a kind of "effective number of
states" or "effective number of possible values" would be made
clear.

If the message is to be chosen from an equally-probable set of possible messages whose cardinality is , a series of such messages would have possibilities. If one of this set of possibilities is to be transmitted by a device which operates by transmitting one of possible values consecutively, then the number of such transmission unit required to make the transmission of messages possible would be at least , which equals . Then it can be seen that the number of transmission units required to transmit one message would be its information entropy , which is formally identical to the Boltzmann entropy formula, Equation (1). But when it comes to the case in which the set of all possible messages is no longer equally probable, the total number of all possibilities when such messages are to be conveyed would become when the number approaches infinity and denotes the probability of the th possible message. If those messages are to be transmitted, the number of transmission units required would be , which can be readily shown to be equal to when Stirling's approximation is used. So when the total number of messages to be transmitted approaches infinity, the number of transmission units required for each message would be , i. e. the entropy, which is defined to be a measure of the information content of a particular message. That is to say, each of the series of messages contains the amount of information that has to be transmitted by that number of transmission units.

Based on the above analysis, it can be clearly seen that entropy in the sense of Shannon can be viewed as the logarithm of the "effective number of possible values" of the random variable of messages to be transmitted because a set of equally-probable possible messages of cardinality would require an equal number of transmission units as the original message set when the number of messages to be transmitted approaches infinity, i. e. its asymptotic behavior in the sense of number of transmission units required to transmit a large number of those messages is identical to the message set under consideration, therefore it can be used to characterize the message set.

# Information entropy in the maximum entropy principle of Jaynes[edit]

The maximum entropy principle was devised by Jaynes in 1957 to allow
statistical inference based upon merely partial knowledge.
Jaynes justified its utility by a set of consistency requirements
.^{[14]} Due to its intimate connection with Shannon's theory,
it also suffers from the deficiency of non-intuition and the concept
of probability involved had to be in the subjective sense. So attempts
have been made to give the maximum entropy principle a concrete
picture and the Wallis derivation,^{[15]} which is
completely combinatorial, is a pre-eminent one. In the present guide
a new variation of the Wallis derivation is presented in a new
objective manner, so that the maximum entropy principle is rendered as
a kind of objective statistical inference in which some quite rational
and intuitive rules are taken as the basis. Consequently the concept
of entropy can be shown to be able to be interpreted as a kind of
"effective number of possible values".

If all sides of a fair dice are marked by different labels, then the
probability of occurrence of each of the labels can be considered to
be equal because of the symmetry of the dice. But when two of the
sides are marked by the same label, then the probability for that
label would be twice the value for other labels. In the same manner,
all cases in which a random variable whose probability distribution is
non-uniform can be considered to be a consequence of the fact that
different values are yielded by different numbers of another ``true
random variable* whose probability distribution is uniform. That is*
to say, any random experiment can be viewed as an experiment yielding
one of a set of possible results with equal probability and different
sets of the true random results would result in different observable
random variable. Values of the observable random variable which
corresponds to a greater number of true random results occur with
greater probability in the random experiment. This is a purely
ontological interpretation on the origin of the non-uniformity of
probability in random experiments and is not falsifiable by any
experimental fact, so it can be used in any pertinent circumstances to
make a clear interpretation on what we are doing.

If the total number of all possible true random results is assumed to be , then each of the results must correspond to one observable value. If each true random result is considered to be equally plausible to yield each of the possible observable results on account of our oblivion about the internal details of their correspondence, then elements of the space of correspondence maps from the true random results to the observable results are all equally important. For a probability distribution for the observable random variable , the number of correspondence maps that it covers can be calculated to be , whose logarithm can be written as if Stirling's approximation formula is used, and it can easily be recognized as , where denotes the information entropy of the distribution. So the entropy can measure the extent that a particular probability distribution can cover in the space of correspondence maps from the true random results to the observable results, and maximizing entropy amounts to maximizing the extent covered. In this way, the "quanta of probability" in the original Wallis derivation is made concrete and the maximum entropy principle can be given as finding the probability distribution which covers the greatest part of the true random result &mdash observable result correspondence map space. In this manner, the maximum entropy principle can be reduced to assuming that all the possible ways of mapping true random results to observable results are equally important and the probability distribution which is compliant with the greatest number of such correspondence maps when it is subject to certain constraints is the least biased one. Accordingly it can really be considered to be a kind of direct extension of the principle of insufficient reason. But here we are not brutally assigning equal probability to every observable results but rather assigning equal importance to all possible correspondence maps from true random results to the observable results. So what partial knowledge does is that it sets constraints on the space of all possible correspondence maps by the information that it is able to provide. For the quantity of entropy itself, in this context it can be interpreted as the logarithm of an "effective number of possible values" of the random variable for the reason that the extent of the space of correspondence maps covered by this random variable and the equally-likely random variable with that number of states are equal when a common number of possible true random results are assumed. The greater the "effective number of possible values" is, a greater extent is covered and the more unbiased the probability distribution is.

Now if the above reasoning is to be given a formal form, we can have the following derivation. If a set of observable events happens with probability and it is considered to be a result of distinct true random events happening with equal probabilities, each particular map from the true random events to the observable events corresponds to a probability distribution and a given probability distribution is given by a set of such maps. For a particular probability distribution , the maps that it covers are comprised of the maps which has true random events corresponding to the th observable event. Then this set of maps have the cardinality of . If each map is assigned equal weight because of our oblivion about the internal casual relationship between the true random result and the observable result, then the probability distribution which maximizes the cardinality would be the least biased one. Because of the equivalence of maximizing the cardinality and maximizing its logarithm and the ease of working with its logarithm, the logarithm of it would be maximized in seeking the most unbiased probability assignment and would be denoted by , then we have which reads as follows when

Stirling's approximation is used,

In this way, the above

function is shown to be equal to , where denotes the Shannon information entropy. Thus to maximize the function to obtain the most unbiased probability assignment is equivalent to maximizing the Shannon entropy. Hence the maximum entropy principle is derived from some more concrete and intuitive premises.

# Summary[edit]

Frequently in many sectors of science, the total number of all possible states of a system or the total number of all possible values for a random variable is a quantity of seminal importance with a lot of aspects of the behavior of the system or the random variable under consideration determined by it, and it is a direct measure of its internal disorder or our oblivion about it. For example, when a set of values for a random variable is equally likely, the cardinality of the set of values that can give rise to some event can give a direct indication on the probability of that event. When the set of all possible values for the random variable lost its symmetry and a variation in its probability distribution is present, then in a lot of aspects, its behavior is no longer similar to equally-probable random variables with that number of states, but rather behaves in a way that is identical with some other equally-probable variable whose number of possible values is less. Thus it can be considered to be less disordered or we have more information about it. That is to say, its number of all possible values decreases effectively when some aspects of its behavior is focused upon. Then the problem of finding the number of possible values of an equally probable random variable with identical behavior and using it as the "effective number of possible values" to characterize a random variable would be of great use in studying its behavior, and this is considered in the present guide to be what exactly the concept of entropy is doing. Moreover, by this concept, its disorderedness or our oblivion about it can be given a quantitative and intuitive measure, i. e. the logarithm of its "effective number of states" or "effective number of possible values". The above three sections have shown that in the three main fields of applications for the concept of entropy, the traditional definitions can always be interpreted as the logarithm of the "effective number of states" in some intuitive way and its definition formula always arises naturally when certain aspects of its behavior is considered. But it must be noted that the process of assigning an "effective number of states" unavoidably contains some kind of arbitrariness in it and it is strongly dependent upon what kind of behavior of the random variable is focused on in finding its equivalent equally-probable variable. Consequently it can be expected that different kind of entropy definition other than the traditional definition of would also be possible if focused is some different aspects of the behavior of the random variable which cannot be grasped by the traditional equation, which have successfully captured it in the aforementioned three sectors. So recent debates over the possible non-uniqueness of the entropy definition and proposal of other forms of entropy can be understood quite readily within the framework of the intuitive interpretation given in the present guide. The traditional information-theoretic derivation of entropy, while it is of great importance and rigor, tends to exhibit the problem of lacking an intuitively concrete picture, so this guide hopes to be able to offer a new perspective of looking at the concept of entropy, and the different definitions of entropy can be regarded as assigning the "effective number of states" by focusing on different aspects of the behavior of the random variable. In this way, respective merits and utilities of different kinds of entropy arising in various sectors of science can be apparent.

# References[edit]

- ↑ Clausius, Rudolf (1867).
*The mathematical theory of heat: with its applications to the steam-engine and to the physical properties of bodies*. London: John van Voorst. - ↑ Ludwig, Boltzmann (1995).
*Lectures on Gas Theory*. Dover: John van Voorst. - ↑ Gibbs, Josiah Willard (1902).
*Elementary principles in statistical mechanics*. New York: Scribner's Sons. - ↑ Shannon, C. E. (1948). "A mathematical theory of communication".
*Bell System Technical Journal***27**: 379–523. - ↑ Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics".
*Physical Review***106**(4): 620-620. doi:10.1103/PhysRev.106.620. - ↑ Lin, Shu-Kun (1999). "Diversity and Entropy".
*ENTROPY***1**(1): 1–3. doi:10.3390/e1010001. http://www.mdpi.com/1099-4300/1/1/1/. - ↑ Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics".
*Physical Review***106**(4): 620-620. doi:10.1103/PhysRev.106.620. - ↑ Gibbs, Josiah Willard (1902).
*Elementary principles in statistical mechanics*. New York: Scribner's Sons. - ↑ Gibbs, Josiah Willard (1902).
*Elementary principles in statistical mechanics*. New York: Scribner's Sons. Pathria, R. K. (1996).*Statistical Mechanics*(2nd ed.). Oxford. - ↑ Schwabl, Franz (2006).
*Statistical Mechanics*(3rd ed.). Berlin: Springer. - ↑ Landau, L. D.; Lifshitz, E. M. (1986).
*Statistical Physics, Part 1*(3rd ed.). Oxford: Butterworth-Heinemann. - ↑ Shannon, C. E. (1948). "A mathematical theory of communication".
*Bell System Technical Journal***27**: 379–523. - ↑ Kardar, Mehran (2007).
*Statistical Physics of Particles*. Cambridge University Press. - ↑ Jaynes, E. T. (1957). "Information Theory and Statistical Mechanics".
*Physical Review***106**(4): 620-620. doi:10.1103/PhysRev.106.620. - ↑ Jaynes, E. T. (2003).
*Probability Theory: The Logic of Science*. Cambridge University Press.