I hear the word 'entropy' everywhere I go, and I'm curious about it. Is information entropy related to thermodynamic entropy at all?
And how exactly is it an "arrow of time"?
The second law of thermodynamics (the science of the relations between heat and other energy forms) tells us about entropy, by claiming that the universe is constantly moving towards a state of randomness. That is, the quantity called 'entropy' can never decrease, it can only increase. Spilled milk remains spilled -- you cannot reverse the effect somehow, and watch the milk molecules fly back into the cup, exactly the way they were before.
That is how it can be considered an "arrow of time". The universe is constantly moving towards a state of more 'disorder'. In the context of entropy, 'disorder' is basically the state any irreversible process tends to reach. Water flows from top to bottom in a waterfall, not the other way around. Cream added to coffee spreads and spontaneously mixes with the drink; it doesn't clump back together to reach the state it was originally in when added to the cup. The motion or kinetic energy of an intact car disperses into sound and fast moving broken pieces if the car slams into a brick wall.
Therefore, the entropy can be said to point in the direction of energy dispersal, or chaos. Which, in turn, gives time direction. Any spontaneous process that happens in the material world can be attributed to entropy, as an example of the second law of thermodynamics.
In information theory, entropy measures the uncertainty of a random variable. It's a way to quantify the expected value of information contained in a message. It's also known as Shannon entropy.
Just to elucidate a bit more on entropy in information theory (Radha has the thermo side pretty well covered), it is a measure of the randomness in a variable. Georgia Tech has a nice overview of information theory here: caution, pdf file. It also contains a great example of entropy, which I will delve into below. Also, Wikipedia provides good intuition for the equation, saying, "Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content" [emph. mine].
The equation for entropy is $$\mathbf{H(X)} = -\mathbf{E}[\log\mathbf{P(X)}] = -\sum_{x \in \Omega_x}\mathbf{P(X=x)}\log\left(\mathbf{P(X=x)}\right)$$ where $\log$ is the base 2 logarithm. Let's decode this expression a little bit. The $\mathbf{E}[\hspace{2pt}]$ structure is the expected value of whatever is inside. $\mathbf{P(X)}$ is the probability distribution for the variable $\mathbf{X}$. This $\mathbf{X}$ can be any value in its sample space $\Omega_x$.
So the equation on the right means we are adding up the result of the expression $\mathbf{P(X=x)}\log\left(\mathbf{P(X=x)}\right)$ for every $x$ in the sample space $\Omega_x$. By doing this, we can determine the randomness of $x$, which we know is its information content. Note that from now on, I am going to denote $\mathbf{P(X=x)}$ as $\mathbf{P(x)}$ because it makes the math cleaner.
Now that the equation is less of a surprise, onto the example! Consider a fair coin, as in it can be heads with probability 0.5 and tails with probability 0.5. If we flip the coin once, what is its entropy? Well, what events are in our sample space $\Omega_x$?
So, we have $$-\sum_{x\in\Omega_x}\mathbf{P(x)}\log_2\left(\mathbf{P(x)}\right) = -\mathbf{P(h)}\log_2\left(\mathbf{P(h)}\right) -\mathbf{P(t)}\log_2\left(\mathbf{P(t)}\right)$$ Filling in the numbers, this yields $$-0.5\log_2 0.5 - 0.5\log_2 0.5 = -0.5(-1) - 0.5(-1) = 0.5 + 0.5 = 1$$ So a coin has 1 bit of information content (notice that a bit has two options as well, 1 or 0). Entropy confirms what we already know.
Going a little faster, what if we have a fake coin that is heads on both sides (i.e. probability of heads is a 1, probability of tails is a 0)? Well, $$-1\log_2 1 - 0\log_2 0 = 0$$ A coin that is always heads has no randomness (and by extension no information content). The Georgia Tech source goes on into another example that explains how entropy determines the number of bits needed to describe a (typically larger) number of samples.
Very aptly presented! Entropy captures amount of randomness in a variable. However, I would like to add minor point (for those who aren't aware) that base of log is 2 since possible outcomes for entropy is in bits form.
Very neat answer@u826717(Adam)! This cleared a lot of things for me. Thanks a lot!
Since we're touching information theory, I want to deviate slightly and recommend an excellent book by Vlatko Vedral: Decoding Reality: The Universe as Quantum Information.
It talks about how information is the most fundamental building block of reality.
Got here from Reddit. I think I will giving it a try here.
Broadly speaking, 'entropy' in information, thermodynamics and statistics answers the same question “how difficult is it to describe a particular thing?”. And the key to remember is that the amount of information it takes to describe something is proportional to its entropy.
In terms of statistical mechanics, entropy is basically a measure of the number of ways to arrange a given system. The concept is quite abstract and it's certainly not just the degree of randomness. Btw, this site seems to have a lot of interesting content. Will try to visit more often.
all the opinions describe a lot about entropy. However, regarding your doubt about arrow of time, the direction of flow of time can be determined by the increment in entropy! Entropy is not symmetric with respect to time. It increases with time as predicted by second law of Thermodynamics.
Thus entropy differentiates and tells us the direction of flow of time.
Wikipedia defines entropy to be: In thermodynamics, entropy (usual symbol S) is a measure of the number of specific ways in which a thermodynamic system may be arranged, commonly understood as a measure of disorder. According to the second law of thermodynamics the entropy of an isolated system never decreases; such systems spontaneously evolve towards thermodynamic equilibrium, the configuration with maximum entropy. Systems which are not isolated may decrease in entropy. Since entropy is a state function, the change in the entropy of a system is the same for any process going from a given initial state to a given final state, whether the process is reversible or irreversible.
But this video by MIT helps understand the concept better:
The word entropy is sometimes confused with energy. Although they are related quantities, they are distinct.
As described in previous sections, energy measures the capability of an object or system to do work.
Entropy, on the other hand, is a measure of the "disorder" of a system. What "disorder refers to is really the number of different microscopic states a system can be in, given that the system has a particular fixed composition, volume, energy, pressure, and temperature. By "microscopic states", we mean the exact states of all the molecules making up the system.
The idea here is that just knowing the composition, volume, energy, pressure, and temperature doesn't tell you very much about the exact state of each molecule making up the system. For even a very small piece of matter, there can be trillions of different microscopic states, all of which correspond to the sample having the same composition, volume, energy, pressure, and temperature. But you're ignorant of exactly which one the system is in at any given moment - and that turns out to be important.
Why should it be important, after all, if you know the bulk properties. Isn't that all one usually needs? It turns out that no, in fact if you want to, say, exact energy from say steam and convert it to useful work, those details turn out to be crucial! (More on this below).
For those that are technically inclined, the exact definition is
Entropy = (Boltzmann's constant k) x logarithm of number of possible states = k log(N).
Source: nmsea
The notion of entropy quantifies the average information content of every character in a message that draws the letter ${i}$ from an alphabet of size ${q}$ with probability ${p_i}$. To make sense of this, we first consider a message of length ${N}$, where ${N}$ is a large number. Typically, there are ${p_iN}$ instances of the letter ${i}$. So, there are in all ${n=N!/\prod_{i}(p_iN)!}$ distinct messages that can be formed out of a typical multiset of ${N}$ characters. Now, suppose I wanted to compress all this into a message of length ${l < N}$ using the same alphabet of ${q}$ letters. The total number of messages of length ${l}$ I could send would be ${q^l}$. So, if I had to send a message of ${l}$ specifying exactly which of the ${n}$ typical messages I started off with, the minimum length of my message would have to be ${\log_qn}$ which is proportional to ${\ln n}$. At this point, we invoke Stirling's approximation for logarithms of factorials and call the approximated quantity that the minimum length of the compressed message is proportional to as the total 'entropy' ${NS}$, or the amount of information contained in the ${N}$ characters. Of course increasing the length of the original message increases the number of typical length and hence the information content in a manner that turns out to be asymptotically linear in ${N}$. It thus makes sense to talk of the average information content or entropy of a single character of the message, denoted ${S}$, by dividing ${NS}$ by ${N}$. The result is a natural logarithm version of Shannon's formula
$${S = -\sum_i p_i\ln p_i.}$$
Because in information theory, one usually works with bits drawn from an alphabet of size ${2}$, information theorists work with base ${2}$ logarithms rather than natural logarithms. But apart from this there is no essential difference between thermodynamic entropy and information entropy, which, as I've elaborated upon above, measures the lower limit to how much you can compress a typical message.
What does this then have to do with thermodynamic entropy. Well, to see the connection, we think of the original message as an ensemble and the characters as identical copies of system. The alphabet is a set of states of definite energies, which are the letters, that the system can be in and the probabilities constitute a thermodynamic distribution over these states. As we can see, the size of the alphabet varies from system to system, so it makes sense to just fix the base to be ${e}$.
To demonstrate equivalence with the entropy in the first law of thermodynamics, we need to explicitly work with the Boltzmann distribution. I've detailed out this calculation in another opinion of mine.