I hear the word 'entropy' everywhere I go, and I'm curious about it. Is information entropy related to thermodynamic entropy at all?
And how exactly is it an "arrow of time"?
The second law of thermodynamics (the science of the relations between heat and other energy forms) tells us about entropy, by claiming that the universe is constantly moving towards a state of randomness. That is, the quantity called 'entropy' can never decrease, it can only increase. Spilled milk remains spilled -- you cannot reverse the effect somehow, and watch the milk molecules fly back into the cup, exactly the way they were before.
That is how it can be considered an "arrow of time". The universe is constantly moving towards a state of more 'disorder'. In the context of entropy, 'disorder' is basically the state any irreversible process tends to reach. Water flows from top to bottom in a waterfall, not the other way around. Cream added to coffee spreads and spontaneously mixes with the drink; it doesn't clump back together to reach the state it was originally in when added to the cup. The motion or kinetic energy of an intact car disperses into sound and fast moving broken pieces if the car slams into a brick wall.
Therefore, the entropy can be said to point in the direction of energy dispersal, or chaos. Which, in turn, gives time direction. Any spontaneous process that happens in the material world can be attributed to entropy, as an example of the second law of thermodynamics.
In information theory, entropy measures the uncertainty of a random variable. It's a way to quantify the expected value of information contained in a message. It's also known as Shannon entropy.
Just to elucidate a bit more on entropy in information theory (Radha has the thermo side pretty well covered), it is a measure of the randomness in a variable. Georgia Tech has a nice overview of information theory here: caution, pdf file. It also contains a great example of entropy, which I will delve into below. Also, Wikipedia provides good intuition for the equation, saying, "Shannon entropy is the average unpredictability in a random variable, which is equivalent to its information content" [emph. mine].
The equation for entropy is $$\mathbf{H(X)} = -\mathbf{E}[\log\mathbf{P(X)}] = -\sum{x \in \Omegax}\mathbf{P(X=x)}\log\left(\mathbf{P(X=x)}\right)$$ where $\log$ is the base 2 logarithm. Let's decode this expression a little bit. The $\mathbf{E}[\hspace{2pt}]$ structure is the expected value of whatever is inside. $\mathbf{P(X)}$ is the probability distribution for the variable $\mathbf{X}$. This $\mathbf{X}$ can be any value in its sample space $\Omega_x$.
So the equation on the right means we are adding up the result of the expression $\mathbf{P(X=x)}\log\left(\mathbf{P(X=x)}\right)$ for every $x$ in the sample space $\Omega_x$. By doing this, we can determine the randomness of $x$, which we know is its information content. Note that from now on, I am going to denote $\mathbf{P(X=x)}$ as $\mathbf{P(x)}$ because it makes the math cleaner.
Now that the equation is less of a surprise, onto the example! Consider a fair coin, as in it can be heads with probability 0.5 and tails with probability 0.5. If we flip the coin once, what is its entropy? Well, what events are in our sample space $\Omega_x$?
So, we have $$-\sum{x\in\Omegax}\mathbf{P(x)}\log2\left(\mathbf{P(x)}\right) = -\mathbf{P(h)}\log2\left(\mathbf{P(h)}\right) -\mathbf{P(t)}\log2\left(\mathbf{P(t)}\right)$$ Filling in the numbers, this yields $$-0.5\log2 0.5 - 0.5\log_2 0.5 = -0.5(-1) - 0.5(-1) = 0.5 + 0.5 = 1$$ So a coin has 1 bit of information content (notice that a bit has two options as well, 1 or 0). Entropy confirms what we already know.
Going a little faster, what if we have a fake coin that is heads on both sides (i.e. probability of heads is a 1, probability of tails is a 0)? Well, $$-1\log2 1 - 0\log2 0 = 0$$ A coin that is always heads has no randomness (and by extension no information content). The Georgia Tech source goes on into another example that explains how entropy determines the number of bits needed to describe a (typically larger) number of samples.
Very aptly presented! Entropy captures amount of randomness in a variable. However, I would like to add minor point (for those who aren't aware) that base of log is 2 since possible outcomes for entropy is in bits form.
Since we're touching information theory, I want to deviate slightly and recommend an excellent book by Vlatko Vedral: Decoding Reality: The Universe as Quantum Information.
It talks about how information is the most fundamental building block of reality.
Got here from Reddit. I think I will giving it a try here.
Broadly speaking, 'entropy' in information, thermodynamics and statistics answers the same question “how difficult is it to describe a particular thing?”. And the key to remember is that the amount of information it takes to describe something is proportional to its entropy.
In terms of statistical mechanics, entropy is basically a measure of the number of ways to arrange a given system. The concept is quite abstract and it's certainly not just the degree of randomness. Btw, this site seems to have a lot of interesting content. Will try to visit more often.