Browsing while I ought to be working (as one does), I came across a blog post which has some thoughts on information theory, which I at first thought were wrong. (And naturally I set out to correct this Wrongness on the Internet.) On reflection, I find that Aza Raskin is in fact correct, but that there are subtleties.
He asks which of these sentences contains more information:
1. Cogito ergo sum.
2. Shoes smell bad.
and answers that they contain the same amount, on the grounds that they have the same number of letters; modulo that rare letters contain more information than common ones. He then gives the usual Shannon-information definition as the number of bits.
Now, my initial thought was that this is wrong, because the sentences convey no information unless the receiver speaks English (or Latin, in the first case; but the phrase is sufficiently common that we can consider it a loan-word). To an alien, a phrase this short – expressed in Morse, perhaps – is indistinguishable from random noise. However, upon writing the previous sentence, I realised that this is precisely why they contain the same amount of Shannon information: Random noise is maximally information-rich.
Nonetheless, although this is the sense of ‘information’ used by communications engineers and interface designers (and since the post I linked to is on interface design, it’s a perfectly reasonable usage for that context, and not wrong at all), it does not correspond at all to the usual informal sense, the way in which non-engineers use the word. One could say that this is too bad for the informal sense; but in fact it is possible to rescue it by looking not at the words but at the concepts. If we consider that sentences as actual messages intended to cause chemical changes in a receiving brain, rather than strings of bits to be transmitted over a wire, then the information they encode looks quite different; they are in fact shorthands for enormously complex concepts, with the vast majority of the information encoded in the receiving brain.
Now, which of the two sets of concepts contains the more information? If you started with a blank-slate baby, how long would it take you to teach it the meaning of the two sentences? Intuitively the bit about shoes looks simpler, because you can just let the baby sniff a shoe. But this is cheating; it is exploiting the fact that a human child is not a blank slate at all. It has a huge amount of information built into its brain, including the processing power that we register as smell, and what it means for a smell to be bad. For the test to be honest, we cannot exploit this pre-existing information – mammal brains are even more complicated than the English language! For purposes of the test, then, we must consider a child with no sense of smell, or an alien who never evolved that form of sensory input.
A similar objection applies to ‘shoe’; you can’t just point to one and say “This is a shoe”, because again you are exploiting the pre-existing framework for processing visual information. But now I’m painting myself into a corner. If I can’t use any sensory input at all, just what am I explaining things to? One cannot impose philosophy on a rock; in the actual Universe of existing things, there are no truly blank slates, at least not ones to which things can genuinely be explained. We must allow, then, for some pre-existing information.
I find, as I write, that I’m not actually certain which concept contains more information. I thought at first that ‘bad’ would require defining the entire concept of subjective experience, and therefore would contain all of ‘cogito ergo sum’ within it; but on reflection, a dog knows very well what it means to smell something bad, but has very little self-awareness. On the other hand it’s also reasonable to say that a dog doesn’t actually understand the concept of ‘bad’, it just recoils from certain things through plain chemistry, like an organic computer. And on the gripping hand, the qualium that humans experience as ‘knowledge’ is also plain chemistry, just more complex – in the sense of having more Shannon information! (You would need to specify the position and state of more molecules.)
I come to the conclusion that the amount of information depends very strongly on the recipient: A self-aware alien with no sense of smell and no feet would require a lot of definition before it could be brought up to speed on the shoes, but would with high probability have developed ‘cogito ergo sum’ on its own; conversely a human child probably understands shoes and smells well before it has true self-awareness.
This sort of subtlety is, perhaps, precisely the reason Shannon information is so useful: It allows you to define an amount of information with no reference to the receiving party. Its strength is also its weakness, though, because generally we do not actually care about the difficulty of transmitting the message, but its content. Incidentally, this is exactly why IDiots will never give a definition of ‘information’ other than circular stuff like “The amount of meaning contained in the message”. (I have actually seen this seriously offered as an operational definition!) The Shannon definition doesn’t support their case and there is no other quantifiable meaning.