sexta-feira, 17 de novembro de 2006

[ tctp # 0002 ] REDUNDANCY AND ENTROPY

[Jhon Michael Underwood, "Cultsock"]


Important note: This section is based on my attempt to understand Shannon's information theory. I am almost completely innumerate, so am far from sure that I have grasped his understanding of the terms. In so far as I think I have understood them, I am not persuaded that they are particularly useful to an understanding of everyday communication between human beings. They are, however, terms which you will come across in your study of communication, so I suppose you should have some idea of what they mean. I have searched the Web for articles on the subject. Those I have discovered seem to fall into three categories:


highly technical articles which I do not understand
articles by authors working in the 'humanities' side of communication which plainly misrepresent Shannon's ideas articles by authors working in the 'humanities' side of communication which appear to explain the concepts accurately, but fail to make a link between them and human-to-human communication
Should you know of any articles on the Web which do not suffer from these problems, please
e-mail me and I'll post a link to them. Alternatively, should you wish to post an article of your own, please send it to me and I'll post it on this site.


REDUNDANCY


If you're familiar with 'redundancy', you may wish to go straight to entropy.
You have probably had the experience of being criticized for redundancy by your English teacher. By 'redundancy' she means unnecessary verbiage, or perhaps even tautologies, such as 'reversing back into the driveway'. In the criticism of English style, redundancy is considered to be a bad thing, but in natural communication systems a certain amount of redundancy is always built in. You'll be familiar with this if you've ever compressed a number of word-processed files to prepare them for sending via e-mail. They can generally be reduced to around half their size. To do that, the compression algorithm exploits the redundancy in the English spelling system.

A smpl xmpl s ths sntnc whch y cn prbbly ndrstnd wtht th vwls. Orthissentencewhichyoucanreadevenwithoutthespaces.

The first of those examples exploits the 'redundancy' of letters for vowel sounds, the second the 'redundancy' of spaces. You might also have little difficulty in filling in the missing word in this incomplete .......... You probably guessed that the word missing from the end of that sentence was 'sentence'; you might perhaps have expected 'sequence of words'; a bit oddly, you might have guessed 'test'; strangely, you might have thought 'crossword'; and, if you thought 'doughnut' or 'duck-billed platypus', you should seek help and/or study something else (dadaism and surrealism would be good starting points). However, even allowing for the oddest readers, it's a fair bet that you didn't guess 'quickly', 'turns', 'to speak', 'bright' or any such adverb, verb, infinitive or adjective simply because the rules of English don't allow those at that point. If I'm speaking quietly, or the environment is noisy, or I have an unfamiliar regional accent, I suppose you might think I asked 'Would you like some of the scheese?' But what you know about permitted consonant clusters in English leads you to conclude that I must have referred to 'this cheese' simply because we don't begin words with the sounds 's-ch'. As you can see, we're talking here about probabilities. If you're interested in seeing how Shannon develops this, it's in the next few paragraphs. If not, just skip it (it's actually quite entertaining).


Clearly in any language some characters have a greater probability of occurring than others. Thus, in English, E occurs with a frequency of .12 and W with a frequency of .02 (I know because Shannon says so). However, as we have just seen with 'scheese', the probability of a character occurring is also affected by the preceding character. So, in his exploration of redundancy in the English spelling system, Shannon is first forced to consider digrams: after the first letter is chosen the next one is chosen in accordance with the frequencies with which the various letters follow the first one. After that, he gets into a trigram structure ....
So, first of all, for the 26 characters of English plus a space using a zero-order approximation (i.e. where all characters are equally probable) this is what Shannon gets:


XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD


The first-order approximation, where the characters have the frequency of English text, but are not influenced by what precedes them, gives us:


OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL


You might think that still looks like Klingon, but compare it with what was produced first and note the absence of Q's, X's and Z's.
When we apply a digram structure, this is what comes up:


ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE


Is this starting to look more familiar? I think I used to work with the guy who wrote that.
The trigram structure gives us:


IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE


Blimey! This is starting to sound like cultural studies.
Shannon, rather than continuing with all the possible n-grams, then jumps to word units. In the following, the words are chosen independently of one another, but using their approximate frequencies in normal English:


REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE


In the following the word-transition probabilities are correct:


THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED


Suddenly, this reads like a translation from French philosophy.
The redundancy of English is around 50%. As Shannon explains it, that means that 'when we write English around half of what we write is determined by the structure of the language and half is chosen freely' and '[redundancy] is the fraction of the structure of the message which is determined not by the choice of the sender, but rather by the accepted statistical rules governing the choice of the symbols in question'. This redundancy becomes evident to us if we send someone a telegram leave a short note, or quickly fire off an e-mail, in each of which we cut out many 'redundant' words or even letters.
With artificial communication systems we can deliberately add in redundancy as a means of error checking. For example, when your library user number is entered into your college's computer, the last digit of the number may well be a so-called 'check digit' which results out of a calculation automatically performed on the preceding digits when the library computer first issued you with a number. It is 'redundant' in that the preceding sequence of digits is sufficient to distinguish you from other library users. Its purpose is to check that the librarian has entered the preceding digits correctly. Thus, the redundancy is used to overcome possible noise (for comments on 'noise', see the section on the Shannon-Weaver model).
Similarly, since around 50% of English is redundant, it should be possible to save around 50% of the costs involved in the electronic transmission of messages in English. Something like that happens when a file is compressed. But, if there was some noise somewhere in the transmission process, obliterating some of the characters retained in the compressed message, then it would be impossible to reconstruct the original. Here again, it would be wise to build redundancy into the system.
In the digital encoding of messages, the redundancy may take the form of a 'parity bit'. The letter A is represented in binary as 01000001. To transmit a letter A digitally, we just need to transmit those 8 digits. But if the line's noisy and a digit is dropped, transmitting, say, 010000?1 we can't tell which letter was transmitted. It might have been an A if the missing digit was 0. But if the missing digit was 1, then the letter would have been C. OK, in normal English, we can figure it out from the context, but what if we've already reduced the redundancy in the English? In that case, adding a parity bit, an extra redundant digit, will overcome the problem. If the sequence of 8 bits contained an even number of 1's, the parity bit will be 0; if it contained an odd number, it will be 1. So, if we receive 010000?10, the extra 0 on the end tells us we should have received an A (01000001); if we receive 010000?11, the extra 1 on the end tells us we should have received a C (01000011). Of course, if the line is so noisy that we lose two digits, tough. Mind you, if we expect a lot of noise, we could always agree to send a parity digit after every four bits instead of every eight. It might seem daft to send extra bits that aren't really needed. But, assume that we've reduced a text of 10,000 characters to, say, 8,000 characters by eliminating some of the redundancy. For every character we transmit, we also send a parity bit, i.e. 8,000 parity bits. 8,000 bits is the equivalent of 1,000 characters, so we're still ahead.
Deliberately engineered redundancy such as that is not uncommon in natural communication either. For example, teachers have long adhered to the prescription to 'tell them what you're going to tell them; tell them; then tell them what you've told them'. Again, though, it may be fine in principle, but doesn't necessarily work in practice as all of us who have slept through almost an entire lecture can testify.


Channel redundancy
As mentioned above, one of the purposes of redundancy in natural communication systems would appear to be to overcome any possible noise, as well as to increase reliability. When I say 'the two men are standing in the room', why do I say the plural three times and the present tense twice (in other words why can't I say 'the two man standing....')? And why can't I just jumble up the words in any old order? Presumably because the redundancy helps me to overcome noise in the source if I have a speech defect, noise in the channel if there's a lorry just going past or noise in the receiver if she's not paying full attention. So, if it's particularly important for me to emphasize that there are two men in the room, I can hold up two fingers to emphasize that at the same time as I say the word 'two'. When I do that I am using the visual channel. I am exploiting what is known as channel redundancy. Of course, it's not quite as simple as that because non-verbal communication could also be used to disavow what I am saying in the verbal channel, but you get the general idea. So to make sure someone gets the message, you send them a letter, a telegram and an e-mail. Again, you get the idea, but again you might appreciate that it's not as simple as that. After all, the receiver of your message could be pretty annoyed and not reply ever again as a result. Just think of the insurance company which sends you junk mail, rings you up for a survey and sends a salesman around. The message definitely reaches the receiver, but the insurance man had better have some life insurance of his own if he doesn't sod off soon. The point I'm making here is that it's probably not wise to take aspects of information theory and attempt to apply them directly to human-to-human communication.


***


ENTROPY


Entropy in thermodynamics
Ah, well... should I confess that I don't really understand what is meant by this? After the first couple of pages, Shannon's paper fairly rapidly fills up with mysterious squiggles which I believe are known as equations. There's also a lot of arcane talk about logarithms, whatever they may be. He also often uses phrases that mathematicians love such as 'it follows that...' and 'thus it can easily be seen that...' Well, by Shannon maybe, but not by me. So you should take what follows with an entire salt mine.
If you already know about thermodynamic entropy, you can skip the next bit and jump to the part about cybernetic entropy.
Those of you with more alert minds than I, or closer to your schooldays than I, may recall the term entropy from what you learned about the second law of thermodynamics. If you were to ask me what the law states, I would look around at the piles of crap on my desk, scratch my balding head, recall my irritation at the sticking SHIFT key on this keyboard, and tell you it states that everything falls apart and time only moves in the direction of things falling apart. And, just before my fifitieth birthday, I can tell you my experience does not falsify this law. Anyway, hang on a minute while I go and look up a definition ..... now, where did I put that encyclopaedia?

Entropy always increases in any closed system not in equilibrium, and remains constant for a system which is in equilibrium

That's what it says here anyway. So what is entropy?
Entropy is interpreted as a measure of the disorder among the atoms making up the system, since an initially ordered state is virtually certain to randomize as time proceeds.
I knew it had something to do with the state of my desk. There's a delicious smell of fresh coffee emanating from the kitchen as I write this. The smell is due to smelly coffee molecules escaping from the coffee pot and scurrying up my nose. They are leaving the initially ordered state of the coffee pot and moving into an increasingly 'disordered state', i.e. spreading around the house. There's nothing in Newton's laws to say that they shouldn't all spontaneously end up in a more ordered state, for example by all going back into the coffee pot. In fact, if we could have a film of the whole process and were to run it backwards we would find those molecules doing nothing which conflicts with Newton's laws of motion. Somehow, though, the piles of crap on my desk suggest that that's not going to happen if we leave the molecules to their own devices. As I know from my desk, increased order only comes about as a result of the input of work. The second law of thermodynamics says that things are going to get more random, more shuffled. Eventually, I shall die, be eaten up by bacteria and all my molecules will spread out just like the coffee molecules and - who knows? - maybe you'll get to breathe in one of my 1025 nitrogen atoms. I shall then have reached a state of maximum entropy (in the not too distant future if I keep smoking). The universe itself will eventually come to a halt, according to the second law of thermodynamics (not that that will bother me much).

Entropy in information theory
But what has thermodynamic entropy to do with information theory and with human-to-human communication? Hmmm.... I'm getting well out of my depth here, but, if I've understood correctly, thermodynamic entropy is a measure of disorder whilst cybernetic or statistical entropy is a measure of uncertainty. As far as I can see, the two are really only formally related in the sense that both are expressed as the logarithm of a probability. Entropy (I think) can be understood as the opposite of redundancy. Here are some examples:


1. If we have a computer whose job is to transmit the letter A all the time, there is 100% redundancy in its message. We receive


AAAAAA?AA?AAAAAAAA?A


No problem, we know that the missing letters are A's.


2. If the computer's job is to send out all the letters from A to Z in alphabetical sequence over and over again, then redundancy would also be at a maximum, entropy zero. We receive


ABCD?FG


and we know with absolute certainty what the missing letter is.


3. If, however, the computer's task is to transmit the letters A to Z in a random sequence, sending each one only once until it's gone through all 26 letters, redundancy decreases and entropy rises. Say we receive this sequence during one transmission of the 26 letters:


PQGHWMNBCVZSTERUYAX


We know that the next letter must be one of these: DFIJKLO How do we know? Because none of those has been sent yet. But we don't know which one. Remember that Shannon defined redundancy as 'the fraction of the structure of the message which is determined not by the choice of the sender, but rather by the accepted statistical rules governing the choice of the symbols in question'. The accepted statistical rules in our example are that the computer should send each character of the alphabet once only, but in any sequence. So the redundancy in the system is what tells us that DFIJKLO are the characters which remain as the 'choice of the sender'. In this instance, then, redundancy is reduced over the previous examples and entropy has increased.


4. Now suppose that the computer's task is to choose a character at random from the 26 in the alphabet and send it.
There are no further constraints in this example, no further limits on the computer's 'free choice'. We receive


XFOMLRXKHRJFFJUJZLPWCFWKCYJFFJ


How do we know what the next letter is going to be? We don't. Choosing at random, the computer could send us any one of the 26 letters. There is some redundancy in that we know that the next sequence will not be. We know with certainty that the next letter will be one of our 26, but we don't know which one. So, in this case, redundancy is very low, entropy very high.


As you can see, there is a parallel between the concepts of thermodynamic and cybernetic entropy. In our last example, there is a higher degree of disorder (as in thermodynamic entropy) and a higher degree of uncertainty (as in cybernetic entropy). Weaver, in his introduction to Shannon's paper, makes the parallel explicit:

Thus for a communication source one can say, just as he would also say it of a thermodynamic ensemble: This situation is highly organized, it is not characterized by a large degree of randomness or of choice -- that is to say, the information (or the entropy) is low.

All well and good, but where does that get us? The point is that an increase in entropy is equivalent to an increase in information. In examples 1 and 2 above, information is low. In example 4 it is very high. You may object that none of those examples contain any information. They don't come anywhere near telling you the time, who won the World Cup or whether the pubs are open yet (I wish!). Quite so, but information and meaning are not the same thing, as far as Shannon is concerned. Weaver pointed this out in his introduction to Shannon's paper:

The word information, in this theory, is used in a special sense that must not be confused with its ordinary usage. In particular information must not be confused with meaning. In fact, two messages, one of which is heavily loaded with meaning and the other of which is pure nonsense, can be exactly equivalent, from the present viewpoint, as regards information.
You can see how this applies to the English language. Before I start typing my English sentence, the possibilites are virtually infinite. As soon as I have typed the first letter: "I", the possible choice of the next letter is constrained. At least, according to the Oxford English dictionary it is, as no word begins with two i's. Once I've typed "In" the possibilities are even more limited. Once I've typed "Inf", redundancy is starting to increase. Once I've typed "Informati", redundancy is very high, because the only choices open to me are o and v. Here entropy is very low. According to Shannon, the information is high after my first "I" because entropy is high. Therefore, the totally random sequence of characters in example 4) above has a high information content because entropy is high. The next character in the sequence cannot be predicted at all. We would have to send the whole message as it stands, there is no way we could compress it as we can the English language. There is no redundancy, so nothing we can cut out. This is quite well summed up by Weaver: information is a measure of one’s freedom of choice when one selects a message
I hope that makes it clear that the higher the entropy, the greater the information. We could also express that as: the higher the uncertainty, the greater the information; or: the higher the unpredictability, the greater the information.
'Well, all right,' you might object, 'but XFOMLRXKHRJFFJUJZLPWCFWKCYJFFJ conveys no information at all.' Smugly, I reply, 'You're confusing information with meaning!' 'Fair enough,' you say irritably, 'let me rephrase what I said: XFOMLRXKHRJFFJUJZLPWCFWKCYJFFJ conveys no meaning. So, frankly, I don't see what all this talk of entropy has to do with human communication, which, when all's said and done, is all about trying to convey meanings, not random strings of letters.' At this point in the conversation, I might try to change the subject.
Weaver ends his introduction to Shannon's paper with:


The concept of information developed in this theory at first seems disappointing and bizarre--disappointing because it has nothing to do with meaning, and bizarre because it deals not with a single message but rather with the statistical character of a whole ensemble of messages, bizarre also because in these statistical terms the two words information and uncertainty find themselves to be partners


So you and I may find consolation in the fact that we're not the only ones to find this all a bit odd.


'Entropy' in communication studies
Can we generalize from Shannon's theory which places the emphasis on the information content of messages, in his specialized use of the term, to the study of the semantic content (i.e. the meaning) of the messages in human-to-human communication? You'll come across communication theorists using the terms 'entropy' and 'entropic'. By 'communication theorists' here I am referring to those working within the humanities who study communication, I do not mean communication theorists in cybernetics. If I have understood Shannon's use of the term correctly (no money-back guarantee, I'm afraid), then it seems to me that many communication theorists (as I am using the term here) seem to misunderstand it and yet still claim to find it a useful concept. For example, referring to Shannon, Kellner and Best state:



For such information systems theorists, increase of entropy represents a progressive loss of information


1997 : 209


Whereas Weaver, in his introduction to Shannon's paper says:


Thus for a communication source one can say, just as he would also say it of a thermodynamic ensemble: This situation is highly organized, it is not characterized by a large degree of randomness or of choice -- that is to say, the information (or the entropy) is low.

In case anyone missed that, let me just summarize:



information theorist (Weaver): low entropy = low information


cultural theorists (Kellner and Best): high entropy = low information


It really doesn't take much thought for any lay person who's used a computer a bit to figure this out. To save storage space, or to reduce the size of a file we want to e-mail to someone, we compress the file. File compression works by squishing out redundant information. If the blue sky is represented by five lines of 100 pixels of the same shade of blue, we don't store 500 pixels, we store one, togtether with a couple of bytes that mean '100 wide' and 'five lines deep'. Although, in lay person's terms, we can't say that the compressed file contains 'more information' than the original uncompressed file, we can squash four images of the same size into the space occupied by just one of them. In that case we would all be prepared to say that the compressed file of four images contains 'more information' than just one image. Now, given that the compression is achieved by reducing redundancy, and, given also that entropy is the opposite of redundancy, it clearly follows that the higher the entropy, the higher the information. Well, it seems clear to me at any rate, but I'm a bit hesitant about saying so because it means I have to disagree with big-name theorists, but that's just tough.
So what on earth is the point of a term that hardly anybody seems to be able to agree on?
What do our 'humanities' communication theorists mean when they use the term 'entropy'? Here's a definition provided by John Fiske:


Redundancy is generally a force for the status quo and against change. Entropy is less comfortable, more stimulating, more shocking perhaps, but harder to communicate effectively.
(1982 : 17)

I think Shannon might have been surprised to learn that. What is Fiske getting at? He is drawing a parallel between redundancy and conventionality. From what we've seen of Shannon's theories, we might therefore expect him to say that the less conventional a message is, the more unpredictable it is, therefore the greater the information content. In fact, he doesn't reach that conclusion because he draws a distinction between entropy in form and entropy in content. So, for Fiske, there can be a high degree of redundancy in form together with a high degree of entropy in content. Examples he gives are Beethoven and Jane Austen.
Using the term in this sense, it is quite possible for entropy in form to have been high once and low now, as innovations are imitated, enter the mainstream and become conventions. It is surprising for us now that the composer Weber, after hearing Beethoven's 7th for the first time, commented that Beethoven was clearly 'ripe for the madhouse'. For Weber, the form of the 7th was highly entropic in this Fiskean sense; for us when we listen to it today, there is a higher degree of redundancy than there was for Weber. This is the fate of almost any creative artist, I guess, whether it's Beethoven, John Coltrane or Miles Davis - remember the old lady who commented on leaving a performance of Hamlet that she wasn't greatly impressed by it because it was full of quotations or John Cage's surprise on leaving a Webern concert to discover that it had sounded tuneful.
Consider the case of minimalist music for a moment. Today, minimalism has entered the mainstream, but back in the 60s it was avant-garde. At that time Terry Riley, Steve Reich, Tim Souster et al were breaking new ground. Steve Reich's 'It's Gonna Rain' consists of two tape recorders, each playing a loop of a preacher shouting 'it's gonna rain'. In Shannon's sense, surely, there was almost no entropy in the work. There was some entropy due to the fact that the two tape players played at slightly different speeds, with a slight unpredictability in the speed variation. Therefore, in Shannon's sense, there was virtually no information in the work. In Fiske's sense, though, where entropy means cutting down the 'redundancy' of conventional forms, entropy was high.
I don't know how consistently or usefully this term can be applied in this sense. I don't see how the use of the term 'entropic' is an advantage over 'unpredictable', 'surprising', 'novel' or 'shocking'. There used to be an advertising line for shampoo which ran 'Wash the city right out of your hair' The use of the word 'city' is surprising and therefore catches our attention. The artist Robert Rauschenberg begged the then considerably more famous artist de Kooning to give him a drawing. When he finally got it, he rubbed it out and exhibited it as 'Erased de Kooning by Robert Rauschenberg'. Rachel Whiteread filled an entire house with concrete so that when the house was demolished we were left with an inside-out house. Damien Hirst had a cow sawn into two halves and exhibited them in tanks of formaldehyde. All of these actions are surprising, shocking, novel, avant-garde, challenging. Frankly, I don't see what we gain by introducing the term 'entropic' to describe them. It seems to be used in a very loose sense which bears little relation to Shannon's original concept.
Still, this finally gets us to what you need to know. In the humanities, to describe a message as highly entropic is to say that it is unexpected, shocking, novel or surprising. So, if you would like people to know that you are a student of communication and every bit as capable of being as incomprehensible as a teacher of communication, don't say of a message 'this is very novel', say 'this is highly entropic'. Quite a fashion statement, huh? Or maybe it's only a fashion statement until 'entropic' enters mainstream usage, whereupon it becomes 'redundant'. Eh?


For further information, you may wish to consult the following websites:
Lucent Technologies' excellent website with very clear explanations
Bell Laboratories, where you can download a copy of Shannon's paper
Information Theory Society Web Page
Entropy in Information and Coding Theory
Entropy and its Role in Claude Shannon’s Theory of Communication