Information Theory – Digital Communication
INFORMATION THEORY
What is Information Theory?
Information theory is a highly theoretical study of the efficient use of
bandwidth to propagate information through electronic communication
system.
Information theory is a mathematical approach to the study of coding of
information along with the quantification, storage, and communication of
information.
APPLICATION OF INFORMATION
THEORY
Data compression
Shannon’s concept of entropy (a measure of the maximum possible efficiency of
any encoding scheme) can be used to determine the maximum theoretical compression
for a given message alphabet. In particular, if the entropy is less than the average
length of an encoding, compression is possible. The table Relative frequencies of
characters in English text shows the relative frequencies of letters in representative
English text. The table assumes that all letters have been capitalized and ignores all
other characters except for spaces. Note that letter frequencies depend upon the
particular text sample. An essay about zebras in the zoo, for instance, is likely to have a
much greater frequency of z’s than the table would suggest. Nevertheless, the
frequency distribution for any very large sample of English text would appear quite
similar to this table. Calculating the entropy for this distribution gives 4.08 bits per
character. (Recall Shannon’s formula for entropy.) Because normally 8 bits per
character are used in the most common coding standard, Shannon’s theory shows that
there exists an encoding that is roughly twice as efficient as the normal one for this
simplified message alphabet. These results, however, apply only to large samples and
assume that the source of the character stream transmits characters in a random
fashion based on the probabilities in the table. Real text does not perfectly t this model;
parts of it tend to be highly nonrandom and repetitive. Thus, the theoretical results do
not immediately translate into practice.
1|P ag e
Information Theory – Digital Communication
Error-correcting and error-detecting codes
Shannon’s work in the area of discrete, noisy communication pointed out the
possibility of constructing error-correcting codes. Error-correcting codes add extra bits
to help correct errors and thus operate in the opposite direction from compression.
Error-detecting codes, on the other hand, indicate that an error has occurred but do not
automatically correct the error. Frequently the error is corrected by an automatic request
to retransmit the message. Because error-correcting codes typically demand more extra
bits than error-detecting codes, in some cases it is more efficient to use an error-
detecting code simply to indicate what has to be retransmitted.
Deciding between error-correcting and error-detecting codes requires a good
understanding of the nature of the errors that are likely to occur under the
circumstances in which the message is being sent. Transmissions to and from space
vehicles generally use error-correcting codes because of the difficulties in getting
retransmission. Because of the long distances and low power available in transmitting
from space vehicles, it is easy to see that the utmost skill and art must be employed to
build communication systems that operate at the limits imposed by Shannon’s results.
A common type of error-detecting code is the parity code, which adds one bit to a
block of bits so that the ones in the block always add up to either an odd or even
number. For example, an odd parity code might replace the two-bit code words 00, 01,
10, and 11 with the three-bit words 001, 010, 100, and 111. Any single transformation of
a 0 to a 1 or a 1 to a 0 would change the parity of the block and make the error
detectable. In practice, adding a parity bit to a two-bit code is not very efficient, but for
longer codes adding a parity bit is reasonable. For instance, computer and fax modems
often communicate by sending eight-bit blocks, with one of the bits reserved as a parity
bit. Because parity codes are simple to implement, they are also often used to check
the integrity of computer equipment.
As noted earlier, designing practical error-correcting codes is not easy, and Shannon’s
work does not provide direct guidance in this area. Nevertheless, knowing the physical
characteristics of the channel, such as bandwidth and signal-to-noise ratio, gives
valuable knowledge about maximum data transmission capabilities.
Cryptology
Cryptology is the science of secure communication. It concerns both
cryptanalysis, the study of how encrypted information is revealed (or decrypted) when
the secret “key” is unknown, and cryptography, the study of how information is
concealed and encrypted in the first place.
Shannon’s analysis of communication codes led him to apply the mathematical tools of
information theory to cryptography in “Communication Theory of Secrecy Systems”
(1949). In particular, he began his analysis by noting that simple transposition ciphers—
such as those obtained by permuting the letters in the alphabet—do not affect the
entropy because they merely relabel the characters in his formula without changing their
associated probabilities.
2|P ag e
Information Theory – Digital Communication
Cryptographic systems employ special information called a key to help encrypt
and decrypt messages. Sometimes different keys are used for the encoding and
decoding, while in other instances the same key is used for both processes. Shannon
made the following general observation: “the amount of uncertainty we can introduce
into the solution cannot be greater than the key uncertainty.” This means, among other
things, that random keys should be selected to make the encryption more secure. While
Shannon’s work did not lead to new practical encryption schemes, he did supply a
framework for understanding the essential features of any such system.
Linguistics
While information theory has been most helpful in the design of more
efficient telecommunication systems, it has also motivated linguistic studies of the
relative frequencies of words, the length of words, and the speed of reading.
The best-known formula for studying relative word frequencies was proposed by the
American linguist George Zipf in Selected Studies of the Principle of Relative Frequency
in Language (1932). Zipf’s Law states that the relative frequency of a word is inversely
proportional to its rank. That is, the second most frequent word is used only half as
often as the most frequent word, and the 100th most frequent word is used only one
hundredth as often as the most frequent word.
Consistent with the encoding ideas discussed earlier, the most frequently used
words tend to be the shortest. It is uncertain how much of this phenomenon is due to a
“principle of least effort,” but using the shortest sequences for the most common words
certainly promotes greater communication efficiency.
Information theory provides a means for measuring redundancy or efficiency of
symbolic representation within a given language. For example, if English letters
occurred with equal regularity (ignoring the distinction between uppercase and
lowercase letters), the expected entropy of an average sample of English text would be
log2(26), which is approximately 4.7.. Scientists have studied sequences of eight
characters in English and come up with a figure of about 2.35 for the average entropy of
English. Because this is only half the 4.7 value, it is said that English has a relative
entropy of 50 percent and a redundancy of 50 percent.
A redundancy of 50 percent means that roughly half the letters in a sentence
could be omitted and the message still be reconstructable. The question of redundancy
is of great interest to crossword puzzle creators. For example, if redundancy was 0
percent, so that every sequence of characters was a word, then there would be no
difficulty in constructing a crossword puzzle because any character sequence the
designer wanted to use would be acceptable. As redundancy increases, the difficulty of
creating a crossword puzzle also increases. Shannon showed that a redundancy of 50
percent is the upper limit for constructing two-dimensional crossword puzzles and that
33 percent is the upper limit for constructing three-dimensional crossword puzzles.
3|P ag e
Information Theory – Digital Communication
Shannon also observed that when longer sequences, such as paragraphs, chapters,
and whole books, are considered, the entropy decreases and English becomes even
more predictable. He considered longer sequences and concluded that the entropy of
English is approximately one bit per character. This indicates that in longer text nearly
all of the message can be guessed from just a 20 to 25 percent random sample.
4|P ag e