0% found this document useful (0 votes)
5 views27 pages

Mit Lec 12 Transcript

The lecture discusses Reed-Solomon codes, their optimal attributes, and their applications in error correction, particularly in packet transmission over the internet. The professor emphasizes the importance of understanding the limitations of these codes when applied to binary channels and introduces the concept of concatenated codes as a method to achieve high-performance coding. Additionally, the midterm logistics and review session details are provided, focusing on chapters four through eight of the course material.

Uploaded by

archigoyal16
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views27 pages

Mit Lec 12 Transcript

The lecture discusses Reed-Solomon codes, their optimal attributes, and their applications in error correction, particularly in packet transmission over the internet. The professor emphasizes the importance of understanding the limitations of these codes when applied to binary channels and introduces the concept of concatenated codes as a method to achieve high-performance coding. Additionally, the midterm logistics and review session details are provided, focusing on chapters four through eight of the course material.

Uploaded by

archigoyal16
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MITOCW | 12ocw-6-451_4-261-14mar2005-220k_512kb-mp4

PROFESSOR: All right, I thought he would be. It's always a hazard putting up somebody who you

know is going to lecture better than you do. But I hope that was a good change of
pace. And, of course, he knows more about the subject of Reed-Solomon decoding

than I or more than practically anyone else does, so -- did he have much time to talk

about decoding or was the most -- yeah. Talked about the decoding algorithms,
Sudan-type decoding algorithms. Yeah? OK. Good. Well, in fact, he covered so

much that I don't have much left to do today.

I remind you, Ashish put out an email. There's a review session tomorrow for the
midterm from 5 to 7 in a different room, 36-112. The midterm is Wednesday in this
room. It starts at 9:05 to give you a little bit more time to work on it. It's closed book

except that you're allowed to make up three normal size sheets of notes. You can
write as small as you like. This experience has shown this is the very best way of

getting you to consolidate what you've learned up to this point, is writing your own
three-page precis of the course to date. So, yes?

AUDIENCE: [INAUDIBLE]?

PROFESSOR: What is the rule, both -- what?

AUDIENCE: Both sides.

PROFESSOR: Both sides. All right. Six sheets. If you can't write down the whole course in six
sheets, then, well, you haven't been paying attention. OK. And you can bring

calculators, but we don't expect they'll be useful. Please erase any erasable will
memories if you do bring calculators. These are things we tell undergraduates. I
guess they hardly seem necessary in a graduate course. Any other questions about
the logistics of the midterm?

AUDIENCE: What is the material going to be for the midterm?

PROFESSOR: Everything that's been covered up to today. I won't hold you responsible for
anything I've talked about today. So it's basically chapters one through eight. I really

1
will only hold you responsible for what we covered in class, which chapters one, two,
and three were so brief, you can't say we really covered them. So really, chapters

four through eight. Any other questions? All right. You'll have more chance as we go
along.

So as I say, Ralf really covered the main things about Reed-Solomon codes, which
are the crowning achievement of algebraic coding theory. They have a lot of
favorable attributes. They are optimum, from an n, k, d point of view, in the sense

that they're maximum distance separable. The parameters are as good as they
possibly could be by this very basic Singleton bound. They have efficient algebraic
decoding algorithms. And traditionally, this has meant hard decision bounded
distance decoding algorithms like the Welch-Berlekamp algorithm or the Berlekamp-
Massey match algorithm, which take hard decisions in, put hard decisions on

symbols out.

But as Ralf suggested to you, now there are algorithms that can use soft decisions
in, produce a list, at least, coming out, some indication of the reliability of each
element of the list. So they've been softened up. Nonetheless, what are the

negatives is, basically, that they work on bytes, or symbols, not bits. So if we have a
basically binary channel, as we very frequently do, then they need something else

to make them adapted to the binary channel, or even on the additive white
Gaussian noise channel.

I don't think we've gone through the calculation. You could think of sending q or e

symbols over the additive white Gaussian noise channel, but what would you need?
You would need a q or e signal set. Now, if you pick that to be a q simplex signals

set or a orthogonal or biorthogonal signal set, then that might work because the

distance structure of such a signal set is like the Hamming distance structure in GF
of q. In other words, it's approximately equally likely to make an error to any other

symbol. The symbols are more or less equally spread apart in Euclidean space.
Just as when we count distance in Hamming space, we give the same weight to

each possible kind of error or each possible kind of symbol difference.

2
So if you did that, then Reed-Solomon codes might work well. That's one form of a
concatenated scheme, which I will talk about in just a second. But if you tried to

apply Reed-Solomon codes to a binary input additive white Gaussian noise channel,
just translate the, say, 8-bit bytes into bits and send them one bit at a time, then the

distance structures don't correspond at all well. It's much more likely to make a

single bit error than an error that involves all eight bits or something. And for that
fundamental reason, you'll find that performance is not so great.

And certainly, this is not the way to get to capacity all by itself unless it's married
with something else. So that's something we always have to deal with in another

applications, as I will talk about in a minute. We naturally do want to correct bytes,

or symbols, or blocks. There are many bits. And then Reed-Solomon codes are just
exactly what we want. They're optimal in any such situation.

And so what did I just say? They're not the way to get to capacity, although they are
sometimes part of schemes for getting to capacity. And since getting to capacity in

the additive white Gaussian noise channel is the theme of this course, then they

play only a peripheral role in that theme. Nonetheless, they're great codes. They're
in wide use. They're part of every semiconductor manufacturer's core library. You

can buy (255,223) distance 33, 16 error correcting Reed-Solomon decoder in part
of any chip package and it'll go at gigabits nowadays. So it's kind of the all-purpose

error correcting code, with the caveat that you need to somehow get what you

correct to be bytes rather than bits, OK?

So today I thought I'd cover the last two topics in chapter eight and maybe, to the

extent that time permits, go back to chapter seven. But we'll see how we feel about
that. Applications of Reed-Solomon codes. OK, basically, you can apply them

wherever you want to correct blocks, rather than bits or packets or n-tuples or what

have you. So one obvious application is, when you send packets through the
internet, if you use Reed-Solomon codes across the packet, then you can correct

errors that occur in individual packets.

This is not the way the internet usually works. The internet usually works by having

3
an error detection code within each packet, a cyclic redundancy check (CRC). You
hash bits, whatever. You provide some redundant bits within each packet if -- when

you receive the packet, you check to see whether the parity check checks. If there
are parity check bits, then a very good rule of thumb is that the probability of not
detecting an error for almost any type of error pattern is going to be of the order of

2 to the minus r. That's exactly what it would be if you had a completely random
stream of bits. Random stream of bits with r redundant bits in a block, the probability
that all the check bits are random, the probability that they're all equal to 0, is 2 to
the minus r. All right?

But it turns out that's very good approximation for just about any error mechanism.
So you pick r large enough. It's often been picked to be something like r equals 32.
What kind of failure to detect does that give? 2 to the minus 32, which is what, about

2 the minus 9th, 2 to the minus 10th -- 10 to the minus 9th, 10 to the minus 10th. Is
that low enough? It was certainly -- when I started in this business, that was
considered just incredibly low, but now that we're sending mega packets per

second, it doesn't seem quite so low anymore. And so 32 is a little bit on the sloppy
side. There are going to be undetected errors that get through if you're spending 10
to the 9th, 10 to the 10th packets. But 32 or, nowadays, 64 would be a better
number. Just like cryptography, the number of bits has to go up as technology goes

up. But 32 is often used.

OK, but here's an alternative scheme. Suppose you coded across packets. So here
is a, say, a 1,000-bit packet. And here's packet one, here's packet two, here's

packet three, and so forth up to packet k, where we're going to use n, k, d equals n
minus k plus 1, RS code over, it doesn't matter, GF of let's say 256, just for definite
[INAUDIBLE]. Here I'm using older notation. This is the same as F-256. Or we

sometimes write that as F 2 to the 8th.

OK, how are we going to do packet error correction? We're going to code crosswise
in columns, all right? So these would be k. Let's make this eight-wide. Just pick off a
bite at a time here. So each of these can be considered to be a symbol in GF of

256. And then we'll code n minus k parity check packets down here. So these are

4
redundant check packets. Encoded bytes at this point. They're not packets yet. But
if we do that column-wise across here, we can fill in n minus k check packets and

you can see the block length is quite huge here. So you might have a delay or
latency problem in executing this scheme. Basically, you would have to send this
whole thing before you could do error correction and then error correct the whole

thing.

But how does error correction work? You send this. Perhaps some of the packets
are erased or even received incorrectly. You would then use some error correction
or erasure and error correction scheme, which Ralf talked about, for correcting the

Reed-Solomon code. And you could correct up to d over 2 errors, or up to d minus


1 erasures, OK?

So this would be a very powerful packet error correction scheme. What are its

costs? It's costs are quite a bit of redundancy. It doesn't fit the internet architecture
very well. This is basically a point-to-point scheme. So when you're sending from a
to b, you'd have to send b an additional n minus k streams. And if you're sending

the same webpage, as you often do, to multiple people, you'd have to send this to
everybody. And it also doesn't have any feedback in it.

The great thing about ARQ is that you only have to send the information that's
necessary. You only have to restore the packets that are actually lost. You send

them again. Whereas here, you're just kind of, a priori, deciding that you're going to
send a great deal of redundant information down here to cover some kind of worst
case, whatever you think the worst-case number of packets would be. If there are

more that worst-case number or packets, then you could retransmit again, but
you're retransmitting this huge block, all right?

So for that reason, this is not actually used, as far as I'm aware, in the internet.

ARQ, wherever you have a feedback path, you should use it. That's a general
moral. So ARQ is a wonderful scheme that is, on an erasure channel, that's a
capacity-approaching scheme. Just on a bitwise basis, if you have a binary erasure
channel with erasure probability p, and then the capacity of the channel is 1 minus

5
p, and you can reach that capacity if you simply retransmit each bit that was erased.
As it's erased -- it's erased, you send it again.

And a quick calculation shows you that the average number of retransmissions is p.

And so the average reduction from 1 bit per second is -- 1 bit per transmission is p
bits per transmission, you can achieve 1 minus p with no coding at all, just by
retransmission. So it's perfectly matched to the erasure channel. And the internet,

especially with parity checks, CRC, within blocks, long enough CRC is basically a
packet erasure channel. You really, in practice, hardly ever make a undetected -- let
a packet with errors come through. OK? Any questions about this?

AUDIENCE: [INAUDIBLE].

PROFESSOR: The TCP/IP protocol just has a 32-bit CRC, I think, and if it fails to check, you ask for
that packet again by packet number. OK? So Reed-Solomon codes, of course, are
optimal if that's what you want to do. You couldn't possibly have a better code for

coding across packets. But it's cumbersome in the packet environment. OK.

Let's talk about concatenated codes. I guess I'll do it at the same place. This is
something -- I did in my thesis and last week I was at a tribute to Andy Viterbi, who

invented the Viterbi algorithm, among other things. Of course, everybody knows that
the Viterbi algorithm, when Andy invented it, he had no idea it was actually going to
be a practical algorithm. It's become one of the super practical algorithms. He was

just trying to prove a certain result, exponential error bounds for convolutional
codes. And he thought the proof was easier to understand if you introduce this
algorithm, which he describes perfectly. It is the Viterbi algorithm. But really, just a
proof technique. Didn't even realize it was an optimum decoder.

And then somebody, not him, not me, but in his company, fortunately, realized that
gee, this could be very practical. And that became a workhorse decoding algorithm.
Very similarly, concatenated codes are something that I did in my thesis in 1965.

And I was looking, basically, for a theoretical result. How could you approach
capacity with an exponentially decreasing error probability near capacity but with
only polynomial increasing complexity? And I looked into this very simple idea of
6
basically taking two codes and using one to correct the errors of the other. And I
found that you could do that.

And I'd also never realized that this might actually be practical. I remember going
around giving talks when I was interviewing afterwards that I was looking for an

industrial position. I went down to places like Bell Labs, IBM labs, and did seminars

talking about codes that were thousands of symbols long and everybody rolled their
eyes and said, boy, this guy is not very practical. And to my surprise and everyone

else's surprise, this became and still is a very practical way of getting high-
performance codes. This, in conjunction -- well, I'll get into it, but this was, again, a

workhorse coding technique and still is in wide use. In fact, as you will see, this

contains some of the ideas that go into modern capacity-approaching codes.

But the idea is very simple. You can do it two ways, or probably many more than

two. Suppose we have a basically binary channel. I'll just say a core physical
channel there. The channel is what you can't affect as an engineer. So let's say we

have a binary symmetric channel or a binary input additive white Gaussian noise

channel or whatever you like. And then we could put on it an encoder and decoder
for an n, k, d binary linear block code. OK?

Now, from the input and the output, what is this channel going to look like? At the

input, we're putting -- basically, per block, we put k bits in. They're encoded into n

bits. They're sent over the channel, received some n-tuple of noisy received
symbols. The decoder tries to find the best code word. The code word is identified

by k bits. Most of the time there'll be no error and these -- let's call this u in and u

hat out. u hat is going to equal u.

But, of course, let's imagine this is a relatively short code. Let's imagine this is a
maximum likelihood decoder or minimum distance decoder in the appropriate

space. Then the complexity is going to be of the order of 2 to the k. And so you

can't let k get very big. We'll find somewhat better ways to do maximum likelihood
decoding. But if you're really trying to do optimum decoding, the complexity is going

to go up exponentially at this stage with the number of information bits k. So this is

7
not the scheme that we're looking for that has only polynomial complexity. We know
that we can choose codes like this that with optimum decoding get to capacity.

That's Shannon's theorem. But we don't know how to decode them with reasonable

complexity.

But, all right, here's a start. We could use a relatively modest length code that we
can feasibly decode. And this is what we get. OK. So we have blocks going in. Block

from an alphabet of size 2 to the k. Blocks coming out, alphabet size 2 to the k.

Occasionally, we make errors. What's a good idea to then do? This is not very hard.
This is why you should have been in this field in the '60s rather than the year 2005.

There were lots of good ideas just laying around waiting to be picked up. Class?

Anyone have a suggestion for how to make further improvement in this system?

AUDIENCE: [INAUDIBLE].

PROFESSOR: Add another code. What should it be? Let's put an encoder here for what?

AUDIENCE: [INAUDIBLE].

PROFESSOR: Everyone is extremely wide awake. An n, k, d Reed-Solomon code over what field?

In principle, it should be over F2 to the k. Now, of course, if k is 64 or something

here, you might not even go all the way to field of 2 to the 64 elements. You might
divide it up again as we did in the packet error correction. But basically, the way this

goes is you try to keep this as small as possible so that this can still work.

There's a tradeoff between the complexity of the two codes and therefore, the rates.

The decoder -- here we have an algebraic Reed-Solomon decoder, which, in the


simplest case, could just do error correction. Just take these k bits as a hard

decision on u hat and try to correct any errors that occurred in this inner loop here,

which we might think of this as being a 2 to the kre channel. So this whole thing
together is a 2 to the kre super channel. Sorry if you can't read that, but you heard

me say it.

So here's a channel that takes in 2 to the kre symbols, puts out 2 to the kre symbols

and sometimes makes errors according to whatever the error probability you can
8
achieve here is. Or some of the discussion back when we were talking about hard

decision with binary suggested that it would be better to pass along more reliability

information here along with just the hard decision. And the very simplest kind of
reliability information is if this guy really isn't sure, if either the noise is too large so

he can't make any decision or if the noise leaves him exactly halfway between two

code words so it's a completely ambiguous decision, or whatever criteria you have,
this might make erasures.

And I'm sure Ralf told you how to do error and erasure correction with Reed-

Solomon codes. Or this could be more likely that information just -- how likely was

the code word you actually saw? And there are various ways of devising an
approximate likelihood metric. You just do what makes sense that indicates that

some words, you're absolutely sure of in this decoding. And other words, you don't

know -- if you have one or more good candidates or you have no good candidates.
So based on this reliability information, you can actually incorporate that. Did Ralf

talk about reliability-based decoding? Soft decoding algorithms for Reed-Solomon

codes? No? OK, well, that's a shame because he and Alex Vardy are the inventors
of a very good class that's recently been improved.

So one of the knocks on Reed-Solomon codes is that, at one point, you might have

said a minus can only work with hard decisions. And then, fairly early, it was figured

out how to correct to erasures. But just in the last 10 years, we now have -- this is
no longer true. We can have soft, meaning reliability-labeled inputs. And this is a

plus. That means we can do soft -- and we have corresponding algorithms that can

do soft Reed-Solomon decoding. They are, of course, more complicated than just
straightforward error correction with hard decisions. But there's still polynomial

complexity. They're maybe an order of slightly higher degree of n, but still

polynomial in the block length or in k or d or whatever you're taking. And so we now


have some such algorithms that do make substantial gains.

And, of course, if he didn't tell you about them, I'm not going to tell you about them.

It's a rather specialized subject. But it's something that's really happened the last 10

years. And they've constantly been improved. The first ones, like one I came up with
9
called generalized minimum distance decoding, made a modest improvement. In dB

terms, 1 dB, where there's 3 dB to be got. Now they're up to 2, 2 1/2 out of the 3 dB

with these new soft Reed-Solomon decoding algorithms.

So anyway, if you pass some reliability information here, then you can improve the

performance of this decoder. This decoder knows which symbols are reliable and it
won't change in the decoding in which are unreliable or completely erased. Where

there's no information about them, that actually helps you in the decoding. If you
knew that you had s erasures rather than s errors, this helps you quite a lot. We can

decode twice as many erasures as we can decode errors. And similarly, soft

information in between can improve the capabilities of your decoder.

All right. So that's the set up. You can see why it makes a certain amount of sense.

From a complexity point of view, what it does -- here, we're going to do maximum
likelihood decoding, complexity order of 2 to the k. But we're not going to let k get

very large, so this is feasible. Then we're going to let the code get very long, as we

know we have to do from Shannon's Theorem, using this outer code. But now we
know we have polynomial complexity Reed-Solomon decoding algorithms, even for

the soft decision case.

So if we balance the complexity here, basically let this length be exponential and

this length, as it is -- it's of the order of 2 to the k itself -- then we can get polynomial
complexity overall, all right? The overall complexity is going to be of the order of a

small power of 2 to the k. The overall block length is going to be of the order -- is

going to be linear in 2 to the k. And therefore, you're going to get polynomial


decoding complexity.

The other question is performance. And you can show that if you -- well, even if you

don't use reliability information, you can get all the way to the capacity of this

underlying channel here with exponentially-decreasing error rate for any rate below
that capacity. Not as good an exponent as if you just used random codes per

Shannon and coded for the channel, but nonetheless, you basically get what you

want. So using two smaller, simpler codes to create one long code turned out to be

10
a good idea.

First of all, for proving this theoretical result, you could get exponentially-decreasing

error rate for polynomial increasing decoding complexity. And within about 10 years,

this is what people actually did. And, in fact, what people actually did -- I've got a
block code here, but what people actually did was they used a short constraint

length convolutional code here, which we'll be talking about after the break.

Convolutional codes tend to have a better performance complexity tradeoff,

particularly in soft decision situations than block codes do. So in a practical system,

it's almost always better to use a convolutional code than a block code. We'll
discuss the reasons why.

So this convolutional code here, there is a maximum likelihood sequence detector

for the convolutional code down here, which is the Viterbi algorithm, which you've

probably all heard about. And this is very practical as long as the constraint length,
k, now becomes the constraint length, nu, of the code doesn't get too large. So for

about 20 years, the standard for space communication was this was a (255, 223,

33) Reed-Solomon code over F-256. This is a constraint length 6, rate 1/2
convolutional code. This is a 64-state Viterbi algorithm.

So what's the -- again, if we have a look at the error channel, here is a stream of

bits coming along. After encoding and decoding, what we typically see is bursts of

errors, all right? So you get a burst that starts -- in convolutional codes, it can start
at a random time, end at a random time. You might get another discrete error event

over here. In other words, most of the time, the decoder is decoding properly. What

you're receiving is what you transmitted in a streamwise fashion. The bits just are
transmitted forever. But every so often, the decoder gets off on the wrong track,

puts out errors for a while, then gets resynchronized, and decodes correctly again.

Is my picture clear? I'm using words, mainly, to tell you what it represents. OK.

So now, basically, what you want to do is, just as we did in the packet
communication case, you want to send multiple parallel streams, up to 256 of them,

such that in this stream, you're going to get independent error blocks. And this one,
11
you're going to get errors occurring somewhere else. They're going to have a
typical length. You can very well estimate what the typical length of error events is,

but they occur randomly. And now you use the Reed-Solomon code, coding across

this, and that will basically give you this picture over here.

How would I get this? One way is by using a block interleaver, which is described in
the notes. A block interleaver, you basically take a long block from the convolutional

code and you just encode and transmit along here and ultimately decode. And after

a very long time, you come back and start transmitting the second row just as part
of the same stream. And then the third row and so forth. And if the block length n,

here, is long enough, the errors in different rows will be effectively independent, all

right?

So again, you wind up with very long blocks in this scheme, but analytically, it works.

And in practice, this is roughly what's done, except there is such a thing as a
convolutional interleaver, which again, operates streamwise and actually performs

better in practice. But I won't take the time to go into this because we're not going to
really be getting into things like interleavers. Do you see how this works? OK. So it

works well theoretically. It gives us the theoretical result we want. Works well in

practice.

As I say, this was the standard for space communications during the '70s and the

'80s. Around the end of the '80s, they upgraded this. First of all, they replaced this
by a nu equals 14, rate 1/6, lower rate convolutional code, so this now becomes a 2

to the 1fourth state Viterbi decoder. They built this huge rack at JPL that had 8,192

parallel decoding units, add-compare-select units in the Viterbi algorithm, to decode


this thing. But, of course, they only need to do that in one place, in the space

program. Everything comes back to JPL. So they built this BVD, Big Viterbi decoder.
And first, they used it with this code and then they souped up the Reed-Solomon

code. And by about 1990, 1991, they got this thing operating up within about 2 dB of

the Shannon limit, OK? Which was considered a heroic feat at the time, or even
now.

12
So that kind of tells you, in the terms we've been using in this course, just what you
can get out of this kind of approach. At that point, that was considered coding is

dead. There's really nothing more to do. We're within about 2 dB of the Shannon

limit and it's inconceivable that we could actually get any closer than that. Second
half of the course will, of course, tell us how we can get within 0.0045 dB of the
Shannon limit. But that was the state of the art as of about 1990, 15 years ago.

Questions? Comments? OK.

All right. I'll just mention a third application is burst error correction, which really,
we've already been talking about. Suppose -- let's let this be a naked channel now.
Suppose you're transmitting with an interleave scheme like this. And now, in the

same way -- it's a radio channel or something and every so often, you have an
outage just due to fading, or a thunderstorm, or somebody turns on his vacuum
cleaner, or whatever. Every so often, as you transmit, you're going to get bursts.

You don't have independent, identically-distributed errors. The character of the


channel is such that you see bursts of errors, which is precisely what you do in this
super channel here. Every so often, you see a bunch of errors if you have the

original transmitted scheme to compare it to.

All right. In exactly the same way, you can code -- with something like this block
interleaver, you can code across channels and correct the bursts with Reed-
Solomon codes. And here, it's quite easy to show that, because Reed-Solomon

codes have optimal parameters n, k, d, that Reed-Solomon codes do the best


possible job of correcting burst errors that you possibly could do, given some
specification for how long the bursts could possibly be and the minimum guard

space between bursts. This is classical stuff. Reed-Solomon codes are optimum
burst correctors if you describe the burst channel by max burst length in a minimum
guard space. Is that too brief? Probably is. Give you some impression of what we're

talking about.

So Reed-Solomon codes are optimum for burst error correction, which, most real
channels are bursty. The only non-bursty channel that we really have is the additive
white Gaussian noise channel, ultimately. That is the channel in space

13
communications, which is why you can apply a nice, textbook-type code like this to
the space channel. For any other channel, typically, you have to use some kind of
interleaving to break up the bursts so that you can -- this simply is no information

over a long time. If you tried to encode crossways on this, you'd have to have a
code that's long enough to see the average burst's behavior. We'd make the burst
have an ergodic model. The way to do that in practice is to code across it so you get

independent snippets of widely separated bursts which can be assumed to be


independent. Yeah.

AUDIENCE: [INAUDIBLE]?

PROFESSOR: No different, really.

AUDIENCE: [INAUDIBLE].

PROFESSOR: Yeah, in space channel -- all right. Here, I'm talking about power limit. That's a good

question. But if you think about the band-limited regime, let's suppose -- so this
would be some [? MRE ?] channel. You would use some multilevel signaling
scheme like QAM with a large signal set or QPSK or something like that to get a

basic transmission with a discrete signal set that doesn't lose too much over
sending Gaussian over the channel. Again, this will be later in the course. We'll talk
about this in detail.

Other than that -- well, all right, so now this is an MRE channel. If M is large enough,

then you could apply Reed-Solomon codes right here. But typically, it's not. M is like
16 or 64. And so Reed-Solomon codes would still be too short. You would not be
able to get long enough codes over such a channel. So a similar kind of thing can

be done. And the history of bandwidth-limited coding pretty much parallels that of
power-limited coding.

What corresponds to block codes here is lattice codes for this channel. What
corresponds to convolutional codes is trellis-coded modulation. And again, once you

handle the basic physical channel with these codes that are well matched to that
channel but whose complexity increases too rapidly if you make them long because

14
of exponential decoding, then you can always clean up whatever errors you make.

You can think of this as a cleanup function out here. Typically, the rates out here

are very high. You don't need much redundancy. It doesn't cost too much in overall
rate. But even with very high rates, like 223 over 255, you can do a rather large
amount of error correction. You can correct up to 16 errors or 32 erasures with us

channel. So this is a cleanup function.

You basically work very hard in here, get the best error probability you can, which --
it doesn't have to be too low. 10 to the minus 2, 10 to the minus 3. And then you use

this to clean it up and that drives the error probability down to 10 to the minus 12 or
10 to the minus 15 or whatever you really want. But the principles are the same in
power limit and band limit. But nice question. Any other questions? OK.

So as a result, Reed-Solomon codes are the general purpose code that has

emerged from algebraic coding theory. A good example of burst error correction is,
you know on your CDs, there is an error correction scheme incorporated such that if
you scratch the CD, it still works. How does that work? It could only be because

there's some error correction in there.

What, actually, is the error correction scheme? I'm not sure I know what the latest
and greatest is, but generally, the error correction schemes for magnetic memory
recording like that have used Reed-Solomon codes -- have used two Reed-

Solomon codes in interleave fashion like this. So the across code is Reed-Solomon.
The down code is Reed-Solomon.

As a result, if you make a scratch through the disc, you will cut all of these codes,
but you will cut each one in a small enough place so that you can correct it. So you

can think of this as a actually a physical laying down of Reed-Solomon codes on the
two-dimensional surface of the disc in such a way that -- well, even if your scratch
came right across one whole channel here, then, of course, this decoder would be

dead. But you still have somebody decoding in this direction to pick up the errors.
And so that's why you can completely abuse these discs and they still work.

15
And when you think about what's actually going on, think of the transfer rates from a
CD to your computer. It's up in the hundreds of millions of bits per second. Maybe

its gigabits by now. Perhaps it is. And you've got this Reed-Solomon -- these are
non-trivial Reed-Solomon decoders. These are like this 16 error correcting decoder
out here. And so these are working away at that data rate, capable of doing

decoding. And they're cheap enough so that they're just buried in part of one of the
integrated circuits in your disc player circuitry.

So that'd give you a physical sense for just how much error correction power and
speed you can get out of this class of code decoder, OK? So wherever you want an

extremely powerful code and you're a little bit less concerned with getting to
capacity, Reed-Solomon code is the way to go. It's absolutely part of your toolbox,
OK?

The last topic in chapter eight is BCH codes, which are binary codes. So they're

binary linear block codes of the class we've been discussing, comparable with
Reed-Muller codes. Reed-Muller codes were discovered in 1954 in two separate
papers by Reed and by Muller in two separate points of view. And there are a

million points of view you can have for Reed-Muller codes. We've used yet a third
one.

BCH stands for Bose and Chaudhury and Hocquenghem. Bose and Chaudhury
were both professors in the US. They wrote a paper about 1960 introducing Reed-

Solomon codes, which, by the way, and introducing BC codes -- Bose-Chaudhury


codes -- Reed-Solomon's paper was in 1960 also. These were completely
independent of each other. And then this fellow named Hocquenghem in France,

whose name I can't and few people can pronounce, turned out he had actually
invented these codes in 1959 in a paper that unfortunately was in French, so few
people read it.

But anyway, in honor of all three of these people, these are called BCH codes.
These fit right into the main theme of algebraic coding theory at that time. Maybe
stimulated it because these turned out to be cyclic codes, or at least as they were

16
originally introduced. And so when I was in school, which was in the early '60s,

these were considered the greatest thing going. People were interested in binary
codes, algebraic codes because they are slightly better than Reed-Muller codes.

Now, I would say, there's sort of been a swing back. Shu Lin gave a paper about 10

years ago that said Reed-Muller codes are not so bad. In fact, in terms of
performance versus complexity for certain kinds of decoding schemes, trellis-based
decoding, Reed-Muller codes are better than BCH codes. So from a more modern

prospective, we're more interested in Reed-Muller than we are in BCH.


Nonetheless, you'll hear about these a lot if you read the literature and so you ought
to know what they are.

OK. Conceptually, what are they? Since we already know about Reed-Solomon

codes, the way that I'm going to characterize BCH codes is as subfield subcodes of
the Reed-Solomon codes. Now, if we have a Reed-Solomon code over F2 to the 8
or something, the field with 256 elements, or any power of 2, contains a subfield

which is just 0 and 1, all right? All fields contains 0 and 1. If the field has
characteristic 2, then 0 and 1 all by themselves are the prime subfield. So it's
possible that there are certain words in the Reed-Muller code -- sorry, in the Reed-

Solomon code that are completely made up of 0's and 1's.

For instance, the all-0 word is certainly in the Reed-Solomon code, so we have, at
least, that one that's made up of all zeros. Almost always, we find that the all-1 word
is a code word. So we have a code with all 0's and all 1's. And there might be more

of them. And that's actually very straightforward, to find out how many completely
binary words there are in a Reed-Solomon code.

So it's the binary subfield. We have F2 is a subfield of F2 to the M for all M. OK, so
a subfield subcode, if we have an n, k, d equals n minus k plus 1 Reed-Solomon

code, then the set of all BCH code is the set of all binary code words, code words
that just use the 0 and 1 of the code alphabet. Call this c in C. So we call this C
prime.

OK, so what are its parameters going to be? Its length is still going to be n, right? All
17
the code words here are n-tuples, so these have got to be n-tuples. We don't know
its dimension. Let's just call it k prime. That's how many such code words are there?
We're going to have to do some combinatorics to find that out. And what's the

minimum distance going to be? Anybody?

AUDIENCE: [INAUDIBLE].

PROFESSOR: In general, well, it's certainly not going to be less than because, by definition of the
minimum distance, the minimum Hamming distance between any two code words in
this code is d, so we're just looking at a subset of the code words in this code. And

the minimum squared distance has to be at least d. There's an example given in the
notes. If we let c be the 4, 3 -- I think it's the 4, 2, 3, over F4, then C prime turns out
to be the 4, 1, 4 code over F2.

In other words, the only two code words in this code that are all binary are the all-0
and all-1 sequences. All right? So we actually get a minimum distance of 4. It has to
be at least 3 in this particular case, so here's an example where it could be greater
than 3. Or if there are cases where the subcode just consists of the all-0 word, then

that, by convention, has an infinite minimum distance. So distance could be greater,


but it never can be less than what you started with.

On the other hand, the dimension is clearly going to go down. If it were still equal to

what it was, then we'd have a binary MDS code and we know we don't get binary
MDS codes. So in general, k prime decreases quite a bit. In this case, it just
decreases by 1. And we have to find out how it decreases.

OK. I'm a little uncertain as to how much of the derivation -- so we need to find --

AUDIENCE: Is that [INAUDIBLE] at all points of the field, is it?

PROFESSOR: No. One of the characterizations you have of Reed-Solomon codes, I assume, was

that they're the set of all polynomials of length n that evaluate to 0 at some
consecutive set of n minus k powers of alpha. Did you see anything like that?

AUDIENCE: Yes.

18
PROFESSOR: All right. In other words, they're -- let me write that down because RS code is the set

of all multiples f of x, g of x, of degree less than n where of g of x equals the product


of x minus alpha to the i for i equals 0 to n minus k minus one or something like that.
This is called a generator polynomial and -- so what does this mean? In other

language, all polynomials of degree less than n that have roots at 1 alpha, alpha
squared, or so forth. It's better to make this i equals 1 to n minus k. You can also
make it minus i. There are lots of details in here. But I'll just ask you broadly to recall

that characterization. This is a --

AUDIENCE: I have another question. [INAUDIBLE] are not linear?

PROFESSOR: OK, let's ask that. So we've got a set of binary n-tuples, OK, defined as the subset.
So what do we need to do to see whether it's linear or not?

AUDIENCE: [INAUDIBLE].

PROFESSOR: We just check the group property. Is it closed under addition? Suppose we take two
of these code words we add them together, component-wise, vector form. Seems to

me we get A, that gives us another code word in the Reed-Solomon code because
it's closed under addition, and B, by the addition rules, it's going to be another
binary code word, all right? So it's another binary code word in the RS code,

therefore, it's in C prime. So that's proof that, in fact, this code is linear.

AUDIENCE: What about [INAUDIBLE]?

PROFESSOR: That's trivial. For a binary code, we only have to consider two scalars, 0 and 1.

AUDIENCE: [INAUDIBLE]?

PROFESSOR: But now I'm asking whether this binary code is linear. There's the code over F2.
Yes? You had a --

AUDIENCE: Same question. The base field is F2 to the x, so if you add 1 and 1 there, it wouldn't
be just 0. But I guess if you restrict the field to F2, you have to check everything
again, don't you?

19
PROFESSOR: No, because we said F2 is the subfield of F2 to the M. And what we actually showed

was that if you just stay in the prime field, the integers of the field, then you get
exactly the same addition and multiplication rules as you do in the prime field. So in
fact, in a field of characteristic 2, we always have 0 plus 1 is 1, 0 plus 1 plus 0 is 1, 1

plus 1 is 0. That last one is the only one you really have to check. So that accounts
for the definition of characteristic, and that's the reason we can always use just plus
signs, because subtraction is the same as addition.

AUDIENCE: [INAUDIBLE]?

PROFESSOR: Yeah. We showed -- well, very simple proof. The order of a subgroup has to define

the order of the group it's in. The additive group of the prime field has order 2 to the
M, so it has to be some divisor of 2 to the M, which is a prime. I think we proved that
another way, but that's a very simple proof. OK, great questions, because it shows

you're thinking about how this relates to what you learned back in chapter seven.
But at this point you've kind of got a little vague about chapter seven, what's really in
there. By Wednesday, you're really going to know chapter seven well. Let me tell

you. OK. Good. Thank you for the help. Because these are all things that I should
have just said up here, but they come out much better, actually, if they come out in
response to questions. OK.

So here's one characterization of a Reed-Solomon code which apparently you did

see. It's the set of all multiples of some generator polynomial. The generator
polynomial is described as this polynomial of degree n minus k whose roots are
precisely alpha up through alpha to the n minus k. You did say that? OK. So

similarly, the BCH code -- I'm just going to assert this just to show you how it goes.
It's the set of all multiples of g prime of x, which is the minimal degree binary
polynomial with the same roots. Now, how can a binary polynomial -- alpha to the n

minus k.

How can a binary polynomial have non-binary roots? Well, it clearly can. Let's
consider g of prime of x. It's an element of F2 to the x, the set of all polynomials
over F2, which is a subset of the set of all polynomials over 2 to the M for the same

20
reason. F2 is the subfield of F2 to the M, so some of the polynomials over F2 to the
M have purely binary coefficients, all right? So that's the sense -- that's how we can
get between these two things.

So some binary polynomials have roots in GF of 2 the M let me give you an

example. Let's consider x squared plus x plus 1 as a polynomial in F4 of x, which is


certainly is. It's a polynomial of degree 2 and F4 of x. What are its roots in F4 of x?
Does it have any roots in F4 of x? Does it have any roots in F2 -- sorry, I'm asking

whether it has any roots in F4. Now I step back and ask if it has any roots in F2. So
how do we do that? We just ask if F of x is 0 for x equals 0. We get 1. So it's not a
root of 0. Substitute 1, we get 1 plus 1 plus 1. 1 is not a root F of x.

All right. Well, we have two more candidates. How about alpha? For alpha, we get F

of alpha equals alpha squared plus alpha plus 1. And if you remember the addition
table of F4, however it's constructed, this equals 0, OK? 0, 1, alpha, alpha, squared.
0, 1 alpha, alpha squared. 0, 1, alpha, alpha, squared. This is plus. 1, alpha, alpha,

squared. 0, 1 plus alpha is alpha squared. 1 plus alpha squared is alpha. Any of
these is telling you the same thing. 0, 1, 0, 1 alpha, squared alpha.

So we conclude from that that 1 plus 1 plus alpha, which is alpha squared, plus
alpha squared is 0. So alpha is, in fact, a root of this. And what about F of alpha

squared? Alpha squared squared is alpha fourth, but we can reduce that mod 3 --
the exponent mod 3. So this is alpha plus alpha squared plus 1 and that is equal to
0. So alpha and alpha squared are roots of this polynomial.

There's a lovely fact proved in the notes, which is that every element of -- facts.
Every element of F2 to the M is a root of x 2 to the M plus x.

OK, did you see something like this last time? I see some shaking of heads. OK.
And how many roots could there possibly be of x 2 to the M plus x by the

fundamental theorem of algebra?

AUDIENCE: 2 to the M.

PROFESSOR: 2 to the M. So that means that x 2 to the M plus x -- and for every root, it's got to
21
have a factor of the form x minus, or in this case, plus beta. So this must have a
complete factorization of the following form. x 2 to the M plus x is simply the product

of x minus beta. All right. In fact, you can see that 0 a root, obviously. 1 is a root,
obviously. And we've checked up here. Let's see, what are we interested in? Here's
another example. x to the fourth plus 1, f of -- if we have g of x equals x to the four -

- sorry -- plus x, then g of alpha is alpha to the fourth plus alpha. But alpha to the
fourth is equal to alpha, so that's 0. g to the alpha squared equals alpha to the 8th
plus alpha squared. Alpha to the 8th is alpha to the fifth is alpha squared, so that

equals 0. So, in fact, that's true for that case.

Every irreducible polynomial g of x and F2 of x whose degree g of x divides 2 to the


M -- is that right? This may be 2 to the M minus -- I think it's whose degree divides
M -- is a factor of x to the 2M plus x in F2 to the x. OK. So let's give you an example

again. Example M equals 2. This means, what are the irreducible polynomials
whose degrees divide 2? The prime polynomials of degree 1 certainly divides to our
x and x plus 1. Do they divide x to the fourth plus 1? Yeah, they do.

How many prime polynomials are there of degree 2? There's only one, x squared

plus x plus 1. And if you divide it out into x to the fourth plus 1, it goes evenly. It
turns out that what this implies is that this is actually an if or only if. A polynomial g of
x divides x to 2 to the M plus x if and only if g of x has a degree which divides M and

g of x is irreducible. So this implies that x to the M plus x factors in F2 of x into the


product of all irreducible polynomials whose degrees divide M. And I think you did a
homework problem that used this, even though it was never proved in class. So you

at least read about it.

Again, for this example, that should mean that x to the fourth plus 1 equals x times x
plus 1 times x squared plus x plus 1. Is that true? Pretty easy to see that it is. x plus
1 times x squared plus x plus 1 is just x cubed plus 1. Multiply by x -- I'm sorry, this x

fourth plus x. How many times have I done this? Should be x to the fourth plus x.
Yes. OK, it's kind of miracle when you first see it. This works over any field for any
PNM again, lest you get this kind of factorization.

22
OK, but now, comparing this to this, what does that mean? That means this is one
factorization. This is in -- we can do a complete factorization in F2 to the M of x. We
can only do complete in the sense that we reduce it to a product of degree 1
polynomials. In F2 of x, we can't do a complete factorization. This is similar to, over

the complex field you can always completely factor a polynomial into one-
dimensional polynomials of this form, where beta is the set of roots of the
polynomial. Whereas over the real field, you might have to have some degree 2

polynomials in there.

This is exactly analogous. You can't get complete factorization over this subfield.
But what does it mean? Here we have a polynomial of degree 4, has 4 roots. We

factor like this. So each of these factors must contain a number of roots equal to its
degree. So what are the roots? This is the polynomial that has root 0. This is the
polynomial that has root 1. As we just showed, this is the polynomial that has the
roots alpha and alpha squared, OK? So it's always going to turn out that way.

So you can group the field elements into subsets where each subset is the set of
roots of some irreducible binary polynomial of some degree that divides M, OK?
These are called cyclotomic subsets. Whence the name of Elwyn Berlekamp's

company, Cyclotomics. All right. So this is where, I think, the theory gets just
absolutely lovely and unexpected, at least when you start in. And people who have
a taste for these things, I encourage you to read a real book on the subject.

Berlekamp's is one of the most elegant. But there are lots -- because it's an elegant
subject, there are lots of elegant treatments of it. All right.

So remember what we were trying to do in the first place. We were trying to find the
dimension of BCH codes. So where are we now? We've defined a BCH code as the

set of all binary polynomials with roots alpha up to alpha to the n minus k where n
and k are the parameters of the parent Reed-Solomon code. And alpha is in the
parent field. So alpha in F2 to the M.

OK. One final fact, I guess, we need to know here. All right, if alpha is a root -- now

I'm talking about any field -- then alpha squared is a root of a binary polynomial.

23
And again, this is due to a very lovely fact, which is that squaring is linear in fields of
characteristic 2. Why is squaring linear. A little side calculation. a plus b squared

equals a squared plus 2ab plus b squared but characteristic 2. We can wipe that
out. So in general, f of x quantity squared is f of x squared. And that's why, if f of x
evaluates to 0, then f of x squared must evaluate to 0 since 0 squared is always

equal to 0.

All right. So alpha is a root if and only if alpha squared is a root. Is that true here?
Yes. Alpha alpha squared. Well, is it also true if we squared alpha squared? What is
the square of alpha squared? It's back to alpha, all right? So these sets now turn

out to be cyclotomic cosets, they're called, which is the set of all elements and their
squares. We get alpha, alpha squared, alpha fourth, alpha eighth. And then, at
some point, it cycles. Why? Because what does alpha to the 2M equal?

AUDIENCE: [INAUDIBLE].

PROFESSOR: No. Alpha. Right. Alpha to the 2M minus 1 is 1, so alpha to the 2M is alpha. So it

comes back to alpha again because of that. So we get a finite number of roots
which, of course, are equal to the degree here. In some cases, it's fewer. It's
whatever the degree is. It's d, which divides M.

So from this, we simply start -- suppose we want alpha to be a root. Let's start with

high rate codes. I think here in the example, in the text, I used F16. F16 is going to
have -- the elements of F16 are going to be the roots of all the degree 1, 2, and 4
polynomials because M is 4 and these are the degrees that divide 4. So again, we

get x and x plus 1, which have roots 0 and 1. We get x squared plus x plus 1.

What are the roots of x squared plus x plus 1 and F16? There are only going to be
two of them and they've got to have the property that if we start with the first one,

let's call it beta and beta squared. Beta fourth has got to equal beta because we
only have two of them. That more or less forces beta to be alpha to the fifth or alpha
to the tenth. Let's see if that works. We get alpha to the fifth, alpha to the tenth.
Square that, we get alpha to the 20th. But we now take the exponents mod 15 and

we get 5 again, all right? So those are the two roots of x squared plus x plus 1 in
24
F16. Obviously, I'm not -- I guess that was a legitimate proof, but it was certainly
quick.

OK, there are three polynomials of degree 4. By the way, I think you can imagine
how, given this result, we can now enumerate the polynomials of each degree. We
now have an expression. There are two irreducible polynomials of degree 1. So how

many more of degree 2 is it going to take to make the overall degree 4? One of
them. So we're looking for a single -- there must be a single irreducible polynomial
degree 4. Similarly over here, all right, we've accounted for four of the elements of
F16. The only other possible degree we could have is 4. So how many irreducible
polynomials are there of degree 4 over in F 2 of x? Must be three of them. And it

always works out.

So, from this you can get a closed-form combinatorial formula again in terms of the
Mobius inversion formula. By doing a sieve or by some other method, you can find
out that the three irreducible polynomials of degree 4 are those three and the

corresponding roots for this one -- if this is the irreducible polynomial you use to
define the field, then alpha is a root of it and you're going to get alpha, alpha
squared, alpha fourth, alpha to the eightth. Alpha to the eightth squared is alpha to
the 16th, but that equals alpha, OK?

So these are the roots of that equation. Here you get -- sorry, x fourth plus x third
plus 1, since that's basically the reverse of this, turns out you get alpha to the minus
1, which is alpha to the 1fourth. Square that and you get alpha to the minus 2, which
is alpha to the 13th, or equivalently, alpha to the 28 -- reduces to thirteen. This
becomes alpha the 11th, alpha to the seventh. and then alpha to the seventh gives
you alpha to the 1fourth again.

AUDIENCE: [INAUDIBLE] same thing about taking the square [INAUDIBLE]?

PROFESSOR: Correct.

AUDIENCE: --from the [INAUDIBLE]?

PROFESSOR: Because you go around in a circle. So it's a finite set. So you can obviously divide by
25
2. So there's one cyclotomic coset. Here's the second one. And then the third one
must be whatever's left over. What's left over? We don't have alpha to the third yet.
Alpha to the third, alpha to the sixth, alpha to the 12th, and then alpha to the
2fourth, which is alpha to the ninth. And alpha to the 18th is equal alpha to the third
again.

OK, what's the moral of this? We want to start off with -- let's just take a look at the
Reed-Solomon, the very first one. We just want n minus k to be 1 and we just want
to have root alpha in here. Well, if we get root alpha, we're going to get root alpha
squared, alpha fourth and alpha eightth along with it. If we want a binary polynomial

with root alpha, it has to this one. And it's going to have four roots. All right.

So for root alpha, we're going to get x fourth plus x plus 1 as our generator
polynomial. And it's actually going to have roots alpha and alpha squared. So it's
going to be the same one as you would get if you wanted alpha and alpha squared.

That means n minus k is already up to 2, which means your distance is already up


to 3. Distance is n minus k plus 1. So this leads to a code with n equals 16 -- is that
right? Yes. BCH codes are always defined to be one shorter. We're going to have
four redundant bits, so k equals 12 and distance equals 3.

I think that's not quite right. When we're doing BCH codes, they're always defined
just on the non-zero elements of the field. So we start with the shorter Read-
Solomon code of length q minus 1. 15 in this case rather than 16. And 0 never
appears here as one of the possible roots. So let me just do that. I apologize I
haven't carried through all the possible steps. So we're getting all of the polynomial
multiples of x fourth plus x plus 1 of degree less or equal to 14 rather than 15. So

it's n equals 15. n minus k is going to be 4. So this is 11. And the distance is going
to be greater than equal to 3. In fact, it is 3. So it's a 15, 11, 3 code, which is, in fact,
the Hamming code. Yes?

AUDIENCE: [INAUDIBLE]

PROFESSOR: Oh, I'm sorry. Wasn't this fun? You're supposed to do that, Ashish.

26
AUDIENCE: [INAUDIBLE].

PROFESSOR: OK, well, so you can see how it goes. So this gets us both roots alpha and alpha
squared. If we want alpha third and alpha fourth, we already have alpha fourth from
this polynomial. So all we need is this one, which gets us another. And that gives us

an n equals 15, k equals 7, d greater than or equal to 5, which, in fact, is a 15, 7, 5


code. So we get additional codes. There's a table in there that shows you that most
the time for short block lengths for distances which are equal to a prime power, we
get the same parameters as Reed-Muller codes. For lengths 64 and higher,
sometimes we do a little bit better.

In addition, we get a lot more codes for every, basically, odd distance when you
take the length odd or even distance if you take the length even. And that's
definitely an asset, that you have a larger number of codes to choose from. So
there's no doubt that in terms n, k, d, BCH is the larger and, for longer block length,
slightly better class than Reed-Muller codes. But as I said before, in terms of

performance versus complexity, Reed-Muller might still be the ones you want to
use. OK. Good luck in the review and good luck in the exam. I will see you
Wednesday.

27

You might also like