DISTRIBUTED VIDEO CODING WITH ZERO MOTION SKIP AND
EFFICIENT DCT COEFFICIENT ENCODING
Guogang Hua1 and Chang Wen Chen2
1
Dept. of Electrical & Computer Engineering, Florida Institute of Technology, Melbourne, FL 32901 USA
2
Dept. of Computer Science & Engineering, University at Buffalo, State University of New York, Buffalo, NY 14260 USA
ghua@[Link], chencw@[Link]
ABSTRACT tation. Especially in the new video coding standard, H.264/MPEG-
4 AVC, numerous new features, (such as rate-distortion optimiza-
In this paper, we propose a suite of efficient schemes for the repre- tion, intra-prediction, and multi-reference, etc.) are adopted, which
sentation of spatial and temporal correlations of video signals in dramatically increase the encoding complexity. However, the cor-
distributed video coding (DVC). These schemes consist of zero relation exploitation alone is not able to achieve the latest com-
motion skipping, Gray codes, and sign bits coding of DCT coeffi- pression efficiency in hybrid video codec. Various strategies have
cients developed to improve the overall rate-distortion (R-D) per- been designed to represent the de-correlated video efficiently,
formance of distributed video coding. Existing schemes have fo- including the well-known zig-zag scan, run length code, entropy
cused on exploiting the spatial and temporal correlation without code, and skipped macroblock. In particular, when the motion
sufficient investigation on the efficient representation of the corre- vector is zero and no quantized prediction error for the current
lation. We believe this is one of the reasons that there still exists macroblock is present, the macroblock can be skipped. Skipped
sub-stantial gap between current DVC and traditional hybrid video macroblocks are the most desirable feature since they consume
coding. In traditional hybrid coding, a suite of efficient representa- very few bits in the coded bitstream.
tion schemes such as zig-zag scan, run length code, and skipped
macroblock have contributed significantly to excellent R-D per- 1.1. Existing distributed video coding schemes
formance. The efficient representation schemes we developed for
DVC in this research will also lead to improved rate-distortion Many distributed video coding schemes share a general architec-
performance comparing with existing DVC schemes. We present ture. Typically, the encoder applies error control codes to each
in this paper the overall framework for the proposed zero motion frame and generates syndrome bits. To achieve compression effi-
skip and the analysis for efficient representations of both DCT ciency, only part of the syndrome bits (punctured from the original
coefficients and their signs. The experiment results show that the syndrome bits) is usually sent. The decoder uses the correlation
distributed video coding based on these efficient representations is between the video frames to construct an estimation of the current
able to achieve considerably improved rate-distortion performance frame and such estimation can be viewed as a noisy version of the
over existing schemes. original frame. Then the error control decoder combines the re-
ceived syndrome bits and the estimation of the current frame to
Index Terms— Distributed video coding, zero motion, DCT decode the current frame.
coefficients From this process we can see clearly that the performance of
distributed video coding depends on two factors: the first one is the
1. INTRODUCTION accuracy of the estimation of current frame while the second one is
the decoding bit error rate performance of the error control codes.
In traditional video compression, as standardized by MPEG and Most of current research on distributed video coding tried to ad-
ITU-T H.264, the encoder is much more complicated than decoder. dress those two aspects in order to achieve good video coding per-
This class of codec architecture has been driven predominately by formance. In [4], the author proposed DCT domain and hash code
the broadcasting or “downlink” nature of traditional video applica- based distributed video coding method. Block wise DCT is applied
tions. With the advances in contemporary technologies, emerging to each block in Wyner-Ziv frame to exploit the spatial correlation.
applications demand low complexity encoder, in particular for The transform coefficients are grouped together to form different
mobile reception devices. With more and more mobile handsets coefficient bands, and each coefficient band is then encoded inde-
supporting multimedia capturing, playing and communication pendently. A hash code is generated and sent to decoder to help
capacities, the new media-rich “uplink” wireless video transmis- motion search to find best matched block in reference frame. In
sion applications need a total redesign of these traditional [5], the authors used highly compressed version of each frame as
downlink friendly video architectures. The new architecture calls reference to perform motion estimation at the decoder in order to
for low-power and low complexity video encoding at the mobile build more accurate estimation. Although there is some cost for
sensor unit. Motivated by these emerging applications, DVC compression and transmission of those low quality frames, the
schemes [1][2][3] were recently developed based on distributed overall bit rate can be reduced because more syndrome bits will be
source coding theory. saved by accurate estimation. In [8], the authors proposed a rate-
The traditional hybrid video encoder exploits the spatial and adaptive LDPC Accumulate (LDPCA) codes and Sum LDPC Ac-
temporal redundancy existing in the video sequences. It is the cumulate (SLDPC) codes for distributed source coding. The au-
searching of such correlation that requires huge amount of compu-
978-1-4244-2571-6/08/$25.00 ©2008 IEEE 777 ICME 2008
thors claimed that those codes with code length of 6336 bits can different bits between current frame and reference frame, which
achieve within 10% of the Slepian-Wolf theoretical bound. will in turn reduce the overall bit rate.
Although these schemes have tried to address the two key fac- Third, we developed a special scheme to code the sign bits of
tors in DVC, there is still substantial gap between the compression DCT coefficients. The coding of sign bits has been known as a
efficiency of distributed video coding and that of traditional hybrid difficult problem in bit plane coding due to its significant rate cost.
video coding. We believe the reason for such gap lies in mismatch The difficulty in coding the sign bits lies in the zeros in the quan-
between maximizing the exploitation of spatio-temporal correla- tized DCT coefficients. In this research, we actually skip the zero
tion of video signals and the efficient representation of such corre- DCT coefficients and only code the sign of non-zero DCT coeffi-
lation. Without efficient representation for the correlation and cients. Since there are many zeros after quantization, the proposed
residual signals, the rate-distortion performance will not be maxi- scheme can greatly reduce the number of sign bits to code.
mized. Although there is fundamental difference between DVC The rest of this paper is organized as follows. Section 2 de-
and traditional hybrid video coding, the principles of efficient rep- scribes the proposed distributed video coding with zero motion
resentation can still be applied to DVC. It is this consideration that skip and the efficient representation of DCT coefficients. Section 3
motivates us to develop a suite of efficient representation for dis- presents the experimental results to show that substantial perform-
tributed video coding. ance improvement can be achieved with the proposed scheme.
Section 4 concludes this paper with some discussions.
1.2. Overview of proposed DVC scheme
2. DVC WITH ZERO MOTION SKIP
The proposed DVC scheme seeks to represent the spatial and tem-
poral correlation of the video signal more efficiently for distributed 2.1. Architecture of zero motion skip DVC
video coding. In particular, we seek to minimize the bits used to
represent temporal correlation by skipping the coding of zero mo- As indicated earlier, the performance of distributed video coding
tion blocks. We also developed two schemes to efficiently repre- depends on two factors: the prediction of current frame and the
sent DCT coefficients used to exploit spatial correlation of video decoding bit error rate performance of the selected error control
frames. codes. As shown in [9], simple motion prediction in decoder based
First, we look into the cases when some blocks in a video only on the previous decoded picture can not meet the expectation
frame are virtually the same as co-located blocks in the reference of practical application. In this system, we adopt the scheme in [5],
frame. We call this type of blocks zero motion blocks. In hybrid i.e., a low quality version of current frame is also sent to decoder
video codec, such as MPEG-2, these blocks are skipped without to help motion search. The low quality version picture is the down-
spending any bits to represent them. Without complicated motion sampled picture of the current frame. Such picture is encoded by
search, the encoder usually is able to identify these zero motion using the conventional DPCM method. The decoder decodes the
blocks. Traditional hybrid video codec uses very few bits to repre- down-sampling picture; and then up-sample it. By using the up-
sent these zero motion blocks to achieve very high coding effi- sampled low quality picture, the decoder can perform motion
ciency. In distributed video coding, we can take advantage of such search using the same scheme as in conventional video coding.
virtually complete correlation of these blocks so as to increase The motion search can be integer pixel or half pixel, single or
coding efficiency. multi reference.
Once zero motion blocks are identified, the encoder can Fig.1 depicts the architecture of the proposed distributed
choose to either send or skip these blocks. If these blocks are also video coding scheme. At the encoder, for each frame of the input
fed into the distributed video encoder, for example, either Turbo or video, zero motion vector blocks are identified first. This can be
LDPC encoder, then, the zero motion blocks can be used as a con- done by calculating the SAD between current block and the co-
straint on the error control decoding at the receiving end to en- located block in reference frame. If SAD(0) is smaller than a pre-
hance the performance of the error control code [6]. A more effi- set threshold, we assume the difference is negligible and just use
cient way to take full advantage of the zero motion blocks is to the block in reference frame to substitute the current block. These
skip these blocks without feeding these blocks to the distributed blocks will be skipped and the rate for channel codes will be re-
video encoder. This way, fewer bits will be generated and the rate duced. Without motion search, the increase in complexity for zero
of the encoded bitstream will be lower [7]. Since there are good motion detection at encoder is tolerable. We will show later that
percentage of blocks can be considered zero motion blocks, this there are significant portion of blocks in video sequences that can
lower rate will contribute significantly to the overall rate-distortion be skipped with zero motion vectors.
performance of DVC. Block based DCT is then applied to the non-skipped blocks.
Second, we seek to represent the quantized DCT coefficients After quantization and bit plane extraction, the bit stream is fed
more efficiently to exploit the spatial correlation of with each into LDPC encoder. Part of the syndrome bits were sent to de-
block. In distributed video coding, the Turbo or LDPC encoder coder. At the decoder, motion search is only applied to the non-
usually codes the bit plane of the DCT coefficients. The number of skipped blocks. After motion search, DCT and quantization are
bits needed to represent is proportional to the number of different applied to the corresponding block in reference frame. The DCT
bits between current frame and reference frame in a given bit plane. coefficients and the received syndrome bits are combined to feed
After quantization, the co-located DCT coefficients in current to the LDPC decoder to recover the DCT coefficients of the non-
frame and reference frame are usually very similar. Gray code is skipped blocks in current frame. For the skipped blocks, the pixel
more suitable to represent similar coefficient for channel coding value in the co-located reference frame is simply copied as the
since Gray code can guarantee that two successive values differ in pixel value of current frame.
only one bit. Such representation is able to reduce the number of
778
No Video
Video Zero motion DCT LDPC LDPC Q-1 IDCT
output
.…
.…
sequence Block ? and Q encoder decoder
bit plane bit plane
-1
Q and IDCT Rate
R t control
t l Motion estima-
tion
Down-sampling DPCM-frame encoder DPCM-frame up-sampling
decoder
Fig. 1 Distributed video coding with zero motion skip
The benefit of zero motion skip is triple folds. First, it reduces 2.3. Coding of DCT coefficient signs
significantly the bit rate to be sent to the receiving end since large
percentage of the video blocks can be considered zero motion The coding of sign bits has been known as a difficult problem in
blocks. Such reduced bit rate will contribute substantially to the bit plane coding due to its significant rate cost. In [9], the author
rate-distortion performance of the distributed video coding. Sec- chose not to code the sign bit plane the decoder and set the sign bit
ond, zero motion skip leads to reduced data for LDPC encoding at the same as the sign bit of the side information. They claimed that
the encoder. Reduced encoder complexity is very much desired for it achieved a better rate distortion tradeoff than the case when the
DVC. Third, zero motion skip also lead to reduced date for LDPC sign bits are encoded and transmitted.
decoding. Reduced decoder complexity will help speed up the In fact, the sign is more important than any other bits for a
decoding speed for real time DVC applications DCT coefficient. In this research, we take advantage of numerous
zeros in DCT coefficients as there is no need for zero coefficients
2.2. Gray code of DCT coefficients to have sign bit. We first only encode and decode the magnitude of
the DCT coefficients. After decoding, the decoder knows which
It is true that the performance of distributed video coding depends coefficients are zeros and which coefficients are non-zeros. At the
on the accuracy of prediction of current frame. However, the accu- encoder, only the signs of non-zeros coefficients are extracted and
racy of prediction in DVC is slightly different from that in tradi- encoded by LDPC and the syndrome bits are sent to decoder. The
tional hybrid video codec because of its bit-plane based encoding decoder first decodes the magnitude of the DCT coefficients of
strategy. From the error control code point of view, the amount of current frame. After decoding, the decoder extracts the signs of
syndrome bits needed to decode the current frame corresponds to initial estimation in the same frequency coefficients with the non-
the amount of bit errors and their distribution. When applied to the zero coefficients of current frame. With the received syndrome bits,
distributed video coding, it depends on the number of bits in cur- the decoder can decode the signs of non-zero coefficients.
rent frame that are different from corresponding bits in the predic- The benefit by using this scheme is that we only need to en-
tion frame. code and decode the signs of non-zero coefficients. The total num-
Therefore, it is inadequate to just carry out a good motion ber of bits will be reduced greatly since there are many zero coef-
search at the decoder. In some case, the motion search is very good, ficients, especially when with larger QP.
but the number of different bits is still large. For example, one
DCT coefficient in current block is 8, and the corresponding DCT 3. EXPREIENTAL RESULTS
coefficient in the prediction block is 7. In the case, the DCT coef-
ficients in current block and prediction block are very close, and In this section, we present the experimental result to confirm the
the prediction is pretty accurate. Suppose we use 4 bits quantiza- effectiveness of the proposed method presented in Section 2. We
tion, so the bit of that DCT coefficient in current block is 1000, use Carphone and Foreman sequences in our experiment. The
and that in the prediction block is 0111. There are maximum 4 frame rate is 30 fps.
different bits that will need good amount of syndrome bits to cor- First, we present the compression efficiency by using Gray
rect four bits difference. code to the DCT coefficients. Fig. 2 gives the PSNR performances
We believe Gray code can solve such problem. Gray code is a with/without Gray code. From the figure we can see that with
binary numeral system where two successive values differ in only Gray code, both sequences outperform that without Gray code by
one digit. When applied to the above example, the Gray code of 0.5~1dB, when all other conditions remain the same.
the DCT coefficient in current block is 1100 and the Gray code of Second, we present the effectiveness of the sign coding pro-
the corresponding DCT coefficient in the prediction block is 0100. posed in Section 2.3. Two encoding/decoding methods are com-
In this case, there is only one bit difference, and therefore fewer pared. First, we code the sign bits by the proposed method. Then,
syndrome bits are needed to correct that one bit difference. We we do not code the sign bits, but use the sign in prediction block as
will show in our experiments that, by simply using the Gray code, the sign in the current block. Fig. 3 shows the PSNR performance
the rate-distortion performance of distributed video coding can be which indicates that the proposed sign coding for DCT coefficients
improved 0.5~1dB when all other conditions remain the same. produces better performance for both video sequences.
Third, we compare the performances with and without zero
motion skip. In both cases, we use Gray code and the proposed
779
sign coding method. The only difference is that in zero motion skip Gray code of DCT coefficients and sign bits coding of DCT coef-
method, zero motion blocks are identified but not coded. Only the ficients to address this problem. The experiment results show that
positions of zero motion blocks are sent to decoder. The PSNR our proposed methods can improve the rate-distortion performance
performances comparing with H.263 Intra and Inter (IPPPP) were greatly.
presented in Fig.4 and Fig. 5. Clearly, DVC with zero motion skip For the future works, there is still much work remaining to be
outperforms substantially over the scheme without zero motion done in distributed video coding to achieve a better rate-distortion
skip. performance. We believe two directions we should work on: maxi-
mizing the correlations between video signals in decoder side and
more efficient ways to represent these correlations and residuals.
Although we proposed some novel ways to represent them in this
paper, we believe there are still many other ways to improve the
efficiency of representation of the correlations and residuals in
distributed video coding.
Fig. 2 PSNR performances w/o Gray code
Fig. 5 PSNR performances w/o zero motion skip, Carphone
5. REFERENCES
[1] R. Puri and K. Ramchandran, ĀPRISM: A ‘reversed’ multi-
media coding paradigm,” in Proc. IEEE ICIP, Barcelona, Spain,
Sept. 2003.
[2] B. Girod, A. Aaron, S. Rane and D. Rebollo-Monedero ˈ
Fig. 3 PSNR performances w/o coded signs "Distributed video coding," Proceedings IEEE, Special Issue on
Video Coding and Delivery, vol. 93, no. 1, pp. 71-83, Jan. 2005.
[3] M. Wu, G. Hua, C.W. Chen, “Syndrome-Based Light-Weight
Video Coding for Mobile Wireless Application”, IEEE Interna-
tional Conference on Multimedia & Expo (ICME) 2006.
[4] A. Aaron, S. Rane, and B. Girod, “Wyner-ziv video coding
with hash-based motion compensation at the receiver,” in Proc.
IEEE ICIP, Singapore, October 2004.
[5] M. Wu, A. Vetro, J.S. Yedidia, H. Sun, and C.W. Chen, “A
study of encoding and decoding techniques for syndrome based
video coding,” in IEEE ISCAS, Kobe, Japan, May 2005, vol. 4,
pp. 3527-3530.
[6] G. Hua and C.W. Chen, "Distributed Source Coding in Wire-
less Sensor Networks", The Second International Conference on
Quality of Service in Heterogeneous Wired/Wireless Networks
Fig. 4 PSNR performances w/o zero motion skip, Foreman
(QShine), Orlando, USA, Aug., 2005.
[7] G. Hua, C.W. Chen, “Low Punctured Turbo Codes and Zero
4. CONCLUSIONS
Motion Skip Encoding Strategy for Distributed Video Coding”, in
Proceedings of GlobeCom 2006, San Francisco, USA, Nov., 2006.
In this paper, we consider the efficient ways to represent the corre-
[8] D. Varodayan, A. Aaron and B. Girod, “Rate-adaptive codes
lation and residuals in distributed video coding. Most literatures in
for distributed source coding”, EURASIP Signal Processing Jour-
distributed video coding considered to exploit the correlations in
nal, Special Section on Distributed Source Coding, vol. 86, no. 11,
decoder but ignored the efficient representation of them. Without
pp. 3123-3130, Nov. 2006.
an efficient way to represent the correlation and residuals, a high
[9] Z. Li, L. Liu, and E. J. Delp, “Rate Distortion Analysis of
compression performance is not able to be achieved. We proposed
Motion Side Estimation in Wyner–Ziv Video Coding”, IEEE
three novel ways of efficient representation: zero motion skip,
Transactions on Image Processing, vol. 16, No. 1, Jan. 2007.
780