Multimedia
Communications
Across Networks
Module 5
Packet Audio/Video
in Network
Environment
Introduction
• Packet-switched networks invented for carrying computer
data
• Burst-type nature of such information makes it
uneconomical to use continuously connected circuits.
• Audio and video signals, in contrast, have for many years
been carried across fixed-bit-rate circuit-switched
connections.
• Specialists considered potential advantages of using
variable bitrate transmission across ATM networks.
• With digitization of audio/video, consider packet-based
systems.
• Use of VLSI, OFC, etc resulted in many new multimedia
processing and communication systems.
• By basing such a network on packet switching, the
services (video, voice and data) can be dealt with in a
common format.
• Packet switching is more flexible than circuit switching
in that it can emulate the latter while vastly different bit
rates can be multiplexed together.
• In addition, the network's statistical multiplexing of
variable rate sources may yield a higher fixed-capacity
allocation.
Packet Voice
• Packet switching offers several potential advantages
• in terms of performance.
• Advantage #1 : Efficient use of channel capacity,
particularly for bursty traffic.
• Although not as bursty as interactive data, speech exhibits
some burstiness in the form of talksparts.
• Average talkspart duration depends on the sensitivity of the
speech detector
• Individual speakers are active only about 35 to 45% in
typical telephone conversations.
• Send voice packets only during talksparts - packet switching
offers a natural way to multiplex voice calls as well as voice
with data.
• Advantage #2 : call blocking can be a function of the
required average bandwidth rather than the required peak
bandwidth.
• Advantage #3 : Packet switching is also flexible.
• Advantage #4 : Since packets are processed in the network,
network capabilities in traffic control, accounting and
security are enhanced.
• Disadvantage : Continuous speech of acceptable quality to
be reconstructed from voice packet with variable delays
through the network.
• The reconstruction process involves compensating for the
variable delay component by imposing an additional delay.
• Hence, the packet should be delivered with low-average
delay and delay variability.
• Speech can tolerate a certain amount of distortion (for
example, compression and clipping) but is sensitive to end-
to-end delay.
• Maximum tolerable delay generally accepted to be in the
range of 100-600 ms.
• To minimize packetization and storage delays, it has been
proposed that voice packets should be relatively short, on
the order of 200-700 bits, and generally should contain less
than 10-50 ms of speech.
• Network protocols should be simplified to shorten voice
packet headers (by 4-8 bytes), although time stamps and
sequence numbers are likely needed.
• Since some distortion is tolerable, error detection,
acknowledgements and retransmissions are unnecessary
in networks with low error rates.
• Flow control can be exercised end-to-end by blocking
calls.
• Network switches can also discard packets under heavy
traffic conditions
• Embedded coding proposed - speech quality degrades
gracefully with the loss of information.
• Packets generated at regular intervals during talksparts at
the Packet Voice Transmitter (PVT).
• The reconstruction process at the Packet Voice Receiver
(PVR) must compensate for variable delay component
by adding a controlled delay before playing out each
packet.
• This is constrained by some value, Dmax - specified
maximum percentage of packets that can be lost or miss
playout.
• In addition to buffering voice packets, it might be
desirable for the PVR to attempt to detect lost packets
and to recover their information.
• Two basic approaches to reconstruction process: NTI &
CTI
Null Timing Information (NTI) scheme :
• Reconstruction does not use timing information
(timestamps) to determine packet delays through
network.
• PVR adds a fixed delay D to the first packet of each
talkspart.
• If is the transit delay of a first packet through the network
and is a packet-generation time (assumed to be constant)
• Total delay of the first packet from entry into the network
to playout is:
• Subsequent packets in the talkspart are played out at
intervals of after the first packet.
• Sequence numbers required to indicate relative positions
of packets in the talkspart.
• If a packet is not present at the PVR at its playout time, it
is considered lost.
• Choice of D involves a trade-off.
• Increasing D reduces the percentage of lost packets, but
increases total end-to-end delays and the size of the
queue at the PVR.
• D cannot be too large due to the constraint from or too
small due to
• Since is random, the silence intervals between talksparts
are not reconstructed accurately.
• Reconstruction of silences in an NTI scheme:
• Let and denote the values of for the talksparts preceding
and following a silence interval
• Suppose that and are identically distributed with
variance and have some positive correlation .
• Error in the length of the reconstructed silence is:
• This has the variance:
• which is directly proportional to the variance of packet
delays.
• NTI scheme would be adequate only if a small delay
variance could be guaranteed.
• Since scheme depends on the first packet of each
talkspart, the loss of a first packet might cause confusion
at the PVR.
Complete Timing Information (CTI) scheme:
• Reconstruction process uses full timing information
from time stamps to determine each packet's delay
accurately through the network, (
• PVR adds a controlled delay so that the total entry-to-playout
delay
is as uniform as possible for all packets.
In addition to time stamps, sequence numbers are also
• desirable for detecting lost packets.
• There are various choices for the format of the time-stamp fields:
Global time stamp - requires precise synchronization of both PVT and
PVR to a global clock.
Encoding relative time between consecutive packets - unknown constant
end-to-end-delay. So, large time-stamp field is also required because
time between packets could be long.
Time stamp can indicate the delay a packet has accumulated in transit
so far. (Delay Stamp).
Packet generated with a delay stamp initialized to zero.
Each node increments the delay stamp by amount of time the packet has spent in
that node.
Integrated Packet Networks
• Integrated Networks are economical and flexible.
• Effective integration of speech and other signals, such as
graphics, image and video into an Integrated Packet
Network (IPN) can rearrange network design properties.
• Although processing speeds will continue to increase, it
will be necessary to minimize the nodal per-packet
processing requirements imposed by network design.
• Data signals must generally be received error free in order
to be useful.
• Speech and image signals allow for some loss of
information without significant quality improvement.
• So, limited information can be discarded to achieve a goal
like temporary congestion control!
• One of the goals in IPNs - construct a model that considers
entire IPN (transmitters, packet multiplexers and receivers)
as a system to be optimized for higher speeds and
capabilities.
• To simplify the processing at network nodes, more complex
processing at network edges can be allowed.
• Transmitter forms a packet switch, varying in its importance
to the reconstruction of high-quality speech at the receiver.
• Packet multiplexers discard speech packets according to
this delivery priority in order to control overload.
• Receiver then attempts to regenerate the information
contained in any discard packets.
• Although model is concerned specifically with speech, the
approach can be extended to other structural signals such as
graphics, image and video signals.
• A transmitter subsystem is shown below:
• Transmitter first classifies speech segments according to
models of the speech production process (voiced sounds,
fricatives and plosives).
• This model-based classification is used to remove
redundancy during coding, to assign delivery properties
and to regenerate discarded speech packets.
• After classification, transmitter removes redundancy
from the speech using a coding algorithm based on the
determined model.
Eg: voiced sounds (vowels) could be coded with a block-
oriented pitch prediction coder.
• After coding, transmitter assigns delivery priority to
each packet based on the quality of regeneration possible
at the receiver.
• This delivery priority is included in the network portion
of the packet header.
• Classification and any coding parameters would be
included in the end-to-end portion of the header.
• Packet multiplexers exist at each outgoing link of each
network node as well as at each multiplexed network
access point.
• A packet multiplexer subsystem with the arriving packet
discarded is shown below.
• is the effective arrival rate, and represents the effective
service rate.
• Each packet multiplexer monitors local overload and
discards packets, according to packet delivery priority (read
from the network portion of the packet header)
• It is locally determined by the measure of overload level.
• It is assumed that arriving packets are discarded.
• It is also possible to discard already-queued packets.
• In addition, if error checking is performed by the nodes, any
packet (data or speech) found to have an error is discarded.
• The receiver decodes the samples in speech packets
delivered to it based on the classification and coding
parameters contained in the end-to-end header.
• It also determines the appropriate time to play them out.
• Receiver synchronization problem requires only packet
sequence numbers.
• Global synchronization is administratively difficult, and
time stamps must be modified at each packet
multiplexer, requiring additional per-packet processing.
• Potential speech detector impairments, such as clipping,
are eliminated whenever the network is not overloaded.
• Even during periods of considerable overload, received
quality may be better if at least a few background noise
packets are delivered and then used to regenerate noise
that is similar in character to the actual noise.
• If a packet is lost for any reason, the receiver must first
detect the loss by inspecting sequence numbers of those
packets that have been received.
• It must further make a determination of the class of each
lost packet so that the appropriate regeneration model
can be applied using previous header and sample history.
• Correct class determination will be critical to
regenerating the lost information accurately.
• How?
In a string of packets with the same class, we can virtually
ensure that the first packet will be received by assigning it a
high delivery priority.
Assuming perfect delivery of these first packets, the class of
any lost packet will match the class of the last received packet.
Thus, the receiver's class decision can be virtually error free
• Advantages gained by taking a total system approach to
an integrated packet network are:
A powerful overload control mechanism is provided.
The structure of speech is effectively exploited.
Extremely simple per-packet processing for overload control
is allowed.
Only one packet per speech segment is required.
Receiver speech synchronization is simplified.
Reduced per-packet error processing at packet multiplexers is
possible.
Packet Video
• Asynchronous transfer of video, (packet video), can be defined as
the transfer of video signals across Asynchronous Time Division
Multiplex (ATDM) networks, such as IP and ATM.
• Video may be transferred for instantaneous viewing or for
subsequent storage for replay at a later time.
• Requirements and Limits – Pacing, Max Transfer Delay, etc
• Limits are set by human perception and determine when delay
starts at the information exchange.
• Parts of the signal may be lost or corrupted by errors during the
transfer
• This will reduce the quality of the reconstructed video, and, if the
degradation is serious it may cause the viewer to reject the service.
• Thus, general topics of packet video are to code and to transfer
video signals asynchronously under quality constraints.
• Synchronous transfer mode combines the circuit-switched
routing of telephony networks with the asynchronous
multiplexing of packet switching.
• Done by establishing a connection (fixed route) through the
network before accepting any traffic.
• Information is then sent in 53-octet long cells.
• Switches route cells according to address information contained
in each cell's five-octet header.
• Traffic on a particular link consists of randomly interleaved
cells belonging to different calls.
• The network guarantees that all cells of a call follow the same
route and, hence, get delivered in the same order as they were
sent.
• Intention - ATM networks should be able to guarantee QoS in
terms of cell loss and maximum delay, as well as maximum
delay variations.
• The IP differs in two major respects from ATM:
No pre-established route
Packets are of variable length (up to 65,535 octets).
• IP does not guarantee packet delivery, and they may
even arrive out of order if the routing decision is
changed during the session.
• Issues addressed by introduction of IPng (IPv6) in
conjunction with RSVP.
• IPv6 packets contain 24-bit flow identifier in addition to
source and destination addresses and can be used in
routers for operations like scheduling and buffer
management to provide service guarantees.
• Delay and some loss is inevitable during transfers across
both ATM and IP networks.
• Delay is chiefly caused by propagation and queuing.
• Queuing delay depends on the dynamic load variations
on the links and must be equalized before video can be
reconstructed.
• Bit errors can occur in the optics and electronics of the
physical layer through thermal and impulsive noise.
• Loss of information is mainly caused by multiplexing
overload of such magnitude and duration that buffers in
the nodes overflow .
• Video in digital form is a 3D signal.
• It is a time sequence of equidistantly spaced 2D pictures
or frames.
• Frames can be samples of real scene captured by camera
or a sensor.
• They may also be generated by computer graphics
• The digitized frames of a video sequence can either be
scanned sequentially row by row or be interlaced.
• If source produces a signal with RGB components, it is
transformed into a YIQ format with one luminance
component (Y) and two chrominance components (I and
Q).
• The stream consists of frames that may be composed of fields if
Interlaced scanning is used.
• The fields are composed of lines of pixels where each pixel
consists of color components (each has a fixed number of bits) .
• Video communication system:
• Camera continuously captures a scene.
• Digitized video is passed to an encoder.
• A function that is often part of the encoder is the bit-rate
control, which is used to regulate compression to adapt
the bit rate to the channel in the network.
• Typically, a common reconstruction is that of the access
capacity to the network.
• Constraint can also be affected by flow-control messages
from the network as well as from the receiver.
• Often bit-rate control is the information segmentation
and framing.
• A frame is a segment of data with added control
information.
• Segments that are formed at the application level
typically constitute the loss unit.
• Errors and loss in the network load to the loss of one or
more application segments.
• Further segmentation occurs at the network level, where
the data is segmented into multiplexing (IP packets or
ATM cells), which is the loss unit for the network.
• Application layer segmentation and framing should
simplify the handling of information loss that may occur
during the transfer.
• Network framing is needed to detect and possibly
correct bit and burst errors as well as packet or cell
losses.
• The framing thus contains control
• The receiver side performs functions that are reciprocal
to the sending functions and may compensate for errors
during the transfer.
• These functions include decoding, error handling, delay
equalization, clock synchronization and digital-to-analog
conversion.