Unit 2
Sound/Audio System
Er. Uttam Karki(PhD Scholar)
Assist. Professor
What is Sound
• Sound is a physical phenomenon caused by vibration of material, such
as a violin string or a wood log.
• This type of vibration triggers pressure wave fluctuations in the air
around the material.
Terms used in sound/audio system
Frequency:
• Sound’s frequency is the reciprocal value of its period.
• Similarly, the frequency represents the number of periods per second
and is measured in hertz (Hz)
• Sound processes that occur in liquids, gases, and solids are classified
by frequency range:
• Infrasonic: 0 to 20Hz
• Audio sonic: 20Hz to 20kHz
• Ultrasonic: 20kHz to 1GHz
Terms used in sound/audio system
Amplitude:
• A sound has a property called amplitude, which humans perceive
subjectively as loudness or volume.
• A simple example, there is the volume of radio or television. If you
press the volume button found on the television or radio, it means
you have played the amplitude. Then the sound will be big or vice
versa.
Computer representation of sound
• The smooth, continuous curve of a wave form is not directly
represented in a computer.
• A computer measures the amplitude of the waveform at regular time
intervals to produce a series of numbers called samples.
• Audio signals are converted into digital samples through Analog-to-
Digital Converter (ADC).
Sampling Rate
• The rate at which a continuous waveform is sampled is called the sampling rate. It
is measured in Hz.
• Nyquist Sampling Theorem:
• “For lossless digitization, the sampling rate should be at least twice the maximum frequency
responses”.
• E.g. CD standard sampling rate of 44100Hz means that the waveform is sampled
44100 times per second. A sampling rate of 44100 Hz can only represent
frequencies up to 22050 Hz.
Quantization:
• The value of sample is discrete. Resolution/Quantization of a sample value depends
on the no. of bits used in measuring the height of a waveform.
• For E.g. An 8-bit quantization yields 256 values. 16-bit CD quality quantization results
over 65536 values.
Fig: 3-bits Quantization
• The values transformed by a 3-bit quantization process can accept eight different characteristics: 0.75,
0.5, 0.25, 0, -0.25, -0.5, -0.75, and -1, so that we obtain an “angular-shape” wave.
• This means that the lower the quantization (in bits), the more the resulting sound quality deteriorates
Sound Hardware
• Some of the hardware regarding to sound are microphone jacks,
built-in speakers, Headsets etc.
Music
• The relationship between music and computers has been more and
more important especially considering the development of MIDI
(Musical Instrument Digital Interface).
• The MIDI interface between electronic musical instruments and
computers is a small piece of equipment that plugs directly into the
computers serial port and allows the transmission of musical signal
• The MIDI standard was developed in early 1980s.
Introduction to MIDI
• Musical Instrument Digital Interface is a connectivity
standard that musicians used to hook together musical
instruments (such as keyboards and synthesizers) and
computer equipment.
• It is set of specifications used for building the instrument
so that the instrument of one manufacturer can without
difficulty communicate musical information between one
another
• Using MIDI, a musician can easily create and edit digital
music tracks
• A MIDI interface has two different components:
• The hardware that connects the equipment. It specifies that the
physical connection between musical instruments, stipulates
that a MIDI port is built into on instrument, specifies a MIDI
cable and deals with electronic signals that are sent over the
cable.
• A data format encodes the information travelling though the
hardware. The MIDI data format is digital i.e. data are grouped
into MIDI messages.
MIDI
Synthesizers : It Converts the MIDI note messages to an audio signal . Its
basically a device or software that synthesizes sounds in response to
incoming MIDI data . Ex : Sound generator , microprocessor , keyboard ,
control panel , memory .
Sequencers : It is an electronic device in cooperating with both hardware
and software ,which is used as storage server for generated MIDI data . Ex :
Launchpad
Controllers : MIDI controllers are the devices for manipulating the generated
MIDI software messages .
Network : MIDI network is the combination of hardware & software to
interconnect group of MIDI devices such as synthesizer , controller and
sequence .
MIDI Devices
• If the musical instrument satisfies both components of a MIDI
standard, the instrument is a MIDI device (E.g. a synthesizer). MIDI
device is capable of communicating with other MIDI devices through
channels
• MIDI standard specifies 16 channels and identifies 128 instruments.
E.g.
0 - Acoustic grand piano
12 - Marimba
40 – Violin
73 – Flute
• MIDI synthesizer device is the heart of MIDI system. Most
synthesizers have following components:
MIDI Devices
Sound generators:
• It synthesizes the sound. It produces an audio signal that becomes
sound when fed into a loud speaker.
• It can change quality of sound by varying the voltage oscillation of
the audio.
MIDI Devices
Microprocessor:
• Microprocessor communicates with the keyboard to know which
notes the musician is playing.
• Microprocessor communicates with the control panel to know what
commands the musician wants to sent to the microprocessor.
• The microprocessor then specifies note and sound commands to the
sound generators (i.e. microprocessor sends and receives the MIDI
message)
MIDI Devices
Keyboard:
• A MIDI keyboard is a device used to send MIDI data to a computer or
other hardware.
• Pressing keys means signaling microprocessor what notes to play and
how long to play them.
• Keyboard should have at least 5 octaves(an interval whose higher note has a
sound-wave frequency of vibration twice that of its lower note) and 61 keys.
MIDI Devices
Control panel:
• Controls those function that are not directly concerned with notes
and duration.
• Control panel includes a slider, a button and a menu.
Auxiliary controllers:
• Gives more control over the notes played on keyboard.
Memory:
• Stores patches for the sound generation and settings on the control
panel.
MIDI Devices
✓Drum machine
✓Master keyboard:
• Increases the quality of the synthesizer keyboard, Guitar Synthesizer,
Drum pad controllers, Guitar controllers and many more
✓Sequencer:
• Sequencer is the important MIDI device. It is used as storage server
for generated MIDI data.
• It is also used as music editor. Musical data are represented in musical
notes.
• Sequencer transforms the notes into MIDI message.
MIDI messages
• MIDI messages transmit information between MIDI devices and
determine what kinds of musical events can be passed from device to
device
• Formats of MIDI messages
• Status byte: First byte of any MIDI message. It describes the kind of
message
• Data byte: The following bytes.
There are two types of MIDI messages
• Channel message
• System message
MIDI messages
Channel Message
• Since, channel message are specified, the channel messages go only
to specified devices. There are 2 types of channel messages:
• Channel voice messages: Sends actual performance data between
MIDI devices, describing keyboard action, controller action and
control panel changes. E.g. note on, Note off, channel pressure,
control change etc.
• Channel mode messages: Determine the way that a receiving MIDI
device responds to channel voice messages. E.g. local control, All note
off, Omni mode off etc.
MIDI messages
System Message:
• System messages go to all devices in a MIDI system because no channel
numbers are specified.
• There are three types of system messages:
• System real time messages: These messages are short and simple (one byte). It
synchronizes the timing of MIDI devices in performance. To avoid delay, they are sent
in the middle of other messages. E.g. System reset, Timing clock i.e. MIDI clock etc.
• System common messages: Commands that prepare sequencer and synthesizer to
play a song. E.g. song select, tune request etc.
• System exclusive messages: MIDI message type designed to transmit information
about specific functions inside a piece of MIDI hardware.
MIDI and SMPTE timing standards
• The MIDI clock is used by a receiver to synchronize itself to the
sender’s clock.
• Alternatively, the SMPTE (Society of Motion Picture and Television
Engineers) timing code can be sent to allow receiver-sender
synchronization.
• SMPTE defines a frame format by hours:minutes:seconds:, for
example 30 frames/s.
• This information is transmitted in a rate that would exceed the
bandwidth of existing MIDI connections.
MIDI Software
The software applications generally fall into 4 major categories:
1. Music recording and performance applications: Provides function as recording of
MIDI messages. Editing and playing the messages in performance.
2. Musical notations and printing applications: Allows writing music using
traditional musical notation. User can play and print music on paper for live
performance or publication.
3. Synthesizer path editor and librarians: Allows information storage of different
synthesizer patches in the computer’s memory and disk drives. Editing of patches in
computer.
4. Music education applications: Teaches different aspects of music using the
computer monitor, keyboard and other controllers of attached MIDI instruments
Speech
• Speech can be ‘perceived’, ‘understood’, and ‘generated’ by humans and by
machines too.
• Human speech signal comprises a subjective lowest spectral component
known as the pitch.
Properties of Speech Signals:
• Voiced speech signals show periodic behavior at certain time intervals.
• Spectrum of audio signals shows characteristic maxima, which are mostly 3-5
frequency bands.
Speech Generation Basic Notions/ Speech
Terminology
• Lowest periodic spectral component of the speech signal is called the
fundamental frequency. It is present in the voiced sound.
• A phone is the smallest speech unit, such as the m of mat and the b of bat
in English that distinguishes one utterance or word from another in a given
language.
• Allophones mark the variants of a phone. For e.g. the aspirated p of pit and
the un-aspirated p of spit are allophones of the English phoneme p.
• The morph marks the smallest speech unit which carries a meaning itself.
Therefore, consider is a morph, but reconsideration is not.
• The voiced sound is generated through the vocal cords, m, v, l are the
examples of voiced sounds.
• During the generation of an unvoiced sound the vocal cords are opened. F
and s are unvoiced sounds
Basic Notions/ Speech Terminology
Vowels
• A speech sound created by the relatively free passage of breath
through the larynx and oral cavity, usually forming the most
prominent and central sound of syllable. E.g. a, e, i, o, u
Consonants
• A speech sound produced by a partial or complete obstruction of the
air stream by any of the various constrictions of the speech organs.
E.g. b, c, d, f, g, h, j etc.
Reproduced Speech Output
• The easiest method of speech generation output is to use
prerecorded speech and play it back in timely fashion.
• Speech can be stored as PCM samples.
• Data Compression methods, without using language typical
properties, can be applied to recorded speech.
• Speech generation/output can be performed by sound concatenation
in a timely fashion.
Sound Concatenation
• Speech generation/output can also be
based on a frequency dependent sound
concatenation.
Step 1:
• Performs transcription
• Text is translated into sound script
• This process is done using letter-to-
phone rules and dictionary of
exceptions
• User recognizes the formula deficiency
in the transcription and improves the
pronunciation manual
. Step 2: Fig: Components of speech synthesis
• Sound script is translated into a system a time dependent sound
speech signal. concatenation
• Time or frequency dependent
concatenation can follow
Speech Analysis
Speech Analysis
• Speech analysis can serve to analyze who is speaking i.e. to recognize a
speaker for his identification and verification. The computer identifies and
verifies the speaker using an acoustic fingerprint.
• An acoustic fingerprint is a digitally stored speech probe of a person.
• Another main task of Speech analysis is to analyze what has been said i.e.
to recognize and understand the speech signal itself.
• Based on speech sequence, the corresponding text is generated (e.g.
speech-controlled typewriter)
• Another area of speech analysis tries to research speech patterns with
respect to how a certain statement was said. E.g. a spoken sentence
sounds differently if a person is angry or calm. An application of this
research could be a lie detector.
Speech Recognition
• The primary goal of speech analysis is to correctly determine
individual words with probability < 1.
• Here, environmental noise, room acoustics and a speaker’s physical and
psychological conditions play an important role.
• For example, let’s assume extremely bad individual word recognition with a
probability of 0.95.
• If we have a sentence with three words, the probability of recognizing the
sentence correctly is 0.95*0.95*0.95= 0.857
Speech Recognition
Fig: Speech recognition system: task division into system components, using the
basic principle “extract characteristics to reduce data”.
Speech Recognition
Fig: Speech recognition components
Speech Transmission
• The area of speech transmission deals with efficient coding of the
speech signal to allow speech/ sound transmission at low
transmission rates over networks.
• The goal is to provide the receiver with the same speech/sound
quality as was generated at the sender side.
Signal form Coding
• This kind of coding considers no speech-specific properties and
parameters.
• Here, the goal is to achieve the most efficient coding of the audio
signal. the data rate of a PCM-coded stereo-audio signal with CD-
quality requirements is:
• Telephone quality, in comparison to CD-quality, needs only 64Kbit/s.
Using Difference Pulse Code Modulation (DPCM),the data rate can be
lowered to 56 Kbits/s without loss of quality. Adaptive Pulse Code
Modulation(ADPCM) allows a further rate reduction to 32 Kbits/s.
Source Encoding
• Some transformations depend on the original signal type.
• For example, an audio signal has certain characteristics that can be exploited in
compression.
• The suppression of silence in speech sequences is a typical example of a transformation
that depends entirely on the signal’s semantics.
Fig: Components of speech transmission using source encoding
Recognition Synthesis Methods
• This method conducts a speech analysis and a speech synthesis during
reconstruction, offering a reduction to approximately 50bit/s.
• Only the speech element characteristics are transmitted, for example formants
containing data about the center frequencies and bandwidths for use by digital
filters
Fig: Components of a recognition-synthesis system for speech
transmission
END