Data Representation: ASCII and Unicode
1. What is Data Representation?
Computers work only with binary data (0s and 1s).
To represent text (letters, digits, punctuation, emojis), we need a standard encoding
system that maps characters → binary codes.
Two of the most important encoding systems are:
1. ASCII (American Standard Code for Information Interchange)
2. Unicode
2. ASCII (American Standard Code for Information
Interchange)
Definition:
ASCII is a character encoding standard developed in the 1960s to represent English
characters using 7 bits (later extended to 8 bits).
How it Works:
Each character (letter, digit, symbol, control command) is assigned a unique numeric
code.
For example:
o 'A' = 65 → 1000001 (binary)
o 'a' = 97 → 1100001 (binary)
o '0' = 48 → 0110000 (binary)
o Space = 32
Types of ASCII:
1. 7-bit ASCII (Standard ASCII):
o Can represent 128 characters (0–127).
o Includes: English alphabets (uppercase & lowercase), numbers, punctuation,
and control codes.
2. 8-bit ASCII (Extended ASCII):
o Can represent 256 characters (0–255).
o Adds extra symbols like graphical characters, accented letters (ç, ñ, é).
Advantages of ASCII:
Simple and widely used.
Efficient for English-based systems.
Limitations of ASCII:
Can only represent English and a few special characters.
Not suitable for global languages (e.g., Hindi, Chinese, Arabic).
3. Unicode
Definition:
Unicode is a universal character encoding standard designed to represent text from all
writing systems in the world (languages, symbols, emojis).
How it Works:
Uses different encoding forms: UTF-8, UTF-16, UTF-32.
Provides a unique numeric code (called code point) for every character.
Example:
o 'A' = U+0041
o 'क' (Hindi letter) = U+0915
o '中' (Chinese character) = U+4E2D
o Emoji
[Link]
aCo
oU
[Link]
SC
dacode:
dva
cod
3. code
code
. tat
38(Va pa
tages
Suppo
wo
sy
ascu
ac
syste
tsto
co
va
to
edue
Uea6:p
geSC
equ ocod
aoatd:gs:
oov paoss
gt
oj ees
sdes
guages,
so
age
ab
code
es,
gt
bo
peedste
ab
[Link]
cy so
.es
es.
ts
s,
to
w 😀
codes
sa
tco
web.
o
W
aused
Java.
Uses नम
cbytes
eac
S
but
sto
(7
Codes:
(79)
O
(76),
(69),
0
(Wo
Sto
as
8
U
bytes.
e):
gs.
at
oe:
equ
eac
cy
SC
eu
oost
adt00
8ed
000
ca
de
st
0938
09 dत
),acode:
age.
te).
eped
acte
pat
dows
d:
wa
py:
600
es
eyo000
)7
8eO
0bd. e
00