Data Structures - Introduction
Dr. Shaukat Ali
Department of Computer Science
University of Peshawar
about me
Dr. Shaukat Ali
MSc, University of Peshawar
MS, University of Peshawar
PhD. University of Peshawar
Post-Doctoral (Lancaster University, UK)
Past Teaching at:
Ministry of Defence
University of Peshawar
<number>
Syllabus – tentative topics
Includes the following topics and more..
Data Structure Types
Array
Linked List
Stack
Queues
Analysis of Algorithms
Sorting Algorithms and its Analysis
selection sort,
bubble sort,
merge sorting,
Quick sort
Radix sort
<number>
Bucket Sort (address-calculation sort)
Counting Sort
Tree
Traversal, Insertion, Deletion Algorithms
Graphs
Introduction, Depth First Search, Breadth First
Search, Minimum Spanning Trees, etc.
etc.
<number>
Objectives of the course
Understanding different options available for storing different
types of data
List/Array, Linked Lists, Trees, Queue, Graph, Hash Table etc.
Being able to implement different operations on different data
structures (Algorithms)
Understanding when to use a particular data structure based on the
requirement of a problem at hand
One data structure is not always better.
Understanding Analysis of Algorithm and being able to analyse
algorithms
Understanding factors that influence the selection of data
structures in some application
Memory efficiency
Time efficiency
<number>
Data Structure
A Data Structure is a specific format to
organize data in computer memory so that
it can be used to efficiently store and
process data by computer programs.
Data Structure is organization of
information, usually in memory, for better
algorithm efficiency.
A data structure organizes different data items
based on its relationship to each other.
<number>
Efficient retrieval and access of data
Complex applications requires processing of
data, which include accessing data and writing
new data.
This data processing need to be efficient in
terms of memory space, time or both.
The choice of how the data is organised can
make big difference in efficiency of the complex
programs.
<number>
Data Structure vs File Structure
Data Structure is often referred to data
storage in main memory (RAM)
Data storage representation in secondary
storage is referred to as File Structure.
or Databases
<number>
Data may be organized in many ways
e.g. Arrays, Linked Lists, Trees, etc.
The choice of a particular data structure depends on
two considerations:
It must be able to represent the actual relationships
of data in the real world
The structure should efficiently process the data
when necessary
<number>
Data structure for storing records of data,
for example, students names list.
Arrays or Linked Lists
Issues
Depends on operations on data.
Operations efficiency (Time required to
complete operations)
Are there going to be only Retrieval or Insertion
operation on Data?
Are there going to be Deletions?
<number>
Data Structure Applications
Databases: Data structures are used to organize and
store data in a database, allowing for efficient retrieval
and manipulation.
Operating systems: Data structures are used in the
design and implementation of operating systems to
manage system resources, such as memory and files.
i.e. Queue and Stack are used for process scheduling and function call management.
Computer graphics: Data structures are used to
represent geometric shapes and other graphical
elements in computer graphics applications.
Artificial intelligence: Data structures are used to
represent knowledge and information in artificial
intelligence systems. i.e. graphs, tensor etc.
<number>
Classification of Data Structures
<number>
Primitive data structures
These are the basic data structures and are
provided by programming languages to
represent single values.
They are integers, floating point numbers,
characters, string constants, pointers etc.
Non-primitive/Abstract data structures
They are more sophisticated data structures that
are formed by combining multiple basic
primitive data structures
Store multiple values and provide more complex
and specialized operations. i-e. searching
sorting etc.
Examples are Array, list, files, linked list, trees
and graphs
<number>
Linear and Non-linear Data Structures
Linear Data Structures
Linear data structures organize their data
elements in a linear fashion, where data
elements are attached one after another.
All linear data structures look like a list.
In Linear data structures data elements are
traversed one after the other sequentially.
Some commonly used linear data structures
are arrays, linked lists, stacks and queues.
<number>
Linked List Data Structure
<number>
Non-Linear Data Structures
In nonlinear data structures, data elements
are not organized in a sequential fashion.
A data item in a nonlinear data structure
could be attached to several other data
elements based on their relationship among.
Relationship are hierarchical or network
based
All data items cannot be traversed
sequentially, we need to backtrack.
Data structures like Trees and Graphs are
examples nonlinear data structures.
<number>
A tree is a data structure that is made up of a
set of linked nodes, which can be used to
represent a hierarchical relationship among
data elements.
A graph is a data structure that is made up of
a finite set of edges and vertices. Edges
represent connections or relationships
among vertices that stores data elements.
<number>
Tree Data Structure
Graph Data Structure <number>
Static and Dynamic Data Structures
Static Data Structures
In static data structures the size of the data
structure is known in advance and is fixed.
Fixed number of data items can be stored in
static data structures
Advantage:
It has no memory overflow problems
Disadvantage:
Storing less elements than the maximum
number will results in memory wastage.
For example, int, float, array etc.
<number>
Dynamic Data Structures
Dynamic Data Structures have no fixed size
and any number of data items can be stored
The memory used can grow and shrink based
on the number of elements stored.
Advantage:
Store any number of data elements
No wastage of memory
Disadvantage:
Memory overflow can happen.
Linked list, Tree, Graph, Hash Table
<number>
Common Operations on Data Structures
Insertion
The operation that add a new data element item
in a data structure
Deletion
Deletion operation removes a data elements
from a data structure
Traversing
Accessing each data element in a data structure
in order to perform some data processing.
<number>
Searching
Finding data elements in a structure that
contains some value or meets some search
criteria or condition.
Sorting
Arranging data elements in some order. For
example, ascending or descending order.
Sorting data has many useful applications.
Merging
Combining records in two different data
structures in to one data structure.
<number>
Algorithm
An algorithm is a set of
step-by-step
instructions to solve a
given problem or
achieve a specific goal.
A cooking recipe
written on a piece of
paper is an example of
an algorithm, where
the goal is to make a
certain dinner.
<number>
Algorithm in Computer Science
Algorithms in Computer Science, the step-
by-step instructions are written in a
programming language, and instead of food
ingredients, an algorithm uses data
structures.
An efficient algorithm can help us to find the
solution we are looking for, and to transform
a slow program into a faster one.
<number>
Example
Algorithm: Sum_Two_Numbers
Input:
- Number A
- Number B
Steps:
1. Start.
2. Read the value of Number A.
3. Read the value of Number B.
4. Calculate the sum: Sum = Number A + Number B.
5. Display the value of Sum.
6. End.
Output:
<number>
Characteristics of an Algorithm
Unambiguous − Algorithm should be clear and
unambiguous. Each of its steps (or phases), and their
inputs/outputs should be clear and must lead to only
one meaning.
Input − An algorithm should have 0 or more well-
defined inputs.
Output − An algorithm should have 1 or more well-
defined outputs, and should match the desired output.
Finiteness − Algorithms must terminate after a finite
number of steps.
Feasibility − Should be feasible with the available
resources.
Independent − An algorithm should have step-by-step
directions, which should be independent of any
programming code. <number>
Algorithm Complexity
Suppose X is an algorithm and n is the size of
input data, the time and space used by the
algorithm X are the two main factors, which
decide the efficiency of X.
Time Factor − Time is measured by counting the
number of key operations such as comparisons in the
sorting algorithm.
Space Factor − Space is measured by counting the
maximum memory space required by the algorithm.
The complexity of an algorithm f(n) gives the
running time and/or the storage space required
by the algorithm in terms of n as the size of
input data.
<number>
Data Structures – Abstract Data Type
Qazi Muhammad Anas
Abstract Data Type
An Abstract Data Type is a conceptual model of a data
type that defines the data it contains and the operations
that can be performed on it, but does not specify the
implementation at all.
Hidden Implementation details include
Memory layout (whether it’s in a contiguous block or scattered around
memory)
Data Representation (whether a list is implemented as an array or a linked
list)
Storage Strategy (how memory is allocated, whether it is dynamically or
statically allocated, or the specific strategy for resizing as the underlying
structure)
<number>
Common Abstract Data Types
Commonly used Abstract Data Types include:
Stack
Queue
List
Set
Map (Dictionary)
Tree
Graph
<number>
ADTs VS Data Structures
Abstract data Type
An ADT focus on what operations are possible and
their behavior
An ADT can have multiple implementations using
different data structures
An ADT is implementation independent
Data Structure
how data is stored and organized in memory
Data structure is implementation dependent
<number>
Need of Abstract Data type
Abstraction:
The main role of an ADT is to separate the interface(what
the data type does(behaviour)) from the
implementation(how it does it(logic/details)).
Benefit for the End User, when you use Stack ADT, you
only need to know about the push() and pop() operation
and the LIFO(Last in First Out) rule.
No need to worry about it implementation logic
ADT is like a Black Box you provide input and expect
defined output
<number>
Need of Abstract Data type (cont.)
Flexibility and Maintainability:
ADT allow the underlying implementation to change
without affecting the application code that uses it.
For Example
Imagine a List ADT was initially implemented using Linked List,
later profiling show the application does far more indexed lookups
than insertion, making the Linked List access time a bottleneck.
The development team can switch the list ADT internal
implementation from Linked List to a Dynamic Arrays.
Since the ADT public operations (insert, get, remove) did not
change, none of the application code that uses the List need to be
rewritten.
This freedom to swap out the underlying data structure
based on performance needs, without breaking the rest of
the program.
<number>
List of Abstract Data Types
Stack
The Stack is one of the most fundamental ADT, defined by
its strict access rule: Last-in, First-out (LIFO).
Core Stack Operations includes push(), pop(), top(),
isEmpty().
Queue
The queue ADT is defined by the First-in, First-out (FIFO)
principle.
This is the behaviour we see in any line of waiting entities
i.e. people, task, cars etc.
<number>
A queue requires two separate access points
one for insertion and one for removal.
Common operations includes enqueue(),
dequeue(), peek() and isEmpty().
List
List ADT is a collection of elements arranged
in a linear order; often called a sequence.
It defines an ordered collection of elements
where access is generally done via index
number.
<number>
The key operations includes:
get(i)
insert(i, element)
remove(i)
size().
<number>
Data Structures - Arrays
Qazi Muhammad Anas
Arrays
An array is a linear data structure used to store a
fixed size of sequential collection of elements
of the same type
but it is often more useful to think of an array as
a collection of variables of the same type
All arrays consist of contiguous memory
locations
The lowest address corresponds to the first
element and the highest address to the last
element
<number>
Why arrays?
Instead of declaring individual variables, such
as number0, number1, ..., and number99
you declare one array variable such as numbers
and use numbers[0], numbers[1], and ...,
numbers[99] to represent individual variables
An array is elementary data structure that
exists as a built-in data type in most
programming languages
It serve as the building block for more complex
data structures like stacks, queues, heap and
hash tables etc.
<number>
Declaring Array
To declare an array in C++, the programmer
specifies the type of the elements and the
number of elements required by an array as
follows −
datatype arrayName [ arraySize ];
This is called a single-dimension array
The arraySize must be an integer constant
greater than zero and type can be any valid C++
data type
For Example: double balance [ 10 ];
<number>
Initializing Arrays
Assigning values to the array at the time of its
declaration is called initialization
You can initialize array elements either one by
one like −
balance [ 0 ] = 1000.0;
balance [ 1 ] = 2.0;
or using a single statement as follows −
double balance [ 5 ] = { 1000.0 , 2.0 , 3.4 , 15.4 , 20.0 };
The number of values between braces { } can not be larger
than the number of elements that we declare for the array
between square brackets [ ]
<number>
Initializing Arrays (cont.)
If we omit the size of the array, an array just big
enough to hold the initialization is created
double balance [ ] = { 1000.0 , 2.0 , 3.4 , 15.4 , 20.0 };
<number>
Accessing Array elements
An element is accessed by placing the index of
the element within square brackets after the
name of the array. For example −
cout << balance[ 2 ]; //return 3.4
<number>
Two-dimensional Arrays
The simplest form of the multidimensional
array is the two-dimensional array
A two dimensional array is a list of one-
dimensional arrays or array of arrays
A 2D array consists of rows and columns,
allowing you to store data in a grid-like
structure, also known as matrix
Elements of 2D arrays are stored in row-
major order in memory(contiguous blocks)
<number>
Declaring 2D Array
To declare a two dimensional integer array of size x,y
you would write something as follows −
type arrayName [ x ][ y ];
Where type can be any valid C++ data type and
arrayName will be a valid C++ identifier.
A two-dimensional array can be think as a table, which
will have x number of rows and y number of columns
Two-dimensional Array of 3 rows & 3 columns
<number>
Declaring 2D Array (cont.)
Thus, every element in array a is identified by
an element name of the form a[ i ][ j ]
where a is the name of the array, and i and j are
the subscripts or indices that uniquely identify
each element(cell) in a.
<number>
Initializing Two-dimensional Array
Two-dimensional arrays can be initialized by using
nested braces. Following is an array with 3 rows and each
row have 4 columns
int arr[3][4] = {
{0, 1, 2, 3} , /* initializers for row indexed by 0 */
{4, 5, 6, 7} , /* initializers for row indexed by 1 */
{8, 9, 10, 11} /* initializers for row indexed by 2 */
};
The nested braces, which indicate the intended row, are
optional. The following initialization is equivalent to
previous example −
int arr[3][4] = {0,1,2,3,4,5,6,7,8,9,10,11};
<number>
Initializing Two-dimensional Array
In partial initialization the remaining
elements are zero-initialized for example
int arr[3][4] = {
{0, 1, 2}
};
// results
{ {0, 1, 2, 0} ,
{0, 0, 0, 0} ,
{0, 0, 0, 0}
};
<number>
Accessing 2D Array Element
An element in 2-dimensional array is accessed by using the
subscripts, i.e., row index and column index of the array. For
example −
int val = arr[2][3];
The above statement will take 4th element from the 3rd row of the
array
Commonly nested loops are used to access elements in 2D arrays
(e.g. to print all the elements of 2D array)
Upper loop for changing index value of row
While inner loop for changing index value of column
<number>
Accessing 2D Array Example
Accessing Two-dimensional array elements
<number>
Advantages and Disadvantages of Array
Advantages of an Array:
Simple data structure
Efficient memory usage when fully utilized
Allows fast random access using index
Disadvantages of an Array:
Memory may be wasted if array is not fully used
Requires consecutive memory locations
Homogeneous data (same data type)
<number>
Memory Representation
How Arrays are stored in Memory?
Arrays are stored in memory in contiguous
locations For Example −
int data[5];
double money[6];
char letters[10];
<number>
Memory Representation (cont.)
The process to determine the address in a
memory
First address Base address
Relative address to base address through
index function.
General Index Function:
Address[index] = Base address + index * sizeOfElement
Lets apply this formula and see if we do get
what we claimed we should
Base + index * sizeOfElement Address
data[3] 200 + 3 * 4 =212
money[4] 250 + 4 * 8 =282 <number>
Array operations
Traversing Algorithm
Traversing operation means visit every
element of array once
Linear operation – start at the first index
(usually 0 )and move sequentially to the last
index (n-1)
1. Start
2. Initialize a counter i to 0
3. Repeat while i < Array length (n-1):
Access element at Array[i]
Increment i by 1
4. End while
5. Stop
<number>
Array operations (cont.)
<number>
Array operations (cont.)
Insertion Operation
Insertion is a process of adding a new
element into the array
Insert element at the end is easy (no shifting
is required) if there is a space
Insert element at the start or middle requires
the shifting of all the existing elements to
the right
Once the space is clear, the new element is
placed at the target index
<number>
Array operations (cont.)
Inserting the value 6 into an array at position
2 (index 2)
<number>
Array operations (cont.)
Insertion operation
<number>
Array operations (cont.)
Insertion operation
<number>
Array operations (cont.)
Insertion operation
<number>
Array operations (cont.)
Insertion operation:
Insert new element 6 at index 2
<number>
Array operations (cont.)
Insertion Algorithm
1. Let ARRAY be a linear array with capacity MAX
Let n be the current number of elements
Let K be the index where the new element is to be inserted
Let VALUE be the element to insert
2. If n = MAX:
Print “Array is full”
return
3. Set i = n - 1 (start from the last element)
4. Repeat while i >= K:
1. ARRAY[i +1] = ARRAY[i ] (Shift element to the right)
2. i-- (decrement i by 1)
5. Set ARRAY[K] = VALUE (Insert the new value)
6. Update n = n + 1 (Increase the element count)
<number>
Array operations (cont.)
Deletion operation
Deletion is an operation that is used to
remove an element from the array
The deletion of an element at the end of the
array can easily be done
Whereas the deletion of the element in the
middle of the array or at specified location
makes the data elements to move one
location upwards to fill the array
<number>
Array operations (cont.)
Delete an element from index 2 which is 7 in
the below array
<number>
Array operations (cont.)
Deletion operation
<number>
Array operations (cont.)
Deletion Algorithm
[Link] ARRAY be a linear array
Let n be the current number of elements
Let K be the index of element to delete
[Link] K < 0 or K >= n
Print “invalid index”
return
[Link] i = K
[Link] while i < n-1:
ARRAY[i] = ARRAY[i + 1];
i++ (Increment i by 1)
5. Update n = n-1 (reduce number of element by one)
<number>
Stack Data structure
Qazi Muhammad Anas
Stack
A stack is an ordered group of homogeneous
items or elements
The removal of existing items and the addition
of new items can take place only at the top of
the stack
<number>
Stack Data Structure
Like a real-life stack, the stack data structure is
a list of elements in which elements can be
inserted or deleted only at one end
This end is referred to as the top of stack
Elements are added to and removed from the
top of the stack, the last element added is the
first one to be removed
Therefore, a stack follows the LIFO (last-in
first-out) principle
<number>
Stack Data Structure
A set of operations that allows the user to
access and manipulate the elements stored
in stack data structure are:
The operation that add an element to the top
of a stack is usually called push
The operation that remove the top element
from the stack is referred to as pop
<number>
Stack implementation
A stack can be implemented using either an
array or a linked list
In this discussion, we focus on the array
representation of the stack since all its
elements are of the same data type, and an
array provides a suitable structure to hold
them
We can put elements into sequential slots in
the array
<number>
Placing the first element pushed in to the
first array position, the second element
pushed in to the second array position, and
so on
<number>
Stack Push Algorithm
Push mean add an element to the top of a stack
1. Check whether a stack is full or not
If top == stack_size – 1 return “Stack is full” exit
2. If stack is not full, increment the top pointer by one
top = top + 1
3. Insert the element where the top pointer is pointing
Stack[ top ] = new_element
<number>
Stack Pop Algorithm
Pop means remove the top element from the stack
1. Check if the stack is empty
If top == – 1 return “stack is empty” exit
2. Get top element
item = stack[top]
top = top -1
return item
<number>
Applications of Stack
Stack are widely used in computer science
Their specific applications are:
Management of function calls
Evaluation of expression
String reverse
Parenthesis Balanced
<number>
Management of function calls
Programming language systems typically use a
stack to keep track of function calls
For example - the main function calls function
A, which in turn calls function B, and then
function C
When function C finishes execution, control
returns to B; when B finishes, control returns to
A, and so on
The call-and-return sequence follows the LIFO
principle, so a stack is the perfect data structure
for managing the function calls
<number>
Management of function calls (cont.)
<number>
Evaluation of Expression
<number>
Evaluation of Expression (cont.)
Parentheses alter the precedence of
operator For Example
(A+B)*C ≠ A+(B*C)
(2+3)*7 = 35 while 2+(3*7) = 23
Now let’s see how computer evaluates the
arithmetic expression?
Computers use expression notations like
infix, postfix, and prefix to represent and
evaluate arithmetic expressions
<number>
Notation for Expression
Infix Notation:
Expressions in which operator lies between
the operands are referred to as infix notation
A+B, A-C, E*F etc. all are infix notation
A+(B*C) and (A+B)*C are distinguished by
parentheses or by applying operators
precedence discussed above
<number>
Prefix or Polish Notation
Named in honor of Polish mathematician,
Jan Lukasiewiez, refer to the expressions in
which the operator symbol is placed before
its two operands
+AB, -AC, *EF etc. all are the examples of
prefix or polish notation
Simple infix expressions can be converted to
polish expression as follow:
(A+B)*C => +AB*C => *+ABC
A+(B*C) => A+*BC => +A*BC
<number>
(A+B) / (C-D) => +AB / -CD => /+AB-CD
An important property of these notations is
that they are parentheses free
<number>
Postfix or Reverse Polish Notation
Refer to the expressions in which operator is
placed after its two operands
AB+, CD-, EF* etc. all are the examples of
postfix or reverse polish notation
Like prefix notation, they are also
parentheses free
<number>
How Computer Evaluate Expressions?
Expressions are represented in infix
notations and use of parentheses is very
common
Computer may apply the operators
precedence and parentheses rules to
evaluate the expression
But, this process is not feasible in terms of
computer timing (time complexity) as
computer takes a lot of time to resolve
parentheses
<number>
So, the computer first convert an infix
expression into an equivalent postfix
expression and then evaluates it
Following figure depicts the process:
<number>
Clearly following two procedures
(algorithms) are required:
Algorithm 1: converting an infix expression
to an equivalent postfix expression
Algorithm 2: Evaluating the postfix
expression
For each algorithm, stack is the main tool to
be utilized
<number>
Converting Infix expression to Postfix
<number>
Infix to Postfix
[Link] the expression Q from left to right
[Link] an operand is found, add it to P
[Link] “(“ is found, Push it onto the STACK
[Link] an operator is encountered then:
1. Pop operators from the STACK to P while they
have higher or equal precedence
2. Push the operator onto the STACK
[Link] a right parenthesis “)” is encountered
then:
1. Pop operators from the STACK to P until left
parenthesis “(“ is found
2. Remove the left parenthesis, Don’t add it to P
<number>
Infix to Postfix
<number>
Infix to Postfix
<number>
Evaluate Postfix Expression
Following algorithm finds the value of an
expression P, written in Postfix notation
1. Scan P from left to right ( ) and repeat step2
and 3 for each element of P
2. If an operand is encountered, put it on STACK
3. If an operator is encountered then:
Remove the two top elements of STACK, let A is
the top element and B is next-to-top element
⊗
Evaluate B A
4. Set VALUE equal to the top element on STACK
<number>
Evaluate Postfix Expression
Let P: 5,3,4,*,16,2,2,↑,/,1,*,-,5,*,+
<number>
Evaluate Postfix Expression (cont.)
Symbol Encountered Stack
* 5,12,4 4*1 = 4
- 5,8 12-4 = 8
5 5,8,5
* 5,40 8*5 = 40
+ 45 5+40 = 45
) Value = 45
<number>
Infix to Prefix
Convert the following infix to prefix notation
Add left And Right parentheses to infix and assign
precedence to operators
<number>
Symbol Encounter Stack P
( (
5 ( 5
* (,* 5
( (, *, ( 5
1 (, *, ( 51
* (, *, (, * 51
( (, *, (, *, ( 51
2 (, *, (, *, ( 512
↑ (, *, (, *, (, ↑ 512
2 (, *, (, *, (, ↑ 5122
/ (, *, (, *, (, / 5122↑
<number>
Symbol Encountered Stack P
16 (, *, (, *, (, / 5122↑16
) (, *, (, * 5122↑16/
- (, *, (, - 5122↑16/*
4 (, *, (, - 5122↑16/*4
* (, *, (, -, * 5122↑16/*4
3 (, *, (, -, * 5122↑16/*43
) (, * 5122↑16/*43*-
+ (, + 5122↑16/*43*-*
5 (, + 5122↑16/*43*-*5
) Empty 5122↑16/*43*-*5+
<number>
Evaluate Prefix expression
[Link] the prefix expression from right to left
[Link] operand → push onto stack
[Link] operator →
pop operand1
pop operand2
apply operator i-e (top
Operand1 ⊗ Operand2 = value
⊗ next-top)
push result back onto stack
<number>
Let P : + 5 * - * 3 4 * / 16 ↑ 2 2 1 5
Scan from right to left, and Apply operator on operands from
right to left
⊗
Order( top next-top )
Symbol Encountered Stack
⊗
A B = value
5 5
1 51
2 512
2 5122
↑ 514 2↑2=4
16 51416
/ 514 16 / 4 = 4
* 54 4*1=4
4 544
<number>
Symbol ⊗
Order(top next-top)
Encountered
Stack
⊗
A B = value
3 5443
* 5412 3*4=12
- 58 12-4=8
* 40 8*5=40
5 405
+ Value=45 5+40=45
<number>