Overview of Data Structures and Types
Overview of Data Structures and Types
A data structure is basically a group of data elements that are put together under one name,
and which defines a particular way of storing and organizing data in a computer so that it can be
used efficiently.
The term data means a value or set of values. It specifies either the value of a variable or a
constant (e.g., marks of students, name of an employee, address of a customer, value of pi, etc.).
Data structures are used in almost every program or software system. Some common examples
of data structures are arrays, linked lists, queues, stacks, binary trees, and hash tables. Data
structures are widely applied in the following areas:
Compiler design
Operating system
Statistical analysis package
DBMS
Numerical analysis
Simulation
Artificial intelligence
Graphics
Data structures are generally categorized into two classes: primitive and non-primitive data
structures.
The data items are arranged in The data items are arranged in non-
sequential order, one after the other. sequential order (hierarchical manner).
All the items are present on the single The data items are present at different
layer. layers.
This section discusses the different operations that can be performed on the various data
structures previously mentioned.
Traversing: It means to access each data item exactly once so that it can be processed. For
example, to print the names of all the students in a class.
Searching: It is used to find the location of one or more data items that satisfy the given
constraint. Such a data item may or may not be present in the given collection of data items.
Inserting: It is used to add new data items to the given list of data items. For example, to add
the details of a new student who has recently joined the course.
Deleting: It means to remove (delete) a particular data item from the given collection of data
items. For example, to delete the name of a student who has left the course.
Sorting: Data items can be arranged in some order like ascending order or descending order
depending on the type of application. For example, arranging the names of students in a class in
an alphabetical order, or calculating the top three winners by arranging the participants‟ scores in
descending order and then extracting the top three.
Merging: Lists of two sorted data items can be combined to form a single list of sorted data
items.
Efficient Insertion and Deletion: Linear data structures such as arrays and linked lists
provide efficient insertion and deletion operations. Arrays offer constant time complexity
for accessing elements by index, while linked lists allow for quick insertion and deletion
at the beginning or end of the list.
Easy Implementation: Linear data structures are relatively simple to implement and
understand. Arrays, for instance, provide a straightforward way of storing elements
sequentially, while linked lists use pointers to connect nodes.
Flexibility: Linear data structures allow dynamic resizing, enabling them to
accommodate varying amounts of data. Linked lists, in particular, can easily grow or
shrink by adding or removing nodes as needed.
Sequential Access: Linear data structures facilitate sequential access to elements. Arrays,
for instance, allow for efficient traversal using loops, making them suitable for tasks that
involve processing data in a linear manner.
Disadvantages:
Fixed Size (Arrays): Arrays have a fixed size determined during initialization, which
limits their flexibility. If the size is insufficient to store additional elements, the entire
array needs to be resized, leading to potential performance overhead.
Inefficient Search: Linear data structures can have inefficient search operations. Arrays
and linked lists require traversing elements one by one until the desired item is found.
This linear search has a time complexity of O(n), which can be inefficient for large
datasets.
Memory Overhead (Linked Lists): Linked lists require additional memory to store
pointers, increasing the overall memory usage compared to arrays. This overhead can
impact the efficiency and space requirements of the data structure.
Abstract data types are the entities that are definitions of data and operations but do not have
implementation details. In this case, we know the data that we are storing and the operations that
can be performed on the data, but we don't know about the implementation details. The reason
for not having implementation details is that every programming language has a different
implementation strategy for example; a C data structure is implemented using structures while a
C++ data structure is implemented using objects and classes.
Abstraction: It is a technique of hiding the internal details from the user and only showing the
necessary details to the user.
Encapsulation: It is a technique of combining the data and the member function in a single unit
is known as encapsulation.
In this model, first encapsulation is performed, i.e., all the data is wrapped in a single unit, i.e.,
ADT. Then, the abstraction is performed means showing the operations that can be performed on
the data structure and what are the data structures that we are using in a program.
Advantages:
Encapsulation: ADTs provide a way to encapsulate data and operations into a single unit,
making it easier to manage and modify the data structure.
Abstraction: ADTs allow users to work with data structures without having to know the
implementation details, which can simplify programming and reduce errors.
Data Structure Independence: ADTs can be implemented using different data structures,
which can make it easier to adapt to changing needs and requirements.
Information Hiding: ADTs can protect the integrity of data by controlling access and
preventing unauthorized modifications.
Modularity: ADTs can be combined with other ADTs to form more complex data
structures, which can increase flexibility and modularity in programming.
Stack ADT
In Stack ADT Implementation instead of data being stored in each node, the pointer to data
is stored.
The program allocates memory for the data and address is passed to the stack ADT.
The head node and the data nodes are encapsulated in the ADT. The calling function can
only see the pointer to the stack.
The stack head structure also contains a pointer to top and count of number of entries
currently in stack.
push() – Insert an element at one end of the stack called top.
pop() – Remove and return the element at the top of the stack, if it is not empty.
peek() – Return the element at the top of the stack without removing it, if the stack is not
empty.
size() – Return the number of elements in the stack.
isEmpty() – Return true if the stack is empty, otherwise return false.
isFull() – Return true if the stack is full, otherwise return false.
Queue ADT
The queue abstract data type (ADT) follows the basic design of the stack abstract data type.
Each node contains a void pointer to the data and the link pointer to the next element in the
queue. The program‟s responsibility is to allocate memory for storing the data.
enqueue() – Insert an element at the end of the queue.
dequeue() – Remove and return the first element of the queue, if the queue is not empty.
peek() – Return the element of the queue without removing it, if the queue is not empty.
size() – Return the number of elements in the queue.
isEmpty() – Return true if the queue is empty, otherwise return false.
isFull() – Return true if the queue is full, otherwise return false.
ALGORITHM
An algorithm is a set of instructions designed to perform a specific task. It takes a set of input(s)
and produces the desired output. When designing algorithms, it is important to consider not only
the correctness of the algorithm, but also its efficiency. Two key measures of efficiency are time
complexity and space complexity.
Space Complexity
Space complexity of an algorithm represents the amount of memory space needed the
algorithm in its life cycle. Space needed by an algorithm is equal to the sum of the following two
components
A fixed part that is a space required to store certain data and variables (i.e. simple variables and
constants, program size etc.), that are not dependent of the size of the problem.
A variable part is a space required by variables, whose size is totally dependent on the size of the
problem. For example, recursion stack space, dynamic memory allocation etc.
Space complexity S(p) of any algorithm p is S(p) = c + Sp(I) Where c is treated as the fixed part
and S(I) is treated as the variable part of the algorithm which depends on instance characteristic
I.
Example:-
1. Algorithm
abc(x,y,z)
{
return x*y*z+(x-y)
}
Here we have three variables x, y and z. x,y and z each variable requires one unit of memory.
S(p) = c + sp
S(p) = 3 + 0
S(p)=3
2. Algoritham sum(x,n)
{
Total:=0
For i←1 to n do
Total:=total + x[i]
}
S(p)=c+sp
Here c = x,n,total (each one requires one unit of memory)=3
Sp=x[i] and its depends on n
S(p)=3+n
1. We introduce a variable, count into the program statement to increment count with initial
value [Link] to increment count by the appropriate amount are introduced into the program.
This is done so that each time a statement in the original program is executes count is
incremented by the step count of that statement.
Example:-
Algorithm sum(a,n)
{
S: = 0.0
For i: = 1 to n do
{
S: = S + a[i]
}
return S
}
}
Count : = count+1; // for last exexution
return S;
Count : = count+1;
}
If the count is zero to start with, then it will be 2n+3 on termination. So each invocation of sum
executes a total of 2n+3 steps.
1. Algorithm Sum(a,n) 0 - 0
2.{ 0 - 0
3. s:=0.0; 1 1 1
4. for i:=1 to n do 1 n+1 n+1
5. s:=s+a[i]; 1 n n
6. return s; 1 1 1
7.} 0 - 0
Total 2n+3
Asymptotic notations are used to represent the complexities of algorithms for asymptotic
analysis. These notations are mathematical tools to represent the complexities. There are three
notations that are commonly used.
Big O Notation (O): It represents the upper bound of the runtime of an algorithm. Big O
Notation's role is to calculate the longest time an algorithm can take for its execution, i.e., it is
used for calculating the worst-case time complexity of an algorithm.
Let g and f be functions from the set of natural numbers to itself. The function f is said to be
O(g) (read big-oh of g), if there is a constant c > 0 and a natural number n0 such that f(n) ≤
cg(n) for all n ≥ n0 .
Omega Notation (Ω(n)): It represents the lower bound of the runtime of an algorithm. It is used
for calculating the best time an algorithm can take to complete its execution, i.e., it is used for
measuring the best case time complexity of an algorithm.
We write f(n) = Ω(g(n)), If there are positive constants n0 and c such that, to the right of n 0 the
f(n) always lies on or above c*g(n).
Ω(g(n)) = { f(n) : There exist positive constant c and n0 such that 0 ≤ c g(n) ≤ f(n), for all n ≤ n0}
We write f(n) = Θ(g(n)), If there are positive constants n0 and c1 and c2 such that, to the right of
n0 the f(n) always lies between c1*g(n) and c2*g(n) inclusive.
Θ(g(n)) = {f(n) : There exist positive constant c1, c2 and n0 such that 0 ≤ c1 g(n) ≤ f(n) ≤ c2 g(n),
for all n ≥ n0}
SEARCHING TECHNIQUES
Searching means to find whether a particular value is present in an array or not. If the value is
present in the array, then searching is said to be successful and the searching process gives the
location of that value in the array. However, if the value is not present in the array, the searching
process displays an appropriate message and in this case searching is said to be unsuccessful.
Linear search, also called as sequential search, is a very simple method used for searching an
array for a particular value. It works by comparing the value to be searched with every element
of the array one by one in a sequence until a match is found. Linear search is mostly used to
search an unordered list of elements (array in which data elements are not sorted).
Example:-
The following steps are followed to search for an element k = 1 in the list below.
LINEAR_SEARCH(A, N, VAL)
Step 1: [INITIALIZE] SET POS = -1
Step 2: [INITIALIZE] SET I = 1
Step 3: Repeat Step 4 while I<=N
Step 4: IF A[I] = VAL
SET POS = I
PRINT POS
Go to Step 6
[END OF IF]
SET I = I + 1
[END OF LOOP]
Step 5: IF POS = –1
PRINT VALUE IS NOT PRESENT IN THE ARRAY
[END OF IF]
Step 6: EXIT
Binary Search
Binary search is a searching algorithm that works efficiently with a sorted list. This search
algorithm works on the principle of divide and conquers. Binary search looks for a particular
item by comparing the middle most item of the collection. If a match occurs, then the index of
item is returned. If the middle item is greater than the item, then the item is searched in the sub-
array to the left of the middle item. Otherwise, the item is searched for in the sub-array to the
For a binary search to work, it is mandatory for the target array to be sorted. We shall learn the
process of binary search with a pictorial example. The following is our sorted array and let us
assume that we need to search the location of value 31 using binary search.
Now we compare the value stored at location 4, with the value being searched, i.e. 31. We find
that the value at location 4 is 27, which is not a match. As the value is greater than 27 and we
have a sorted array, so we also know that the target value must be in the upper portion of the
array.
We change our low to mid + 1 and find the new mid value again.
low = mid + 1
mid = low + (high - low) / 2
Our new mid is 7 now. We compare the value stored at location 7 with our target value 31.
We compare the value stored at location 5 with our target value. We find that it is a match.
SORTING
Sorting means arranging the elements of an array so that they are placed in some relevant order
which may be either ascending or descending. That is, if A is an array, then the elements of A are
arranged in a sorted order (ascending order) in such a way that A[0] < A[1] < A[2] < ..... < A[N].
For example, if we have an array that is declared and initialized as
int A[] = {21, 34, 11, 9, 1, 0, 22};
Then the sorted array (ascending order) can be given as:
A[] = {0, 1, 9, 11, 21, 22, 34};
BUBBLE SORT
Bubble sort is a very simple method that sorts the array elements by repeatedly moving the
largest element to the highest index position of the array segment (in case of arranging elements
in ascending order). In bubble sorting, consecutive adjacent pairs of elements in the array are
compared with each other. If the element at the lower index is greater than the element at the
higher index, the two elements are interchanged so that the element is placed before the bigger
one. This process will continue till the list of unsorted elements exhausts.
This procedure of sorting is called bubble sorting because elements „bubble‟ to the top of the list.
Note that at the end of the first pass, the largest element in the list will be placed at its proper
position (i.e., at the end of the list).
Technique
Example
To discuss bubble sort in detail, let us consider an array A[] that has the following elements:
A[] = {30, 52, 29, 87, 63, 27, 19, 54}
Pass 1:
(a) Compare 30 and 52. Since 30 < 52, no swapping is done.
(b) Compare 52 and 29. Since 52 > 29, swapping is done.
30, 29, 52, 87, 63, 27, 19, 54
(c) Compare 52 and 87. Since 52 < 87, no swapping is done.
(d) Compare 87 and 63. Since 87 > 63, swapping is done.
30, 29, 52, 63, 87, 27, 19, 54
(e) Compare 87 and 27. Since 87 > 27, swapping is done.
30, 29, 52, 63, 27, 87, 19, 54
(f) Compare 87 and 19. Since 87 > 19, swapping is done.
30, 29, 52, 63, 27, 19, 87, 54
(g) Compare 87 and 54. Since 87 > 54, swapping is done.
30, 29, 52, 63, 27, 19, 54, 87
Observe that after the end of the first pass, the largest element is placed at the highest index of
the array. All the other elements are still unsorted.
Pass 2:
(a) Compare 30 and 29. Since 30 > 29, swapping is done.
29, 30, 52, 63, 27, 19, 54, 87
(b) Compare 30 and 52. Since 30 < 52, no swapping is done.
(c) Compare 52 and 63. Since 52 < 63, no swapping is done.
(d) Compare 63 and 27. Since 63 > 27, swapping is done.
29, 30, 52, 27, 63, 19, 54, 87
(e) Compare 63 and 19. Since 63 > 19, swapping is done.
29, 30, 52, 27, 19, 63, 54, 87
(f) Compare 63 and 54. Since 63 > 54, swapping is done.
29, 30, 52, 27, 19, 54, 63, 87
Observe that after the end of the second pass, the second largest element is placed at the second
highest index of the array. All the other elements are still unsorted.
Pass 4:
(a) Compare 29 and 30. Since 29 < 30, no swapping is done.
(b) Compare 30 and 27. Since 30 > 27, swapping is done.
29, 27, 30, 19, 52, 54, 63, 87
(c) Compare 30 and 19. Since 30 > 19, swapping is done.
29, 27, 19, 30, 52, 54, 63, 87
(d) Compare 30 and 52. Since 30 < 52, no swapping is done.
Observe that after the end of the fourth pass, the fourth largest element is placed at the fourth
highest index of the array. All the other elements are still unsorted.
Pass 5:
(a) Compare 29 and 27. Since 29 > 27, swapping is done.
27, 29, 19, 30, 52, 54, 63, 87
(b) Compare 29 and 19. Since 29 > 19, swapping is done.
27, 19, 29, 30, 52, 54, 63, 87
(c) Compare 29 and 30. Since 29 < 30, no swapping is done.
Observe that after the end of the fifth pass, the fifth largest element is placed at the fifth highest
index of the array. All the other elements are still unsorted.
Pass 6:
(a) Compare 27 and 19. Since 27 > 19, swapping is done.
19, 27, 29, 30, 52, 54, 63, 87
(b) Compare 27 and 29. Since 27 < 29, no swapping is done.
Observe that after the end of the sixth pass, the sixth largest element is placed at the sixth largest
index of the array. All the other elements are still unsorted.
Observe that the entire list is sorted now. The final list is : 19, 27, 29, 30, 52, 54, 63, 87
BUBBLE_SORT (A, N)
Step 1: Repeat Step 2 For I =0 to N - 1
Step 2: Repeat For J =0 to N – I - 1
Step 3: IF A [J] > A [J + 1]
SWAP A [J] and A [J+1]
[END OF INNER LOOP]
[END OF OUTER LOOP]
Step 4: EXIT
SELECTION SORT
Selection sort is basically a selection of an element position from the start with the other rest of
the elements. Elements are compared and exchanged depending on the condition and then
selection position is shifted to the next position till it reaches to the end. This sorting algorithm is
an in-place comparison-based algorithm.
Example :-
Insertion sort is a very simple sorting algorithm in which the sorted array (or list) is built one
element at a time. We all are familiar with this technique of sorting, as we usually use it for
ordering a deck of cards while playing bridge.
The main idea behind insertion sort is that it inserts each item into its proper place in the final
list. To save memory, most implementations of the insertion sort algorithm work by moving the
current data element past the already sorted values and repeatedly interchanging it with the
preceding value until it is in its correct place.
Technique
Insertion sort works as follows:
The array of values to be sorted is divided into two sets. One that stores sorted values
and another that contains unsorted values.
The sorting algorithm will proceed until there are elements in the unsorted set.
Suppose there are n elements in the array. Initially, the element with index 0 (assuming
LB = is in the sorted set. Rest of the elements is in the unsorted set. The first element of
the unsorted partition has array index 1 (if LB = 0).
During each iteration of the algorithm, the first element in the unsorted set is picked up
and inserted into the correct position in the sorted set.
Example
Consider an array of integers given below. We will sort the values in the array using insertion
sort.
15 20 10 30 50 18 5 45
Let us consider the array of values to be sorted is divided into two sets. One that stores sorted
values and another that contains unsorted values. Assume that initially sorted portion of the list is
empty.
Move the first element of unsorted list 15 from unsorted list to sorted list, then
To move 10 from unsorted list to sorted list first compare 10 with 20 and it is smaller so swap
these two elements. Again compare 10 with 15 and it is smaller so swap these two elements.
To move element 30 from unsorted to sorted portion, compare 30 with 20, 15, 10 and it is larger
than all. So 30 are directly inserted into sorted list.
To move element 50 from unsorted to sorted portion, compare 50 with 30,20,15,10 and it is
larger than all. So 50 are directly inserted into sorted list.
To move element 5 from unsorted to sorted portion, compare 5 with 50, 30, 20, 18, 15, and 10.
Since 5 is smaller than 50, 30, 20, 18, 15 10. So 10, 15, 18, 20, 30, 50 move one position right
and 5 is inserted before 10.
To move element 45 from unsorted to sorted portion, compare 45 with 50, 30, 20, 15, 10, and 5.
Since 45 is smaller than 50 and greater than 5, 10, 15, 18, 20, 30. So 50 move one position right
and 45 is inserted after 30.