0% found this document useful (0 votes)
27 views21 pages

Unit 3 Join

The document discusses AVL trees, which are a type of balanced binary search tree where the height difference between left and right subtrees is at most one, and outlines the methods for single and double rotations to maintain balance during insertions. It also covers binary heaps, their properties, and basic operations like insertion and deletion, as well as B-trees and their advantages for reducing disk access. Lastly, it introduces hashing and hash tables, emphasizing the importance of hash functions for efficient data retrieval.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views21 pages

Unit 3 Join

The document discusses AVL trees, which are a type of balanced binary search tree where the height difference between left and right subtrees is at most one, and outlines the methods for single and double rotations to maintain balance during insertions. It also covers binary heaps, their properties, and basic operations like insertion and deletion, as well as B-trees and their advantages for reducing disk access. Lastly, it introduces hashing and hash tables, emphasizing the importance of hash functions for efficient data retrieval.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

HOLY CROSS ENGINEERING COLLEGE

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

ACADEMIC YEAR 2012-2013

EE 2204
DATA STRUCTURES AND ALGORITHMS

Prepared by

R . ROSY ANGEL

AP/CSE

Holy Cross [Link]


UNIT 3-BALANCED SEARCH TREES AND INDEXING

AVL TREES

AVL Tree->Adelson – Velskii and Landis


 An AVL tree is a binary search tree with a balance condition, except that for every node
in the tree, the height of the left and right subtrees can differ by atmost 1.
 The height of the empty tree is defined to be -1.
 A Balance Factor is the height of the left subtree minus height of the right subtree.
BF = Height of left subtree – Height of right subtree

For an AVL tree all balance factor should be +1, 0 or -1.

 If the balance factor of any node in an AVL tree becomes less than -1 or greater than 1,
the tree has to be balanced by making a simple modifications in the tree called rotation.

 An AVL tree causes imbalance, when any one of the following conditions occur, α – is
the node must be rebalanced.
1. An insertion into the left subtree of the left child of node α.
2. An insertion into the right subtree of the left child of node α.
3. An insertion into the left subtree of the right child of node α.
4. An insertion into the right subtree of the right child of node α.
 Cases 1 and 4 are mirro images symmetrics with respect to α, as are cases 2 and 3.
 The first case, in which the insertion occurs on the “outside” ([Link]-left or right-right) is
fixed by a single rotation of the tree.
 The second case, in which the insertion occurs on the “inside”([Link]-right or right-left)
is handled by the slightly more complex double rotation.
SINGLE ROTATION

Single rotation is performed to fix case 1 and case 4(left-left (or)right-right).

Case 1: An insertion into the left subtree of the left child of k2.
General Representation:

EXAMPLE:

Routine toperform single rotation with left


Staticposition SingleRotateWithLeft(Position K2)
{
Position K1;
K1=K2->Left;
K2->Left=K1->Right;
K1->Right=K2;
K2->Height=Max(Height(K2->Left),Height(K2->Right))+1;
K1->Height=Max(Height(K1->Left),K2->Height)+1;
return K1;
}

Case 4: An insertion into the right subtree of the right child of K1.
General Representation:
Routine to perform single rotation with right

staticPosition SingleRotateWithRight(Position K1)


{
Positiion K2;
K2=K1->Right;
K1->Right=K2->Left;
K2->Left=K1;
K2->Height=Max(Height(K2->Left),Height(K2->Right))+1;
K1->Height=Max(Height(K1->Left),Height(K1->Right))+1;
return K2;
}

EXAMPLE:

 Sequentially insert 3, 2, 1, 4, 5, 6 to an AVL Tree


DOUBLE ROTATION
Double Rotation is performed to fix case 2 and case 3.

Case 2: An insertion into the right subtree of the left child.

General Representation:

Place K2 as the new root and make K1 to be K2’s left child and K3 to its right child.

Routine to perform double rotation with left:

Static Position DoubleRotateWithLeft(Position K3)


{
K3->Left=SingleRotateWithRight(K3->Left);
Return SingleRotateWithLeft(K3);
}
The above double rotation can be done by using the following two single rotations.
i. Single rotation with right.
ii. Single rotation with left.

Case 3:An insertion into the left subtree of the right child of K1

General Representation:
Routine to perform double rotation with right:

Static Position DoubleRotateWithRight(Position K1)


{
K1->Right=SingleRotateWithLeft(K1->Right);
Return SingleRotateWithRight(K1);
}

EXAMPLE:
In the above example, We have inserted 3, 2, 1, 4, 5, 6, 7, 16.

We will insert 15, 14, 13, 12, 11, 10, 8, 9


Routine to declare the node for AVL tree:
Struct AvlNode;
typedef struct AvlNode *Position;
typedef struct AvlNode *AvlTree;

AvlTree MakeEmpty(AvlTree T);


Position Find(ElementType X,AvlTree T);
Position FindMin(AvlTree T);
Position FindMax(AvlTree T);
AvlTree Insert(ElementType X,AvlTree T);

struct AvlNode
{
ElementType Element;
AvlTree Left;
AvlTree Right;
int Height;
};
Function to compute height of an AVL Node:
static int Height(Position P)
{
if(P==NULL)
return -1;
else
return P->Height;
}
Routine Insertion into an AVL tree:
AvlTree Insert(ElementType X,AvlTree T)
{
if(T==NULL)
{
T=malloc(sizeof(struct AvlNode));
if(T==NULL)
FatalError(“Out of space!!!”);
else
{
T->Element=x;
T->Height=0;
T->Left=T->Right=NULL;
}
}
else if(X<T->Element)
{
T->Left=Insert(X, T->Left);
if(Height(T->Left)-Height(T->Right)==2)
if(X<T->Left->Element)
T=SingleRotateWithLeft(T);
else
T=DoubleRotateWithLeft(T);
}
else if(X>T->Element)
{
T->Right=Insert(X,T->Right);
if(Height(T->Right)-Height(T->Left)==2)
if(X<T->Right->Element)
T=SingleRotateWithRight(T);
else
T=DoubleRotateWithRight(T);
}
T->Height=Max(Height(T->Left),Height(T->right))+1;
return T;
}

BINARY HEAPS
Priority queue: It is an collection of ordered elements that provides fast access to the minimum
(or maximum) element.
The efficient way of implementing priority queue is binary heap. Binary heap is merely
referred as heaps, Heaps have two properties namely.
 Structure property.
 Heap order property.
 Structure Property:
A heap is a binary tree is completely filled, with the possible exception of the bottom
level, which is filled from left to right. Such a tree is known as complete binary tree.
A complete binary tree of height h has between 2 h and 2h+1-1 nodes. This implies that
the height of a complete binary tree is [log N], which is clearly O(log N).
For any element in array position i, the left child is in position 2i, the right child is in
the cell after the left child (2i+1), and the parent is in position [i/2].

C
B

D E
F G
FIG: A complete binary tree.
H
A B CI D J E F G H I J

0 1 2 3 4 5 6 7 8 9 10 11 12 13
FIG : Array Implementation of complete binary tree

 Heap Order Property:



The property that allows operations to be performed quickly is the heap order
property. In a heap, for every node X, the key in the parent of X is smaller than (or equal
to) the key in X, with the exception of the root (which has no parent).
FIG : Two complete trees (only the left tree is a heap).

 BASIC HEAP OPERATIONS


It is easy (both conceptually and practically) to perform the two required
operations. All the work involves ensuring that the heap order property is maintained.
 Insert
 DeleteMin

 Insert:
 To insert an element X into the heap, create a hole in the next available location,
since otherwise the tree will not be complete.
 If X can be placed in the hole without violating heap order, then we do so and are
done.
 Otherwise we slide the element that is in the hole’s parent node into the hole, thus
bubbling the hole up toward the root. We continue this process until X can be
placed in the hole.

To insert 14,

Before Insert Creating the hole and bubbling th hole up

 Now, Inserting 14 in the hole violating the heap order property, so 13 is slide
down into the hole.
 Inserting 14 into the new hole will violate the heap order property. So, 21 is slide
down into the hole.

Procedure to insert into a binary key:


void Insert(ElementType X, PriorityQueue H)
{
int i;
if(IsFull(H))
{
Error(“Priority queue is full”);
return;
}
for(i=++H->size;H->Element[i/2]>X;i/=2)
H->Element[i]=H->Elements[i/2];
H->Elements[i]=X;
}
This general strategy is known as a percolate up; the new element is percolated up the
heap until the correct location is found.
Percolation in the insert routine by performing repeated swaps until the correct order was
established.
The time to do the insertion could be O(log N).

 DeleteMin:
 Finding the minimum is easy, it is hard to move. When the minimum is removed,
a hole is created at the root.
 Since the heap now becomes one smaller, it follows that the last element X is the
heap must move somewhere in the heap.
 If X can be placed in the hole, then we are done.
 We slide the smaller of the hole’s children into the hole, thus pushing the hole
down one level.

 We repeat this step until X can be placed in the hole.


FIG : Creation of the hole at the root.
 The last value 31 cannot be placed in the hole, because this would violate heap
order.
 Therefore, we place the smaller child ‘14’ in the hole, sliding the hole down one
level.

FIG : Next two steps in DeleteMin

Function to perform DeleteMin in a binary heap:

ElementType DeleteMin(PriorityQueue H)
{
int i,Child;
ElementType MinElement, LasrElement;
if(IsEmpty(H))
{
Error(“Prioriy queue is empty”);
return H->Elements[0];
}
MinElement=H->Elements[1];
LastElement=H->Elements[H->Size—1];
for(i=1;i*2<=H->Size;i=Child)
{
Child=i*2;
if(Child!=H->Size&&H->Elements[Child+1]<H->Elements[Child])
Child++;
if(LastElement>H->Elements[Child])
H->Elements[i]=H->Elements[Child];
else
break;
}
H->Elements[i]=LastElement;
return MinElement;
}
 The worst-case running time for this operation is O(log N).
 On average, the element that is placed at the root is percolated almost to the
bottom of the heap, so the average running time is O(log N).

B – TREE
 M-ARY Search Tree-B Tree
An M-ary search tree allows M- way branching. As branching increases, depth
decreases, Whereas a complete binary tree has height that is roughly log 2N, a complete M-
ary tree has height that is roughly logmN.
EXAMPLE:
A perfect binary tree of 31 nodes has five levels.

In 5-ary tree of 31 nodes has only three levels.

The structure of M-ary search tree is:

ADVANTAGE:
 There are limitations on size of the collection that can be maintained in main memory.
 When data is maintained in the disk, it is important to reduce the number of disk access.
 Many search tree performs this task, by reducing the number of levels.
 B-Tree:
It is a balanced M-ary search tree. A B-Tree of order M is a tree with following
properties:
 The root is either a leaf or has between 2 and M children.
 All nonleaf nodes(except the root) have between [M/2] and M children.
 All leaves are at the same depth.
 All data are stored at the leaves.
Contained in each interior node are pointers P1, P2, …., Pm to the children and values k1,
k2,…. , km-1, representing the smallest key found in the subtrees P2, P3,… , Pm.
For every node, all the keys in subtree P 1 are smaller than the keys in subtree P 2, and so
on.

FIG : B-tree of order 4


A B-tree of order 4 is more popularly known as a 2-3-4 tree, and a B-tree of order 3 is
known as a 2-3 tree.
Representation of 2-3 tree:
 Interior nodes (non leaves) are represented as ellipses, with 2 pieces of data.
Dashed lines ‘----‘ indicates a node with only 2 children.
 Leaves are drawn in boxes, which contains the data in ordered way.

Insertion at 2-3 tree:


To insert the element find the position of insertion in the leaf level.

The key ‘18’ is added to leaf level without causing any violations.
Insert ‘1’:
 When 1 is inserted, the leaf node has totally 4 keys. That is 1 , 8 , 11 , 12
 This node violates the condition, since any leaf node can hold only two or three keys.
Therefore divide the node into two nodes, with 2 keys in each node.

This tree has an internal node P with four children, but only three children per node is
allowed. Therefore split the node P into two nodes, with two children.

Insert ‘28’:
Te leaf node 22 , 23 , 28 , 31 has four keys, therefore split into two node and
introduce a new internal node.

Here, the internal node 41:58 has four children, which is splitted into two children.
Here, the root has 4 internal nodes in the next level. Therefore, divide the root into two nodes,
increases the height by 1.

 We can perform deletion by finding the key to be deleted and removing it. If this key was
one of only two keys in a node, then it removal leaves only one key.
 If the sibling has three keys, we can steal one and have both nodes with two keys.
 If the sibling has only two leys, we combine the two nodes into a single node with three
keys. The parent of this node now loses a child.
 If the root loses its second child, then the root is also deleted.

HASHING
The implementation of hash tables is frequently called hashing. Hashing is a
technique used for performing insertions, deletions and finds in constant average time.
Hash Table:
 The ideal hash table data structure is merely an array of some fixed size, containing the
keys. A key is a string with an associated value.
 Each key is mapped into some number in the range 0 to TableSize -1 and placed in the
appropriate cell.
 The mapping is called a hash function, which shoul be simple to compute and should
ensure that any two distinct keys get different cells.

In this example, john hashes to 3, phil


hashes to 4, dave hashes to 6, and mary
hashes to 7.

FIG : An ideal hash table


Hash Function:
A hash function is a key to address transformation which acts upon a given key to
compute the relative position of the key in an array. The choice of hash function should be
 Simple
 Must distributes the data evenly.
A Simple hash function is:
hash key =key mod Tablesize
hash (75) = 75 mod 10 = 5.
The key value ‘75’ is placed in the relative location 5.
Routine for simple has function:
typedef unsigned int Index;
Index Hash(const char *key, int TableSize)
{
unsigned int HashVal = 0;
while(*key!=’\0’)
HashVal +=*key++;
return HashVal % TableSize;
}
The above routine is simple to implement and computes an answer quickly. If the table
size is large, the function does not distribute the keys well.
For instance, suppose that TableSize = 10,007(it is a prime number).
Suppose all the keys are eight or fewer characters long. Since a char has an integer value
that is always at most 127, the hash function can only assume values between 0 and 1,016, which
is 127 * 8. This is not equitable distribution.
Another hash function-not too good:
Index Hash(const char *Key, int TableSize)
{
return (Key[0] + 27 * Key[1] + 729 * Key[2]) % TableSize;
}

This hash function assumes that the Key has at least two characters plus the NULL
terminator. The value 27 represents the number of letters in the English alphabet, plus the blank,
and 729 is 272. This function only examines the first three characters, but if these are random and
the table size is 10,007.
English is not random.263 = 17,576 possible combination of three characters (ignoring
blanks),a check of a reasonably large on-line dictionary reveals that the number of different
combinations is actually only 2,851.
Even if none of these combination collide, only 28 percent of the table can actually be
hashed to.
A good hash function:
Index Hash(const char *Key, int TableSize)
{
unsigned int HashVal=0;
while(*key!=’\0’)
HashVal=(HashVal<<5)+ *Key++;
return HashVal % TableSize;
}

This is hash function involves all characters in the key and can generally be expected to
distribute well.
SEPARATE CHAINING/ OPEN ADDRESSING /LINEAR PROBING

collision:
When an element is inserted, it hashes to the same value as an already inserted element,
then it produces collision and need to resolve it.
collision resolution methods:
i. Separate Chaining
ii. Open Addressing.
 Separate Chaining(or) External Hashing
It is a collision resolution technique, in which to keep a list of all elements that
hash to same value. This is called as separate chaining because each hash table element is
a separate chain(Linked List).
Each linked list contains all the elements whose keys hash to the same index.

FIG : A Separate Chaining Hash Table


Type declaration for separate chaining hash table:
struct ListNode;
typedef struct Listnode *position;
struct HashTbl;
typedef struct HashTbl *HashTable;

HashTable InitializeTable(int TableSize);


Position Find(ElementType Key,HashTable H);
void Insert(ElementType key,HashTable H);

struct ListNode
{
ElementType Element;
Position Next;
};
struct HashTbl
{
int TableSize;
List *TheLists;
};
ListNode – Structure is the same as the linked list declarations.
Hashtable – Structure contains an array of linked lists,which are dynamically allocated when the
table is initialized.
The list – Pointer to a pointer to a List node structure.
Initialization routine for separate chaining hash table:
HashTable InitializeTable(int TableSize)
{
HashTable H;
int i;
if(TableSize<MinTableSize)
{
Error(“Table size too small”);
return NULL;
}
H=malloc(sizeof(struct HashTbl));
if(H==NULL)
FatalError(“Out of space!!!”);
H->TableSize= NextPrime(TableSize);
H->TheLists = malloc(sizeof(List)*H->TableSize);
if(H->heLists== NULL)
FatalError(“Out of space!!!”);
for(i=0;i<H->TableSize;i++)
{
H->TheLists[i] = malloc(sizeof(struct ListNode));
if(H->TheLists[i]==NULL)
FatalError(“Out of space!!!”);
else
H->TheLists[i]->Next=NULL;
}
return H;
}

Find:
 use the hash function to determine which list to traverse.
 Traverse the list in the normal manner.
 Return the position where the item is found.
Find Routine for separate chaining hash table:
Position Find(ElementType key, HashTable H)
{
Position P;
List L;
L=H->TheLists[Hash(key, H->TableSize)];
P=L->Next;
while(P!=NULL&&P->Element!=key)
P=P->Next;
return P;
}

Insert:
 Go to the position by the hash function for the itemx.
 Traverse the list to see if X exists already.
 If not, insert a new node at the rear of the list.
 If it is a duplicate element,an extra field is kept and placed.
Insert Rouine for separate chaininh hash table:
void Insert(ElementType key, HashTable H)
{
Position pos, NewCell;
List L;
pos=Find(key,H)
if(pos==NULL)
{
NewCell=malloc(sizeof(struct ListNode));
if(NewCell==NULL)
FatalError(“Out of space!!!’);
else
{
L=H->TheLists[Hash(key, H->TableSize)];
NewCell->Next=L->Next;
NewCell->Element=key;
L->Next=NewCell;
}
}
}
ADVANTAGE:
 More number of elements can be inserted as it uses away of linked lists.
DISADVANTAGE:
 The elements are not evenly distributed. Some elements may have more elements and
some may not have anything.
 It requires pointers. This leads to slow the algorithm down a bit because of the time
required to allocate new cells.

 Open Addressing (or)Closed Hashing:


 Open Addressing hashing is an alternative resolving collisions with linked list.
 In this system, if a collision occurs, alternative cells are tried until an empty cell is
found.
 The cells h0(x), h1(x), h2(x)….are tried in succession, where
hi(x)=(Hash(x)+F(i)) mod TableSize with F(0)=0.
 The function F is the collision resolution strategy.
There are three common collision resolution strategies. They are:
i. Linear probing.
ii. Quadratic probing.
iii. Double probing.

LINEAR PROBING

Probing -> process of getting next available hash table array cell.
In linear probing, F is a linear function of i, F(i)=i. This amounts to trying cells
sequentially in search of empty cells.

Example:
Insert the keys {89, 18, 49, 58, 69} into a hash table using the same hash function as
before and the collision resolution strategy, F(i) = i.
FIG : Open addressing hash table with linear probing, after each insertion

Initially, 89 is inserted at slot 9, 18 is inserted at slot 8. The first collision occurs when 49 is
inserted, it is put in the next available slot namely slot 0, which is open.
The key 58 collides with 18, 89 and then 49 before an empty cell is found three away.
The collision 69 is handled in a similar manner.
If the table is big enough, a free cell can always be found, but the time to do so can get
quite large.
Even if the table is relatively empty, blocks of occupied cells start forming. This effect is
known as primary clustering means that any key hashes into the cluster will require several
attempts to resolve the collision and then it will add to the cluster.

You might also like