Calicut University Data Structures Syllabus
Calicut University Data Structures Syllabus
Hashing techniques play a vital role in data search by providing efficient access to data through the use of a hash function that converts keys into hash codes, which direct the placement of data in a hash table. Open hashing, also known as separate chaining, handles collisions by maintaining a list of all elements that hash to the same location. In contrast, closed hashing, or open addressing, resolves collisions by placing new elements in the next available spot within the hash table. Thus, open hashing makes use of additional data structures (like linked lists), while closed hashing manages collisions within the table itself, utilizing techniques like linear probing or double hashing to find available positions .
External sorting is crucial for sorting large datasets that exceed main memory size. Two-way merge sort divides the dataset into smaller segments that fit into memory, sorts each, and then sequentially merges them back together using two input and one output buffer, efficiently reducing disk I/O operations. Polyphase merge sort optimizes this process by utilizing fixed-length runs for distribution, ensuring that the output buffer does not become idle, resulting in efficient merging cycles. Both effectively manage data transfer between slower disk storage and faster main memory, allowing for speed despite large data size limitations .
AVL trees and B-trees are both self-balancing binary search trees but differ in usage and structure. AVL trees maintain height balance by performing rotations to ensure the height difference between left and right subtrees at any node does not exceed one, supporting efficient worst-case time complexity for search operations, suitable for in-memory data structures in DBMS. B-trees, however, extend more than two children per node (multi-way) and are designed for efficient disk reads and writes by minimizing node access and maximizing data in each node, making them ideal for databases needing fast access to large blocks of data on secondary storage .
Binary tree traversals include pre-order, in-order, and post-order traversals, which differ in the order nodes are processed. Pre-order traversal processes the root first, then recursively traverses the left and right subtrees, making it suitable for copying or prefix notation. In-order traversal visits the left subtree first, processes the root, and then traverses the right, often used for retrieving sorted data from a BST. Post-order traversal processes both subtrees before the root, which is helpful in deleting a tree or postfix notation operation .
Depth-first search (DFS) and breadth-first search (BFS) process graphs differently by the order in which nodes are explored. DFS explores as far down a branch as possible before backtracking, using a stack data structure, which makes it suitable for pathfinding and situations where we need to visit all nodes in a "deep" manner (e.g., solving mazes). BFS, on the other hand, explores nodes level by level, using a queue data structure, making it ideal for finding the shortest path in an unweighted graph, such as the shortest distance from one point to another in a social network .
The quick sort algorithm differs from merge sort primarily in method and efficiency of sorting. Quick sort is a divide-and-conquer algorithm that selects a 'pivot' and partitions the array into sub-arrays, which are then sorted independently, making it efficient for in-place sorting with an average-case time complexity of O(n log n). It's preferred when space is a concern and average performance is adequate. Merge sort, on the other hand, divides the array into halves, recursively sorts them, and merges the results, offering stable sort with a consistent time complexity of O(n log n). It is favored in contexts where stability is required or when dealing with linked lists .
Dijkstra's algorithm finds the shortest path from a single source node to all other nodes in a graph by iteratively selecting the node with the smallest tentative distance and updating its neighbors, ideal for weighted graphs where negative weights do not exist. Kruskal's algorithm finds a minimum spanning tree by sorting all edges and choosing the smallest edge that does not form a cycle, which is especially suited to sparse graphs or when the network is established from independent edges .
Linked lists differ from arrays primarily in terms of memory allocation and structure. Arrays require a contiguous block of memory and are static in size, meaning the size must be determined at the time of declaration and cannot be changed during runtime. This can lead to inefficient use of memory if the array is under-utilized or memory overflow issues if it’s over-utilized. Linked lists, on the other hand, consist of nodes where each node contains data and a reference to the next node, allowing for dynamic memory allocation. This means the list can grow and shrink in size as needed, utilizing only the necessary amount of memory, and can easily accommodate the insertion and deletion of elements without reorganizing the entire structure .
The primary advantage of using a Binary Search Tree (BST) over a linked list for searching data is the average time complexity. In a linked list, searching for an element requires traversing through all elements, resulting in O(n) time complexity. In a well-balanced BST, the average time complexity for search operations is O(log n), making it significantly faster for large datasets. The hierarchical structure of BSTs allows them to efficiently narrow down the potential location of a search target by eliminating half of the nodes from consideration at each step .
Asymptotic notation is crucial for analyzing algorithm efficiency by providing a mathematical way to describe the maximum, minimum, or average resource usage (time and space) as a function of input size. It abstracts actual execution details while offering insight into scalability and performance. Different types of asymptotic notations include Big O (O), which characterizes the upper bound of an algorithm's runtime, providing a worst-case scenario analysis; Big Omega (Ω), offering an asymptotic lower bound indicating best-case scenario analysis; and Big Theta (Θ), providing a tight bound that indicates both the upper and lower bounds of an algorithm .