0% found this document useful (0 votes)
8 views21 pages

Data Structures and Algorithms Overview

The document provides an overview of various data structures, including arrays, lists, linked lists, stacks, queues, trees, heaps, and hash tables, along with their definitions and characteristics. It also covers algorithms, their properties, and how they can be expressed, as well as specific types of trees like binary trees and AVL trees. Additionally, it discusses priority queues, hash maps, and graph data structures, emphasizing their applications and operational details.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views21 pages

Data Structures and Algorithms Overview

The document provides an overview of various data structures, including arrays, lists, linked lists, stacks, queues, trees, heaps, and hash tables, along with their definitions and characteristics. It also covers algorithms, their properties, and how they can be expressed, as well as specific types of trees like binary trees and AVL trees. Additionally, it discusses priority queues, hash maps, and graph data structures, emphasizing their applications and operational details.
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

DATA STRUCTURE REVIEWER

CPE 2101

PRELIMS:
●​ DATA STRUCTURES
- A way of organizing data in a computer. To arrange data in memory location to
represent values of carrier set of an abstract data type
- Ability of computer to fetch and store data at any place which specified by a
POINTER - bit string memory address that can be stored and manipulated.

EXAMPLE OF DATA STRUCTURES


●​ ARRAY
- Fixed- length, ordered collection of values of the same type
- Consists a collection of elements, each identified by at least array index or key
- can be considered as the simplest type of data structure

●​ LIST
- Sequence of values, where the same value may occur more than once
- TWO TYPES OF LIST​
INSTANCE - mathematical concept of a finite sequence
STREAM - potentially infinite analog
- Basic example of containers (contain other values)
-

●​ LINKED LIST
- Chain of nodes where each contains DATA and POINTER to the next node
- each node is composed of data and reference (a link) to the next node
- list elements can be inserted or removed without reallocation of reorganization of entire
structure
-
●​ STACK
- Principal operations are the addition of an entity (Push) and removal of an entity (POP)
- is a LAST-IN-FIRST-OUT (LIFO):Last element added to the structure must be the first
one removed
- PUSH/POP only occur at the end of the structure(Referred as TOP OF THE STACK)
- PEEK: Returning value of top element without removing it
-

●​ QUEUE
- Entities in the collection are kept in order
- Principle operations are the addition of entities to the rear terminal position and removal
of entities from front terminal position
- FIRST-IN-FIRST-OUT where the first element added will be the first one to be removed
-

●​ HASHING
- Method for storing and retrieving records from database
- Allows one to insert, delete, and search for records based on a search key value
- HASH TABLE: performing a computation on a search key K to identify the position that
contains the record with the key K
- HASH FUNCTION: In which the calculations are done to satisfy the need of the
address calculations
- SLOT: Positions in the hash table that is denoted by M
-
●​ TREES
- Data structure made up of NODES or vertices and edges without having any cycles.
- Tree without nodes is called NULL or empty tree
- Tree consists of ROOT NODE and many levels of node in a form of a HIEARCHY
-

ABSTRACT DATA TYPES


Mathematical model for certain classes of data structure. Defined only by operations
performed on it and by mathematical pre-conditions and constraints on the effects
ALGORITHMS
-​ Finite sequence of steps for accomplishing a computational task.
-​ Steps must be simple and definite enough to be done by a computer, and must
terminate after a finite number of steps.
-​ Considered as computational procedure that consist of a set of instructions in which
INPUT takes some or set of values and OUTPUT which produces

CHARACTERISTICS OF AN ALGORITHM

• Each step of an algorithm must be exact. An algorithm must be precise and unambiguously
described. This eliminates any uncertainty.

• Algorithms must terminate. Since the ultimate aim of an algorithm is to solve a problem, it
must terminate; otherwise, there will not be a solution to the problem. This leads to the fact that
an algorithm must have a finite number of steps in its execution. The presence of endless loops
must be avoided.

• It must be effective. An algorithm must provide the correct answers at all times.

• It must be general. An algorithm must solve every instance of a problem.

• It must be unique. Results of each step are uniquely defined and only depend on the input
and the result of the preceding steps.

• Finiteness. The algorithm stops after a finite number of instructions are executed.

• Output. The algorithm always produces output.

Algorithms can be expressed in the following ways:


●​ Human Language - plain, natural language
●​ Pseudocode - Informal high-level description of the operating principle of a computer
program. Intended for human reading
●​ Flowchart - type of diagram that represents and algorithm, workflow, or process
SYMBOLS IN FLOWCHART

LINEAR DATA STRUCTUIRE


- Elements are organized sequentially, where each element is positioned in a linear order

STATIC ARRAY
-​ Data structure with a fixed size. Once created, it cannot be changed or altered.

Key features:
o Fixed Size: The size of the array is set during the declaration and cannot be modified.
o Memory Allocation: The memory is located on the stack, which is faster but has a limited
size.
o Access Time: The elements can be accessed in constant time since the address of the
elements can be calculated using the base address and the index.
o Usage: It is ideal for scenarios where the number of elements is known beforehand, such as
storing fixed-sized data like days of the week and months of the year.
DYNAMIC ARRAY,
-​ or resizable arrays, can change size during runtime.

key features:
o Resizable: The array can be resized, allowing it to grow or shrink based on the requirements
o Memory Allocation: The memory is allocated on the heap, which is more flexible but can be
slower due to the overhead of dynamic memory management.
o Access Time: Similar to static arrays, elements can be accessed in constant time after they
are created.
o Usage: It is useful when the number of elements is not known in advance or when the array
needs to expand during program execution.

LINKED LIST
Syntax to define a linked list
-​ LinkedList linkedListName = new LinkedList();

SINGLY LINKED LIST


-​ Consists of nodes where each contains a DATA FIELD and a REFERENCE to the next
node in the linked list
-​ Next of the last node is NULL, indicating the end of the list.

DOUBLY LINKED LIST


-​ More complex. Allows for efficient traversal of list in BOTH DIRECTIONS
-​ Each node contains a pointer to the previous and the next node

LINKED LIST OPERATIONS:


●​ Traversal - accesses each element of the linked list.
-​ Visiting each node one by one and done using WHILE LOOP
●​ Insertion - adds a new element to the linked list
●​ Deletion - removes the existing elements.
●​ Search - finds a node in the linked list.
●​ Sort - sorts the nodes of the linked list.
-​ Arranging values in ascending/descending order
MIDTERMS:
BINARY TREE
-​ Allows data to be organized hiearchically
-​ The topmost node is the ROOT NODE. and END/LEAF NODES have no children
-​ Can be used in databases, such as organizing files into folders

TREE TRAVERSAL
-​ Process of visiting every node exactly once in particular order
-​ Allows searching, printing, and manipulating tree data

●​ Breadth-First Traversal (BFT)


-​ Also known as LEVEL ORDER TRAVERSAL
-​ Visiting nodes from the root downwards in horizontal before descending
●​ Depth-First Traversal (DFT)
-​ Starting from the bottom, exploring deeply before backtracking
●​ Inorder Traversal: Left subtree → Root → Right subtree
●​ Preorder Traversal: Root → Right subtree → Left subtree
●​ Postorder Traversal: Left subtree → Right subtree → Root
​ ​ ​ ​

BINARY SEARCH TREE (BST)


-​ Used for retrieving, sorting, and searching data. It is arranged in a specific order
-​ Also called ORDERED BINARY TREE
-​ Both right and left subtree must also be BST.
-​ RIght subtree must be larger than parent and left subtree must be the lowest
-​ No Duplication

AVL TREES
-​ ADELSON-VELSKII TREE is a self-balancing BST in which each node maintains extra
information called a balance factor (-1,0,+1)
ROTATIONS
-​ Applied to restore balance when factors go beyond -1 or +1
APPLICATION OF TREES (kayo na mag review dito putangina)
●​ HIERARCHICAL DATA STRUCTURE
-​ FIle system, Document object model (HTML/XML)
●​ EFFICIENT SEARCHING AND SORTING
-​ BST, AVL, Red-Black Trees
●​ DATABASE AND FILE STORAGE SYSTEMS
-​ B- Trees/B+ trees, MySQL, Oracle
●​ AI AND GAME DEVELOPMENT
-​ NPC, Machine Learning, Chess
●​ COMPILER DESIGN
-​ Abstract Syntax Tree, Parse Tree
●​ CRYPTOGRAPHY AND SECURITY
-​ Merkle Trees, Bitcoin/Ethereum
●​ NETWORKING
-​ Fast IP Lookups, Kruskal, Prim’s algorithm

HEAP DATA STRUCTURE


-​ Binary tree which every node is larger than either child
-​ Means that largers or smallest value is always at the top
-​ Widely used in computing for efficient management of priority-based operations, optimal
selections, and real-time processing

HEAP ORDER PROPERTY


●​ MAX-HEAP
-​ Parent node is always greater or Equal to its children
Used in the following:
-​ HEAPSORT: Comparison-based sorting algorithm to extract the
maximum/minimum element and place it at the end of array
-​ PRIORITY QUEUE: Element with the highest priority is accessed first
-​ TOP-K Tracking: Maintaining K Largest/Lowest from continues stream
without sorting the entire input
●​ MIN-HEAP
-​ Parent node is always less than or equal to its children
Used in the following:
-​ Dijikstra’s Algorithm: use min-heap to extract the next closest vertex.
Use in GPS routing, networking, and AI Pathfinding.
-​ Priority QUEUE: Min-Heap are Backbone of Priority queue. Task with
lowest cost/priority are processed first
-​ Event-Driven Simulation: Earlier event is always at the top

HEAP OPERATIONS
o Insert (Push) – adds a new element to the heap by adding it at the bottom.
o Extract Root (Pop) – removes and returns the root element with the last element.
o Peek (Get Root) – returns the root without removing it.
o Replace – removes the root and inserts a new element in a single operation.

HEAPIFY - process of converting arbitrary array into a valid heap

●​ Up-Heapify
-​ Used after inserting a new element to restore the heap property
Steps:
1. Insert the new element at the end/bottom of the heap.
2. Compare the inserted element with its parent.
3. If the heap property is violated (the new element is greater than the parent in a
max-heap or smaller than the parent in a min-heap), swap the new element with
its parent.
4. Repeat this process until the heap property is restored or the element
becomes the root.

●​ Down-Heapify
-​ Used after deleting the root element to restore heapify property
Steps:
1. Replace the root element with the last element in the heap.
2. Remove the last element from the heap.
3. Compare the new root with its children.
4. If the heap property is violated (the new root is smaller than a child in a
max-heap or larger than a child in a min-heap), swap the new root with the larger
(in max-heap) or smaller (in min-heap) child. 5. Repeat this process until the
heap property is restored or the element reaches a leaf node.
PRIORITY QUEUE
-​ Operates like a regular queue, except each element has a priority level, and are
dequeued based on their priority
-​ Ensures that elements with higher priority are processed first

IMPLEMENTING INSERT
-​ Add new value at end of array. if the parent is smaller, swap the value until it is in order

IMPLEMENTING REMOVEMAX
-​ Replace value of the root with the value at the end of the array(rightmost leaf). From
down the tree, swap the value to restore order property. Swap current node to a larger
child

PREFINALS
HASH TABLE/HASH MAP
-​ Maps KEYS to VALUES using HASH FUNCTION
1.​ Key-Value Pair:
Key: "name", Value: "Kimmy"
Key: "age", Value: 32
Key: "city", Value: "Tacloban"
2.​ Hash Function
-​ Generates integers from the key. Mathematical function that takes input and produces
numerical output called hash, hash code, hash value.
3.​ Storage in buckets
-​ Determines the bucket where the key-value pair will be stored (Array slot)
4.​ Retrieval
-​ Used to locate the bucket using key and retrieved the value
Operations used:
• Insertion: put(key, value) – adds a new key-value pair to the hash table.
• Retrieval: get(key) – returns the value associated with the key.
• Checking: containsKey – checks if the given key exists in the hash table.
• Deletion: remove(key) – removes the key-value pair associated with the specified key.
TABLE DESIGN
-​ How internal table is structured ,sized, and organized
COMPONENTS:
●​ Initial size
-​ If M is too small = collisions
-​ If M is too large = memory waste
●​ Prime Numbers
-​ Prime numbers reduced clustering
●​ Resizing
-​ Dynamically resize when load factor exceeds a threshold

BUCKET STRUCTURE
-​ Each position in table is called Bucket
1.​ Chaining
-​ Each bucket contains a linked list of key-value pairs
-​ Multiple keys with same index are stored in list
-​ Lookup time increases with length
2.​ Open Addressing
-​ Each bucket holds 1 entry
-​ If collision occurs, probing is used to find an empty bucket
▪ Linear Probing: Check the next slot sequentially.
▪ Quadratic Probing: Check slots using quadratic steps.
▪ Double Hashing: Use a second hash function to compute probe steps.
3.​ Rehashing
-​ Creates new table
-​ Recompute hash for each key and inserts all key into the new table

LOAD FACTOR MANAGEMENT


-​ Measures how “full” a hash table is

COLLISIONS - when 2 different keys produce the same index in the table
SET DATA STRUCTURE
-​ Stores a collection of unique elements. Does not allow duplication

TYPES:
●​ Unordered Set
-​ Insertion is always randomized. Takes constant time which can go up to liner time
in worst case
●​ Ordere Set
-​ Most common set. Implemented using balanced BST’s
Can be implemented in various ways:
1.​ Hash-based set:
-​ Represented as a HASH TABLE where each element is stored in
bucket
2.​ Tree-based set
-​ Represented as a Binary search tree where each node in tree
represents an element
-​ Maintains element in ascending order and guarantees a
predictable sorted order


MAP DATA STRUCTURES
-​ Stores information as key-value pairs, where each key is unique and maps to exactly
one value
KEY FEATURES:
• Key-Value Pairing: Each element in the map consists of a unique key and a corresponding
value.
• Fast Access: Maps provide efficient lookup and retrieval operations based on the key. They
can achieve constant or logarithmic time complexity for common operations.
• Dynamic Size: Maps can dynamically grow and shrink in size as elements are added or
removed. This flexibility makes them suitable for handling varying amounts of data.
• Uniqueness of Keys: Each key within a map must be unique to ensure that every key-value
pair represents a distinct entry.

TYPES:
●​ HASHMAP
-​ Maps keys to arrays indices using hash function to convert each key into a
numeric index
●​ TREE MAP
-​ Uses binary search tree as implementation
-​ Keys are stored in sorted order that allows for efficient searching and deletion
●​ LINKED HASH MAP
-​ Linked hash map combined with hashing to preserve insertion order
-​ Enables rapid iteration over the maps element
GRAPH DATA STRUCTURE
-​ Collection of interconnected nodes that have data in which represented by relationships
and connection between objects.

TERMINOLOGIES
• Adjacency: A vertex is adjacent to another vertex if there is an edge connecting them.

• Path: A sequence of edges that goes from vertex point A to vertex point B is called a path.

• Directed Graph: A graph in which an edge (u, v) does not necessarily mean that there is an
edge (v, u) as well. Arrows represent the edges in such a graph to show the direction of the
edge.

TYPES: ( alam niyo na to eh )


1. Directed Graph: Edges have a direction, meaning they go from one node to another in a
specific way.
2. Undirected Graph: Edges do not have a direction. They connect two nodes without any
particular order.
3. Weighted Graph: Edges have weights or costs associated with them. These weights can
represent distances, costs, or any other metric.
4. Unweighted Graph: All edges have the same weight, typically considered as 1.
5. Cyclic Graph: Contains at least one cycle, meaning it is possible to start at a node and follow
a path that leads back to the same node.
6. Acyclic Graph: An acyclic graph does not contain any cycles.
7. Connected Graph: Has a path between every pair of nodes.
8. Disconnected Graph: Has at least one (1) pair of nodes with no path between them.
9. Bipartite Graph: Has its nodes divided into two (2) sets such that no two (2) nodes within the
same set are adjacent.
GRAPH REPRESENTATION
GRAPH TRAVERSAL TECHNIQUES
-​ Used to visit all vertices and edges in a graph

DEPTH-FIRST SEARCH
-​ Mark each vertex while avoiding cycles (VISITED,STACK)
-​ The DFS algorithm works as follows:
1. Start by putting any one of the graph's vertices on top of a stack.
2. Take the top item of the stack and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones that are not in the visited
list to the top of the stack.
4. Keep repeating steps 2 and 3 until the stack is empty

BREADTH-FIRST SEARCH
-​ Explores all nodes at the present depth prior to moving on to nodes at the next depth
level
-​ Has two categories: VISITED and NOT VISITED
-​ VISITED,QUEUE
-​ The BFS algorithm works as follows:
1. Start by putting any one of the graph's vertices at the back of a queue.
2. Take the front item of the queue and add it to the visited list.
3. Create a list of that vertex's adjacent nodes. Add the ones that are not in the visited
list to the back of the queue.
4. Keep repeating steps 2 and 3 until the queue is empty

APPLICATION OF GRAPHS
– bahala na kayo dito
FINALS
PARALLEL ALGORITHM MODEL
-​ How a computation is divided into task, assigned to processor, executed and
coordinated to solve a problem
-​ Exploits availability of multiple processors or cores to execute
-​ To reduce computation time and handle larger problems

TERMINOLOGIES
• Task – a unit of computation executed independently or with limited dependencies.
• Processor – a logical or physical computational unit executing tasks.
• Communication – exchange of data among processors.
• Synchronization – coordination of execution order to respect dependencies.
• Granularity – the amount of computation per communication; coarse-grained means large
independent chunks, fine-grained means frequent interaction.

TYPES:
●​ DATA-PARALLEL MODEL
-​ Divides large data into small chunks and distributes them across multiple
processors
-​ Foundation of SIMD architecture and GPU computing
-​ Mechanics
1. Data Partitioning: The dataset is split evenly among processors.
2. Uniform Operation: Each processor performs the same function on its subset.
3. Synchronization: Once each processor finishes its part, the results are
combined.

●​ TASK GRAPH MODEL


-​ DIvides computation into a set of distinct tasks that runs concurrently
-​ Mechanics
1. Task Identification: Break the problem into smaller subtasks with well-defined
input/output.
2. Dependency Analysis: Build a DAG showing which tasks depend on which.
3. Scheduling: Assign independent tasks to different processors.
4. Execution and Synchronization: Processors run independent tasks
simultaneously; dependent tasks wait until prerequisites are complete

●​ WORK POOL MODEL


-​ All tasks are stored in a shared pool or queue, balancing workload
-​ Each processor repeatedly:
1. Fetches a task from the pool.
2. Executes it.
3. Returns results.
4. Takes another task until none remain.
-​ ​ - Mechanics
-​ 1. A centralized or distributed queue contains pending tasks.
-​ 2. Processors “pull” work as they become idle.
-​ 3. The pool shrinks as tasks complete, reducing the total workload
dynamically.
●​ MASTER-WORKER MODEL
-​ Uses central controller (MASTER) that distributes task to workers and return
results to the master
-​ Mechanics
1. Task Distribution: The Master assigns different parts of the problem to
workers.
2. Concurrent Execution: Workers process their tasks in parallel.
3. Result Aggregation: Master collects outputs and assembles final results
●​ PIPELINE MODEL
-​ Divided computation into ordered stages taht is similar to assembly line
-​ Each stage performs a computation and passes results to next stage
-​ Mechanics
-​ 1. Decomposition: Split the problem into sequential stages.
-​ 2. Stage Assignment: Assign each stage to a different processor.
-​ 3. Data Flow: Each processor processes its stage and passes output to the next.
●​ HYBRID MODEL
-​ Combines multiple models to exploit the best of each

MULTITHREADING (OS and Emerging topics)


-​ Single process runs into multiple threads that share the same address space ( parang
OS lang)

TYPES:
●​ CONCURRENT QUEUE
-​ Allows multiple threads to safely enqueue and dequeue items without explicit
synchronization
-​ Use in task scheduling , message passing, buffering data
●​ THREAD-SAFE COLLECTIONS
-​ Handles concurrent access INTERNALLY, ensuring multiple thread can modify
them
-​ Use in shared caches and lookup tables
●​ BLOCKING QUEUEs
-​ Blocks threads attempting to enqueue or dequeue when the queue is full or
empty
-​ Use in producer-consumer patterns, task distributions and managing resources in
limited capacity

NOTE: (Parang Emerging about blockchain ang topic)


ALGORITHMIC COMPLEXITY
-​ Evaluates the order of the count of operation performed by a given algorithm as fucntion
-​ Commonly presented with O(f) Notation, also known as Asymptotic notation or Big O
notations
-​ F is the size of the input data
COMMON TIME COMPLEXITIES
SCHEDULING ALGORITHMS ( OS topic ulit putangina )
-used by OS to decide which process or thread to run on the CPU

Terminologies
• Arrival Time: The time at which the process arrives in the ready queue.
• Completion Time: The time at which the process completes its execution.
• Burst Time: Time required by a process for CPU execution.
• Turn Around Time: Time Difference between completion time and arrival time.

Turn Around Time = Completion Time – Arrival Time

• Waiting Time: Time Difference between turn around time and burst time.

Waiting Time = Turn Around Time – Burst Time

TYPES:
●​ Non-Preemptive
-​ Once process starts, it runs to completion or until it switches to a waiting state
●​ Preemptive
-​ Os can interrupt and pause a running process

COMMON SCHEDULING ALGORITHMS (reviewhin niyo nalang OS tangina)


(nasa name na yung definition)
●​ First-come-first-served - in order they arrived in ready queue
●​ Shortest job first - Shortest CPU burst time
●​ Shortest remaining time - least remaining execution time
●​ Priority scheduling - highest priority queue
●​ Round robin - Bilog
●​ Multilevel queue - divides process into multiple queues
●​ Multilevel feedback queue - advance scheduling method that allows to move between
queues based on behavior and cpu usage

APPLICATION SCENARIO
-​ Lalallalallalallalallalalalallalala
COMPUTABILITY
-​ Determines whether a problem can be solved using an algorithm, finite sequence or
defined steps
-​ Basta nag hahanap ng method para sa sagot
Key Properties of Computability
• Definiteness: Every step in the algorithm must be precisely defined.
• Finiteness: The algorithm must terminate after a finite number of steps.
• Input and Output: Every algorithm takes input data and produces output results.
• Effectiveness: Each step must be basic enough to be carried out by a mechanical process.

COMPUTABLE
-​ If there exist an algorithm that can process any valid input and produce a correct output
in a finite time
NON - COMPUTABLE PROBLEM
-​ Predicting whether any arbitrary program will run forever or eventually stops (HALTING
PROBLEM)
-​ Cannot be solved algorithmically

THE TURING MACHINE MODEL


-​ Capable of simulating any computer algorithm
-​ If solvable, the problem is turing-computable. If not then non-computable

DECIDABLE PROBLEMS
-​ Yes or No Answers
UNDECIDABLE PROBLEMS
-​ No yes or no. algorithm would run forever or fail to give results

REDUCTIONS
-​ Prove that problem is as hard or harden than another problem

TYPES
●​ MANY-ONE REDUCTION(MAPPING REDUCTION)
-​ Most direct type of reduction
-​ Transforms instance of one problem into another problem (answer to both
remains the same)
-​ To prove a is no harder than b and if b is undecidable then also is a
●​ TURING REDUCTION
-​ Solving one problem to another problem as SUBROUTINE or ORACLE
-​ Doesnt require entire transformation in one step
-​ More flexing and used to define turing degree
●​ POLYNOMIAL-TIME REDUCTION (P-TIME REDUCTION) (KEY TERM: TIME)
-​ One problem can be transformed into another in polynomial time
-​ Used to classify problems as P,NP, NP HARD, or NP-complete
-​ Used for complexity analysis and ensures computationally efficient

You might also like