0% found this document useful (0 votes)
79 views13 pages

Advanced Database Management Systems Overview

The document provides an overview of advanced database management systems, focusing on Object-Relational (ORDBMS) and Object-Oriented (OODBMS) data models, their features, advantages, and disadvantages. It also discusses file organization concepts, transaction processing, concurrency control, recovery techniques, query processing, database programming, integrity, security, database administration, physical design, distributed databases, emerging research trends, and NoSQL databases like MongoDB. The content highlights the evolution of database systems to handle complex data and improve performance and scalability.

Uploaded by

Raja Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views13 pages

Advanced Database Management Systems Overview

The document provides an overview of advanced database management systems, focusing on Object-Relational (ORDBMS) and Object-Oriented (OODBMS) data models, their features, advantages, and disadvantages. It also discusses file organization concepts, transaction processing, concurrency control, recovery techniques, query processing, database programming, integrity, security, database administration, physical design, distributed databases, emerging research trends, and NoSQL databases like MongoDB. The content highlights the evolution of database systems to handle complex data and improve performance and scalability.

Uploaded by

Raja Raja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

ADVANCED DATABASE

MANAGEMENT SYSTEMS
Course Code DC-220
Introduction to Advanced Data Models

Traditional Relational Data Models (RDBMS) are powerful but limited for handling
complex, real-world data like images, multimedia, or user-defined data types.

Advanced Data Models were developed to overcome these limitations, providing:

• Better handling of complex data


• Support for user-defined types
• Encapsulation of data and behavior

The two most common advanced models are:

1. Object-Relational Data Model (ORDBMS)


2. Object-Oriented Data Model (OODBMS)

Object-Relational Data Model (ORDBMS)

The Object-Relational Data Model combines the relational model with some features of
object-oriented programming.

Key Features

Feature Description
Extended Data
Supports user-defined data types, collections, and multimedia.
Types
Inheritance Enables table hierarchies.
Encapsulation Allows methods/functions tied to user-defined types.
Retains SQL as query language with extensions (SQL:1999,
Compatibility
SQL:2003).

Example

-- Define a custom type


CREATE TYPE Address AS OBJECT (
street VARCHAR(50),
city VARCHAR(30),
zip VARCHAR(10)
);

-- Use the type in a table


CREATE TABLE Customers (
id INT PRIMARY KEY,
name VARCHAR(50),
addr Address
);

-- Access object attributes


SELECT [Link]
FROM Customers c
WHERE [Link] = '54000';

Popular ORDBMS Systems

• PostgreSQL
• Oracle 12c+
• IBM DB2

Advantages

• Supports complex data types.


• Maintains backward compatibility with relational systems.
• Better suited for multimedia and GIS data.

Disadvantages

• More complex to design and manage.


• Performance overhead due to object features.

Object-Oriented Data Model (OODBMS)

The Object-Oriented Data Model integrates object-oriented programming concepts


directly into the database.

It treats everything as an object, just like in OOP languages such as Java or C++.

Key Features

Feature Description
Objects Data and methods are encapsulated together.
Classes Objects are grouped into classes with attributes and methods.
Inheritance Classes can inherit attributes and methods from parent classes.
Persistence Objects can be stored and retrieved from the database.
Complex Objects Supports storage of images, videos, arrays, and user-defined types.

Example (Pseudo Code)

// Define a class
class Customer {
int id;
String name;
Address addr;

void display() {
[Link](name + " - " + [Link]);
}
}

// Store object in OODBMS


db. Store(new Customer(101, "Ali", new Address("Street 10", "Lahore", "54000")));

Popular OODBMS Systems

• db4o (Database for Objects)


• Object DB
• Versant ODBMS
• GemStone/S

Advantages

• Seamless integration with OOP languages.


• No mismatch between programming objects and database schema.
• Ideal for complex, hierarchical data.

Disadvantages

• Lack of standard query language like SQL.


• Limited adoption compared to relational systems.
• Migration and interoperability can be challenging.

4. Comparison of ORDBMS vs. OODBMS

Feature ORDBMS OODBMS

Data Structure Tables + Objects Pure Objects

Query
Extended SQL OQL or APIs
Language

Compatibility Compatible with relational databases Requires specialized systems

Complex relational data with some object Full object-oriented


Best Use
features applications

Use Cases
Model Common Applications
Multimedia databases, GIS, data warehousing,
ORDBMS
scientific databases
CAD/CAM, real-time systems, simulations,
OODBMS
complex engineering projects

File Organization Concepts


File organization refers to how data is stored in files on secondary storage (like hard drives).
The organization affects data access speed, efficiency, and ease of maintenance.
Types of File Organizations
1. Heap (Unordered) File Organization
o Records are stored randomly as they arrive.
o Use case: Temporary data storage.
o Pros: Simple, fast insertion.
o Cons: Slow search and updates.
2. Sequential (Ordered) File Organization
o Records are stored in sorted order (usually by key).
o Use case: Batch processing.
o Pros: Efficient for range queries.
o Cons: Insertion and deletion are slow.
3. Hashed File Organization
o Uses a hash function to calculate the location of a record.
o Use case: Fast access for exact match queries.
o Pros: Very fast lookup.
o Cons: Not good for range queries.
4. Clustered File Organization
o Related records from different tables are stored together.
o Use case: Queries that join tables frequently.

2. Transaction Processing
A transaction is a logical unit of work that must be either fully completed or fully rolled
back.
ACID Properties
• A – Atomicity: Entire transaction is done or not done.
• C – Consistency: Database moves from one consistent state to another.
• I – Isolation: Transactions do not interfere with each other.
• D – Durability: Once committed, changes persist even after system failure.
States of a Transaction
• Active → Partially Committed → Committed → Terminated
• If an error occurs: Active → Failed → Aborted → Rolled Back

3. Concurrency Control Techniques


Concurrency control ensures correctness when multiple transactions execute
simultaneously.
Techniques
1. Lock-Based Protocols
o Shared (S) Lock: For read.
o Exclusive (X) Lock: For write.
o Variants:
▪ Two-Phase Locking (2PL): Ensures serializability.
▪ Strict 2PL: Prevents cascading aborts.
2. Timestamp Ordering
o Each transaction gets a timestamp; operations are executed in timestamp order.
3. Optimistic Concurrency Control
o Transactions execute without locks and validate before commit.
4. Multiversion Concurrency Control (MVCC)
o Creates versions of data for consistent reads without blocking writers.

4. Recovery Techniques
Recovery ensures the database can be restored to a consistent state after a failure.
Types of Failures
• Transaction Failure: Logical errors, system crashes.
• System Failure: Power outage, OS crash.
• Media Failure: Disk crash or corruption.
Techniques
1. Log-Based Recovery
o Write-Ahead Logging (WAL): All actions recorded in a log before execution.
o Undo/Redo operations during recovery.
2. Checkpointing
o Saves a snapshot of the database state periodically to speed up recovery.
3. Shadow Paging
o Maintains two versions of the database pages — current and shadow — to avoid
inconsistencies.
4. Backup & Restore
o Periodic backups are used to restore the database if a crash occurs.

Query Processing and Optimization


Query processing is the sequence of steps a DBMS follows to execute an SQL query
efficiently.
The goal is to minimize response time and resource usage.
Stages of Query Processing
1. Parsing and Translation
o SQL query is checked for syntax and semantics.
o Converted into an internal query tree or relational algebra expression.
2. Optimization
o The system selects the best execution plan based on cost estimation.
o Uses statistics such as table size, indexes, and selectivity.
3. Evaluation
o The query execution plan is executed to retrieve the results.
Types of Query Optimization
1. Heuristic Optimization (Rule-Based)
o Uses rules to simplify and rearrange the query.
o Example: Pushing selection closer to the base table.
2. Cost-Based Optimization (CBO)
o Estimates cost of multiple execution plans.
o Chooses the plan with the lowest cost based on factors like:
▪ Disk I/O
▪ CPU usage
▪ Memory usage
Optimization Techniques
• Use of indexes for faster lookups.
• Query rewriting to simplify expressions.
• Join ordering and methods (nested loop, hash join, merge join).
• Materialized views for frequently used queries.

Database Programming
Database programming uses procedural extensions of SQL to build more powerful
applications.
A. PL/SQL (Procedural Language/SQL)
• Platform: Oracle Databases
• Features:
o Variables, loops, and conditional statements.
o Functions, procedures, and triggers.
o Exception handling for robust programming.
Example:
DECLARE
v_salary NUMBER;
BEGIN
SELECT salary INTO v_salary FROM employees WHERE employee_id = 101;
DBMS_OUTPUT.PUT_LINE('Salary is: ' || v_salary);
END;
B. T-SQL (Transact-SQL)
• Platform: Microsoft SQL Server
• Features:
o Supports procedural logic.
o Transaction management.
o Error handling with TRY...CATCH blocks.
Example:
DECLARE @Salary DECIMAL(10,2);
SELECT @Salary = salary FROM Employees WHERE employee_id = 101;
PRINT 'Salary is: ' + CAST(@Salary AS VARCHAR);
Common Uses of Database Programming
• Stored Procedures: Precompiled SQL blocks for performance and security.
• Triggers: Automatic execution of code on events like INSERT or UPDATE.
• Functions: Reusable code that returns a value.
• Cursors: Row-by-row processing of query results.
• Error and Transaction Handling: Ensuring consistency with COMMIT and
ROLLBACK.
Integrity in Databases
Integrity ensures accuracy, consistency, and validity of data in the database.
Types of Integrity
1. Entity Integrity
o Each table must have a primary key.
o No primary key column can have NULL values.
2. Referential Integrity
o Foreign key values must match primary key values in related tables or be
NULL.
o Prevents orphaned records.
3. Domain Integrity
o Ensures values in a column are within a defined range, format, or type.
o Example: age column must be between 0 and 120.
4. User-Defined Integrity
o Business-specific rules.
o Example: Salary of an employee cannot exceed the manager’s salary.
Security in Databases
Database security protects data from unauthorized access, misuse, or corruption.
Security Mechanisms
• Authentication
o Verifies user identity (username, password, biometrics, or tokens).
• Authorization and Access Control
o Grants specific privileges using roles and permissions.
o Example: GRANT SELECT ON Employees TO analyst;
• Encryption
o Secures data at rest (disk) and in transit (network).
• Auditing and Monitoring
o Logs actions for detecting suspicious activity.
• Backup and Recovery Security
o Ensures secure storage of backups to avoid data leaks.

3. Database Administration (DBA)


A Database Administrator (DBA) manages and maintains the database for performance,
security, and availability.
Key Responsibilities
1. Role Management
o Creating roles for users and assigning privileges.
o Example (Oracle):
o CREATE ROLE data_analyst;
o GRANT SELECT, INSERT ON sales TO data_analyst;
2. Managing Database Access
o Controlling who can access the database and what they can do.
o Use GRANT and REVOKE commands in SQL.
o Example (SQL Server):
o CREATE LOGIN user1 WITH PASSWORD = 'pass123';
o GRANT SELECT, UPDATE ON Employees TO user1;
3. Views for Security
o Views provide limited, filtered access to data.
o Example:
o CREATE VIEW EmployeePublic AS
o SELECT emp_id, emp_name, department
o FROM Employees;
o Users can query this view without accessing sensitive columns like salary or
SSN.
4. Performance Monitoring
o Optimizing queries and indexing for faster response times.
5. Backup and Recovery
o Scheduling regular backups and creating disaster recovery plans.

Physical Database Design and Tuning

Physical design focuses on how data is stored, accessed, and optimized in the DBMS to
improve performance.

Key Steps

• Choosing Storage Structures


o Tablespaces, data files, and partitions for better organization.
• Indexing
o Using B-tree indexes, hash indexes, or composite indexes to speed up queries.
• Denormalization
o Storing redundant data to reduce complex joins and improve performance.
• Partitioning
o Splitting large tables for faster access.
• Materialized Views
o Pre-computed query results to optimize reporting and analytics.

Database Tuning

• Query Tuning: Rewrite slow SQL queries.


• Index Tuning: Add/remove indexes based on usage.
• Memory and I/O Tuning: Adjust buffer sizes, cache, and parallelism.
• Monitoring Tools: Oracle AWR, SQL Profiler, PostgreSQL EXPLAIN.

2. Distributed Database Systems

A Distributed Database (DDB) is a collection of databases spread across multiple sites,


but appearing as a single logical database.

Features
• Transparency: Users don’t need to know the data location.
• Replication: Copying data to multiple sites for fault tolerance.
• Fragmentation: Dividing data into logical pieces (horizontal/vertical).

Advantages

• Improved availability and reliability.


• Faster local access.
• Load balancing across multiple servers.

Challenges

• Maintaining consistency.
• Higher complexity in concurrency control and recovery.
• Network latency and synchronization issues.

3. Emerging Research Trends in Databases

Some modern research areas include:

• Cloud Databases: Scalable and elastic storage (e.g., Amazon RDS, Azure SQL).
• Big Data Integration: Combining traditional DBMS with platforms like Hadoop and
Spark.
• AI-Driven Query Optimization: Using machine learning for cost estimation and
tuning.
• Graph Databases: Neo4j, Amazon Neptune for social networks or recommendation
systems.
• Blockchain Databases: Immutable and secure storage for sensitive transactions.
• Time-Series Databases: Optimized for IoT and event-based data (e.g., InfluxDB).

4. MongoDB (NoSQL Database)

Key Features

• Document-Oriented: Stores data as JSON-like documents.


• Schema-less: No fixed table structure.
• Horizontal Scalability: Sharding allows scaling across multiple servers.
• High Availability: Built-in replication (Replica Sets).

Example

// Insert document
[Link]({ name: "Ali", semester: 4, gpa: 3.5 });
// Query document
[Link]({ semester: 4 });
NoSQL Databases

NoSQL databases are designed for unstructured or semi-structured data and provide high
scalability and performance.

Types of NoSQL
Type Examples Use Case
Document MongoDB, CouchDB Flexible data storage
Key-Value Redis, DynamoDB Caching, real-time apps
Column-Family Cassandra, HBase Big data analytics
Graph Neo4j, ArangoDB Social networks, fraud detection
Advantages
• Handles large-scale, complex, or rapidly changing data.
• Easier horizontal scaling.
• Better for real-time web or IoT applications.

Common questions

Powered by AI

Security mechanisms such as authentication, authorization, and encryption collaboratively protect data in database systems. Authentication verifies user identities using methods like usernames, passwords, biometrics, or tokens to ensure only legitimate users gain access . Authorization and access control grant specific privileges to users, delineating what they can access or modify by using roles and permissions, such as the SQL command GRANT to control data access . Encryption secures data by encrypting it at rest and in transit, making it unreadable to unauthorized users and safeguarding against data breaches . Together, these mechanisms ensure that only validated users can access data, control the level of access granted, and protect data integrity and confidentiality during storage and transmission, forming a multilayered security approach essential for database protection.

The key differences between ORDBMS and OODBMS are primarily in their data structures and language compatibility. ORDBMS combines tables with object-oriented features, allowing for extended data types, inheritance, and encapsulation while maintaining compatibility with SQL via extensions like SQL:1999 and SQL:2003 . OODBMS, on the other hand, fully integrates object-oriented programming principles, treating data as objects with encapsulated methods, grouped into classes and allowing inheritance of attributes and methods. It lacks a standard query language like SQL and is more suited to full object-oriented applications, leading to difficulty in migration and interoperability .

Distributed database systems offer several benefits, including improved availability and reliability due to the replication of data across multiple sites. This replication allows data to remain accessible even if one site fails, while also facilitating faster local data access and balancing the load across different servers . However, distributed systems face challenges such as maintaining data consistency, which becomes complex due to the need to synchronize data across distributed locations. Moreover, ensuring data transparency so that users do not need to know the physical location of data adds to the complexity. Additionally, increased complexity in concurrency control and recovery is a significant challenge, as is the potential for network latency and synchronization issues that can arise due to variance in transaction processing times across sites . Balancing these benefits and challenges is crucial for effective distributed database management.

Emerging trends such as AI-driven query optimization are transforming traditional database management systems by introducing machine learning techniques to improve cost estimation and tuning processes. This approach can lead to more efficient query execution plans by analyzing large amounts of query execution data, identifying patterns, and predicting optimal strategies, thus reducing the reliance on manually crafted optimization rules . AI-driven optimization can enhance systems' ability to adapt to changing workloads and data distributions, offering the potential for continuous and dynamic improvements. These capabilities enable databases to handle larger and more complex datasets more efficiently, aligning with modern demands for big data integration and real-time processing . However, integrating AI within traditional DBMS also presents challenges, including the need for significant computational resources and changes to existing database infrastructure.

NoSQL databases are particularly advantageous for modern applications due to their ability to handle large-scale, complex, and rapidly changing data. They offer flexible schema designs, which suit unstructured or semi-structured data formats common in real-time applications . Features like horizontal scalability and high availability through sharding and built-in replication align with the demands of web and IoT applications, where data volumes can grow rapidly and require timely, reliable access . Also, their ability to handle heterogeneous data makes them ideal for IoT applications, which often involve diverse data types and sources .

Concurrency control techniques are essential for managing simultaneous transactions to ensure the correctness of outcomes. Lock-based protocols use shared (S) and exclusive (X) locks to manage read and write operations, thereby ensuring serializability by allowing only a consistent set of transactions to proceed at any given time, often implemented through Two-Phase Locking (2PL). Other methods include timestamp ordering, which executes operations based on transaction timestamps to avoid conflicts, and optimistic concurrency control, which executes transactions without locks, only validating them at the commit time to check for inconsistencies. Multiversion Concurrency Control (MVCC) creates data versions to allow consistent reads without blocking writers, reducing the contention between read and write operations . These mechanisms collectively manage the access to shared data, maintaining data integrity and providing robustness against concurrent data manipulation.

Database programming extensions like PL/SQL and T-SQL enhance database management systems by introducing procedural logic to the SQL language, allowing for more powerful and flexible applications. PL/SQL, used in Oracle databases, provides features such as variables, loops, conditional statements, functions, procedures, triggers, and robust exception handling, which facilitate complex transaction and business logic execution within the database . Similarly, T-SQL, used in Microsoft SQL Server, supports procedural logic, transaction management, and error handling through constructs like TRY...CATCH. These extensions enable developers to create stored procedures, functions, and triggers that precompile SQL blocks for better performance, encapsulate business rules for security, automate tasks, and ensure data integrity through controlled transaction handling . By allowing database operations to include programmatic control flow, these extensions significantly increase the versatility and efficiency of database applications.

File organization directly affects how data is stored and accessed, impacting speed, efficiency, and maintenance ease in database systems. Heap (unordered) organization enables quick insertion but is inefficient for searches and updates due to the lack of order . Sequential (ordered) organization provides efficient access for range queries as data is sorted by key, but insertion and deletion operations are slow due to potential need for reshuffling . Hashed file organization offers fast lookups for exact match queries by calculating record locations using a hash function, though it is unsuitable for range queries due to the lack of logical ordering . Clustered organization, storing related records from different tables together, improves performance for join queries but can be complex to manage due to the requirement of keeping records physically together on disk . Each type offers unique trade-offs between speed and flexibility based on the access patterns and transactional needs of the system.

ACID properties stand for Atomicity, Consistency, Isolation, and Durability. These properties are critical in ensuring that transactions are reliably processed in a database system. Atomicity ensures that a transaction is fully completed, or none of its operations are performed in case of failure. Consistency refers to maintaining data integrity before and after the transaction by moving the database from one valid state to another. Isolation ensures that concurrent transactions do not interfere with each other, preserving transaction serializability. Durability guarantees that once a transaction is committed, its changes are permanent, even after a system failure . These properties collectively ensure data integrity, reliability, and robustness of database operations .

Query processing and optimization enhance database system performance by reducing resource usage and response time. Query processing involves parsing, optimization, and evaluation. Parsing checks syntax and converts the SQL query to a relational algebra expression. Optimization selects the most efficient execution plan often through cost-based evaluation methods . Techniques include heuristic optimization, such as pushing selections closer to base tables, which reduces the dataset early in the process. Cost-based optimization estimates the cost of different execution plans based on factors like disk I/O and CPU usage. Key optimization strategies include using indexes for faster data retrieval, rewriting queries to simplify expressions, and employing efficient join methods, such as nested loops, hash joins, and merge joins . Materialized views can also improve performance by precomputing and storing complex query results for frequently run queries, optimizing reporting and analytics tasks .

You might also like