0% found this document useful (0 votes)
21 views2 pages

ADS Syllabus

The Advanced Database System course covers advanced database concepts including object and object-relational databases, query processing and optimization, distributed databases, NOSQL databases, and big data technologies. The course aims to equip students with the knowledge to solve database-related problems through theoretical and practical lab work. It consists of seven units, each focusing on specific topics, and includes references for further reading.

Uploaded by

Bikash Yadav
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views2 pages

ADS Syllabus

The Advanced Database System course covers advanced database concepts including object and object-relational databases, query processing and optimization, distributed databases, NOSQL databases, and big data technologies. The course aims to equip students with the knowledge to solve database-related problems through theoretical and practical lab work. It consists of seven units, each focusing on specific topics, and includes references for further reading.

Uploaded by

Bikash Yadav
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Advanced Database System

Course Title: Advanced Database System Full Marks: 45+30


Course No: MIT502 Pass Marks: 22.5+15
Nature of the Course: Theory + Lab Credit Hrs: 3
Semester: I

Course Description:
This course introduces the advanced database concepts. The topics covered include object and
object relational database, query processing and query optimization, distributed databases, NOSQL
database, big data storage, big data technologies, active, temporal, spatial, multimedia, and
deductive databases and information retrieval and web search.

Course Objectives:
The main objective of this course is to make students familiar with the advanced concepts of
database systems so that upon completion of the course students will be able to understand and use
the advanced concepts to solve problems related to the database systems.

Course Contents:

Unit 1: Object and Object-Relational Databases (5 Hrs.)


Overview of Object-Oriented concepts; Object database extension to SQL; The ODMG object
model and the Object Definition Language (ODL); Object Database Conceptual Design; The
Object Query Language (OQL)

Unit 2: Query Processing and Optimization (11 Hrs.)


Translating SQL Queries into Relational Algebra and Other Operators; Algorithms for External
Sorting; Algorithms for SELECT Operation; Implementing the JOIN Operation; Algorithms for
PROJECT and Set Operations; Implementing Aggregate Operations and Different types of JOINs;
Combining Operations Using Pipelining; Parallel Algorithms for Query Processing: Operator
Level, Intraquery, Interquery
Query Trees and Heuristics for Query Optimization;Choice of Query Execution Plans; Use of
Selectivities in Cost-Based Optimization: Cost components for Query Execution, Catalog
Information Used in Cost Functions, Histograms;Cost Functions for SELECT Operation;Cost
Functions for the JOIN Operation;Additional Issues Related to Query Optimization;Query
Optimization in Data Warehouses

Unit 3: Distributed Database Concepts (7 Hrs.)


Distributed Database Concepts; Data Fragmentation, Replication, and Allocation Techniques for
Distributed Database Design; Overview of Concurrency Control and Recovery in Distributed
Databases; Overview of Transaction Management in Distributed Databases; Query Processing and

3
Optimization in Distributed Databases; Types of Distributed Database Systems; Distributed
Database Architectures; Distributed Catalog Management
Unit 4: NOSQL Databases and Big Data Storage Systems (6 Hrs.)
Introduction to NOSQL Systems; Characteristics of NOSQL System, Categories of NOSQL
Systems, The CAP Theorem; Document-Based NOSQL Systems and MongoDB; NOSQL Key-
Value Stores; Column-Based or Wide Column NOSQL Systems; NOSQL Graph Databases and
Neo4j

Unit 5: Big Data Technologies Based on MapReduce and Hadoop (5 Hrs.)


Introduction to Big Data; Introduction to MapReduce and Hadoop; Hadoop Distributed File
System (HDFS); MapReduce Runtime; Joins in MapReduce, Apache Hive, YARN

Unit 6: Enhanced Data Models: Introduction to Active, Temporal, Spatial, Multimedia, and
Deductive Databases (5 Hrs.)
Active Database Concepts and Triggers; Temporal Database Concepts; Spatial Database Concepts;
Multimedia Database Concepts; Introduction to Deductive Databases

Unit 7: Introduction to Information Retrieval and Web Search (6 Hrs.)


Information Retrieval Concepts; Retrieval Models, Types of Queries in Information Retrieval
Systems; Text Preprocessing; Inverted Indexing; Evaluation Measures of Search Relevance; Web
Search and Analysis; Trends in Information Retrievals

Laboratory Works
Laboratory works include implementing the concepts in above mentioned chapters using
appropriate platforms.

References:
1. Elmasri and Navathe, Fundamentals of Database Systems, Pearson Education, 7th Edition
2. Korth, Silberchatz, Sudarshan , Database System Concepts, McGraw-Hill, 7th Edition
3. Raghu Ramakrishnan, Johannes Gehrke, Database Management Systems, McGraw-Hill
4. Peter Rob and Coronel, Database Systems, Design, Implementation and
Management,Thomson Learning.
5. C.J. Date & Longman, Introduction to Database Systems, Pearson Education

Common questions

Powered by AI

Inverted indexing improves search performance in information retrieval systems by creating a searchable index from the full text of documents, allowing for quick lookup of relevant documents based on query terms. Text preprocessing, including tokenization, stemming, and removal of stopwords, refines the data before indexing, improving both the speed and accuracy of search results. These techniques reduce the size of the index and enhance the relevancy of search results by focusing on meaningful terms .

Evaluation measures used to determine the relevance of search results in information retrieval systems include precision, recall, and F-measure. Precision measures the proportion of relevant documents in the set of all retrieved documents, while recall measures the proportion of relevant documents that were retrieved out of all relevant documents available. The F-measure combines both precision and recall into a single metric by calculating their harmonic mean, providing a balanced evaluation framework .

Object databases extend traditional SQL by incorporating object-oriented concepts into database management systems, enabling the storage and manipulation of objects in a manner similar to object-oriented programming languages. The ODMG object model allows for complex data types and objects with methods, contrasting with traditional relational databases that primarily handle structured data types and utilize tables for organization. The Object Definition Language (ODL) and Object Query Language (OQL) are components of the ODMG standard, enabling object-oriented data modeling and querying unlike SQL which is primarily set-based and table-focused .

Temporal databases handle changing data over time by incorporating time-related data at multiple granularities, such as using time-stamped records to track historical changes and future events. They are particularly useful in applications requiring audit logs, tracking historical data, and time-sensitive information, such as in finance, legal record-keeping, and health records where maintaining a reliable history of changes is crucial .

Spatial databases enhance traditional database models by incorporating spatial data types and enabling spatial indexing, which allows them to efficiently handle queries involving geometric and geographical data. Such queries include spatial joins, nearest neighbor searches, and range queries, which are crucial for applications in geographic information systems (GIS), disaster management, and any domain where location-based data analysis is critical. Spatial databases manage complex relationships and properties inherent in spatial data that traditional databases are not equipped to handle efficiently .

Query pipelining plays a crucial role in improving the efficiency of query processing by allowing the outputs of one operation to be passed directly as inputs to subsequent operations without intermediate storage. This contrasts with traditional query execution, which often involves storing intermediary results temporarily, leading to increased input/output overhead and slower processing times. By employing pipelining, systems can optimize resource use and reduce latency by overlapping multiple query operations .

The CAP theorem states that a distributed database system can only provide two out of the three guarantees at any time: consistency, availability, and partition tolerance. In the design of NOSQL systems, choices concerning these attributes significantly influence their architecture. Some NOSQL systems, like MongoDB, prioritize availability and partition tolerance over strict consistency, allowing for eventual consistency models. Others, like HBase, focus on maintaining consistency and partition tolerance at the cost of availability during certain operations. The design decisions depend on the specific use case and performance requirements .

MapReduce differs from traditional database management systems by using a distributed computing model that breaks down data processing tasks across multiple nodes, handling large data volumes with parallelization. Traditional databases often rely on centralized or single-node processing, which limits scalability with massive datasets. Hadoop leverages MapReduce by using its distributed file system (HDFS) to efficiently store and process data across distributed networks, enabling robust data handling and processing capabilities ideal for big data applications .

Active databases incorporate features that enable automatized responses to certain conditions or events, making them suitable for real-time processing needs. Triggers enhance active databases by providing mechanisms to execute predefined code in response to specified events on a table or view, such as insertions, updates, or deletions. This allows for the automation of tasks like auditing changes, enforcing constraints, and implementing complex business rules without manual intervention .

In distributed databases, query optimization must address challenges such as data fragmentation, replication, and network latency which do not exist in single-server environments. Strategies include optimizing the placement of data fragments to minimize cross-site communication, utilizing distributed transaction management to ensure consistency, and employing advanced algorithms for query processing across multiple nodes. Additionally, cost-based optimization must consider network cost and distribution of data when choosing execution plans, unlike in single-server scenarios where the focus is primarily on single-location resource use .

You might also like