0% found this document useful (0 votes)

8 views4 pages

Big Data Analysis Course Syllabus

Uploaded by

harsh958098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views4 pages

Big Data Analysis Course Syllabus

Uploaded by

harsh958098

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

BIG DATA ANALYSIS

Syllabus and Course Outline

Credits: 03
Programs:
[Link]. (CSE) 7th Semester: Paper – 1 (BCS701)
[Link]. (CS) 3rd Semester: Paper – 5: Elective I (MCS651)

Prerequisites:
Students should have knowledge of one Object Oriented Programming Language (Java or
Python or Scala), Database, SQL, and basic hands-on Linux operating system. However, a prior
course in Data warehousing and mining, Machine Learning, Parallel and Distributed Computing
will help in quick learning.

Course Objectives:
1. To introduce the fundamental concepts in big data analytics
2. To learn the data analysis with R
3. To understand the design principles of Hadoop
4. To learn programming with the MapReduce concepts
5. To learn how to design scalable and distributed applications with Hadoop
6. To introduce the tools of the Hadoop ecosystem
7. To understand the document-oriented database system
8. To study the security and privacy issues of big data

Learning Outcomes:
On completion of this course, the students will be able to

1. Define data science, big data, and associated terminologies

2. Understand the fundamentals of Big Data Analytics techniques and its applications in
various sectors
3. Apply R tool for data analysis
4. Analyze the Hadoop system and various components of its Ecosystem
5. Use HDFS and Apply MapReduce to develop big data applications
6. Understand the NoSQL databases
7. Understand the security issue of big data and its implementation with Hadoop
Course Contents:
Unit 1: Introduction: 08 Lectures

Data Science, Big Data and its importance, Prediction vs. Inference, Statistical learning,
Unsupervised and Supervised learning, Drivers for Big data, Big data analytics, Big data
applications, Basic R concepts, Data transformation and data visualization in R.

Unit 2: Hadoop: 08 Lectures

Introduction to Hadoop and Hadoop Architecture, Apache Hadoop & Hadoop EcoSystem,
Moving Data in and out of Hadoop, Understanding inputs and outputs of MapReduce.

Unit 3: Querying in Big Data: 08 Lectures

HDFS Overview, Hive Architecture, Comparison with Traditional Database, HiveQL Querying
Data, Sorting and Aggregating, Map Reduce Scripts, Joins & Sub queries, HBase concepts,
Advanced Usage, Schema Design, Advance Indexing, PIG, Zookeeper, HBase uses Zookeeper.

Unit 4: Data Base for the Modern Web: 08 Lectures

Introduction to Mongo DB key features, Core Server tools, Mongo DB through the JavaScript’s
Shell, Creating and Querying through Indexes, Document-Oriented, principles of schema design,
Constructing queries on Databases, collections and Documents, MongoDB Query Language.

Unit 5: Big Data Security: 08 Lectures

Big Data Privacy, Ethics and Security, Steps to secure big data, Cloud security, Hadoop Security
Design, Hadoop Kerberos Security Implementation & Configuration, Audit logging in Hadoop
cluster, Data security and event logging.

Recommended Readings:
Text and Reference Books:

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

John Wiley & Sons, Inc.
2. Chris Eaton, Dirk Derooset. al. , “Understanding Big data ”, McGraw Hill
3. Kyle Banker, Piter Bakkum, Shaun Verch, Douglas Garrett, Tim Hawkins, “MongoDB in
Action”, Manning Publications Co.
4. Tom White, “HADOOP: The definitive Guide”, O Reilly, Media, Inc.
5. Vignesh Prajapati, “Big Data Analytics with R and Hadoop”, Packet Publishing.
6. Bill Chambers & Matei Zaharia, Spark The Definitive Guide, O'Reilly Media, Inc.
7. Luis Torgo, Data Mining with R: Learning with Case Studies, Second Edition (Chapman
& Hall/CRC Data Mining and Knowledge Discovery Series)
8. Mark Kerzner and Sujee Maniyam, Hadoop Illuminated,
[Link]
9. Jimmy Lin and Chris Dyer, Data-Intensive Text Processing with MapReduce,
[Link]
10. Anand Rajaraman, Jeffrey D. Ullman, Mining of Massive Datasets,
[Link]

Lecture Notes/Slides

Will be provided during lectures.

Tutorials

Will be provided during lectures.

Research Articles

1. Undefined By Data: A Survey of Big Data Definitions, [Link]

2. Big Data Analytics: A Survey, [Link]
3. A Survey on Platforms for Big Data Analytics, [Link]
6
4. The Google File System,
[Link]
[Link]
5. MapReduce: Simplified Data Processing on Large Clusters,
[Link]
[Link]
6. Spark: Cluster Computing with Working Sets,
[Link]
7. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster
Computing, [Link]

Web Resources:

Assessment Scheme:
Quizzes/Test: 10%
Assignments: 10%
Mid Term Examination: 20%
End Term Examination: 60%

Assignments:
Lab Assignments
Presentation Assignments

Common questions

MapReduce plays a critical role in big data applications by enabling the processing of vast data sets in parallel across a distributed computing environment. However, challenges include the complexity of writing efficient MapReduce programs and the overhead associated with transferring large data sets between distributed systems, which can impact performance and require optimization .

HiveQL is designed for querying and managing large datasets stored in distributed storage using Apache Hadoop, while traditional SQL databases are typically used for structured, transactional data management. HiveQL provides scalability and ease of use on large data sets through optimized queries and supports batch processing, which makes it preferable in big data analytics over traditional SQL, which is limited by the capacity of single-node systems .

The course tackles big data security by focusing on privacy, ethics, and security issues specific to big data environments. It discusses steps such as cloud security, Hadoop security design, Kerberos security implementation, and audit logging in Hadoop clusters. These measures help in securing data storage and processing operations against unauthorized access and breaches .

Hadoop's design principles focus on processing large data sets across distributed clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. This architecture enables it to handle large-scale data processing, ensuring reliability and scalability in distributed applications .

Cloud security solutions for big data involve leveraging cloud-based services to ensure data protection, compliance, and secure access controls. The course explains integration with Hadoop, utilizing cloud infrastructure for scalable storage and processing capabilities, while implementing security protocols such as encrypted data transmission and authenticated access, thereby addressing vulnerabilities in big data environments .

The course integrates the R tool for data analysis by teaching students the fundamentals of data science and big data, followed by data transformation and visualization using R. The R tool is applied to analyze big data, thus equipping students with practical skills for handling data sets .

The course adopts pedagogical strategies such as participative learning, discussions, the use of flowcharts, algorithm creation, program writing, and assignments. These methods encourage active engagement, foster understanding through practical application, and develop problem-solving skills essential for mastering big data analysis concepts .

Key features of MongoDB include its document-oriented data model, dynamic schemas, and scalability across distributed systems. These features support flexibility and efficient data management, which are essential for modern web applications that handle large volumes of data with varying structures. Its powerful querying capabilities through JavaScript's shell also make it an ideal choice for developers .

The course provides an understanding of NoSQL databases by discussing their applications in big data environments, focusing on aspects such as flexibility, schema design, and handling of diverse data types. It emphasizes hands-on experience through practical work with tools like MongoDB and Hadoop's ecosystem, enabling students to manage and analyze large, unstructured datasets effectively .

Both unsupervised and supervised learning techniques are crucial in big data analytics as they provide comprehensive tools for data exploration and decision-making. Supervised learning is used for predictive modeling and involves using labeled datasets to train algorithms, whereas unsupervised learning helps in identifying patterns and structures from data without labels, facilitating deeper insights into complex datasets .

Big Data Analysis Course Syllabus
No ratings yet
Big Data Analysis Course Syllabus
3 pages
Big Data Course Overview and Lab Guide
No ratings yet
Big Data Course Overview and Lab Guide
2 pages
Big Data Analysis Course Overview
No ratings yet
Big Data Analysis Course Overview
5 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
3 pages
Big Data Analytics Overview and Tools
No ratings yet
Big Data Analytics Overview and Tools
131 pages
Big Data Analytics Foundation Course
No ratings yet
Big Data Analytics Foundation Course
6 pages
Big Data Syllabus for VIII Semester
No ratings yet
Big Data Syllabus for VIII Semester
1 page
1st Year CSE Big Data Syllabus
No ratings yet
1st Year CSE Big Data Syllabus
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
MDT Course Syllabus AY-2024-25 Onwards Big Data Analytic
No ratings yet
MDT Course Syllabus AY-2024-25 Onwards Big Data Analytic
4 pages
About The Course
No ratings yet
About The Course
10 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data Analytics with Hadoop Course
No ratings yet
Big Data Analytics with Hadoop Course
2 pages
Data Mining and Big Data Analytics Course
No ratings yet
Data Mining and Big Data Analytics Course
12 pages
Big Data Analytics Syllabus Overview
No ratings yet
Big Data Analytics Syllabus Overview
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
3 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Big Data Analytics Lecture Notes
No ratings yet
Big Data Analytics Lecture Notes
119 pages
Big Data Analytics Course Curriculum
No ratings yet
Big Data Analytics Course Curriculum
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
68 pages
Big Data and Analytics-UIS038E
No ratings yet
Big Data and Analytics-UIS038E
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Introduction to Data Analytics Syllabus
No ratings yet
Introduction to Data Analytics Syllabus
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
19 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
59 pages
ME - CSE SYLLABUSbig Data - CDR
No ratings yet
ME - CSE SYLLABUSbig Data - CDR
3 pages
Veracity in Big Data Concepts
No ratings yet
Veracity in Big Data Concepts
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
1 page
Lesson Plan - Big Data Analytics (30 Lectures)
No ratings yet
Lesson Plan - Big Data Analytics (30 Lectures)
4 pages
Data Science Course Syllabus Overview
0% (1)
Data Science Course Syllabus Overview
1 page
Overview of Big Data Analytics
No ratings yet
Overview of Big Data Analytics
134 pages
Understanding Big Data Analytics
No ratings yet
Understanding Big Data Analytics
84 pages
Big Data Systems Course Overview
No ratings yet
Big Data Systems Course Overview
6 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
15 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
1 page
Big Data Analytics Course Outline 2020
No ratings yet
Big Data Analytics Course Outline 2020
3 pages
Big Data and Hadoop Q&A Guide
No ratings yet
Big Data and Hadoop Q&A Guide
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics and EDA Course Overview
No ratings yet
Big Data Analytics and EDA Course Overview
5 pages
BE Elective BigDataComputing
No ratings yet
BE Elective BigDataComputing
4 pages
Big Data For BUsiness Analytics
No ratings yet
Big Data For BUsiness Analytics
3 pages
Big Data Analytics Syllabus - Sem VII
No ratings yet
Big Data Analytics Syllabus - Sem VII
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
BCA Big Data Major Syllabus 2023-24
No ratings yet
BCA Big Data Major Syllabus 2023-24
23 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
CCS334 Big Data Analytics Syllabus
No ratings yet
CCS334 Big Data Analytics Syllabus
5 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Course Overview and Resources
No ratings yet
Big Data Course Overview and Resources
3 pages
Naresh IT Data Science Course Overview
No ratings yet
Naresh IT Data Science Course Overview
13 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
LBS MCA Entrance Exam Question Paper 2025
100% (1)
LBS MCA Entrance Exam Question Paper 2025
10 pages
OIM Database Table Overview
No ratings yet
OIM Database Table Overview
1 page
Enterprise Programming Viva Questions Guide
No ratings yet
Enterprise Programming Viva Questions Guide
7 pages
SQL SELECT Query Basics Guide
No ratings yet
SQL SELECT Query Basics Guide
2 pages
Key SQL MCQs & PYQs for CBSE Class 12
No ratings yet
Key SQL MCQs & PYQs for CBSE Class 12
10 pages
SQL Function and Procedure Exam Review
No ratings yet
SQL Function and Procedure Exam Review
84 pages
DBMS Semester Notes & Exam Questions
No ratings yet
DBMS Semester Notes & Exam Questions
3 pages
Database Systems Assignment #03
100% (6)
Database Systems Assignment #03
7 pages
Database MCQs and SQL Concepts
No ratings yet
Database MCQs and SQL Concepts
6 pages
SQL Interview Cheat Sheet Guide
No ratings yet
SQL Interview Cheat Sheet Guide
7 pages
Understanding Unnormalized Data
No ratings yet
Understanding Unnormalized Data
40 pages
Canonical Cover and Normal Forms Analysis
No ratings yet
Canonical Cover and Normal Forms Analysis
4 pages
Integrity Constraints in DBMS Explained
No ratings yet
Integrity Constraints in DBMS Explained
13 pages
SQL Queries for Client and Product Management
No ratings yet
SQL Queries for Client and Product Management
7 pages
Relational Data Model & Constraints Overview
No ratings yet
Relational Data Model & Constraints Overview
20 pages
ABAP SELECT Statement Guide
No ratings yet
ABAP SELECT Statement Guide
19 pages
Understanding Relational Model Concepts
No ratings yet
Understanding Relational Model Concepts
27 pages
Laravel Student Activity System Guide
No ratings yet
Laravel Student Activity System Guide
13 pages
Tugas Kelompok: Basis Data Bab 9
No ratings yet
Tugas Kelompok: Basis Data Bab 9
2 pages
NoSQL vs SQL: Social Networking Insights
No ratings yet
NoSQL vs SQL: Social Networking Insights
10 pages
Understanding Functional Dependencies in DBMS
No ratings yet
Understanding Functional Dependencies in DBMS
9 pages
Introduction to Database Systems
No ratings yet
Introduction to Database Systems
23 pages
Types of Database Keys Explained
No ratings yet
Types of Database Keys Explained
30 pages
Database Management Systems: Key Concepts
No ratings yet
Database Management Systems: Key Concepts
2 pages
PL/SQL Fundamentals Overview
No ratings yet
PL/SQL Fundamentals Overview
40 pages
Sales Analysis of Blinkit Data
No ratings yet
Sales Analysis of Blinkit Data
4 pages
23BCS12640 Adbms 4
No ratings yet
23BCS12640 Adbms 4
7 pages
COBOL DB2 Precompilation Process Guide
0% (1)
COBOL DB2 Precompilation Process Guide
10 pages
TCS Prime Interview Questions Guide
No ratings yet
TCS Prime Interview Questions Guide
8 pages
Database Environment Overview and Diagram
No ratings yet
Database Environment Overview and Diagram
18 pages

Big Data Analysis Course Syllabus

Uploaded by

Big Data Analysis Course Syllabus

Uploaded by

BIG DATA ANALYSIS

Syllabus and Course Outline

1. Define data science, big data, and associated terminologies

Unit 2: Hadoop: 08 Lectures

Unit 3: Querying in Big Data: 08 Lectures

Unit 4: Data Base for the Modern Web: 08 Lectures

Unit 5: Big Data Security: 08 Lectures

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

Will be provided during lectures.

Will be provided during lectures.

1. Undefined By Data: A Survey of Big Data Definitions, [Link]

1. The R Project for Statistical Computing, [Link]

Common questions

Discuss the role and challenges of applying MapReduce in developing big data applications as outlined in the course.

Compare and contrast HiveQL and traditional SQL databases in terms of functionality and application in big data analytics.

How does the course address the security concerns associated with handling big data, and what are some methods discussed to mitigate these issues?

What are the design principles of Hadoop as discussed in the course, and how do they contribute to scalable and distributed applications?

Evaluate the implications of using cloud security solutions as discussed in the course, and how do they integrate with big data technologies like Hadoop?

How does the course integrate the R tool for data analysis in the context of big data analytics?

What pedagogical strategies does the course employ to promote active learning and understanding of big data analysis concepts?

Identify key features of MongoDB that are presented in the course and explain their relevance in modern web applications.

How does the course structure facilitate a comprehensive understanding of NoSQL databases, particularly in the context of big data?

Explain the importance of understanding both unsupervised and supervised learning techniques in the context of big data analytics.

You might also like