0% found this document useful (0 votes)

47 views6 pages

Big Data Course Syllabus Overview

Uploaded by

Dr. Neetu Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views6 pages

Big Data Course Syllabus Overview

Uploaded by

Dr. Neetu Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

BCA-IOP BADA Theory Practical

Name of The Course

Introduction to Big Data

Science L T P C IA MTE ETE PR ETE

Course Code BCABA1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

Course Objectives:

The student should be made to:

Course Outcomes

CO1 Describe what Data Science is and the skill sets needed to be a data scientist.

CO2 Explain in basic terms what Statistical Inference means. Identify probability distributions

commonly used as foundations for statistical modeling. Fit a model to data

CO3 Explain the significance of exploratory data analysis (EDA) in data science. Apply basic

tools (plots, graphs, summary statistics) to carry out EDA.

CO4 Describe the Data Science Process and how its components interact. Use APIs and other

tools to scrap the Web and collect data.

CO5 Identify and explain fundamental mathematical and algorithmic ingredients that constitute a

Recommendation Engine (dimensionality reduction, singular value decomposition,

principal component analysis). Build their own recommendation system using existing

components.

CO6 Describe advances and the latest trends in data science.

Text Book (s)

[Link] O‟Neil and Rachel Schutt. Doing Data Science, Straight Talk From The Frontline. O‟Reilly. 2014.

Reference Book (s)

1. Jure Leskovek, Anand Rajaraman and Jeffrey Ullman. Mining of Massive Datasets. v2.1, Cambridge

University Press. 2014. (free online)

2. Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.

3. Foster Provost and Tom Fawcett. Data Science for Business: What You Need to Know about Data
Mining

and Data-analytic Thinking. ISBN 1449361323. 2013.

4. Trevor Hastie, Robert Tibshirani and Jerome Friedman. Elements of Statistical Learning, Second
Edition.

ISBN 0387952845. 2009. (free online)

5. Avrim Blum, John Hopcroft and Ravindran Kannan. Foundations of Data Science. (Note: this is a book

currently being written by the three authors. The authors have made the first draft of their notes for the

book available online. The material is intended for a modern theoretical course in computer science.)

6. Mohammed J. Zaki and Wagner Miera Jr. Data Mining and Analysis: Fundamental Concepts and

Algorithms. Cambridge University Press. 2014.

7. Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, Third Edition. ISBN

0123814790. 2011.

Unit-1 Introduction to BI 8 hours

What is Data Science? - Big Data and Data Science hype – and getting past the hype - Why now? –

Datafication - Current landscape of perspectives - Skill sets needed 2. Statistical Inference -

Populations and samples - Statistical modelling, probability distributions, fitting a model - Intro to

Unit-2 . Exploratory Data Analysis and the Data Science Process 8 hours

Exploratory Data Analysis and the Data Science Process - Basic tools (plots, graphs and summary

statistics) of EDA - Philosophy of EDA - The Data Science Process - Case Study: RealDirect

(online real estate firm) 4. Three Basic Machine Learning Algorithms - Linear Regression - k-
Nearest Neighbors (k-NN) - k-means.

Unit-3 Machine Learning Algorithm and Usage in Applications 8 hours

Motivating application: Filtering Spam - Why Linear Regression and k-NN are poor choices for

Filtering Spam - Naive Bayes and why it works for Filtering Spam - Data Wrangling: APIs and

other tools for scrapping the Web 6. Feature Generation and Feature Selection (Extracting Meaning

From Data) - Motivating application: user (customer) retention - Feature Generation

(brainstorming, role of domain expertise, and place for imagination) - Feature Selection algorithms

– Filters; Wrappers; Decision Trees; Random Forests.

Unit-4 Building a User-Facing Data Product 8 hours

Algorithmic ingredients of a Recommendation Engine - Dimensionality Reduction - Singular Value

Decomposition - Principal Component Analysis - Exercise: build your own recommendation

system 8. Mining Social-Network Graphs - Social networks as graphs - Clustering of graphs -

Direct discovery of communities in graphs - Partitioning of graphs - Neighborhood properties in

graphs.

Unit-5 Data Visualization and Ethical Issues 8 hours

Basic principles, ideas and tools for data visualization , Examples of inspiring (industry) projects -

Exercise: create your own visualization of a complex dataset Discussions on privacy, security,

ethics - A look back at Data Science - Next-generation data scientists.

Unit-6 Research 8 hours

The advances and the latest trends in the course as well as the latest applications of the areas

covered in the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of

Science and SCOPUS indexed journals as well as high impact factor conferences as well as

symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the

course and patents filed in the areas covered.

BCA-IOP Big Data Theory Practical

Name of The Course

Foundation of Big Data

System L T P C IA MTE ETE PR ETE

Course Code BCABI1101 3 0 2 4 20 15 30 15 20

Prerequisite

Co requisite

Ant requisite

COURSE OBJECTIVES:

Understanding Data Science Process and learning techniques, tools, Statistical Methodologies and

Machine learning algorithms used in the process.

COURSE OUTCOMES:

Course Outcomes

CO1 Students should know about design issues of Hadoop Architecture.

CO2 Students should learn various techniques for big data analytics.

CO3 Students able to identify the real time problems and able to design solution using

various big data analytics techniques.

CO4 Students use prediction of supervised and unsupervised learning.

CO5 Students can use classification of clustering algorithms

CO6 Student can understand current research trends in big data

COURSE CONTENT: Hours

UNIT I INTRODUCTION TO BIG DATA: 9

Introduction – distributed file system – Big Data and its importance, Four V‟s in bigdata, Drivers for Big
data, Big data analytics, Big data applications. Algorithms using map reduce, Matrix-Vector

Multiplication by Map Reduce.

UNIT II INTRODUCTION HADOOP : 9

Big Data – Apache Hadoop & Hadoop EcoSystem – Moving Data in and out of Hadoop –

Understanding inputs and outputs of MapReduce - Data Serialization.

UNIT- III HADOOP ARCHITECTURE: 9

Hadoop Architecture, Hadoop Storage: HDFS, Common Hadoop Shell commands , Anatomy of File

Write and Read., NameNode, Secondary NameNode, and DataNode, Hadoop MapReduce paradigm,

Map and Reduce tasks, Job, Tasktrackers - Cluster Setup – SSH & Hadoop Configuration – HDFS

Administering –Monitoring & Maintenance.

UNIT-IV HADOOP ECOSYSTEM AND YARN : 9 Hadoop

ecosystem components - Schedulers - Fair and Capacity, Hadoop 2.0 New Features- NameNode High

Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

UNIT-V HIVE AND HIVEQL, HBASE: 9 Hive

Architecture and Installation, Comparison with Traditional Database, HiveQL - Querying Data - Sorting

And Aggregating, Map Reduce Scripts, Joins & Subqueries, HBase concepts- Advanced Usage, Schema

Design, Advance Indexing - PIG, Zookeeper - how it helps in monitoring a cluster, HBase uses

Zookeeper and how to Build Applications with Zookeeper.

Unit VI 5 hours

The advances and the latest trends in the course as well as the latest applications of the areas covered in
the course.

The latest research conducted in the areas covered in the course.

Discussion of some latest papers published in IEEE transactions and ACM transactions, Web of Science
and

SCOPUS indexed journals as well as high impact factor conferences as well as symposiums.

Discussion on some of the latest products available in the market based on the areas covered in the
course and

patents filed in the areas covered.

Reference Books

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

2. Wiley, ISBN: 9788126551071, 2015.

3. Chris Eaton, Dirk deroos et al. , “Understanding Big data ”, McGraw Hill, 2012.

4. Tom White, “HADOOP: The definitive Guide” , O Reilly 2012.

5. Vignesh Prajapati, “Big Data Analytics with R and Haoop”, Packet Publishing 2013.

6. Tom Plunkett, Brian Macdonald et al, “Oracle Big Data Handbook”, Oracle Press, 2014.

7. Jy Liebowitz, “Big Data and Business analytics”,CRC press, 2013.

Common questions

The architectural components of Hadoop include the HDFS for data storage, MapReduce for processing, YARN for resource management, and a variety of ecosystem tools like Hive, Pig, and HBase for data querying and management. These components contribute to Hadoop's functionality by enabling efficient storage and processing of large datasets, facilitating parallel processing, and supporting diverse data formats and analytical needs, making it a foundation for big data analytics .

The skill sets necessary for a proficient data scientist include a strong understanding of statistical inference and modeling, proficiency in data exploration and visualization tools, an ability to use APIs and web scraping tools for data collection, and knowledge of recommendation system algorithms such as dimensionality reduction and singular value decomposition. These skills are essential to navigate the current landscape of data science and its applications .

Hadoop's distributed file system (HDFS) offers significant advantages in managing big data operations due to its ability to store massive datasets across multiple nodes, ensuring data redundancy and fault tolerance. Its scalability allows for easy expansion by adding more nodes, while the file system's architecture supports the parallel processing of data, significantly improving data retrieval and processing speeds .

Current research trends in big data and data science, such as advancements in machine learning algorithms, real-time data processing, and privacy-preserving analytics, are significantly influencing industry practices by promoting more efficient, accurate, and ethical data handling. These trends enable businesses to derive deeper insights, optimize operational processes, and provide better customer experiences, highlighting the intersection of academic research and applied industry solutions .

Feature generation and selection significantly impact the effectiveness of data analytics applications by enhancing model accuracy and interpretability. Effective feature generation involves incorporating domain expertise and imaginative strategies to create relevant features, while feature selection algorithms like filters, wrappers, and decision trees reduce dimensionality, concentrating on the most informative features. This improves computational efficiency, reduces overfitting, and enhances model performance .

The Four V's of big data—Volume, Velocity, Variety, and Veracity—highlight its significance by illustrating the vast quantities of data generated rapidly from diverse sources, and the need for accuracy and trustworthiness in data processed. These characteristics present challenges in terms of storage, handling, and analysis, necessitating efficient processing tools and sophisticated analytical capabilities such as those provided by Hadoop and other big data technologies .

Naive Bayes models are considered effective for spam filtering because they inherently assume feature independence, which simplifies computation and can effectively capture the probability of a message being spam based on individual words. In contrast, linear regression and k-nearest neighbors often require larger datasets and more complex computations to model non-linear relationships effectively, making them less suitable for the high-dimensional, sparse data typically seen in spam filtering .

Primary ethical considerations in data science regarding data visualization and privacy include ensuring data integrity and accuracy, avoiding misleading representations, and safeguarding user privacy by preventing unauthorized access to personal data. It is also essential to be transparent about the methodology and maintain objectivity to avoid biases that could lead to incorrect conclusions or harm to individuals or groups .

Dimensionality reduction techniques like Singular Value Decomposition (SVD) are crucial in building recommendation engines as they compress the feature space of data, enhance computational efficiency, and help in dealing with the sparsity of user-item interactions. SVD identifies the underlying structure in the data, thereby helping to recommend items by capturing latent factors that represent user preferences and item characteristics .

The RealDirect case study illustrates the principles of Exploratory Data Analysis (EDA) by demonstrating the application of basic tools such as plots, graphs, and summary statistics to understand data patterns and inform decision-making. It exemplifies the data science process by showing how data can be systematically analyzed to derive actionable insights, critical for problem-solving and strategy formulation in a competitive online real estate market .

Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
59 pages
Big Data Analysis Course Syllabus
No ratings yet
Big Data Analysis Course Syllabus
3 pages
Data Science Course Syllabus Overview
0% (1)
Data Science Course Syllabus Overview
1 page
Big Data Analytics Foundation Course
No ratings yet
Big Data Analytics Foundation Course
6 pages
Big Data - SRM University PDF
No ratings yet
Big Data - SRM University PDF
29 pages
Big Data For BUsiness Analytics
No ratings yet
Big Data For BUsiness Analytics
3 pages
Big Data Analysis Course Overview
No ratings yet
Big Data Analysis Course Overview
5 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
7 pages
COMP9313: Big Data Management Overview
No ratings yet
COMP9313: Big Data Management Overview
79 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
4 pages
Sem 7 Machine Learning Syllabus
No ratings yet
Sem 7 Machine Learning Syllabus
10 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
1 page
M.Tech Big Data Analytics Curriculum
No ratings yet
M.Tech Big Data Analytics Curriculum
32 pages
Data Science & Big Data Course Overview
No ratings yet
Data Science & Big Data Course Overview
19 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
97 pages
Data Science and Big Data Analytics Course
No ratings yet
Data Science and Big Data Analytics Course
119 pages
Big Data Analysis Course Syllabus
No ratings yet
Big Data Analysis Course Syllabus
4 pages
Big Data Integration & Machine Learning Course
No ratings yet
Big Data Integration & Machine Learning Course
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
Big Data Analytics Syllabus 5 Units
No ratings yet
Big Data Analytics Syllabus 5 Units
1 page
Introduction To Data Science (Updated Syllabus)
No ratings yet
Introduction To Data Science (Updated Syllabus)
2 pages
Big Data Analytics in Data Science
No ratings yet
Big Data Analytics in Data Science
18 pages
22cs702 Data Analytics Unit-3
No ratings yet
22cs702 Data Analytics Unit-3
153 pages
Big Data Analytics and EDA Course Overview
No ratings yet
Big Data Analytics and EDA Course Overview
5 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
2 pages
Data Analytics & Visualization Syllabus
No ratings yet
Data Analytics & Visualization Syllabus
3 pages
CIT 4401 Big Data Analytics Course Outline
No ratings yet
CIT 4401 Big Data Analytics Course Outline
5 pages
Data Mining and Big Data Analytics Course
No ratings yet
Data Mining and Big Data Analytics Course
12 pages
Data Science Course Syllabus Overview
No ratings yet
Data Science Course Syllabus Overview
9 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
2 pages
Data Science & Big Data Analytics Syllabus
No ratings yet
Data Science & Big Data Analytics Syllabus
2 pages
Big Data & Data Analytics Training Course
No ratings yet
Big Data & Data Analytics Training Course
5 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
6 pages
Naresh IT Data Science Course Overview
No ratings yet
Naresh IT Data Science Course Overview
13 pages
MDT Course Syllabus AY-2024-25 Onwards Big Data Analytic
No ratings yet
MDT Course Syllabus AY-2024-25 Onwards Big Data Analytic
4 pages
00 Intro To Big Data Analytics
No ratings yet
00 Intro To Big Data Analytics
16 pages
Data Science Comprehensive Notes PDF
No ratings yet
Data Science Comprehensive Notes PDF
3 pages
Veracity in Big Data Concepts
No ratings yet
Veracity in Big Data Concepts
3 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
4 pages
Big Data & Analytics Course Outline
No ratings yet
Big Data & Analytics Course Outline
6 pages
Data Science and Big Data Analytics Course
No ratings yet
Data Science and Big Data Analytics Course
47 pages
Big Data Course Overview and Insights
No ratings yet
Big Data Course Overview and Insights
7 pages
Big Data Analytics Syllabus - Sem VII
No ratings yet
Big Data Analytics Syllabus - Sem VII
4 pages
MCA II Year Course Structure Overview
No ratings yet
MCA II Year Course Structure Overview
9 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
2 pages
BigData Syllabus August2021
No ratings yet
BigData Syllabus August2021
7 pages
Lesson One
No ratings yet
Lesson One
13 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
1 page
Data Science Course Overview and Modules
No ratings yet
Data Science Course Overview and Modules
1 page
Data Analytics Lesson Plan - Students - 2024
No ratings yet
Data Analytics Lesson Plan - Students - 2024
12 pages
Big Data Analytics Course Curriculum
No ratings yet
Big Data Analytics Course Curriculum
3 pages
Data Science Final Exam Overview
No ratings yet
Data Science Final Exam Overview
3 pages
About The Course
No ratings yet
About The Course
10 pages
Big Data Analytics Course Syllabus
No ratings yet
Big Data Analytics Course Syllabus
3 pages
MongoDB: Security, Queries, and Schema Management
No ratings yet
MongoDB: Security, Queries, and Schema Management
3 pages
Understanding Inheritance in Java
No ratings yet
Understanding Inheritance in Java
8 pages
Understanding Cyber Security Essentials
No ratings yet
Understanding Cyber Security Essentials
16 pages
Java Collections Overview and Usage
No ratings yet
Java Collections Overview and Usage
32 pages
Java Inheritance: Concepts and Examples
No ratings yet
Java Inheritance: Concepts and Examples
41 pages
Big Data Technologies Course Overview
No ratings yet
Big Data Technologies Course Overview
11 pages
Cloud Computing Course Overview
No ratings yet
Cloud Computing Course Overview
8 pages
Geosynthetic Encased Columns For Soft Soil Improvement Marcio Almeida Ebook Complete Online Chapters
100% (6)
Geosynthetic Encased Columns For Soft Soil Improvement Marcio Almeida Ebook Complete Online Chapters
63 pages
Understanding Cardiac Rhythm Disorders
No ratings yet
Understanding Cardiac Rhythm Disorders
49 pages
Pharmacology For The Surgical Technologist 5th Edition Tiffany Howe Angela Burton Ebook New Format 2026
100% (3)
Pharmacology For The Surgical Technologist 5th Edition Tiffany Howe Angela Burton Ebook New Format 2026
32 pages
Mechanical BOQ for Packing Station
No ratings yet
Mechanical BOQ for Packing Station
9 pages
PMS 2022 Exam Results and Next Steps
No ratings yet
PMS 2022 Exam Results and Next Steps
3 pages
Canine Glomerular Disease Overview
No ratings yet
Canine Glomerular Disease Overview
16 pages
Junior Executive (ATC) Exam Candidates List
No ratings yet
Junior Executive (ATC) Exam Candidates List
4 pages
Year-End Exam for Class C Students
No ratings yet
Year-End Exam for Class C Students
5 pages
Philoxenia: A Seat at My Table Cookbook
No ratings yet
Philoxenia: A Seat at My Table Cookbook
300 pages
Unmasking Creativity in Art Forms
86% (14)
Unmasking Creativity in Art Forms
3 pages
Final Project Draft PDF
No ratings yet
Final Project Draft PDF
27 pages
Deflection Analysis in Prestressed Concrete
No ratings yet
Deflection Analysis in Prestressed Concrete
85 pages
Understanding Solutions and Concentration
No ratings yet
Understanding Solutions and Concentration
2 pages
Proteus Bacteria: Classification & Pathogenicity
No ratings yet
Proteus Bacteria: Classification & Pathogenicity
17 pages
Barriers to NHS Communication
No ratings yet
Barriers to NHS Communication
9 pages
Edux110 Sim
No ratings yet
Edux110 Sim
307 pages
How to Write a Critical Comment
No ratings yet
How to Write a Critical Comment
4 pages
Family Trip Guide to Venice 2026
No ratings yet
Family Trip Guide to Venice 2026
3 pages
Understanding Network Topology Types
No ratings yet
Understanding Network Topology Types
54 pages
Turbomachinery Fundamentals and Analysis
No ratings yet
Turbomachinery Fundamentals and Analysis
2 pages
MARK SCHEME For The November 2004 Question Paper: University of Cambridge International Examinations
No ratings yet
MARK SCHEME For The November 2004 Question Paper: University of Cambridge International Examinations
8 pages
SAP and Oracle Downloadable Files
No ratings yet
SAP and Oracle Downloadable Files
3 pages
IQ8Control M Fire Alarm System Overview
No ratings yet
IQ8Control M Fire Alarm System Overview
4 pages
Image Collection: Clinton/Morocco Nudes
33% (3)
Image Collection: Clinton/Morocco Nudes
86 pages
Tension in Hanging Block System
No ratings yet
Tension in Hanging Block System
8 pages
HMI Functions for SPEEDTRONIC Controllers
No ratings yet
HMI Functions for SPEEDTRONIC Controllers
24 pages
Percentage, Fraction, Decimal Worksheet
No ratings yet
Percentage, Fraction, Decimal Worksheet
3 pages
Labview - Digital Filter Design Toolkit Reference Manual
No ratings yet
Labview - Digital Filter Design Toolkit Reference Manual
59 pages
Dimensional Analysis in Engineering
No ratings yet
Dimensional Analysis in Engineering
11 pages
Aerospace Material Specification: (R) Tolerances Low-Alloy Steel Sheet, Strip, and Plate
No ratings yet
Aerospace Material Specification: (R) Tolerances Low-Alloy Steel Sheet, Strip, and Plate
9 pages

Big Data Course Syllabus Overview

Uploaded by

Big Data Course Syllabus Overview

Uploaded by

BCA-IOP BADA Theory Practical

Name of The Course

Introduction to Big Data

Science L T P C IA MTE ETE PR ETE

Course Code BCABA1101 3 0 2 4 20 15 30 15 20

The student should be made to:

commonly used as foundations for statistical modeling. Fit a model to data

tools (plots, graphs, summary statistics) to carry out EDA.

tools to scrap the Web and collect data.

Recommendation Engine (dimensionality reduction, singular value decomposition,

CO6 Describe advances and the latest trends in data science.

Text Book (s)

Reference Book (s)

University Press. 2014. (free online)

2. Kevin P. Murphy. Machine Learning: A Probabilistic Perspective. ISBN 0262018020. 2013.

and Data-analytic Thinking. ISBN 1449361323. 2013.

ISBN 0387952845. 2009. (free online)

Algorithms. Cambridge University Press. 2014.

Unit-1 Introduction to BI 8 hours

Datafication - Current landscape of perspectives - Skill sets needed 2. Statistical Inference -

Unit-3 Machine Learning Algorithm and Usage in Applications 8 hours

From Data) - Motivating application: user (customer) retention - Feature Generation

– Filters; Wrappers; Decision Trees; Random Forests.

Unit-4 Building a User-Facing Data Product 8 hours

Algorithmic ingredients of a Recommendation Engine - Dimensionality Reduction - Singular Value

Decomposition - Principal Component Analysis - Exercise: build your own recommendation

system 8. Mining Social-Network Graphs - Social networks as graphs - Clustering of graphs -

Direct discovery of communities in graphs - Partitioning of graphs - Neighborhood properties in

Unit-5 Data Visualization and Ethical Issues 8 hours

ethics - A look back at Data Science - Next-generation data scientists.

Unit-6 Research 8 hours

covered in the course.

The latest research conducted in the areas covered in the course.

course and patents filed in the areas covered.

Name of The Course

Foundation of Big Data

System L T P C IA MTE ETE PR ETE

Course Code BCABI1101 3 0 2 4 20 15 30 15 20

Machine learning algorithms used in the process.

CO1 Students should know about design issues of Hadoop Architecture.

various big data analytics techniques.

CO4 Students use prediction of supervised and unsupervised learning.

CO5 Students can use classification of clustering algorithms

CO6 Student can understand current research trends in big data

COURSE CONTENT: Hours

UNIT I INTRODUCTION TO BIG DATA: 9

Multiplication by Map Reduce.

UNIT II INTRODUCTION HADOOP : 9

Understanding inputs and outputs of MapReduce - Data Serialization.

UNIT- III HADOOP ARCHITECTURE: 9

Administering –Monitoring & Maintenance.

UNIT-IV HADOOP ECOSYSTEM AND YARN : 9 Hadoop

Availability, HDFS Federation, MRv2, YARN, Running MRv1 in YARN.

UNIT-V HIVE AND HIVEQL, HBASE: 9 Hive

Zookeeper and how to Build Applications with Zookeeper.

The latest research conducted in the areas covered in the course.

patents filed in the areas covered.

1. Boris lublinsky, Kevin t. Smith, Alexey Yakubovich, “Professional Hadoop Solutions”,

2. Wiley, ISBN: 9788126551071, 2015.

4. Tom White, “HADOOP: The definitive Guide” , O Reilly 2012.

7. Jy Liebowitz, “Big Data and Business analytics”,CRC press, 2013.

Common questions

What are the architectural components of Hadoop, and how do they contribute to its functionality?

What are the architectural components of Hadoop, and how do they contribute to its functionality?

What skill sets are necessary for becoming a proficient data scientist?

What skill sets are necessary for becoming a proficient data scientist?

What are the advantages of using Hadoop's distributed file system (HDFS) in managing big data operations?

What are the advantages of using Hadoop's distributed file system (HDFS) in managing big data operations?

In what ways are current research trends in big data and data science influencing industry practices?

In what ways are current research trends in big data and data science influencing industry practices?

How does feature generation and selection impact the effectiveness of data analytics applications?

How does feature generation and selection impact the effectiveness of data analytics applications?

In what ways do the Four V's of big data highlight its significance and challenges in data processing?

In what ways do the Four V's of big data highlight its significance and challenges in data processing?

Why are naive Bayes models considered effective for spam filtering compared to other machine learning algorithms such as linear regression and k-nearest neighbors?

Why are naive Bayes models considered effective for spam filtering compared to other machine learning algorithms such as linear regression and k-nearest neighbors?