0% found this document useful (0 votes)

2 views8 pages

R Programming

The document provides an overview of Big Data, including its evolution, modern technologies, and future trends such as AI and cloud computing. It outlines best practices for Big Data analytics, emphasizing the importance of clear goals, scalable technologies, data quality, appropriate storage, security, automation, and effective visualization. Additionally, it describes the characteristics of Big Data defined by the 5Vs: Volume, Velocity, Variety, Veracity, and Value, highlighting their significance in data processing and analysis.

Uploaded by

gpragav52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views8 pages

R Programming

Uploaded by

gpragav52

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

UNIT I : Introduction

Evolution of Big data — Best Practices for Big data Analytics —

Big data characteristics — Validating —The Promotion of the
Value of Big Data — Big DataUse Cases- Characteristics of Big
Data Applications —Perception and Quantification of Value –
Understanding Big Data Storage —A General Overview of High-
Performance Architecture—HDFS— Map Reduce and YARN—
MapReduce Programming Model

UNIT II: LISTS

Creating Lists, General List Operations, List Indexing Adding and

Deleting List Elements,Getting the Size of a List, Extended
Example: Text Concordance Accessing List Components and
Values Applying Functions to Lists, Data Frames, Creating Data
Frames, Accessing Data Frames, Other Matrix-LikeOperations

UNIT III: FACTORSANDTABLES

Factors and Levels, Common Functions Used with Factors,

Working with Tables,Matrix/Array-Like Operations on Tables,
Extracting a Sub table, Finding the Largest Cells in aTable,Math
Functions,Calculating a Probability, Cumulative Sums and
Products, Minima and Maxima, Calculus, Functions for Statistical
Distributions R PROGRAMMING

UNIT IV: OBJECT-ORIENTEDPROGRAMMINGSClasses

S Generic Functions, Writing S Classes, Using Inheritance, S

Classes, Writing S Classes, Implementing a Generic Function on
an S Class, visualization, Simulation, code profiling, Statistical
Analysis with R, data manipulation
UNIT-1
[Link] Evolution of Big Data Technologies

The Past: Beginning of Big Data

Traditional Databases
In the early days, data was stored using relational databases.
 Worked with structured data
 Limited in handling large data
Data Growth
With the rise of the internet in the 1990s, data increased
rapidly.
 Data came from websites and social media
 Mostly unstructured and complex
Emergence of Big Data
Around 2005, the term “Big Data” became popular.
 Traditional systems could not manage large data
 New technologies like Apache Hadoop were introduced

The Present: Modern Big Data Technologies

Today, Big Data technologies are widely used in many
industries. Organizations use different tools to manage and
analyze large amounts of data efficiently.
Apache Hadoop
Apache Hadoop is still a key technology in Big Data. It works by
dividing data across many systems.
 HDFS – stores large data across multiple machines
 MapReduce – processes data in parallel
 YARN – manages system resources
NoSQL Databases
Traditional databases are not suitable for all types of data.
NoSQL databases such as MongoDB, Cassandra, and Redis
are more flexible.
 No fixed structure (schema-less)
 Easy to scale horizontally
 High availability and reliability
Data Lakes
Data lakes can store all types of data—structured, semi-
structured, and unstructured. They are more flexible than
traditional data warehouses.
 Examples: Amazon S3, Azure Data Lake Storage
 They allow companies to store raw data for future
analysis.

The Future: Trends in Big Data

Big Data technologies will continue to grow and improve.
Several important trends will shape the future.
Artificial Intelligence (AI)
AI and machine learning help analyze large datasets
automatically. They find patterns and provide insights quickly.
Tools like TensorFlow and PyTorch are widely used in data
analysis.
Cloud Computing
Cloud platforms make Big Data more accessible. Companies
can store and process data without owning expensive
hardware.
 Popular platforms: Amazon Web Services (AWS), Google
Cloud Platform (GCP), Microsoft Azure
Edge Computing
With the rise of IoT devices, large amounts of data are
generated continuously. Edge computing processes data closer
to where it is created.
 Benefits: Faster processing, reduced delay, lower network
usage.
Explainable AI (XAI)
Explainable AI helps people understand how AI systems make
decisions. This improves trust and transparency.

[Link] practices for big data analytics

1. Define Clear Goals
 Identify Objectives: Decide on the main goal before
starting (e.g., increasing sales or reducing costs).
 Stay Focused: Clear goals help in selecting the right data
and tools.
 Filter Data: Avoid collecting unnecessary data that does
not serve your goal.
 Benefit: Saves significant time, effort, and resources.
2. Use Scalable Technologies
 Volume Handling: Big data requires tools that can
manage massive amounts of information.
 Key Tools: * Apache Hadoop: Used for storing and
processing large datasets.
o Apache Spark: Provides much faster data
processing.
 Benefit: These tools allow systems to grow smoothly as
data increases.
3. Ensure Data Quality
 Data Cleaning: Remove errors and duplicate entries.
 Complete Records: Handle missing or incomplete
information properly.
 Consistency: Maintain a uniform format and structure
across all records.
 Benefit: High-quality data leads to accurate results and
better business decisions.
4. Choose the Right Storage
 Data Lakes: Store raw and unstructured data in systems
like Amazon S3.
 Data Warehouses: Use Google BigQuery for structured
and analyzed data.
 Selection: Choose storage based on the specific data
type and how it will be used.
 Benefit: Proper storage improves both speed and
operational efficiency.
5. Implement Security and Governance
 Access Control: Allow data access only to authorized
users.
 Encryption: Protect sensitive data using modern security
methods.
 Compliance: Follow legal rules and regulations for data
usage.
 Benefit: Prevents data loss, misuse, and security
breaches.
6. Automate Workflows
 Automation Tools: Use platforms like Apache Airflow to
manage tasks.
 Scheduling: Set data processing jobs to run
automatically at specific times.
 Consistency: Reduce manual work and the risk of human
error.
 Benefit: Ensures smooth, consistent, and reliable
operations.
7. Focus on Insights and Visualization
 Visual Aids: Present results using clear charts, graphs,
and dashboards.
 Simplicity: Make complex data easy for all users to
understand.
 Actionable Data: Help decision-makers take quick,
informed actions.
 Benefit: Clear visualization improves how insights are
communicated and used.

3. Big Data Characteristics

Big Data refers to massive, complex datasets that exceed

the processing capabilities of traditional database
systems.
It is defined by the 5Vs—Volume, Velocity, Variety,
Veracity, anValue—which describe its scale, speed, and
diversity.
1. Volume (Amount of Data)
 Volume means the large size of data.
 Companies like Netflix and YouTube create huge amounts
of data every day.
 This data can be in petabytes (very large size).
 Tools like Apache Hadoop and Apache Spark are used to
handle this data.

2. Veracity (Data Quality)

 Veracity means how correct and reliable the data is.
 Big data may have errors, missing values, or wrong
information.
 For example, medical data must be very accurate. Data is
cleaned and checked to improve quality.

3. Velocity (Speed of Data)

 Velocity means the speed at which data is created and
processed.
 Data from social media, sensors, and banking comes very
fast.
 Tools like Apache Kafka help process data in real time. This
is useful for quick actions like fraud detection.

4. Variety (Different Types of Data)

Variety means different types of data. Data can be:
 Structured (tables, databases)
 Semi-structured (JSON, XML)
 Unstructured (text, images, videos) Example: Healthcare
data includes reports, images, and notes. Systems must
handle all these formats together.

5. Value (Usefulness of Data)

 Value means how useful the data is.
 Data is only helpful if it gives meaningful insights
 Companies use data to understand customers and
improve services. Without value, data is just useless
information.
Benefits
 Improved Decision Making
Helps organizations make better decisions using accurate
insights.
 Better Data Quality (Veracity)
Ensures data is reliable and reduces errors.
 Real-Time Processing (Velocity)
Enables fast analysis and quick actions.
 Handling Different Data Types (Variety)
Supports structured and unstructured data.
 Business Value (Value)
Helps companies improve performance and gain profit.

Unit-2

Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
10 pages
Fulafia Sta 212
No ratings yet
Fulafia Sta 212
42 pages
Case Study 2 Modified
No ratings yet
Case Study 2 Modified
6 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
89 pages
Overview of Big Data Analytics
No ratings yet
Overview of Big Data Analytics
17 pages
Big Data Overview and Key Insights
No ratings yet
Big Data Overview and Key Insights
12 pages
Big Data Overview and Analytics Guide
No ratings yet
Big Data Overview and Analytics Guide
16 pages
Understanding Big Data: Key Concepts & Trends
No ratings yet
Understanding Big Data: Key Concepts & Trends
20 pages
Big Data: Recommendation Engines Explained
No ratings yet
Big Data: Recommendation Engines Explained
36 pages
Big Data Analytics Overview and Techniques
No ratings yet
Big Data Analytics Overview and Techniques
61 pages
Fundamentals of Big Data
No ratings yet
Fundamentals of Big Data
7 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
4 pages
Business Process Management in Big Data
No ratings yet
Business Process Management in Big Data
28 pages
Da Using R
No ratings yet
Da Using R
7 pages
Big Data Analytics: Key Concepts & Practices
No ratings yet
Big Data Analytics: Key Concepts & Practices
19 pages
Navigating Big Data Challenges
No ratings yet
Navigating Big Data Challenges
21 pages
Big Data Overview and NoSQL Solutions
No ratings yet
Big Data Overview and NoSQL Solutions
36 pages
Understanding Big Data: Key Concepts & Tools
No ratings yet
Understanding Big Data: Key Concepts & Tools
5 pages
Big Data Analytics Study Material
No ratings yet
Big Data Analytics Study Material
110 pages
Big Data: Trends, Technologies & Applications
No ratings yet
Big Data: Trends, Technologies & Applications
33 pages
Understanding Big Data: Key Concepts & Practices
No ratings yet
Understanding Big Data: Key Concepts & Practices
29 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
54 pages
Big Data Analytics For 5th Sem PGDM Notes
No ratings yet
Big Data Analytics For 5th Sem PGDM Notes
25 pages
Comprehensive Guide to Big Data
No ratings yet
Comprehensive Guide to Big Data
10 pages
Bda U1 - 251111 - 170758
No ratings yet
Bda U1 - 251111 - 170758
22 pages
Big Data Fundamentals Overview
No ratings yet
Big Data Fundamentals Overview
45 pages
Introduction to Big Data Concepts
No ratings yet
Introduction to Big Data Concepts
21 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
20 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
36 pages
Understanding Big Data: Key Concepts
No ratings yet
Understanding Big Data: Key Concepts
31 pages
Bda Unit 1
No ratings yet
Bda Unit 1
309 pages
Understanding Big Data: Types & Challenges
No ratings yet
Understanding Big Data: Types & Challenges
8 pages
Unit 1
No ratings yet
Unit 1
9 pages
BIGDATA2
No ratings yet
BIGDATA2
66 pages
Bda 1,2
No ratings yet
Bda 1,2
42 pages
Understanding Big Data: Key Concepts & Uses
No ratings yet
Understanding Big Data: Key Concepts & Uses
33 pages
Big Data Analytics Lecture Notes
No ratings yet
Big Data Analytics Lecture Notes
119 pages
Big Data Technologies and NoSQL Overview
No ratings yet
Big Data Technologies and NoSQL Overview
5 pages
Understanding Big Data and Its Applications
No ratings yet
Understanding Big Data and Its Applications
19 pages
Big Data Analytics Course Overview
No ratings yet
Big Data Analytics Course Overview
17 pages
Module 5 Big Data and Data Analytics (Ebook)
No ratings yet
Module 5 Big Data and Data Analytics (Ebook)
37 pages
Understanding Big Data Types and Drivers
No ratings yet
Understanding Big Data Types and Drivers
33 pages
BIG DATA ANALYTICS DIGITALNOTES of Class
No ratings yet
BIG DATA ANALYTICS DIGITALNOTES of Class
75 pages
Understanding Big Data Fundamentals
No ratings yet
Understanding Big Data Fundamentals
4 pages
Big Data Concepts and Analytics Overview
No ratings yet
Big Data Concepts and Analytics Overview
6 pages
Big Data Evolution and Best Practices
No ratings yet
Big Data Evolution and Best Practices
13 pages
Data Management Evolution Overview
No ratings yet
Data Management Evolution Overview
13 pages
Understanding Big Data Characteristics
No ratings yet
Understanding Big Data Characteristics
50 pages
Unit-1 Bigdata
No ratings yet
Unit-1 Bigdata
16 pages
Big Data Analytics with Hadoop Overview
No ratings yet
Big Data Analytics with Hadoop Overview
10 pages
Overview of Big Data Analytics
No ratings yet
Overview of Big Data Analytics
134 pages
Big Data Analytics Overview and Evolution
No ratings yet
Big Data Analytics Overview and Evolution
21 pages
Big Data Overview: Key Concepts & Applications
No ratings yet
Big Data Overview: Key Concepts & Applications
10 pages
Big Data: Transforming Business Analytics
No ratings yet
Big Data: Transforming Business Analytics
17 pages
Types of Big Data Analytics Explained
No ratings yet
Types of Big Data Analytics Explained
21 pages
Saha 2015
No ratings yet
Saha 2015
13 pages
MapReduce Job Execution in Cloud
No ratings yet
MapReduce Job Execution in Cloud
73 pages
Understanding Pig Data Types
No ratings yet
Understanding Pig Data Types
16 pages
Big Data Analytics Lab Curriculum 2024
No ratings yet
Big Data Analytics Lab Curriculum 2024
46 pages
Introduction to Text Mining Techniques
No ratings yet
Introduction to Text Mining Techniques
45 pages
Da Unit 5
No ratings yet
Da Unit 5
52 pages
HDFS Limitations in Big Data Scenarios
No ratings yet
HDFS Limitations in Big Data Scenarios
24 pages
Introduction to Apache Pig Overview
No ratings yet
Introduction to Apache Pig Overview
58 pages
Improving Scalability of Prism Using Distributed Cloud Based Resourse-Aware Schedulers
No ratings yet
Improving Scalability of Prism Using Distributed Cloud Based Resourse-Aware Schedulers
6 pages
Real-Time Big Data Processing Solutions
No ratings yet
Real-Time Big Data Processing Solutions
22 pages
MapReduce for Big Data Matrix Multiplication
No ratings yet
MapReduce for Big Data Matrix Multiplication
6 pages
HDFS and MapReduce Commands Guide
No ratings yet
HDFS and MapReduce Commands Guide
91 pages
History and Components of Hadoop
No ratings yet
History and Components of Hadoop
127 pages
Cloudera Hadoop: A Comprehensive Guide
No ratings yet
Cloudera Hadoop: A Comprehensive Guide
65 pages
Key Questions for Big Data Analytics 2025
No ratings yet
Key Questions for Big Data Analytics 2025
2 pages
Big Data
No ratings yet
Big Data
19 pages
Hadoop MapReduce WordCount Tutorial
No ratings yet
Hadoop MapReduce WordCount Tutorial
3 pages
Big Data: Challenges and Opportunities
No ratings yet
Big Data: Challenges and Opportunities
18 pages
Introduction to Hadoop Framework
No ratings yet
Introduction to Hadoop Framework
6 pages
Siddharth Subramanian's Profile
No ratings yet
Siddharth Subramanian's Profile
2 pages
Hive Exam Answer Format Guide
No ratings yet
Hive Exam Answer Format Guide
53 pages
Big Data Analytics Overview and Practices
No ratings yet
Big Data Analytics Overview and Practices
27 pages
Big Data Overview and Importance
No ratings yet
Big Data Overview and Importance
23 pages
Hadoop Word Count Program Example
No ratings yet
Hadoop Word Count Program Example
4 pages
Benefits of Hadoop MapReduce Programming
No ratings yet
Benefits of Hadoop MapReduce Programming
3 pages
Understanding MapReduce in Hadoop Stack
No ratings yet
Understanding MapReduce in Hadoop Stack
48 pages
Database Scalability in Big Data Analytics
No ratings yet
Database Scalability in Big Data Analytics
25 pages
Helix: Cloud-Based Big Data Orchestration
No ratings yet
Helix: Cloud-Based Big Data Orchestration
8 pages
Cloud Computing for e-Science Applications
No ratings yet
Cloud Computing for e-Science Applications
310 pages

R Programming

Uploaded by

R Programming

Uploaded by

UNIT I : Introduction

Evolution of Big data — Best Practices for Big data Analytics —

UNIT II: LISTS

Creating Lists, General List Operations, List Indexing Adding and

UNIT III: FACTORSANDTABLES

Factors and Levels, Common Functions Used with Factors,

UNIT IV: OBJECT-ORIENTEDPROGRAMMINGSClasses

S Generic Functions, Writing S Classes, Using Inheritance, S

The Past: Beginning of Big Data

The Present: Modern Big Data Technologies

The Future: Trends in Big Data

[Link] practices for big data analytics

3. Big Data Characteristics

Big Data refers to massive, complex datasets that exceed

2. Veracity (Data Quality)

3. Velocity (Speed of Data)

4. Variety (Different Types of Data)

5. Value (Usefulness of Data)

You might also like