0% found this document useful (0 votes)

3 views49 pages

Essential R Programming For Data Analytics

The document provides an overview of R, a free programming language for statistical computing and data analysis, highlighting its benefits, applications, and installation steps. It covers essential data structures in R, such as vectors, lists, matrices, arrays, and data frames, as well as the basics of statistics and data analytics processes. Additionally, it emphasizes the importance of data visualization and outlines various types of analytics, including descriptive, diagnostic, predictive, and prescriptive analytics.

Uploaded by

debadboidex

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views49 pages

Essential R Programming For Data Analytics

Uploaded by

debadboidex

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Essential R

FOR DATA ANALYTICS

Compiled by

Isi .A. Edeoghon (PhD)

About R
R is a free, open-source programming language and software environment designed
specifically for statistical computing, data analysis, and graphical presentation.

It is widely used by data scientists, statisticians, and researchers in academia and

industry.
About R
i. The Comprehensive R Archive Network (CRAN) has thousands of user-created packages
that make R's functionality very flexible. Many people use popular packages like ggplot2
for making graphs and tidyverse for cleaning up data.

ii. Advanced Data Visualization: R is great at making high-quality, customizable charts,

graphs, and interactive dashboards, which is very important for sharing data insights.
Benefits of R
i. Cross-Platform Compatibility: It is compatible with the main operating systems,
such as Linux, macOS, and Windows.

ii. Active Community: Through forums and other projects, a sizable and vibrant
community provides support, tutorials, and documentation.
Applications of R
i. Statistical Modeling: Epidemiologists use statistical modeling from R to forecast trends and model
diseases.

ii. Finance and Economics: Used for stock trend forecasting, risk analysis, and portfolio management.

iii. Bioinformatics and Healthcare: Used in clinical trials and pharmaceutical research to analyze
genomic data.

iv. Data Visualization: R is used by data scientists to visualize information contained in data.

v. Machine learning: Utilized to create predictive models using randomForest and caret packages
How to install and use R (Step 1)
i. Go to the CRAN Website: [Link]

ii. Choose your OS: Select the link for Linux, macOS, or Windows.

iii. Windows users: Click "install R for the first time" and then "Download R for
Windows."
How to install and use R (Step 1)
Step 1 to install R [Link]
How to install and use R (Step 2)
Install RStudio (Highly Recommended):

While R comes with a basic interface, almost everyone uses RStudio (now part of Posit).

It makes writing code, viewing graphs, and managing files much easier.

Visit Posit: [Link]

Click the button under

2: Install RStudio Desktop.

"Install: Run the setup file and follow the instructions.

How to install and use R (Step 2)
[Link]
Starting with R
R is just like any normal programming language but of course with its own slight syntax differences.
We’ll run through some basics quickly before we get to Statistics and Data analytics
Creating a variable: This is as simple as choosing a name for a variable and using the R assignment
operator: <-
Texts are distinguished from numbers with quotation marks: “ “ or ‘ ‘
Example:
name <- “Simon”
age <- 12
>name #typing name at R command prompt
“Simon” #result of typing name at R command prompt (in blue)
>age
12
Numbers
R can be used as a simple calculator:
> 11 + 3 * 5
26
We can also print on the screen using the print() function
>print(name)
”Simon”
Data Structures
R has the following built in data structures:
1. Vectors
2. List
3. Matrix
4. Arrays
5. Data Frames
We’ll quickly go through all of them in turn
Vectors
We all know what a vector is: it’s a one-dimensional arrangement of data.
We declare a vector in with the c() function:
courses<- c(“CPE333”, “CPE305”, “GET301”)
“CPE333” , “CPE305”, “GET301”
ages< -c(21, 20, 19)
>ages
21, 20, 19
Vectors are the most common data structures in R
They can hold a mix of Strings and numbers:
student<- c(“amaka”, 12, “ENG2109999”)
>student
“amaka”, 12, “ENG2109999”
Lists
Similar to vectors
We declare a list in with the list() function:
courses<- list(‘’CPE333”, “CPE305”, “GET301”)
>courses
“CPE333” , “CPE305”, “GET301”
ages< - list(21, 20, 19)
>ages
21, 20, 19
They can also hold a mix of Strings and numbers:
student<- list(“amaka”, 12, “ENG2109999”)
>student
“amaka”, 12, “ENG2109999”
Matrices
Of course, a matrix is a two-dimensional data structure with the same data type for all
elements in it.
We declare a matrix in with the matrix() function:
Mymatrix<- matrix(c(10,20,30,40), nrow=2, ncol=2)
10 30
20 40
Where nrow stand for the dimensions of the row and ncol stand for the dimensions of
the column
Arrays
An Array is like a multidimensional matrix:
We declare it with the array() function:
myarray < - array(c(1:10), dim =c(3, 2, 2))
14
25
36

7 10
81
92
Data Frames
A Data Frame is used to store information in rows and columns. Where rows represent
observations and columns represent variables.
We create a data frame with the [Link]() function.
> dataframe<- [Link]( name=c('Simon', 'Praise', 'Ufuoma'), course =c('GET305',
'GET305', 'GET305'), score =c(71, 73, 72) )
>dataframe
name course score
1 Simon GET305 71
2 Praise GET305 73
3 Ufuoma GET305 72
Data Frames
Accessing Data: To access specific columns, we use the $ sign, to access rows or
specifics we use the [ ] symbol and indicate the point of interest using numbers as
positional selectors.
> dataframe$name
[1] "Simon" "Praise" "Ufuoma"
> dataframe[3,2]
[1] "GET305"
> dataframe[3,3]
[1] 72
Data Frames
We can add new rows to a data frame with the rbind() function

> rbind(dataframe, c("Brad", "GET305", 53) )

name course score
1 Simon GET305 71
2 Praise GET305 73
3 Ufuoma GET305 72
4 Brad GET305 53

Here we added a new student called “Brad” to our dataframe.

We can also add new columns with the cbind() function
Data Frames
We can use the c() function to remove rows and columns. This works alongside the index
selector box:
[]

> dataframe[-c(4), -c(4)]

name course score
1 Simon GET305 71
2 Praise GET305 73
3 Ufuoma GET305 72

Here we removed the row containing the new student called “Brad” we added last to our data
frame.
Data Frames
We can find out more about the data in a dataset by using the summary() function

> summary(dataframe)
name course score
Length:3 Length:3 Min. :71.0
Class :character Class :character 1st Qu.:71.5
Mode :character Mode :character Median :72.0
Mean :72.0
3rd Qu.:72.5
Max. :73.0
Data Frames
R has a number of built in data frames such as mtcars, iris
and airquality.
e.g. > mtcars
> iris mpg cyl disp hp drat wt qsec vs am gear carb
[Link] [Link] [Link] [Link] Species Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 5.1 3.5 1.4 0.2 setosa Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 4.9 3.0 1.4 0.2 setosa Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 4.7 3.2 1.3 0.2 setosa Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 4.6 3.1 1.5 0.2 setosa Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
5 5.0 3.6 1.4 0.2 setosa Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
6 5.4 3.9 1.7 0.4 setosa Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
7 4.6 3.4 1.4 0.3 setosa Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
8 5.0 3.4 1.5 0.2 setosa Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
9 4.4 2.9 1.4 0.2 setosa Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
4…………………………………………
10 4.9 3.1 1.5 0.1 setosa ……………………………………………
Statistics
This is the science of learning from data. It involves the collection, organization, analysis, interpretation, and
presentation of information to uncover patterns, make predictions, and handle uncertainty.

Statistics is of two types:

Descriptive statistics summarize dataset features without extending conclusions beyond the data.

Key measuring indices include central tendency (mean, median, mode) and dispersion (range, variance, standard
deviation), often visualized through charts like histograms and pie charts.

Inferential statistics generalize from samples to populations, involving hypothesis testing to assess statistical
significance, p-values for evidence strength, and regression analysis to explore variable relationships.
Data Analytics
Data Analytics is the practical application of statistical principles to discover actionable insights or influence
decision making.
There are four types of Data Analytics:
Descriptive Analytics involves analyzing historical data to identify trends, such as a retail store reviewing last
month's sales.
Diagnostic Analytics explores the reasons behind trends, like understanding a sales drop in a region. It uses
tools like correlation, regression or comparison.
Predictive Analytics forecasts future outcomes by utilizing past data, often through Machine Learning
techniques.
Prescriptive Analytics recommends actions to achieve goals
e.g. An AI system that automatically adjusts train ticket prices based on real-time demand and weather
patterns.
Data Analytics Types
Data Analytics Process (Life Cycle)
Data Analytics follows four main steps:

Definition: Defining the research question

Data Collection: Collecting raw information

Data Preprocessing: Cleaning the data to eliminate inaccuracies and duplicates

Data Analysis: Applying analytical models

Data Visualization: Visualizing the results using tools like ggplot2 for r, Power BI, Tableau, or
Matplotlib.
Data Analytics Process (Life Cycle)
Importance of Data Analytics
Data Analytics Process (Life Cycle)
Data Analytics follows four main steps:

Definition: Defining the research question

Data Collection: Collecting raw information

Data Preprocessing: Cleaning the data to eliminate inaccuracies and duplicates

Data Analysis: Applying analytical models

Data Visualization: Visualizing the results using tools like ggplot2 for r, Power BI, Tableau, or
Matplotlib.
Let’s not forget our inbuilt datasets
R has a number of built in data frames such as mtcars, iris
and airquality.
e.g. > mtcars
> iris mpg cyl disp hp drat wt qsec vs am gear carb
[Link] [Link] [Link] [Link] Species Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
1 5.1 3.5 1.4 0.2 setosa Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
2 4.9 3.0 1.4 0.2 setosa Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
3 4.7 3.2 1.3 0.2 setosa Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 4.6 3.1 1.5 0.2 setosa Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
5 5.0 3.6 1.4 0.2 setosa Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
6 5.4 3.9 1.7 0.4 setosa Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
7 4.6 3.4 1.4 0.3 setosa Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
8 5.0 3.4 1.5 0.2 setosa Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
9 4.4 2.9 1.4 0.2 setosa Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4
4…………………………………………
10 4.9 3.1 1.5 0.1 setosa ……………………………………………
Data Analytics with R
Let’s play around with the mtcars dataset using the following statistical functions we have previously learnt in r:

>?mtcars

>summary(mtcars)

>mtcars$column_name #you can try this with as many columns as you want

NOTE:

You can save an observation as a variable e.g. newvar<- mtcars$hp

Write down your observations for each process

Can you now tell more about this dataset?

What is a “dataset”?
Data visualization
The process of displaying data using visual components like graphs, charts, and maps is known as data
visualization.

It enables easier comprehension of large datasets, enabling the discovery of patterns and trends

that aid in improved decision-making.

Types of Data visualization in R

Barplot

>barplot(mtcars$hp,main ='Vehicle Horsepower',xlab ='horsepower', horiz = TRUE)

Plots a horizontal graph of vehicle horsepower for vertical graphs use: horiz= FALSE

Histogram

> hist(mtcars$gear,main ='Vehicle Gears',xlab ='gears', ylab='frequency', col='blue')

Data visualization
Boxplot
A boxplot depicts information like the minimum and maximum data point,
the median value, first and third quartile and interquartile range.

> boxplot(mtcars$gear,main ='Vehicle Gears',xlab ='gears',col='blue',

horiz=FALSE, notch=FALSE)

> Plots a vertical graph of vehicle gears for horizontal graphs use: horiz=
TRUE
Scatter Plot

Scatter plots are used to to demonstrate if there is a correlation between

bivariate data and to gauge the direction and strength of this kind of
interaction.

> plot(mtcars$gear,mtcars$am, main ='Vehicle Gears',xlab ='gears', Scatter Plot

ylab='Automatic or Manual', col='blue')
Data visualization
NOTE:
Scatter plots can draw dots together with a line just add the command type = “l” to the modifiers in the
plot() function
Exercise
Write a little on these other types of vizualisation in R:
i. Heat Maps
ii. 3D graphs
Apply them to the mtcars dataset
Write down your observations
Can you now tell more about this dataset?
Other Statistical Metrices in R
Mean

> mean(mtcars$gear)

[1] 3.6875

> min(mtcars$gear)

[1] 3

> max(mtcars$gear)

[1] 5
Visualization with ggplot2
ggplot2 is a system for declaratively creating graphics, based on The Grammar of Graphics.
You provide the data, tell ggplot2 how to map variables to aesthetics, what graphical primitives to use, and it takes
care of the details.
Installation
# The easiest way to get ggplot2 is to install the whole tidyverse:
>[Link]("tidyverse")
# Alternatively, install just ggplot2:
>[Link]("ggplot2")
After installation you should see something like this:
The downloaded binary packages are in
C:\Users\isied\AppData\Local\Temp\RtmpEh8qkN\downloaded_packages
Visualization with ggplot2
How to use ggplot2:
After installation use the following:
>library(ggplot2)
Let’s apply it to our mtcars dataset and observe what happens:

>ggplot(mtcars, aes(x = wt, y = mpg))+ geom_point()

We can see the plot on the right
We can take it further by enhancing the information presented on
the graph. Use this:
Visualization with ggplot2
We can take it further by enhancing the information presented on
the graph. Use this:
> ggplot(mtcars, aes(x = wt, y = mpg))+ geom_point()+
labs(title="MPG vs Weight", x = "Weight (1000 lbs)",
y = "Miles per Gallon")

We can see the plot on the right

Visualization with ggplot2
We can again take it further by enhancing the information
presented on the graph. Use this:
>ggplot(mtcars, aes(x = wt, y = mpg))+ geom_point()
+labs(title="MPG vs Weight", x = "Weight (1000 lbs)",
y = "Miles per Gallon")+
geom_smooth(method="lm", col ="blue")
We can plot a trend line using the
method=“lm” in the geom_smooth() method

We can see from the plot on the right, that there is a clear correlation between the weight
of the vehicles and the miles per gallon (mpg)
Multiple Linear Regression
Multiple Linear Regression in RMultiple linear regression models the linear
relationship between a continuous dependent variable and multiple independent
variables using the lm() function (linear model).

Let us apply Multiple Linear Regression to our mtcars dataset:

# Load the built-in mtcars dataset

>data(mtcars)
Multiple Linear Regression
# Model 'mpg' (miles per gallon) as a function of 'hp' (horsepower) and 'wt' (weight)

>model_regression <- lm(mpg ~ hp + wt, data = mtcars)

# View the summary of the model

>summary(model_regression)

The next page shows the output of this summary

Multiple Linear Regression
Call: wt -3.87783 0.63273 -6.129 1.12e-06 ***

lm(formula = mpg ~ hp + wt, data = data) Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residuals: Residual standard error: 2.593 on 29 degrees of freedom

Min 1Q Median 3Q Max Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148

-3.941 -1.600 -0.182 1.050 5.854 F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***

hp -0.03177 0.00903 -3.519 0.00145 **

Multiple Linear Regression
Exercise:
Explain the following from the preceding page:
i. Estimate Std
ii. Multiple R-squared
iii. Adjusted R-squared
iv. F-statistic
v. p-value
What do they tell us about our model on the mtcars dataset?
Multiple Linear Regression
# Create new data to predict on
new_data <- [Link](hp = c(100, 150), wt = c(2.5, 3.1)) # Predict mpg for new data
>predictions_regression <- predict(model_regression, new_data)
>print(predictions_regression)
OUTPUT
1 2
24.35540 20.44005
Decision Trees
This is a supervised learning method that acts like a flowchart.

It works by asking a series of binary (yes/no) questions to split your data into increasingly smaller, more
"pure" groups.

It is mostly used for classification tasks

The rpart and [Link] packages are commonly used.

Install rpart and [Link]

>[Link]("rpart")

>[Link]("[Link]")
Decision Trees
Let’s make use of theses libraries:

>library(rpart)

>library([Link])

Let’s try and see if we can predict if a car is automatic (am) based off the number of gears (gear)

>model_tree <- rpart(gear ~ am, data = mtcars, method = "class")

> # Plot the decision tree

> [Link](model_tree, [Link] = "auto", nn = TRUE)

Find output in the next page

Decision Trees
We can also make predictons on the data: OUTPUT
# Make predictions on the original data (or a separate test set)

>predictions_tree <- predict(model_tree, mtcars, type = "class")

>#View Predictions
>print(predictions_tree)

Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive

4 4 4 3
Hornet Sportabout Valiant Duster 360 Merc 240D
3 3 3 3
Merc 230 Merc 280 Merc 280C Merc 450SE
3 3 3 3
Merc 450SL Merc 450SLC Cadillac Fleetwood Lincoln
Continental
3 3 3 3 ……………………………..
Conclusion
In conclusion we have explored the basics of R as it applies to Data Analytics
Generate a confusion matrix for the predicted values vs actual from our decision tree model.
There are so many other machine learning algorithms:
Random Forest
Support Vector Machine (SVM)
Logistic Regression
i. Apply any or all of them to our mtcars dataset
ii. Write down your observations
iii. What are your insights from the dataset?
References
R Tutorial (w3schools) Accessed January 2026
[Link]
What is Data Analytics? (Geeks for Geeks) Accessed January 2026
[Link]
R documentation (CRAN) Accessed January 2026
[Link]
What Is Data Analytics? A Comprehensive Guide for Beginners Syracrause University,
Accessed January 2026
[Link]
Programiz online R compiler, Accessed January, 2026
[Link]

Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
10 pages
Getting Started with R Programming
No ratings yet
Getting Started with R Programming
34 pages
R Programming for Data Analysis
No ratings yet
R Programming for Data Analysis
24 pages
ANOVA Analysis with R Programming
No ratings yet
ANOVA Analysis with R Programming
32 pages
Data Science Techniques Using R
No ratings yet
Data Science Techniques Using R
38 pages
R Basics for Business Analytics
No ratings yet
R Basics for Business Analytics
7 pages
R Calculator and Data Analysis Guide
No ratings yet
R Calculator and Data Analysis Guide
26 pages
Set 1,2,3 Final - Newodt
No ratings yet
Set 1,2,3 Final - Newodt
10 pages
R Programming: Interfaces and Data Handling
No ratings yet
R Programming: Interfaces and Data Handling
22 pages
UNIT-1 Managing Data With R
No ratings yet
UNIT-1 Managing Data With R
26 pages
R Data Analysis Lab: Iris & mtcars Datasets
No ratings yet
R Data Analysis Lab: Iris & mtcars Datasets
3 pages
Data Science with R: mtcars Analysis
No ratings yet
Data Science with R: mtcars Analysis
11 pages
R Programming: Data Analysis & Visualization
No ratings yet
R Programming: Data Analysis & Visualization
36 pages
R Programming Unit 2
No ratings yet
R Programming Unit 2
9 pages
Computer Oriented Statistical Techniques
No ratings yet
Computer Oriented Statistical Techniques
29 pages
Big Data Analytics Methods in R
No ratings yet
Big Data Analytics Methods in R
57 pages
Module III
No ratings yet
Module III
31 pages
Graphics in R: Essential Packages and Plots
No ratings yet
Graphics in R: Essential Packages and Plots
51 pages
R Arrays, Data Frames, and Factors Guide
No ratings yet
R Arrays, Data Frames, and Factors Guide
23 pages
R Data Analytics and Visualization Guide
No ratings yet
R Data Analytics and Visualization Guide
28 pages
Assignment DADS301 MBA 3
No ratings yet
Assignment DADS301 MBA 3
17 pages
ISOM 670 - 1 Exploratory Data Analysis
No ratings yet
ISOM 670 - 1 Exploratory Data Analysis
48 pages
R Programming Basics and Data Structures
No ratings yet
R Programming Basics and Data Structures
14 pages
Correlation Matrix - A Quick Start Guide To Analyze, Format and Visualize A Corr
No ratings yet
Correlation Matrix - A Quick Start Guide To Analyze, Format and Visualize A Corr
6 pages
Descriptive Statistics Visualization R
No ratings yet
Descriptive Statistics Visualization R
14 pages
R Statistical Modelling Lab Manual
No ratings yet
R Statistical Modelling Lab Manual
23 pages
Exploratory Data Analysis in R
No ratings yet
Exploratory Data Analysis in R
31 pages
Bar Plots vs. Histograms in R
No ratings yet
Bar Plots vs. Histograms in R
8 pages
R Data Types: Variables, Factors, Tables
No ratings yet
R Data Types: Variables, Factors, Tables
48 pages
Understanding R Data Structures and Analysis
No ratings yet
Understanding R Data Structures and Analysis
10 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
29 pages
R Graphics for Data Analysis Techniques
No ratings yet
R Graphics for Data Analysis Techniques
16 pages
STA37W1 NOTES - Introduction To R
No ratings yet
STA37W1 NOTES - Introduction To R
40 pages
Introduction to Descriptive Statistics
No ratings yet
Introduction to Descriptive Statistics
90 pages
EAP Unit 1
No ratings yet
EAP Unit 1
12 pages
Data Manipulation in Watson Studio
No ratings yet
Data Manipulation in Watson Studio
58 pages
R Data Analysis Basics and Techniques
No ratings yet
R Data Analysis Basics and Techniques
78 pages
Unit - III (New)
No ratings yet
Unit - III (New)
29 pages
Data Transformation and Analysis in R
No ratings yet
Data Transformation and Analysis in R
36 pages
Matrix and Data Analysis Exercises
No ratings yet
Matrix and Data Analysis Exercises
3 pages
R Scripts for Data Analysis Techniques
No ratings yet
R Scripts for Data Analysis Techniques
3 pages
Exploratory Data Analysis Techniques
No ratings yet
Exploratory Data Analysis Techniques
64 pages
Machine Learning Lab Manual Guide
No ratings yet
Machine Learning Lab Manual Guide
7 pages
Data Presentation and Visualization in R
No ratings yet
Data Presentation and Visualization in R
18 pages
R and RStudio Beginner's Guide
No ratings yet
R and RStudio Beginner's Guide
27 pages
Descriptive Statistics with R: Data Analysis
100% (1)
Descriptive Statistics with R: Data Analysis
24 pages
R Graphing Basics: Types of Plots
No ratings yet
R Graphing Basics: Types of Plots
10 pages
R Programming Cheat Sheet Guide
No ratings yet
R Programming Cheat Sheet Guide
7 pages
Data Visualization with ggplot2 Guide
No ratings yet
Data Visualization with ggplot2 Guide
21 pages
R Basics for Data Analysis and Visualization
No ratings yet
R Basics for Data Analysis and Visualization
4 pages
R Scripts for Data Analysis in R
No ratings yet
R Scripts for Data Analysis in R
25 pages
Data Visualization Techniques in R
No ratings yet
Data Visualization Techniques in R
75 pages
Essential R Packages and Functions Guide
No ratings yet
Essential R Packages and Functions Guide
9 pages
Chapter 1
No ratings yet
Chapter 1
30 pages
Statistical Analysis Techniques in R
No ratings yet
Statistical Analysis Techniques in R
21 pages
Bangalore Residential Sale Agreement
No ratings yet
Bangalore Residential Sale Agreement
52 pages
Construction Law in GCC States
No ratings yet
Construction Law in GCC States
16 pages
Ebook Is Your Part Numbering Scheme Costing You Millions
No ratings yet
Ebook Is Your Part Numbering Scheme Costing You Millions
15 pages
e-Governance Systems: Features & Frameworks
No ratings yet
e-Governance Systems: Features & Frameworks
14 pages
Corruption and The Crisis of Democracy
No ratings yet
Corruption and The Crisis of Democracy
18 pages
Sugarcane Crusher: Types, Mechanism & Maintenance
No ratings yet
Sugarcane Crusher: Types, Mechanism & Maintenance
5 pages
Custom Enclosure Design for Motherboards
No ratings yet
Custom Enclosure Design for Motherboards
39 pages
Service Manual: Surround Amplifier
80% (5)
Service Manual: Surround Amplifier
30 pages
International Webinar on Tech in Education
No ratings yet
International Webinar on Tech in Education
3 pages
Customs Broker Exam School Performance 2014
No ratings yet
Customs Broker Exam School Performance 2014
3 pages
MICAT Sample Paper Overview
No ratings yet
MICAT Sample Paper Overview
13 pages
Ogk Sop 046 Inflating Tires
No ratings yet
Ogk Sop 046 Inflating Tires
2 pages
ASTM D3359 Adhesion Tape Test Methods
No ratings yet
ASTM D3359 Adhesion Tape Test Methods
8 pages
India NIRF 2025 Agriculture Rankings
No ratings yet
India NIRF 2025 Agriculture Rankings
25 pages
Laptop Rental Feasibility Study Cebu
No ratings yet
Laptop Rental Feasibility Study Cebu
83 pages
YU5R Antenna Rotator v2: by Goran Stankovic Dip - Ing.el. - YT2FSG - Date: 30.10.2019
100% (1)
YU5R Antenna Rotator v2: by Goran Stankovic Dip - Ing.el. - YT2FSG - Date: 30.10.2019
4 pages
A/L Private Application Form 2025
No ratings yet
A/L Private Application Form 2025
1 page
E-Waste Management SOP for Andhra Pradesh
No ratings yet
E-Waste Management SOP for Andhra Pradesh
5 pages
Roxul Comfortboard 80: Insulated Sheathing
No ratings yet
Roxul Comfortboard 80: Insulated Sheathing
1 page
SQL Joins, Sub-Queries, and Views Lab
No ratings yet
SQL Joins, Sub-Queries, and Views Lab
6 pages
Financial Reporting and IFRS Overview
No ratings yet
Financial Reporting and IFRS Overview
13 pages
Birmingham Doctoral Research Guide
No ratings yet
Birmingham Doctoral Research Guide
62 pages
Database Languages in DBMS Explained
No ratings yet
Database Languages in DBMS Explained
7 pages
Management of Natural Resources Introduction - Tutorvista PDF
No ratings yet
Management of Natural Resources Introduction - Tutorvista PDF
2 pages
Energy Efficiency Barriers in India
No ratings yet
Energy Efficiency Barriers in India
23 pages
Plan Regulador Transporte Huancayo
No ratings yet
Plan Regulador Transporte Huancayo
9 pages
Sims 4 Desync Error Report
No ratings yet
Sims 4 Desync Error Report
1 page
Venture Capital Landscape in Bangladesh
No ratings yet
Venture Capital Landscape in Bangladesh
3 pages
Introduction to Auditing Objectives
No ratings yet
Introduction to Auditing Objectives
3 pages
Airtel Black Monthly Bill Summary
No ratings yet
Airtel Black Monthly Bill Summary
6 pages

Essential R Programming For Data Analytics

Uploaded by

Essential R Programming For Data Analytics

Uploaded by

Essential R

FOR DATA ANALYTICS

Isi .A. Edeoghon (PhD)

It is widely used by data scientists, statisticians, and researchers in academia and

ii. Advanced Data Visualization: R is great at making high-quality, customizable charts,

Visit Posit: [Link]

Click the button under

2: Install RStudio Desktop.

"Install: Run the setup file and follow the instructions.

> rbind(dataframe, c("Brad", "GET305", 53) )

Here we added a new student called “Brad” to our dataframe.

> dataframe[-c(4), -c(4)]

Statistics is of two types:

Definition: Defining the research question

Data Collection: Collecting raw information

Data Preprocessing: Cleaning the data to eliminate inaccuracies and duplicates

Data Analysis: Applying analytical models

Definition: Defining the research question

Data Collection: Collecting raw information

Data Preprocessing: Cleaning the data to eliminate inaccuracies and duplicates

Data Analysis: Applying analytical models

You can save an observation as a variable e.g. newvar<- mtcars$hp

Write down your observations for each process

Can you now tell more about this dataset?

that aid in improved decision-making.

Types of Data visualization in R

>barplot(mtcars$hp,main ='Vehicle Horsepower',xlab ='horsepower', horiz = TRUE)

> hist(mtcars$gear,main ='Vehicle Gears',xlab ='gears', ylab='frequency', col='blue')

> boxplot(mtcars$gear,main ='Vehicle Gears',xlab ='gears',col='blue',

Scatter plots are used to to demonstrate if there is a correlation between

> plot(mtcars$gear,mtcars$am, main ='Vehicle Gears',xlab ='gears', Scatter Plot

>ggplot(mtcars, aes(x = wt, y = mpg))+ geom_point()

We can see the plot on the right

Let us apply Multiple Linear Regression to our mtcars dataset:

# Load the built-in mtcars dataset

>model_regression <- lm(mpg ~ hp + wt, data = mtcars)

# View the summary of the model

The next page shows the output of this summary

Residuals: Residual standard error: 2.593 on 29 degrees of freedom

Min 1Q Median 3Q Max Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148

Estimate Std. Error t value Pr(>|t|)

(Intercept) 37.22727 1.59879 23.285 < 2e-16 ***

hp -0.03177 0.00903 -3.519 0.00145 **

It is mostly used for classification tasks

The rpart and [Link] packages are commonly used.

Install rpart and [Link]

>model_tree <- rpart(gear ~ am, data = mtcars, method = "class")

> # Plot the decision tree

> [Link](model_tree, [Link] = "auto", nn = TRUE)

Find output in the next page

>predictions_tree <- predict(model_tree, mtcars, type = "class")

Mazda RX4 Mazda RX4 Wag Datsun 710 Hornet 4 Drive

You might also like