0% found this document useful (0 votes)

3 views26 pages

Mod4 New Rashid

Module 4 covers R programming basics for data analytics, focusing on data visualization with ggplot2, data transformation with dplyr, and exploratory data analysis. Key topics include creating graphs, using aesthetic mappings, and employing various dplyr functions such as filter(), arrange(), and summarize() for data manipulation. The module emphasizes the importance of understanding data patterns through visualization and the handling of outliers in datasets.

Uploaded by

alansaniljacob95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views26 pages

Mod4 New Rashid

Uploaded by

alansaniljacob95

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

MODULE 4

R PROGRAMMING BASICS FOR DATA ANALYTICS

SYLLABUS:
 R programming: basics
 Data visualization with ggplot2
 Data transformation with dplyr
 Exploratory data analysis in R
 Tidy data with tidyr
 Modelling

IMPORTANT QUESTIONS:

 Define ggplot2. What are the features provided by ggplot2? What are the problems faced while
using ggplot2 and how can we overcome them?
 Write the R code to import a .csv file, examine its contents and generate its descriptive statistics.
 With examples, illustrate how these R functions help in data analysis.
o filter()
o arrange()
o summarize()
o mutate()
o select()

SHADHA K|DEPT OF IT|MESCE 1

DATA VISUALIZATION WITH GGPLOT2

The simple graph can bring more information to the data analyst’s mind than any other device.
ggplot2 implements the grammar of graphics. Ggplot2 is a library for describing and building graphs.
A data frame is a rectangular collection of variables (in the columns) and observations (in the rows).

Installing and loading a Package

To install any package use the command [Link]() and to load that use library().
[Link](“ggplot2”)
library(ggplot2)

The mpg Data Frame

‘displ’ describes a car’s engine size, in liters. ‘hwy’, a car’s fuel efficiency on the highway, in miles per
gallon (mpg). To plot mpg, run this code to put displ on the x-axis and hwy on the y-axis we use several
functions under ggplot2.

SHADHA K|DEPT OF IT|MESCE 2

 ggplot():

 ggplot(data = mpg) creates an empty graph.

 geom_point() adds a layer of points to your plot, which creates a scatterplot. ggplot2 comes with
many geom functions that each add a different type of layer to a plot.
 Each geom function in ggplot2 takes a mapping argument. This defines how variables in your
dataset are mapped to visual properties.
 The mapping argument is always paired with aes(), and the x and y arguments of aes() specify
which variables to map to the x- and y-axes.
 ggplot2 looks for the mapped variable in the data argument, in this case, mpg.

A Graphing Template

Below shows a reusable template for making graphs with ggplot2. To make a graph, replace the
bracketed sections in the fol‐ lowing code with a dataset, a geom function, or a collection of mappings:

SHADHA K|DEPT OF IT|MESCE 3

Aesthetic Mappings

An aesthetic is a visual property of the objects in your plot. Aesthetic mappings describe how variables in
the data are mapped to visual properties (aesthetics) of geoms. Aesthetic mappings can be set in ggplot()
and in individual layers. In the following plot, one group of points (highlighted in red) seems to fall outside
of the linear trend. These cars have a higher mileage than you might expect.

Let’s hypothesize that the cars are hybrids. One way to test this hypothesis is to look at the class value
for each car. The class variable of the mpg dataset classifies cars into groups such as compact, midsize,
and SUV. If the outlying points are hybrids, they should be classified as compact cars or, perhaps,
subcompact cars. You can add a third variable, like class, to a two-dimensional scatterplot by mapping
it to an aesthetic.
Aesthetics include things like the size, the shape, or the color of your points. You can display a point in
different ways by changing the values of its aesthetic properties. We can map the colors of your points
to the class variable to reveal the class of each car:

SHADHA K|DEPT OF IT|MESCE 4

To map an aesthetic to a variable, associate the name of the aesthetic to the name of the variable inside
aes(). ggplot2 will automatically assign a unique level of the aesthetic to each unique value of the
variable, a process known as scaling. ggplot2 will also add a legend that explains which levels
correspond to which values.

Common Problems
 R is extremely picky, and a misplaced character can make all the difference. Make sure that every
( is matched with a ) and every " is paired with another ". Sometimes you’ll run the code and
nothing happens. Check the left-hand side of your console: if it’s a +, it means that R doesn’t
think you’ve typed a complete expression and it’s waiting for you to finish it. In this case, it’s
usually easy to start from scratch again by pressing Esc to abort processing the current command.
 One common problem when creating ggplot2 graphics is to put them in the wrong place: it has
to come at the end of the line, not the start.
Facets
The facet approach partitions a plot into a matrix of panels. Each panel shows a different subset of the
data.
There are two main functions for faceting :
◦ facet_grid()
◦ facet_wrap()
facet_wrap()
The first argument of facet_wrap() should be a formula, which you create with ~ followed by a variable
name. The variable that you pass to facet_wrap() should be discrete.

SHADHA K|DEPT OF IT|MESCE 5

facet_grid()
To facet the plot as the combination of two variables we use facet_grid(). The first argument of
facet_grid() is also a formula. This time the formula should contain two variable names separated by a ~.

Geometric Objects
A geom is the geometrical object that a plot uses to represent data. People often describe plots by the
type of geom that the plot uses. For example, bar charts use bar geoms, line charts use line geoms,
boxplots use boxplot geoms, and so on. Scatterplots use the point geom. We can use different geoms to
plot the same data.
 geom_smooth() will draw a different line, with a different linetype, for each unique value of the
variable that you map to linetype.

SHADHA K|DEPT OF IT|MESCE 6

 To display multiple geoms in the same plot, add multiple geom functions to ggplot():

 If you place mappings in a geom function, ggplot2 will treat them as local mappings for the layer. It
will use these mappings to extend or overwrite the global mappings for that layer only. This makes it
possible to display different aesthetics in different layers:

SHADHA K|DEPT OF IT|MESCE 7

Statistical Transformations

 geom_bar() : used to draw bar charts.

The diamonds dataset comes in ggplot2 and contains information about ~54,000 diamonds, including the
price, carat, color, clarity, and cut of each diamond. To plot bar graph:

Scatterplots, plot the raw values of the dataset. Bar charts, calculate new values to plot. Bar charts,
histograms, and frequency polygons bin your data and then plot bin counts, the number of points that fall
in each bin. Smoothers fit a model to your data and then plot predictions from the model. Boxplots
compute a robust summary of the distribution and display a specially formatted box. The algorithm used
to calculate new values for a graph is called a stat, short for statistical transformation.

geom_bar shows the default value for stat is “count,” which means that geom_bar() uses stat_count().

SHADHA K|DEPT OF IT|MESCE 8

Position Adjustments
You can color a bar chart using either the color aesthetic, or more usefully, fill:

If you map the fill aesthetic to another variable, like clarity: the bars are automatically stacked. Each
colored rectangle represents a combination of cut and clarity. The stacking is performed automatically by
the position adjustment specified by the position argument.

The stacking is performed automatically by the position adjustment specified by the position
[Link] options of position are:
◦ Identity
◦ dodge
◦ fill

SHADHA K|DEPT OF IT|MESCE 9

Identity
position = "identity" will place each object exactly where it falls in the context of the graph. This is not
very useful for bars, because it overlaps them. To see that overlapping we either need to make the bars
slightly transparent by setting alpha to a small value, or completely transparent by setting fill = NA. The
identity position adjustment is more useful for 2D geoms, like points, where it is the default.

Fill
position = "fill" works like stacking, but makes each set of stacked bars the same height. This makes it
easier to compare proportions across groups.

dodge
position = "dodge" places overlapping objects directly beside one another. This makes it easier to compare
individual values.

SHADHA K|DEPT OF IT|MESCE 10

 geom_ jitter()
The values of hwy and displ are rounded so the points appear on a grid and many points overlap each
other. This problem is known as overplotting. This arrangement makes it hard to see where the mass of
the data is. We can avoid this gridding by setting the position adjustment to “jitter.” position = "jitter" adds
a small amount of random noise to each point. This spreads the points out because no two points are likely
to receive the same amount of random noise. This makes graph less accurate at small scales, it makes your
graph more revealing at large scales.

SHADHA K|DEPT OF IT|MESCE 11

Coordinate Systems
Coordinate systems are probably the most complicated part of ggplot2. The default coordinate system is
the Cartesian coordinate system where the x and y position act independently to find the location of each
point.

 coord_flip()
This function switches the x- and y-axes. This is useful if you want horizontal boxplots. It’s also useful
for long label. It’s hard to get them to fit without overlapping on the x-axis.

 coord_quickmap()
The function sets the aspect ratio correctly for maps. This is very important if you’re plotting spatial data
with ggplot2.

 coord_polar()
The function uses polar coordinates. Polar coordinates reveal an interesting connection between a bar
chart and a Coxcomb chart.

DATA TRANSFORMATION WITH DPLYR

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most
common data manipulation challenges:

 mutate() adds new variables that are functions of existing variables

 select() picks variables based on their names.
 filter() picks cases based on their values.
 summarise() reduces multiple values down to a single summary.
 arrange() changes the ordering of the rows.

To explore the basic data manipulation verbs of dplyr, we’ll use nycflights13::flights. This data frame
contains all 336,776 flights that departed from New York City in 2013. The data comes from the US
Bureau of Transportation Statistics, and is documented in ?flights:
View(flights) : Shows whole dataset

SHADHA K|DEPT OF IT|MESCE 12

 filter()

The filter() is used to Filter rows in the dataset. The function allows you to subset observations based on
their values. dplyr executes the filtering operation and returns a new data frame. dplyr functions never
modify their inputs, so if you want to save the result, you’ll need to use the assignment operator, <-.

To use filtering effectively, select the observations using the comparison operators. R provides the standard
suite: >, >=, <=, != (not equal), and == (equal). When testing for equality use == .Logical Operators can
be used with the filter().A useful shorthand is x %in% y. This will select every row where x is one of the
values in y. filter() only includes rows where the condition is TRUE; it excludes both FALSE and NA
values.

 arrange()

This function Change the order of rows. It takes a data frame and a set of column names to order by. If
you provide more than one column name, each additional column will be used to break ties in the values
of preceding columns.

SHADHA K|DEPT OF IT|MESCE 13

Use desc() to reorder by a column in descending order.

Missing values are always sorted at the end:

 select()

The function select columns. The function is used to narrow the data set. It allows to rapidly zoom in on
a useful subset using operations based on the names of the variables.

SHADHA K|DEPT OF IT|MESCE 14

There are a number of helper functions you can use within select():
 starts_with("abc") matches names that begin with “abc”.
 ends_with("xyz") matches names that end with “xyz”.
 contains("ijk") matches names that contain “ijk”.
 matches("(.)\\1") selects variables that match a regular expression.
 num_range("x", 1:3) matches x1, x2, and x3.

select() can be used to rename variables, but it’s rarely useful because it drops all of the variables not
explicitly mentioned. Instead, use rename(), which is a variant of select() that keeps all the variables that
aren’t explicitly mentioned.

 mutate()

The function add new variables to data frame. It add new columns that are functions of existing columns.
The newly created columns are added at the end of your dataset.

SHADHA K|DEPT OF IT|MESCE 15

The various functions can be used with mutate().
1. Arithmetic operators +, -, *, /, ^
2. Modular arithmetic (%/% and %%)
3. Logs log(), log2(), log10()
4. Offsets
◦ lead()
◦ lag()
5. Cumulative and rolling aggregates.
◦ cumsum()
◦ cumprod()
◦ cummin()
◦ cummax()
◦ cummean()
6. Logical comparisons <=, >, >=, !=
7. Ranking
◦ min_rank()
◦ desc(x)

SHADHA K|DEPT OF IT|MESCE 16

◦ row_number()
◦ dense_rank()
◦ percent_rank()

transmute()keep only the new variables in the data set.

 summarize()
The function collapses a data frame to a single row.

Useful Summary Functions are :

 means, counts, and sum
 Measures of location
o Median()
 Measures of spread sd(x), IQR(x), mad(x)
 Measures of rank min(x), quantile(x, 0.25), max(x)
 Measures of position first(x), nth(x, 2), last(x)
 Counts
o sum(![Link](x)) - To count the number of non-missing values
o n_distinct(x) - To count the number of distinct (unique) values
o n() - returns the size of the current group.

SHADHA K|DEPT OF IT|MESCE 17

EXPLORATORY DATA ANALYSIS

EDA is to develop an understanding of your data. EDA is fundamentally a creative process. And like most
creative processes, the key to asking quality questions is to generate a large quantity of questions.

Variation
Variation is the tendency of the values of a variable to change from measurement to measurement. Every
variable has its own pattern of variation, which can reveal interesting information. The best way to
understand that pattern is to visualize the distribution of variables’ values. If you measure any continuous
variable twice, you will get two different results. If you measure quantities that are constant, like the speed
of light.

Visualizing Distributions

A variable is categorical if it can only take one of a small set of values. In R, categorical variables are
usually saved as factors or character vectors. We use bar plots to visualize categorical values.

SHADHA K|DEPT OF IT|MESCE 18

A variable is continuous if it can take any of an infinite set of ordered values. Numbers and date-times
are two examples of continuous variables. We use a histogram to plot continuous values.

Typical Values
In both bar charts and histograms, tall bars show the common values of a variable, and shorter bars show
less-common values. Places that do not have bars reveal values that were not seen in your data.

Unusual Values
Outliers are observations that are unusual. Outliers are the Data points that don’t seem to fit the pattern.
Sometimes outliers are data entry errors. Other times outliers suggest important new science. When you
have a lot of data, outliers are sometimes difficult to see in a histogram.
If there are unusual values in your dataset we have two options:
 Drop the entire row with strange values.

 Replace the unusual values with missing values.

The easiest way to do this is to use mutate() to replace the variable with a modified copy. Use the ifelse()
function to replace unusual values with NA.

SHADHA K|DEPT OF IT|MESCE 19

TIDY DATA WITH tidyr

A consistent way to organize your data in R, an organization called tidy data. We can represent the same
underlying data in multiple ways. dplyr, ggplot2, and all the other packages in the tidyverse are designed
to work with tidy data. One dataset, the tidy dataset, will be much easier to work with inside the tidyverse.

There are three interrelated rules which make a dataset tidy:

1. Each variable must have its own column.
2. Each observation must have its own row.
3. Each value must have its own cell.

These three rules are interrelated because it’s impossible to only sat‐ isfy two of the three. That
interrelationship leads to an even simpler set of practical instructions:
1. Put each dataset in a tibble.
2. Put each variable in a column.

Advantages of tidy data

 If you have a consistent data structure, it’s easier to learn the tools that work with it because they
have an underlying uniformity.
 There’s a specific advantage to placing variables in columns because it allows R’s vectorized
nature. Most built-in R functions work with vectors of values. That makes transforming tidy data
feel particularly natural.

Spreading and Gathering

Most data that you will encounter will be untidy. There are two main reasons:
 Most people aren’t familiar with the principles of tidy data, and it’s hard to derive them yourself
unless you spend a lot of time working with data.
 Data is often organized to facilitate some use other than analysis. For example, data is often
organized to make entry as easy as possible.

This means for most real analyses, you’ll need to do some tidying. The first step is always to figure out
what the variables and observations are. Sometimes this is easy; other times you’ll need to consult with
the people who originally generated the data. The second step is to resolve one of two common problems:
 One variable might be spread across multiple columns.
SHADHA K|DEPT OF IT|MESCE 20
 One observation might be scattered across multiple rows.
The different functuions in tidyr: gather() and spread() can be used to solve the issue. gather() makes wide
tables narrower and longer; spread() makes long tables shorter and wider.

Gathering – gather()
A common problem is a dataset where some of the column names are not names of variables, but values
of a variable. To tidy a dataset like this, we need to gather those columns into a new pair of variables.
Consider the below table where the columns are the values of the attribute year.

To describe that operation we need three parameters:

 The set of columns that represent values, not variables. In this example, those are the columns
1999 and 2000.
 The name of the variable whose values form the column names. That is the key, and here it is
year.
 The name of the variable whose values are spread over the cells. That is value, and here it’s the
number of cases.

SHADHA K|DEPT OF IT|MESCE 21

Spreading – spread()

Spreading is the opposite of gathering. You use it when an observation is scattered across multiple rows.

To tidy this up, we first analyze the representation.

 The column that contains variable names, the key column. Here, it’s type.
 The column that contains values forms multiple variables, the value column. Here, it’s count.

SHADHA K|DEPT OF IT|MESCE 22

Separate()

Separate() pulls apart one column into multiple columns, by splitting wherever a separator character
appears. In the example: The rate column contains both cases and population variables, and we need to
split it into two variables. separate() takes the name of the column to separate.

SHADHA K|DEPT OF IT|MESCE 23

Unite()
unite() is the inverse of separate(): it combines multiple columns into a single column. Can use unite() to
rejoin the century and year columns in example. unite() takes a data frame, the name of the new variable
to create, and a set of columns to combine.

Missing Values
Changing the representation of a dataset brings up an important subtlety of missing values. A value can
be missing in one of two possible ways:
 Explicitly, i.e., flagged with NA.
 Implicitly, i.e., simply not present in the data.

SHADHA K|DEPT OF IT|MESCE 24

An explicit missing value is the presence of an absence; an implicit missing value is the absence of a
presence. For example, we can make the implicit missing value explicit by putting years in the columns:

The explicit missing values may not be important in other representations of the data, you can set [Link]
= TRUE in gather() to turn explicit missing values implicit:

complete()
Complete() is used for making missing values explicit in tidy data. Complete() takes a set of columns,
and finds all unique combinations. It then ensures the original dataset contains all those values, filling in
explicit NAs where necessary.

SHADHA K|DEPT OF IT|MESCE 25

fill()
fill() can be used to fill in the missing values. It takes a set of columns where you want missing values
to be replaced by the most recent non-missing value (sometimes called last observation carried forward).

Fill() will complete the values as below:

SHADHA K|DEPT OF IT|MESCE 26

R Programming: Data Visualization with ggplot2
No ratings yet
R Programming: Data Visualization with ggplot2
103 pages
Mastering Data Visualization with ggplot2
No ratings yet
Mastering Data Visualization with ggplot2
19 pages
ggplot2: R Data Visualization Guide
No ratings yet
ggplot2: R Data Visualization Guide
45 pages
Data Visualization Techniques in R
No ratings yet
Data Visualization Techniques in R
36 pages
ggplot2: Defining Aesthetic Mappings
No ratings yet
ggplot2: Defining Aesthetic Mappings
50 pages
Unit 5 Dav
No ratings yet
Unit 5 Dav
37 pages
ggplot2 Plot Types and Guide
No ratings yet
ggplot2 Plot Types and Guide
30 pages
Ggplot2 Basics: Data Visualization Guide
No ratings yet
Ggplot2 Basics: Data Visualization Guide
31 pages
Analyzing Fuel Economy with ggplot2
No ratings yet
Analyzing Fuel Economy with ggplot2
236 pages
R Data Visualization: Base & ggplot2
No ratings yet
R Data Visualization: Base & ggplot2
44 pages
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
No ratings yet
Introduction To Ggplot2: Saier (Vivien) Ye September 16, 2013
32 pages
Graphing Tidy Data with R and ggplot2
No ratings yet
Graphing Tidy Data with R and ggplot2
15 pages
ggplot2 Data Visualization Guide
No ratings yet
ggplot2 Data Visualization Guide
80 pages
Data Visualization with ggplot2 Guide
No ratings yet
Data Visualization with ggplot2 Guide
21 pages
Mastering ggplot2 for R Visualizations
No ratings yet
Mastering ggplot2 for R Visualizations
8 pages
Ggplot 2
No ratings yet
Ggplot 2
24 pages
ggplot2 Basics for Data Visualization
No ratings yet
ggplot2 Basics for Data Visualization
68 pages
Graphics in R: Essential Packages and Plots
No ratings yet
Graphics in R: Essential Packages and Plots
51 pages
Stat 202-0 Notes
No ratings yet
Stat 202-0 Notes
10 pages
Data Visualization with ggplot2 in R
No ratings yet
Data Visualization with ggplot2 in R
31 pages
Experiment # 9
No ratings yet
Experiment # 9
15 pages
Bad Chart Examples in Data Visualization
No ratings yet
Bad Chart Examples in Data Visualization
70 pages
ggplot2 Data Visualization Guide
No ratings yet
ggplot2 Data Visualization Guide
19 pages
ggplot2 Data Visualization Techniques
No ratings yet
ggplot2 Data Visualization Techniques
15 pages
ggplot2 Visualization Cheat Sheet
No ratings yet
ggplot2 Visualization Cheat Sheet
1 page
Ggplot2 Visualization Guide
No ratings yet
Ggplot2 Visualization Guide
9 pages
Data Visualization with R: ggplot2 & More
No ratings yet
Data Visualization with R: ggplot2 & More
10 pages
Unit3 Notes
No ratings yet
Unit3 Notes
22 pages
Effective Graphics for Data Communication
No ratings yet
Effective Graphics for Data Communication
40 pages
Data Visualization with ggplot2 Guide
No ratings yet
Data Visualization with ggplot2 Guide
26 pages
Visualizing Data in R 4: Graphics Using The Base, Graphics, Stats, and Ggplot2 Packages 1st Edition Margot Tollefson Ebook Extended Edition
100% (1)
Visualizing Data in R 4: Graphics Using The Base, Graphics, Stats, and Ggplot2 Packages 1st Edition Margot Tollefson Ebook Extended Edition
116 pages
Data Visualization Techniques in R
No ratings yet
Data Visualization Techniques in R
57 pages
Data Visualization with ggplot2
No ratings yet
Data Visualization with ggplot2
22 pages
R Data Visualization with ggplot2 Guide
No ratings yet
R Data Visualization with ggplot2 Guide
40 pages
R Graphics for Data Analysis Techniques
No ratings yet
R Graphics for Data Analysis Techniques
16 pages
Module 4 - DAR
No ratings yet
Module 4 - DAR
32 pages
Understanding ggplot2 Graphics Grammar
No ratings yet
Understanding ggplot2 Graphics Grammar
14 pages
Graphics Techniques in R for EDA
No ratings yet
Graphics Techniques in R for EDA
23 pages
Data Analytics: Exploratory Analysis in R
No ratings yet
Data Analytics: Exploratory Analysis in R
84 pages
Data Presentation and Visualization in R
No ratings yet
Data Presentation and Visualization in R
18 pages
Introduction to ggplot2 in R
No ratings yet
Introduction to ggplot2 in R
3 pages
Data Visualization with ggplot2 in R
No ratings yet
Data Visualization with ggplot2 in R
34 pages
Data Visualization Techniques in R
No ratings yet
Data Visualization Techniques in R
7 pages
Data Visualization with ggplot2 Guide
No ratings yet
Data Visualization with ggplot2 Guide
44 pages
R Data Visualization Techniques
100% (1)
R Data Visualization Techniques
5 pages
Daily High Temperatures Histogram
No ratings yet
Daily High Temperatures Histogram
54 pages
DVP Unit 5
No ratings yet
DVP Unit 5
16 pages
Theory DAV Practical 8
No ratings yet
Theory DAV Practical 8
1 page
Data Science Visualization with R
No ratings yet
Data Science Visualization with R
45 pages
Creating Figures with ggplot2
No ratings yet
Creating Figures with ggplot2
58 pages
Data Visualization Techniques in R
100% (1)
Data Visualization Techniques in R
20 pages
223mba5186 Unit 4 SLM
No ratings yet
223mba5186 Unit 4 SLM
12 pages
Mastering ggplot2 for Data Visualization
No ratings yet
Mastering ggplot2 for Data Visualization
108 pages
ggplot2 Visualization Techniques Guide
No ratings yet
ggplot2 Visualization Techniques Guide
3 pages
R DataVisCheatSheet1 230921 133037
No ratings yet
R DataVisCheatSheet1 230921 133037
15 pages
Enhancing ggplot2 Visualizations
No ratings yet
Enhancing ggplot2 Visualizations
59 pages
ggplot2 Geoms Cheat Sheet
No ratings yet
ggplot2 Geoms Cheat Sheet
2 pages
ML Lab Manual 6 To 10 Updated
No ratings yet
ML Lab Manual 6 To 10 Updated
3 pages
Extreme Networks: Spectralink VIEW Certified Configuration Guide
No ratings yet
Extreme Networks: Spectralink VIEW Certified Configuration Guide
27 pages
Viral Indian Sex Video Trends 2025
No ratings yet
Viral Indian Sex Video Trends 2025
4 pages
Multi-Gig PoE Switches for Business Solutions
No ratings yet
Multi-Gig PoE Switches for Business Solutions
18 pages
Sony Sound Bar
No ratings yet
Sony Sound Bar
36 pages
Course Introduction: This Chapter Includes The Following Topics: Course Agenda Lab Topology Overview
No ratings yet
Course Introduction: This Chapter Includes The Following Topics: Course Agenda Lab Topology Overview
6 pages
How To Build A Security Operations Center (On A Budget)
100% (2)
How To Build A Security Operations Center (On A Budget)
43 pages
OLI Flowsheet 9.6 User Guide PDF
100% (1)
OLI Flowsheet 9.6 User Guide PDF
189 pages
Quick Pick TOTO: $1 Game Overview
No ratings yet
Quick Pick TOTO: $1 Game Overview
5 pages
RHCPh Assistant Professor Exam Key
No ratings yet
RHCPh Assistant Professor Exam Key
19 pages
Data Warehousing and Mining Exam Paper
No ratings yet
Data Warehousing and Mining Exam Paper
15 pages
Virtual Offshore Captive Center Model
No ratings yet
Virtual Offshore Captive Center Model
29 pages
ASD Detection via Efficient-Net & VGG19
No ratings yet
ASD Detection via Efficient-Net & VGG19
7 pages
Sophos UTM Software Appliance: Quick Start Guide
No ratings yet
Sophos UTM Software Appliance: Quick Start Guide
7 pages
Educational Resources in Kerala's Akshaya Project
0% (1)
Educational Resources in Kerala's Akshaya Project
13 pages
English Exam Paper 401 - 2023
No ratings yet
English Exam Paper 401 - 2023
6 pages
Culvert Studio Methodologies Overview
No ratings yet
Culvert Studio Methodologies Overview
26 pages
Deep Learning for Anal Melanoma Detection
No ratings yet
Deep Learning for Anal Melanoma Detection
25 pages
Siemens Battery Charger Training Manual
No ratings yet
Siemens Battery Charger Training Manual
53 pages
C++ Data Structures Test Bank
No ratings yet
C++ Data Structures Test Bank
15 pages
Overview of Apache Spark and RDDs
100% (1)
Overview of Apache Spark and RDDs
109 pages
PCB Programming Crash Course Guide
No ratings yet
PCB Programming Crash Course Guide
4 pages
APMOPS & HCIC App User Guide
No ratings yet
APMOPS & HCIC App User Guide
13 pages
Detecting Inter-Domain Routing Lies
No ratings yet
Detecting Inter-Domain Routing Lies
8 pages
Math 374 Probability and Statistics Exam
No ratings yet
Math 374 Probability and Statistics Exam
10 pages
Monte Carlo Methods in Stochastic Programming
No ratings yet
Monte Carlo Methods in Stochastic Programming
24 pages
Best Practices for Mobile App Development
No ratings yet
Best Practices for Mobile App Development
19 pages
TeamViewer: Features and Benefits Overview
No ratings yet
TeamViewer: Features and Benefits Overview
22 pages
Solutions for Diagonal Dominance in Clustering
No ratings yet
Solutions for Diagonal Dominance in Clustering
9 pages
Pioneer Deh-P600ub Deh-P7150ub Crt4090
No ratings yet
Pioneer Deh-P600ub Deh-P7150ub Crt4090
78 pages

Mod4 New Rashid

Uploaded by

Mod4 New Rashid

Uploaded by

MODULE 4

R PROGRAMMING BASICS FOR DATA ANALYTICS

SHADHA K|DEPT OF IT|MESCE 1

Installing and loading a Package

The mpg Data Frame

SHADHA K|DEPT OF IT|MESCE 2

 ggplot(data = mpg) creates an empty graph.

SHADHA K|DEPT OF IT|MESCE 3

SHADHA K|DEPT OF IT|MESCE 4

SHADHA K|DEPT OF IT|MESCE 5

SHADHA K|DEPT OF IT|MESCE 6

SHADHA K|DEPT OF IT|MESCE 7

 geom_bar() : used to draw bar charts.

SHADHA K|DEPT OF IT|MESCE 8

SHADHA K|DEPT OF IT|MESCE 9

SHADHA K|DEPT OF IT|MESCE 10

SHADHA K|DEPT OF IT|MESCE 11

DATA TRANSFORMATION WITH DPLYR

 mutate() adds new variables that are functions of existing variables

SHADHA K|DEPT OF IT|MESCE 12

SHADHA K|DEPT OF IT|MESCE 13

Missing values are always sorted at the end:

SHADHA K|DEPT OF IT|MESCE 14

SHADHA K|DEPT OF IT|MESCE 15

SHADHA K|DEPT OF IT|MESCE 16

transmute()keep only the new variables in the data set.

Useful Summary Functions are :

SHADHA K|DEPT OF IT|MESCE 17

SHADHA K|DEPT OF IT|MESCE 18

 Replace the unusual values with missing values.

SHADHA K|DEPT OF IT|MESCE 19

There are three interrelated rules which make a dataset tidy:

Advantages of tidy data

Spreading and Gathering

To describe that operation we need three parameters:

SHADHA K|DEPT OF IT|MESCE 21

To tidy this up, we first analyze the representation.

SHADHA K|DEPT OF IT|MESCE 22

SHADHA K|DEPT OF IT|MESCE 23

SHADHA K|DEPT OF IT|MESCE 24

SHADHA K|DEPT OF IT|MESCE 25

Fill() will complete the values as below:

SHADHA K|DEPT OF IT|MESCE 26

You might also like