R Data Structures
R has many data structures. These include
vector
matrix
list
data frame
factors
Vector
Vector is a basic data object in R. It contains element of the same type. The data types can be
logical, integer, double, character, complex or raw. A vector’s type can be checked with the
typeof() function.
R provides many functions to examine features of vectors and other objects, for example
class() - what kind of object is it (high-level)?
typeof() - what is the object’s data type (low-level)?
length() - how long is it? What about two dimensional objects?
attributes() - does it have any metadata?
Vectors are one-dimension arrays that can hold numeric data, character data, or logical data. In
other words, a vector is a simple tool to store data.
numeric_vector <- c(1, 10, 49)
character_vector <- c("a", "b", "c")
boolean_vector <-c(TRUE,FALSE,TRUE)
Matrix
The following is a simple way to create a new matrix with numbers you input:
matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9), nrow = 3)
You can also take a mathematical vector and turn it into a matrix:
vector1=1:9
matrix(vector1, nrow = 3)
Let us find out how we can create a matrix of zeros in R:
zero_matrix <- matrix(rep(0, 9),
ncol = 3,
nrow = 3)
zero_matrix
Adding, subtracting, multiplying, and dividing matrices works simply by performing the action on
each element of the matrix:
matrix1 = matrix(vector1, nrow = 3)
matrix2=matrix1 + 2
matrix2
Adding and Subtracting two matrices works only when the matrices are of the same dimensions
(same number of columns and rows) and the matrices are added element by element in the
same way as [Link] let’s first check out the dimensions of the matrix2 using the dim().
dim(matrix2)
matrix2 - matrix1 # Subtraction of two matrices
Element-by-element multiplication can be performed by simply multiplying the individual
elements in each matrix :
matrix1 * matrix2
It is also very easy to take the transpose of a matrix.
t(matrix1)
Functions rbind() and cbind() allow you to join matrices of appropriate dimensions (equal number
of columns or rows, respectively) by rows or columns:
rbind(matrix1,matrix2)
cbind(matrix1,matrix2)
To use the indexing of the matrix to access its elements, type the matrix name followed by the
elements you want in brackets. For example:
matrix1[2, 3]
To remove the third column from the matrix1 one would use the following command:
matrix1[,-3]
If you want to change the actual values of the data just access the part of the matrix you want to
change and use the = operator as follows:
matrix1[2, 2] = 10
matrix1
Lists
While vectors and matrices/arrays can hold only data of a single type, lists and data frames can
combine data of different types.
x<-list(1,2,5.6,8,4)
x
y=list(3:10)
y
z=list("Apple","Orange","Grapes","Jackfruit")
z
The list is the most flexible data type: its components can be vectors and arrays holding any type
of data, and even lists;
x <- list(1:5, TRUE, "Fruits", list(1:3, 5))
x
Here x has 4 elements: a numeric vector, a logical, a string and another list. We can select an
entry of x with double square brackets:
x[[2]]
course <- list(name="Introduction to AI and Data
Science",level=c("masters","PhD"),
date="August 1 to December 31",maxnumber=25)
course
In order to know what kind of information is included in a list we can look at the names() function
names(course)
Components of a list can be selected both by their names (using the $ sign) and by the number
of their position. With a list we can always add more information to it.
course$No_of_Hours <-45
course
In order to delete an element from a list we set it to NULL.
course$level<- NULL
course
The unlist() function is used to convert a list to vector in R. The unlist() function takes the list as
an argument and returns the Vector.
x<-list(1,2,3,4,5)
x1<-unlist(x)
x1
DataFrame
A dataframe is a list containing multiple named vectors of same length – similar to a
spreadsheet. It is a two-dimensional table of data, like a matrix, but each of its columns can hold
a different type of data. The typical use of a data frame is to collect data from a real experiment,
with each row holding the data from one experiment , and the columns holding the different types
of data recorded for each experiment.
First,Let us find out how to create a dataframe.
[Link] <- c("John","Ravi","Athira","Mathew","Suhana")
[Link] <- c("Paul","Krishnan","V","Joseph","Ibrahim")
gender <- c("m","m","f","m","f")
age <- c(25,47,11,76,34)
df <- [Link](Firstname = [Link], Surname = [Link],
Age=age, Sex=gender)
df
df$Firstname
You can change the column names by using the function names(). For example:
names(df)[2] = "Family_name"
df
df[df$Sex=="m" & age>20,]
An alternative way to declare the dataframe
patient_info <- [Link](
name=c("Jones","Smith","Reeba"),
gender=c("M","M","F"),
age=c(47,38,52),
height=c(5.9,5.7,5.4)
)
patient_info
Factors
In R programming, factors are used to represent categorical data. Categorical data consists of
distinct categories or levels, such as colors, gender, or education levels. Factors in R allow you
to efficiently represent and work with categorical data, providing benefits like improved memory
usage and enhanced data analysis capabilities. Let’s go through an example of how to create
and work with factors in R:
We create a vector gender_vector containing categorical data representing gender.
gender_vector <- c("Male", "Female", "Male", "Female", "Male", "Female")
factor() function convert the vector into a factor.
gender_factor <- factor(gender_vector)
gender_factor
We print the gender_factor to see the factor representation. It displays the levels of the factor
for each element in the original vector.
levels() function is used in to retrieve the levels of the factor.
table() function is used to calculate the frequency of each level (category) in the factor.
R automatically identifies the unique values (levels) in the vector and assigns them as levels to
the factor.
Check the levels (categories) of the factor
factor_levels <- levels(gender_factor)
factor_levels
Get the frequency of each level
level_frequency <- table(gender_factor)
level_frequency