0% found this document useful (0 votes)
6 views36 pages

Data Structures in R

The document provides an overview of data structures in R, focusing on arrays, matrices, data frames, and factors. It includes syntax for creating and manipulating these structures, as well as handling missing values and performing data imputation. Additionally, it outlines exercises for practical application of the concepts discussed.

Uploaded by

ruqayyah1530
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views36 pages

Data Structures in R

The document provides an overview of data structures in R, focusing on arrays, matrices, data frames, and factors. It includes syntax for creating and manipulating these structures, as well as handling missing values and performing data imputation. Additionally, it outlines exercises for practical application of the concepts discussed.

Uploaded by

ruqayyah1530
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data structures in R

Arrays

• Multi-dimensional data structures (unlike vectors which are 1D)


• Store elements of the same data type
• Built on top of vectors with dimension attribute
• Can be 1D, 2D (matrix), 3D, or higher dimensions
• Useful for organizing complex datasets
Arrays

• Single dimension Arrays


a =c(10,20,30,40,50)
arr =array(a)
Print(class(arr)) # array
Print(a)
[1] 10 20 30 40 50 # output
Arrays

Multi dimensional Arrays


Syntax:
array(data, dim = c(nrow, ncol, narray))
a = c(10, 20, 30, 40, 50, 60)
n = array(a, dim = c(2, 2, 3))
print(n)
Arrays

• n=array(a,dim=c(2,2,3))
print(n[2, 1, 1]) # Single element row 2 colum 1 layer 1 #20
print(n[2 ,,1]) # entire 2nd row [1] 20 40
print(n[, c(1,2), 3]) # All rows, Col 1-2, Array
Arrays

a[row,column,layer]
a=array(c(10,20,30,40,50,60),dim=c(2,2,1))
b=array(c(10,20,30,40,50,60),dim=c(2,2,1))
Apply()
Apply( array_name, margin ,function)

res=apply(a,2,sum)
# 30 70 margin 1 for row 2 for columns
Arrays

• Using for loop to iterate through array elements:


for(i in a) { print(i) }
Search in Array
> 20 %in% a
• [1] TRUE
Matrices
• Matrices in R are created using the matrix() function with three key
parameters:
• data (a vector), nrow (number of rows), and ncol (number of columns)
• For example, m <- matrix(c(10,20,30,40,50,60), nrow=3,
ncol=2,byrow=FALSE)
• fills the matrix column-wise by default, resulting in:
• [,1] [,2]
• [1,] 10 40
• [2,] 20 50
• [3,] 30 60
Matrices

• Accessing elements in matrix


• Use indexing m[row, col] for single elements :m[2,1] [1] 20.
• Access whole rows with m[row, ] print(m[1,]) [1] 10 40
• columns with m[, col], print(m[,1])
• [1] 10 20 30
• multiple rows via m[c(1,2), ]
• multiple columns via m[, c(1,2)].
Matrices

multiple rows via m[c(1,2), ]


print(m[c(1,2),])
• [,1] [,2]
• [1,] 10 40
• [2,] 20 50
• multiple columns via m[, c(1,2)].
Matrices

Operation in matrix
Adding rows and columns in matrix
In order to add rows in matrix we use
rbind(matrix_name,data)
n=rbind(m,c(70,80))
print(n)
Matrices

Operation in matrix
To add rows column wise
Cbind(matrix_name, data)
• Note: the arguments to rbind() should have the same number of columns and
the arguments to cbind() should have the same number of rows.
• Since cbind() and rbind() work on vectors, you can use them to construct
matrices from vectors on a row by row, or column by column basis.
Matrices
• Deleting columns and rows in a matrix
• Single element –matrixname[-c(index)]
• Single row -matrix_name[ -(row_index),]
• Single colum
• MatrixName[,-c(column_num)]
• Entire row and column – matrix_name[-c(row_index),-
c(column_index)
MATRIX
• searching an element in-membership operator element %in%
• 10 %in% m [TRUE]
• dimension of matirx
• length of a matrix print(length(matrix_name))
• finding number of rows and no of columns
• nrow(matrix_namer)- return number of rows
• ncol(matrix_name)-return number of columns
MATRIX
MATRIX
matrix

• looping in matrix_elements
• for(row in 1:nrow(m))
• {for(col in 1:ncol(m))
•}
Data frame

• . A data frame is a 2-D data structure with rows (records) and columns
(variables)
• Each column represents one feature, each row represents one observation.
• Unlike a matrix, columns can have different data types (numeric, character,
factor, etc.)
• Internally, a data frame is a list of equal-length vectors, where each vector is
a column.
DATA FRAME

• Creation of dataframe
• [Link](col_name1,col_name2,col_name3)
a <- c(1,2,3,4,5)
b <- c("R", "Is", "Fun!","Let's","Learn")
c <- c(TRUE,FALSE,TRUE,TRUE,FALSE)
my_frame <- [Link](a,b,c)
DATA FRAME

ACCESSING DATAFRAME
dataframe[index] – return columns with specified index
dataframe[c(col1,col2,…)] –return multiple columns having given
index
dataframe[[“col_name”]] – return column having given name
or
Datafram$ col_name
Data Frame

• Dataframe$col_name[index]
• –return element having index in
• specifiedcolumn
DATA FRAME
DATA FRAME

• Summary(dataframe)
• Gives a statistical overview of each column (min, max, mean for
numeric; counts for factors).
• length(data_frame)
• Dim(data_frame)
• nrow(data_frame)
• ncolumn(data_frame)
Factor

A factor is a way of categorizing or labeling data that falls into different


groups or categories. Think of it like labels or tags for data points.
• gender_vector <- c(rep("male",10),
rep("female",15)) # Create a character variable

gender_factor <- factor(gender_vector)


• print(gender_factor) # male , female
Factor


Factor

• data <- rep(c("very low", "low", "medium", "high", "very high"), 5)

• dat_factor <- factor(dat,


• levels=c("very low", "low", "medium", "high", "very high"),
• ordered=TRUE)

• print(dat_factor)
Handling missing values

• data <- [Link]("C:/Users/madhu/Downloads/titanic/[Link]")


reads a CSV (comma-separated values) file from disk and loads it into R
as a data frame.
• #Dataset dimensions and structure
• dim(data) -dim() returns the dimensions of the dataset: number of rows
and number of columns
• str(data) –str() displays the internal structure of the dataset, including
variable types, dimensions, and sample values.
Handling missing values
# Check if any missing values exist
anyNA(data)
#Identifies the exact row and column positions of all missing values (NA) in the dataset.
• idx <- which([Link](data), [Link] = TRUE)
#[Link]()
• Definition
• Checks whether values are missing (NA).
• What it returns
• TRUE → value is NA
• FALSE → value is not NA
Handling missing values
[Link]() -[Link](data$Age)
• Definition
• Checks whether an object itself is NULL (i.e., does not exist).
[Link]()- new_data <- [Link](data)
• Definition
• Removes all rows that contain at least one missing value (NA).
[Link]()-[Link](data)
• Definition
• Identifies rows with no missing values across all columns.
Handling missing values

• table(data$Embarked)
• Counts the frequency of unique values.
Numeric Imputation and categorical imputation
• Definition
• Replacing missing numeric values with the median of the variable.
• Example
• data$Age[[Link](data$Age)] <- median(data$Age, [Link] = TRUE)
• data$Embarked[[Link](data$Embarked)] <- "S"
Exercise

• Q1. Data Loading & Inspection


• Write R code to:
• Load the Titanic dataset from a CSV file
• Display the number of rows and columns
• Display the internal structure of the dataset
Exercise

Q2. Missing Value Detection & Summary


• Write R code to:
• Check whether missing values exist in the dataset
• Count missing values for each column
• Compute the percentage of missing values
• Create a summary data frame with variable name, missing count, and
missing percentage
Exercise

• Q3. Locating Missing Values & Data Structures


• Write R code to:
• Identify the row and column indices of all missing values
• Display the first six such indices
• Convert the index object into a data frame
• Add a column with corresponding variable names
Exercise

Q4. Logical Matrix, Matrix Indexing & Array


• Write R code to:
• Create a logical matrix indicating missing values
• Extract a 5×5 subset from this matrix
• Convert the logical matrix into an array
Exercise

Q5. Handling Missing Values


•Write R code to:
•Replace missing values in a numeric column using median
imputation
•Randomly introduce missing values into a categorical
column
•Replace missing categorical values using mode imputation
Exercise

Q6. Final Cleaning & Validation


•Write R code to:
•Remove rows containing any remaining missing values
•Verify that all rows in the final dataset are complete

You might also like