1. What is the primary purpose of the dplyr package in R?
A Data visualization B Data manipulation C Statistical modeling
D File input/output Answer: B
2. Which operator is used in dplyr to chain multiple functions together?
A -> B => C %>% D :: Answer: C
3. What does the filter() function do in dplyr?
A Reorders rows B Selects specific columns
C Filters rows based on condition D Combines two data frames Answer: C
4. What is the output of select(data, Name, Salary)?
A A list of all columns B Only the rows with Name and Salary greater than
average C Only the Name and Salary columns
D Sorted rows by Salary Answer: C
5. Which function is used to add a new column to a data frame?
A select() B filter() C mutate() D arrange() Answer: C
6. What does summarise() (or summarize()) do in dplyr?
A Combines data sets B Sorts rows C Calculates summary statistics
D Adds new columns Answer: C
7. What is the purpose of group_by() in dplyr?
A To merge multiple data frames B To convert columns to rows
C To group data before applying summary functions D To visualize grouped data
Answer: C
8. How would you arrange rows in descending order of Salary?
A arrange(df, Salary) B arrange(df, -Salary)
C arrange(df, desc(Salary)) D filter(df, Salary) Answer: C
9. What type of join keeps only matching rows from both data frames?
A left_join() B full_join() C inner_join()
D right_join() Answer: C
10. Which dplyr function is best used for selecting specific columns from a data frame?
A mutate() B summarise() C select() D filter() Answer: C
11. What is the result of using %>% with multiple functions?
A Creates a list B Chains commands in sequence
C Filters only numeric columns D Returns NULL Answer: B
12. In dplyr, which function would you use to combine two data frames by a common column?
A bind() B merge() C join() D group_by() Answer: C
13. Which function is typically used immediately after group_by()?
A select() B summarise() C mutate() D arrange() Answer: B
14. Which of the following is NOT a join function in dplyr?
A inner_join() B left_join() C outer_join() D full_join()
Answer: C
15. What package is dplyr a part of?
A ggplot2 B tidyverse C shiny D Rcpp Answer: B
1. Which function in dplyr is used to choose specific columns?
A filter() B mutate() C select() D arrange() Answer: C
2. What does the mutate() function do in R?
A Selects rows B Adds or modifies columns C Sorts data
D Removes duplicates Answer: B
3. What is the purpose of filter() in dplyr?
A Add columns B Reorder columns C Subset rows based on condition
D Combine datasets Answer: C
4. Which function is used to reorder rows based on column values?
A filter() B summarise() C select() D arrange() Answer: D
5. How do you calculate summary statistics in dplyr?
A group_by() B summarise() C select() D mutate() Answer: B
6. What does group_by() do?
A Sorts values B Converts data types
C Groups data for summary operations D Deletes duplicate rows Answer: C
7. Which function would you use to calculate the average salary?
A mutate(Salary) B arrange(Salary) C summarise(mean(Salary))
D select(Salary) Answer: C
8. What does the `%>%` operator do in R?
A Negation B Exponentiation C Chains functions together
D Defines new variables Answer: C
9. What is a benefit of using the pipe operator `%>%`?
A Makes nested functions less readable B Makes code longer
C Improves code readability D Used only in base R Answer: C
10. What is the output of `select(data, Name, Salary)`?
A Filters by name and salary B Adds new columns
C Only columns Name and Salary D Sorts by Salary Answer: C
11. What function is typically used after group_by()?
A arrange() B select() C mutate() D summarise() Answer: D
12. What function is used to add a new column called "Annual_Salary" as `Salary * 12`?
A summarise() B select() C filter() D mutate() Answer: D
13. What is the purpose of arrange(desc(Salary))?
A Filters rows by Salary B Sorts Salary in ascending order
C Sorts Salary in descending order D Removes missing values Answer: C
14. Which function filters rows where Age > 30?
A arrange(data, Age > 30) B mutate(data, Age > 30)
C select(data, Age > 30) D filter(data, Age > 30) Answer: D
15. What function is used to select only Name and Salary from a data frame?
A select(data, Name, Salary) B mutate(Name, Salary)
C filter(Name, Salary) D arrange(Name, Salary) Answer: A
1. What is the purpose of data blending and joining in R?
A To visualize data B To clean outliers
C To combine datasets for analysis D To export data Answer: C
2. Which R package provides join functions like inner_join and left_join?
A ggplot2 B dplyr C tidytext D tibble Answer: B
3. What does an inner join return?
A Only unmatched rows B All rows from both tables
C Only matching rows from both tables D Rows with NULL values only
Answer: C
4. Which join includes all rows from the left table and matching rows from the right?
A inner_join() B left_join() C right_join() D full_join()
Answer: B
5. What does right_join() return?
A Only right table rows B Matching rows from left table only
C All rows from right table and matched rows from left
D Combined sorted table Answer: C
6. What is the result of full_join() in dplyr?
A Intersection of both tables B Only left table rows
C All rows from both tables D Only right table rows Answer: C
7. Which join returns rows from the left table that match rows in the right table, but excludes
right table columns?
A anti_join() B semi_join() C full_join() D inner_join()
Answer: B
8. What does anti_join() return?
A Matching rows from both tables B All right table rows
C Unmatched rows from the left table D Full combination of rows Answer: C
9. What statistical method can be used to detect outliers?
A Z-scores and IQR B Mean and mode C Median filter
D Chi-square test Answer: A
10. What graphical method is commonly used to detect outliers?
A Pie chart B Histogram C Boxplot
D Bar graph Answer: C
11. How can you treat outliers using transformation?
A Drop them B Replace with NA C Apply log or sqrt
D Sort them Answer: C
12. What function is used to check for missing values in R?
A [Link]() B [Link]() C [Link]() D [Link]() Answer: B
13. Which function removes rows with missing values?
A [Link]() B [Link]() C [Link]() D [Link]() Answer: C
14. What is mean imputation for missing data?
A Remove all NAs B Replace with 0
C Replace with mean of columnD Use a random number Answer: C
15. Which R package can be used for multiple imputation of missing data?
A ggplot2 B mice C tidyverse D stringr Answer: B