0% found this document useful (0 votes)
5 views11 pages

EDA MCQs for BCA Semester 3

The document outlines a course on Exploratory Data Analysis (EDA) for BCA Semester 3, featuring multiple-choice questions (MCQs) and short and long answer questions related to R programming and data visualization techniques. Key topics include advantages and disadvantages of R, data visualization principles, ggplot2 usage, and various EDA techniques like univariate and bivariate analysis. It emphasizes the importance of data manipulation, visualization, and the use of specific R packages like DPLYR and RColorBrewer.

Uploaded by

tanishq Kushwaha
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views11 pages

EDA MCQs for BCA Semester 3

The document outlines a course on Exploratory Data Analysis (EDA) for BCA Semester 3, featuring multiple-choice questions (MCQs) and short and long answer questions related to R programming and data visualization techniques. Key topics include advantages and disadvantages of R, data visualization principles, ggplot2 usage, and various EDA techniques like univariate and bivariate analysis. It emphasizes the importance of data manipulation, visualization, and the use of specific R packages like DPLYR and RColorBrewer.

Uploaded by

tanishq Kushwaha
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Program: BCA

Semester: 3

Course Name: Exploratory Data Analysis

MCQs
1. What is one of the advantages of R for data analysis?
a. Limited package support
b. Inability to import data
c. Active and supportive community
d. Expensive licensing fee
Answer - b

2. Which of the following is a disadvantage of R compared to other languages and platforms?


a. Limited data visualization capabilities
b. Slower computation speed
c. Difficulty in writing custom functions
d. Lack of data import options
Answer - b

3. When installing R on Windows, what is the recommended integrated development environment (IDE)
for beginners?
a. RStudio
b. Jupyter Notebook
c. Visual Studio Code
d. Spyder
Answer - a

4. How can you set the working directory in R?


a. Using the `setwd()` function
b. Through the command line
c. By using the `import` keyword
d. Automatically, it's set to the R installation folder
Answer - a

5. In R, which function is used to import external libraries/packages?


a. `load_library()`
b. `import_library()`
c. `[Link]()`
d. `include()`
Answer - c

6. What are the principles of analytic graphics in EDA?


a. Complex and confusing visualizations
b. Simplicity and clarity
c. Monochromatic color schemes
d. Highly interactive plots
Answer - b

7. In R, what is the primary plotting system for creating simple data visualizations?
a. ggplot2
b. Lattice
c. Base plotting
d. Plotly
Answer - c

8. How is a ggplot2 object created in R?


a. Using the plot() function
b. By specifying points and lines manually
c. By adding layers to a basic ggplot object
d. Through the create_ggplot() function
Answer - c

9. Which package is primarily used for advanced data visualization in R?


a. base
b. lattice
c. ggplot2
d. plotly
Answer - c

10. What does "gg" in ggplot2 stand for?


a. Good Graphics
b. Graph Generator
c. Grammar of Graphics
d. Great Gradients
Answer - c

11. What is the primary goal of dimension reduction techniques?


a. To increase data complexity
b. To reduce the number of variables while preserving information
c. To create additional variables
d. To transform data into higher dimensions
Answer - b
12. What is Principal Component Analysis (PCA) used for?
a. Outlier detection
b. Hierarchical clustering
c. Dimension reduction and data compression
d. Data visualization
Answer - c

13. In the context of dimension reduction, what does "SVD" stand for?
a. Statistical Variable Decomposition
b. Sequential Variable Determination
c. Singular Value Decomposition
d. Supervised Variable Development
Answer - c

14. How can you effectively use color in data visualization?


a. By using monochromatic color schemes
b. By avoiding color altogether
c. By using color to convey meaning and contrast
d. By only using colors that match your personal preferences
Answer - c

15. What is the purpose of creating color palettes with RColorBrewer in data visualization?
a. To generate random color combinations for aesthetics
b. To create animations for interactive plots
c. To provide a set of visually appealing and distinct colors for different data categories
d. To remove colors from the visualization for simplicity
Answer: c.

16. Which control structure in R is used to iterate over a sequence of values?


a. For loop
b. While loop
c. If statement
d. Repeat loop
Answer - a

17. In data analysis, what is the primary purpose of using the `DPLYR` package in R?
a. Data manipulation and transformation
b. Model training and prediction
c. Data visualization
d. Web scraping
Answer - a

18. What is the key goal of exploratory data analysis (EDA)?


a. Building predictive models
b. Data cleaning and preparation
c. Hypothesis testing
d. Summarizing and visualizing data to gain insights
Answer - d

19. Which of the following is an essential step in EDA?


a. Designing machine learning algorithms
b. Conducting hypothesis testing
c. Data cleaning and preparation
d. Missing value treatment
Answer - d

20. What is the purpose of "Univariate Analysis" in EDA?


a. To analyze the relationships between two variables
b. To analyze relationships between three variables
c. To analyze a single variable
d. To generate random data points
Answer - c

21. What are the key principles of analytic graphics in EDA?


a. Complexity and ambiguity
b. Simplicity and clarity
c. Colorful and intricate design
d. Dynamic and interactive plots
Answer - b

22. In R, what is the base plotting system primarily used for?


a. Advanced 3D visualizations
b. Creating basic data visualizations
c. Machine learning model training
d. Handling big data
Answer - b

23. What is the primary advantage of using the "base plotting" system in R for basic data visualization?
a. Complex and customizable graphics
b. High interactivity
c. Simplicity and control over plot elements
d. Compatibility with ggplot2
Answer - c

24. In R, how is a `ggplot2` object typically created?


a. Through complex code that involves multiple packages
b. By manually specifying points and lines
c. By adding layers to a basic ggplot object
d. By using the "create_ggplot()" function
Answer - c

25. What is the primary purpose of the `ggplot2` package in R?


a. Data cleaning and preparation
b. Generating machine learning models
c. Basic data visualization
d. Advanced data visualization and customization
Answer - d

26. Which control structure in R allows you to execute different code blocks based on specified
conditions?
a. For loop
b. While loop
c. If statement
d. Switch case
Answer - c

27. Which package is commonly used for data manipulation and data cleaning in R?
a. `base`
b. `plyr`
c. `DPLYR`
d. `[Link]`
Answer - c

28. What is the main goal of Exploratory Data Analysis (EDA)?


a. Building predictive models
b. Cleaning and preparing data
c. Summarizing and visualizing data
d. Conducting hypothesis testing
Answer - c

29. Which of the following is NOT an EDA technique?


a. Univariate Analysis
b. Principal Component Analysis
c. Bivariate Analysis
d. Missing Value Treatment
Answer - b

30. What is one of the key insights that EDA can provide?
a. Predictive accuracy of the model
b. Patterns and trends in the data
c. Data collection techniques
d. Optimal clustering algorithms
Answer - b

31. What is a primary advantage of using ggplot2 for data visualization?


a. Simplicity and minimal customization options
b. Limited support for custom data visualization
c. Highly customizable and follows the Grammar of Graphics
d. No need to install additional packages
Answer - c

32. How can you customize the appearance of a plot using ggplot2?
a. By modifying the raw data
b. By adding layers and changing aesthetics
c. By using a different plotting system
d. By changing the working directory
Answer - b

33. What does a "facet" mean in the context of ggplot2?


a. A data point on a plot
b. A type of control structure in R
c. A way to create multiple plots based on a variable
d. A color palette
Answer - c

34. What are dendrograms commonly used for in hierarchical clustering?


a. Creating scatterplots
b. Principal Component Analysis
c. Visualization and interpretation of clustering results
d. Outlier detection
Answer - c

35. In hierarchical clustering, what is the meaning of "hierarchical"?


a. It refers to the complex, intertwined relationships in data.
b. It implies a strict hierarchy of variables.
c. It indicates the formation of a tree-like structure in clustering.
d. It signifies the use of a hierarchical statistical model.
Answer - c

36. What is the primary purpose of setting the working directory in R?


a. To define variable data types
b. To generate random numbers
c. To specify the location for file operations
d. To organize data for analysis
Answer - c

37. When installing R on Mac, which package manager is commonly used to simplify package
management?
a. Anaconda
b. Homebrew
c. MacPorts
d. Chocolatey
Answer - b

38. What is the main advantage of R over other data analysis languages?
a. Rich and active package ecosystem
b. Simplicity of syntax
c. Limited data visualization capabilities
d. Built-in machine learning algorithms
Answer - a

39. In R, what does the function `library()` do?


a. Create visualizations
b. Load external datasets
c. Import external libraries/packages
d. Load R packages for use in the current session
Answer - d

40. What is the primary role of the `[Link]()` function in R?


a. To read image files
b. To perform complex statistical calculations
c. To import datasets from the web
d. To read and import data from a CSV file
Answer - d

41. In EDA, what is the primary objective of "Outlier Treatment"?


a. To create outliers for testing purposes
b. To identify and handle extreme or erroneous data points
c. To identify and handle missing values
d. To remove all data points
Answer - b

42. What is "Bivariate Analysis" in EDA?


a. Analyzing relationships between two variables
b. Analyzing relationships between three variables
c. Analyzing a single variable
d. Generating random data
Answer - a
43. In EDA, what is the primary goal of "Bi variate analysis"?
a. To analyze relationships between three or more variables
b. To examine the distribution of a single variable
c. To create data visualizations
d. To explore relationships and patterns between two variables**
Answer - d

44. What does "EDA" stand for in the context of data analysis?
a. Extended Data Assessment
b. Enhanced Data Analytics
c. Exploratory Data Analysis
d. Effective Data Aggregation
Answer - c

45. What is a common outcome of EDA?


a. A fully developed machine learning model
b. Insights and a deeper understanding of the data
c. Data normalization
d. Data cleansing
Answer - b

46. What is the main advantage of using "ggplot2" for data visualization?
a. Limited customization options
b. High interactivity
c. Simplicity and minimal customization options
d. Highly customizable and follows the Grammar of Graphics
Answer - d

47. What is the primary purpose of a "box plot" in data visualization during EDA?
a. To display the distribution of categorical variables
b. To show the relationship between two variables
c. To identify and handle missing values
d. To visualize the distribution and spread of a numeric variable
Answer - d

48. When conducting "Missing Value Treatment" in EDA, what is a common method for imputing
missing values in numeric data?
a. Replacing with the mean value
b. Using the mode of the variable
c. Interpolating missing values based on adjacent data points
d. Removing the missing values
Answer - c
49. In EDA, what is the primary purpose of "Histograms" and "Density Plots"?
a. To detect and handle outliers
b. To visualize relationships between variables
c. To create scatterplots
d. To explore the distribution and frequency of numeric data**
Answer - d

50. What is the main difference between "bar plots" and "histograms" in data visualization?
a. Bar plots are used for univariate analysis, while histograms are for bivariate analysis.
b. Bar plots represent categorical data, while histograms represent numeric data.
c. Bar plots are used for advanced data visualization, while histograms are for basic data
visualization.
d. Bar plots show discrete categories, while histograms display the distribution of continuous
numeric data.
Answer - b

Short Answers
Unit 1:

1. Write R code to install the 'dplyr' package and load it into your R session.
2. Explain the different types of objects in R (vector, matrix, data frame, and list). How are they
created, and in what situations would each be used? Provide example R commands.
3. What are the different data types in R (numeric, character, logical, factor)? Explain each with an
example.
4. Compare R with other programming languages and platforms often used in data analysis.
5. Compare different control structures in R (if, for, while, repeat). Write an example where each
structure would be useful in data analysis.

Unit 2:

6. Explain the principles of analytic graphics. Why are these principles important in exploratory data
analysis?
7. What is a histogram? Write R code to create a histogram for a numeric variable.
8. Explain the purpose of a boxplot in data analysis. How is it created in R?
9. Describe the difference between bar plots and histograms. Provide examples of when each should
be used.
10. Write an R program to create a simple bar plot of five subjects' marks.

Unit 3:

11. What is the purpose of the plot() function in R? Give an example of its basic use.
12. Explain the difference between the base, lattice, and ggplot2 plotting systems in R.
13. What is the function of the colorRamp() and colorRampPalette() utilities in R?
14. What is the use of the RColorBrewer package? Give an example of how it enhances data
visualization.
15. Explain how graphics devices work in R. What are examples of file devices for plots?

Unit 4:

16. What is K-means clustering? Explain its basic idea in simple terms.
17. What is Singular Value Decomposition (SVD)? Name one application of it in data analysis.
18. Explain the difference between Euclidean distance and Manhattan distance in clustering.
19. What is a dendrogram, and how is it useful in hierarchical clustering?
20. What is the purpose of the heatmap() function in R? Provide a situation where it is useful.

Unit 5:

21. Explain the qplot() function in ggplot2. Provide an example to create a scatterplot.
22. How can you add multiple layers (e.g., points, smooth lines) in a ggplot2 plot?
23. What is ggplot2 in R, and why is it widely used for data visualization?
24. Explain how facets work in ggplot2. Why are they useful?
25. What are “geoms” in ggplot2? Give examples of commonly used geoms.

Long Answers
Unit 1:

1. Explain the significance of data frames in R programming. Discuss their structure, characteristics,
and advantages for data analysis. Describe how data frames in R are different from other data
structures like matrices and lists. Explain how data frames are used in data analysis workflows
and provide examples of common operations like filtering, sorting, and summarizing data.
2. Explain the use of the dplyr package in R. Explain its main functions (select(), filter(), mutate(),
arrange(), and summarize()) with suitable examples demonstrating their role in data manipulation.

Unit 2:

3. Outline the key principles that guide the design of analytical graphics in R. Include a discussion
on the importance of choosing appropriate chart types for different data scenarios, the use of color
and aesthetics to highlight key information, and the significance of clarity and simplicity in plot
design. Explain how these principles can be applied to ensure visualizations are both informative
and accessible.
4. Describe the concept of exploratory graphs in R. Discuss how histograms, boxplots, and
scatterplots can be used to summarize data distributions and relationships, using appropriate R
examples.

Unit 3:
5. Compare and contrast the Base Plotting System, Lattice System, and ggplot2 System in R.
Highlight their main features, advantages, and use cases in exploratory data visualization.
6. Explain the process of creating and customizing plots using the Base Plotting System in R.
Discuss how to add titles, labels, colors, and regression lines with suitable code examples.

Unit 4:

7. Discuss the steps involved in performing Hierarchical Clustering in R. Explain how distances are
computed, how clusters are formed, and how dendrograms help visualize relationships among
data points.
8. Explain the concept of Principal Component Analysis (PCA) and Singular Value Decomposition
(SVD) in R. Discuss their mathematical basis, role in dimension reduction, and interpretation of
results in data analysis.

Unit 5:

9. Discuss the core principles of the Grammar of Graphics. In your answer, explain the concept of
mapping data to visual properties (aesthetics), the role of geometric objects (geoms), and the use
of layers to construct complex visualizations. Provide a detailed discussion of how the Grammar
of Graphics facilitates flexibility and consistency in creating plots.
10. Explain the components of a ggplot2 plot — including data, aesthetics, geoms, facets, and
themes. Discuss how these components interact to build layered and informative visualizations in
R.

You might also like