Box Plot
A box plot is a graphical method for displaying the distribution of a dataset and identifying outliers. It
shows the median, lower quartile, upper quartile, and the minimum and maximum values within 1.5
IQR. The interquartile range (IQR) represents the middle 50% of the data. The box portion indicates
spread and skewness. Whiskers show variability outside the quartiles. Outliers, if present, appear as
points outside whiskers. It is widely used in exploratory data analysis. Box plots are especially useful for
comparing multiple groups at a glance.
Step-by-Step Process:
1. Collect numeric data and store it in a vector.
2. Load the vector using c() in R.
3. Use the boxplot() function to create the basic box plot.
4. Add main title using main="".
5. Add x and y labels for clarity.
6. Customize colors using col and border.
7. Interpret quartiles and median.
8. Identify outliers if any.
9. Adjust plotting parameters if needed.
10. Compare multiple box plots using boxplot(list1, list2).
Scatter Plot
A scatter plot visually represents the relationship between two numerical variables. Each point
corresponds to one observation in the dataset. It is used to identify correlation patterns, clusters, and
outliers. A positive trend means as one variable increases, the other also increases. A negative trend
means the opposite. Scatter plots help understand linearity, strength of relationship, and deviations.
Regression lines can be added for deeper analysis. It is a core tool for statistical visualization and
machine learning data exploration.
Step-by-Step Process:
1. Prepare two numeric vectors x and y.
2. Confirm equal lengths for x and y.
3. Use plot(x, y) to draw the scatter plot.
4. Add main title and axis labels.
5. Set point style using pch parameter.
6. Customize color for visibility.
7. Observe the trend or pattern.
8. Add regression line using abline().
9. Adjust plot limits using xlim and ylim.
10. Save or export the plot if needed.
Line Chart
A line chart represents continuous data over time or an ordered sequence. It connects data points with
straight lines to show trends and patterns. Line charts are ideal for time-series data such as
temperature, stock prices, or sales. They highlight increases, decreases, and fluctuations. Multiple line
charts can compare several trends simultaneously. Line thickness, markers, and colors enhance clarity.
Line charts help interpret long-term patterns and short-term variations.
Step-by-Step Process:
1. Prepare x and y vectors representing ordered data.
2. Call plot() with type="l" for line chart.
3. Add titles and axis names.
4. Adjust line width with lwd.
5. Change color to highlight trend.
6. Add points using points() if required.
7. Customize axes using axis() function.
8. Compare two lines by adding lines().
9. Observe trend and fluctuations.
10. Export the chart for reporting.
Bar Chart
A bar chart displays categorical data with rectangular bars. The height or length of each bar is
proportional to the value it represents. Bar charts allow easy comparison between categories such as
products, departments, or regions. Vertical bar charts show values clearly, while horizontal bars help
with long category names. Grouped and stacked bar charts support multivariate comparison. Bar charts
are simple, effective, and widely used in business analytics and reporting.
Step-by-Step Process:
1. Create vector of categories and values.
2. Use barplot() function for drawing bars.
3. Add [Link] to label bars.
4. Add chart title and axis labels.
5. Apply colors using col.
6. Use horiz=TRUE for horizontal bars.
7. Adjust bar width using width parameter.
8. Add legends if comparing datasets.
9. Observe category-wise comparison.
10. Export the image if required.
Pie Chart
A pie chart visualizes proportions of categories as slices of a circle. The size of each slice is
proportional to the percentage it represents. Pie charts are useful when showing how a whole divides
into parts. Labels and colors help identify each segment. While visually attractive, pie charts are less
precise for comparison than bar charts. They should be used only when categories are few and
differences are prominent. Donut charts are a modern variation offering improved readability.
Step-by-Step Process:
1. Create vector of numeric values.
2. Create label vector for categories.
3. Use pie() function for drawing chart.
4. Apply rainbow() or custom colors.
5. Add title using main parameter.
6. Ensure values sum up to meaningful proportions.
7. Avoid chart if categories are more than 6.
8. Customize labels for clarity.
9. Compare proportions visually.
10. Export final chart.
Pairs Plot
A pairs plot (scatterplot matrix) displays relationships among multiple variables simultaneously. It
creates scatter plots for each pair of variables and diagonal panels often show density or histogram.
This visualization helps detect correlations, clusters, and unusual patterns across dimensions. It is
especially useful in multivariate datasets like iris. Pairs plots assist in feature selection and
preprocessing for machine learning. They reveal hidden interactions not visible in single plots.
Step-by-Step Process:
1. Load a dataset (like iris).
2. Select numeric columns for plotting.
3. Use pairs() function to generate plot matrix.
4. Add title for clarity.
5. Custom color for points using col.
6. Observe relationships between variable pairs.
7. Identify correlations or clusters.
8. Look for outliers or unusual patterns.
9. Use ggpairs() from GGally for advanced plots.
10. Save or export plot as needed.