0% found this document useful (0 votes)
16 views63 pages

Understanding Linear Regression Basics

The document explains linear regression, a statistical method for analyzing relationships between independent and dependent variables. It covers types of regression models, the purpose of regression, and the process of regression analysis, including simple linear regression. Additionally, it provides examples, applications, and an implementation guide for simple linear regression using R.

Uploaded by

rahulprakash6480
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views63 pages

Understanding Linear Regression Basics

The document explains linear regression, a statistical method for analyzing relationships between independent and dependent variables. It covers types of regression models, the purpose of regression, and the process of regression analysis, including simple linear regression. Additionally, it provides examples, applications, and an implementation guide for simple linear regression using R.

Uploaded by

rahulprakash6480
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Module 5

Linear Regression and Plotting’s


Linear Regression
Linear regression is defined as the method used to analyze the relationship between an
independent variable and a dependent variable.
In statistics, regression refers to a statistical method used to examine the relationship between
one or more independent variables and a dependent variable.
It aims to model and understand how changes in the independent variables are associated
with changes in the dependent variable.

Purpose of regression
• Predicting or estimating the value of the dependent variable based on the values of
one or more independent variables.
• Understanding the strength and nature of the relationship between variables.
• Making inferences about the population based on sample data.

Types of regression models,


1. Simple Linear Regression: Examining the relationship between two variables,
typically one independent variable and one dependent variable.
2. Multiple Linear Regression: Exploring the relationship between a dependent
variable and multiple independent variables.
3. Logistic Regression: Used when the dependent variable is categorical, predicting the
probability of an event occurring.

Regression Analysis
Regression analysis means creating a model that shows how one thing (the dependent
variable) changes when other things (independent variables) change.
It helps us check how strong the relationship is, see which factors are important, and
make predictions from data.
It is used in many fields (like business, economics, health, and science) to:
• Understand relationships between variables.
• Predict future outcomes.
• Help in making better decisions based on data.

Dependent variable (Y): This is the main thing you want to predict or study. It is also
called the outcome or response variable.
Independent Variable (X): The independent variable is the predictor variable — the one
that causes or influences changes in another variable (the dependent variable).
In other words, you control or observe this variable to see how it affects the outcome.
An Example of a Linear Relationship:
Let's consider an example illustrating a linear relationship between two variables, such as the
relationship between a person's age and their reaction time.
Variables: Dependent Variable (Y): Reaction time (in milliseconds) response variable - this
is what we want to predict or understand.
Independent Variable (X): Age of the person (in years) - this is the predictor variable.
Assumption: We assume that younger individuals tend to have faster reaction times, and as
people age, their reaction times might increase.
Data Collection: Gather data on reaction times and ages from individuals across different
age groups.
Analysis: After collecting the data, plot the reaction time against ages on a scatter plot.
If a linear relationship exists, the points might tend to form a pattern where, on average, as
age increases, reaction time also tends to increase or decrease in a linear manner.
Linear Relationship:
A positive linear relationship would imply that as age increases, reaction time increases.
A negative linear relationship would suggest that as age increases, reaction time decreases.
Modelling:

• Linear regression is used to fit a straight line to data and measure the relationship
between two variables — in this example, age and reaction time.
• By studying this relationship, we can check whether there is a correlation between
age and reaction time, and whether that relationship is linear (increases or decreases
steadily).
• This helps us understand how one variable affects the other — for instance, how
reaction time changes as age increases. This example shows how linear relationships
between variables can be explored, visualized, and modelled using statistics.

Simple Linear Regression


Definition:
Simple Linear Regression is a statistical method used to show or predict the relationship
between two continuous variables — one dependent variable (Y) and one independent
variable (X).
In simple terms:
It predicts the value of Y (output) based on the value of X (input) using a straight-line
relationship.
Example:
If we study how a person’s age (X) affects their reaction time (Y), then we can use simple
linear regression to find a line that best fits the data and helps predict reaction time for a
given age.
The simple linear regression model can be represented as:

Meaning of the Terms:

Symbol Term Explanation

This is the variable we want to predict or explain


Y Dependent Variable
(e.g., Sales, Marks, Weight).

This is the variable used to make the prediction (e.g.,


X Independent Variable
Advertising, Hours studied, Height).

The value of Y when X = 0. It shows where the line


β₀ Intercept
crosses the Y-axis.

Shows how much Y changes when X increases by 1


β₁ Slope
unit.

The difference between the actual value and the


ε Error term predicted value. It represents randomness or factors
not included in the model.

Goal of Simple Linear Regression

• The goal is to find the best-fitting straight line through the data points.
• This line shows how the dependent variable (Y) changes as the independent variable
(X) changes.
• The line is chosen so that the sum of squared errors (differences between actual and
predicted Y values) is as small as possible.
Example:
Suppose you want to study how hours studied (X) affect exam marks (Y).

Hours Studied (X) Exam Marks (Y)

2 50

4 60
Hours Studied (X) Exam Marks (Y)

6 70

8 80

Here:
• X = Hours studied → Independent variable
• Y = Exam marks → Dependent variable
After applying regression, you might get:

This means:
• Intercept (β₀) = 40 → if a student doesn’t study (X=0), they still get 40 marks.
• Slope (β₁) = 5 → for every extra hour of study, marks increase by 5 points.

Applications of Simple Linear Regression

Field How It’s Used

To study how interest rates affect stock prices, or how GDP


Economics & Finance
affects employment.

Market Research To analyze how advertising spending affects sales revenue.

To study the relationship between study hours and exam scores


Education
or predict student performance based on various factors.

To analyze how the number of practice hours affects athletic


Sports Analytics performance or predict team performance based on player
statistics.

Health To explore how exercise time affects weight loss or blood


pressure levels.
Simple Linear Regression Features In Statistics

1. One Dependent Variable


• There is only one dependent variable (Y) that we are trying to predict or explain.
• Its value depends on changes in one independent variable (X).
Example: Predicting exam marks (Y) based on hours studied (X).

2. Straight-Line Relationship
• The relationship between the independent and dependent variables is assumed to be
linear.
• The formula is:

where Y is the dependent variable, X is the independent variable, β₀ is the intercept, β₁


is the slope, and ε is the error term.
3. Minimization of Residuals:

• The residual is the difference between the actual value and the predicted value.
• The regression line is chosen in such a way that the sum of squared residuals is as
small as possible.
Think of it as: Finding the line that fits the data points “best.”
4. Statistical Inference:
• Regression allows us to test statistically whether there is a real relationship between X
and Y.
• We test if the slope (β₁) is significantly different from zero.
• This helps check if changes in X actually affect Y or if it’s just random.
Example: Testing if hours studied really impact marks, or if it’s just coincidence.
5. Assumptions
For simple linear regression to work correctly, we assume:
1. The relationship between X and Y is linear.
2. The residuals (errors) are normally distributed.
3. Residuals have constant variance (called homoscedasticity).
4. Observations are independent of each other.
Example: In a dataset of students’ scores, one student’s marks should not depend on
another’s.
6. Interpretation of Coefficients:
• The slope coefficient (β₁) represents the change in the dependent variable for a one-
unit change in the independent variable.
• The intercept (β₀) represents the value of the dependent variable when the
independent variable is zero (if such an interpretation is meaningful in the context).
Example:
If the equation is Marks=40+5×Study Hours:
β₀ = 40: Expected marks when study hours = 0.
β₁ = 5: Marks increase by 5 for each extra hour of study.
7. Prediction
• Once the model is built, you can predict Y for any given X (within your data range).
Example:
If a student studies 6 hours, predicted marks = 40+5×6=70.

The goal of simple linear regression is to estimate the values of β₀ and β₁ that minimize
the sum of squared differences between the observed Y values and the values predicted
by the model for given X values.

The formula to estimate the slope (β₁) & intercept(β0)

Where:

Problems:
X = {2, 3, 5, 7, 9}
Y = {4, 5, 7, 10, 15}
Step 1: Calculate the Mean

Step 2: Calculate Slope (β₁)


X Y (X−5.2) (Y−8.2) (X−5.2)(Y−8.2) (X−5.2)²

2 4 -3.2 -4.2 13.44 10.24

3 5 -2.2 -3.2 7.04 4.84

5 7 -0.2 -1.2 0.24 0.04

7 10 1.8 1.8 3.24 3.24

9 15 3.8 6.8 25.84 14.44

Σ 49.8 32.8

Step 3: Calculate Intercept (β₀)

Step 4: Write the Regression Equation

Interpretation:
For every increase of 1 unit in X, Y increases by 1.52 units.
When X = 0, the predicted Y value is 0.3.
Implementation of Simple Linear Regression in R
Example Data:
Let’s use the same dataset:

X (Independent Variable) Y (Dependent Variable)

2 4

3 5

5 7

7 10

9 15

Step 1: Create the Data

# Create data vectors


x <- c(2, 3, 5, 7, 9) # Independent variable
y <- c(4, 5, 7, 10, 15) # Dependent variable
# Combine into a data frame
data <- [Link](x, y)
# Display the dataset
print("Data:")
print(data)
Output
[1] "Data:"
x y
1 2 4
2 3 5
3 5 7
4 7 10
5 9 15
Step 2: Build the Linear Regression Model
# Build the model
model <- lm(y ~ x, data = data)
print(model)
# Display the model summary
Print(summary(model))

Part Meaning
lm() The function that performs linear regression.
Formula that defines the relationship between variables. Read
y~x as “Y depends on X”.Here, y = dependent variable, x =
independent variable.
Tells R to look for the variables x and y inside the data frame
data = data
named data.
The <- operator stores the model’s output in an object named
model <-
model. You can later print or summarize it.

Output
Call:
lm(formula = y ~ x, data = data)

Coefficients:
(Intercept) x
0.3049 1.5183
#summary output
Call:
lm(formula = y ~ x, data = data)

Residuals:
1 2 3 4 5
0.6585 0.1402 -0.8963 -0.9329 1.0305

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3049 1.0435 0.292 0.7892
x 1.5183 0.1800 8.434 0.0035 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.031 on 3 degrees of freedom


Multiple R-squared: 0.9595, Adjusted R-squared: 0.946
F-statistic: 71.13 on 1 and 3 DF, p-value: 0.003498

Step 3: View Model Coefficients


# Extract coefficients
cat("Intercept (β₀):", coef(model)[1], "\n")
cat("Slope (β₁):", coef(model)[2], "\n")

Explanation:
• coef(model)[1] gives β₀ (intercept)
• coef(model)[2] gives β₁ (slope)

Step 4: Make Predictions


# Predict Y for new X values
new_x <- [Link](x = c(4, 6, 8))
predictions <- predict(model, new_x)

# Display predictions
cat("Predicted Y values:\n")
print(predictions)

Explanation:
We use the predict() function to find new Y values based on the regression line.

Step 5: Visualize the Regression Line

# Plot data points


plot(x, y, main = "Simple Linear Regression", xlab = "X (Independent
Variable)", ylab = "Y (Dependent Variable)", pch = 19, col = "blue")

# Add the regression line


abline(model, col = "red", lwd = 2)

Explanation:
• plot() draws the scatter plot of actual data.
• abline(model) automatically adds the best-fit regression line.

Step 6: Evaluate Model Fit


# R-squared value
r2 <- summary(model)$[Link]
cat("R-squared:", r2, "\n")

Interpretation:
• R² (R-squared) tells how well the regression line fits the data.
• Value close to 1 → excellent fit.
• Example: R-squared = 0.98 means 98% of variation in Y is explained by
X. That’s an excellent fit!

Final Output Summary

Component Result
Intercept (β₀) 0.3
Slope (β₁) 1.52
Regression Equation Ŷ = 0.3 + 1.52X
R² Value ≈ 0.98
Example Prediction If X = 6 → Ŷ = 9.42

Interpretation
This means that for every 1-unit increase in X, Y increases by approximately
1.52 units.
When X = 0, the predicted value of Y is around 0.3.
The model fits the data very well (since R² ≈ 0.98).
Complete R program (Simple Linear Regression)
# Create data vectors
x <- c(2, 3, 5, 7, 9) # Independent variable
y <- c(4, 5, 7, 10, 15) # Dependent variable

# Combine into a data frame


data <- [Link](x, y)

# Display the dataset


print("Data:")
print(data)

# Build the model


model <- lm(y ~ x, data = data)
print(model)

# Display the model summary


print(summary(model))
cat(coef(model),"\n\n")
cat("Intercept (β₀):", coef(model)[1], "\n")
cat("Slope (β₁):", coef(model)[2], "\n\n")

#predict the Y value for new x value


new_x <- [Link](x = c(4, 6, 8))
predictions <- predict(model, new_x)
print(predictions)

# Plot data points


plot(x, y, main = "Simple Linear Regression", xlab = "X (Independent Variable)", ylab = "Y
(Dependent Variable)", pch = 19, col = "blue")

# Add the regression line


abline(model, col = "red", lwd = 2)

# R-squared value
r2 <- summary(model)$[Link]
cat("R-squared:", r2, "\n")

Output
[1] "Data:"
x y
1 2 4
2 3 5
3 5 7
4 7 10
5 9 15

Call:
lm(formula = y ~ x, data = data)

Coefficients:
(Intercept) x
0.3049 1.5183

Call:
lm(formula = y ~ x, data = data)

Residuals:
1 2 3 4 5
0.6585 0.1402 -0.8963 -0.9329 1.0305

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3049 1.0435 0.292 0.7892
x 1.5183 0.1800 8.434 0.0035 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.031 on 3 degrees of freedom


Multiple R-squared: 0.9595, Adjusted R-squared: 0.946
F-statistic: 71.13 on 1 and 3 DF, p-value: 0.003498

Intercept (β₀): 0.304878


Slope (β₁): 1.518293

Predicted Y values:
1 2 3
6.378049 9.414634 12.451220

R-squared: 0.9595301
Multiple Linear Regression

Multiple Linear Regression is an extension of Simple Linear Regression —


instead of using one independent variable (X) to predict the dependent variable (Y),
it uses two or more independent variables. Multiple Linear Regression (MLR) is a statistical
technique that studies the relationship between one dependent variable and two or more
independent variables.

Key Points:
• It is an extension of Simple Linear Regression.
• It helps to determine the mathematical relationship among multiple predictor
(independent) variables and one outcome (dependent) variable.
• Also known as Multilinear Regression, it is used to analyze how several factors
together affect one response variable.
• The goal of multiple linear regression is to create a model that best fits the
relationship between the dependent and independent variables. It helps in predicting
or understanding how changes in the independent variables affect the dependent
variable.

Example:
Let’s consider a real-life example of multiple linear regression: Imagine you’re trying to
predict house prices based on various factors such as square footage, number of bedrooms,
and distance from the city centre.

Solution: In the case of multiple linear regression for predicting house prices,
Dependent Variable: House price.
Independent Variables (Predictors):
1. Square Footage: The size of the house in square feet.
2. Number of Bedrooms: The count of bedrooms in the house.
3. Distance from City Centre: How far the house is from the city centre.

Multiple Linear Regression can be used to predict house prices using various predictors like
square footage, number of bedrooms, and distance from the city centre.
The model analyzes how each of these independent variables influences the dependent
variable (house price). It helps in understanding how changes in property features affect the
price and can be used for future predictions based on new data.

Applications of Multiple Linear Regression:

Multiple Linear Regression is widely used in various fields for prediction and analysis:

Field Application
Forecasting GDP, inflation, or unemployment using economic
Economics
indicators.
Measuring the impact of pricing, advertising, and demographics on
Marketing
sales.
Studying how age, habits, and health factors affect disease
Healthcare
outcomes.
Finance Predicting stock prices, asset values, and company performance.
Environmental Estimating pollution levels or air/water quality using weather and
Science geographical data.

In Simple Linear Regression, we study the relationship between one independent variable
(X) and one dependent variable (Y).
However, in real life, the outcome (Y) often depends on many factors, not just one.
The equation for a multiple linear regression model is:

Where

Symbol Meaning
Y Dependent variable (the outcome we want to predict)
X₁, X₂, X₃, …, Xₚ Independent variables (the factors affecting Y)
β₀ Intercept — the value of Y when all X’s are 0
Coefficients showing how much Y changes when one X changes
β₁, β₂, β₃, …, βₚ
(while others are constant)
Symbol Meaning
Error term — accounts for random variation or unexplained
ε
factors

Example:
Let’s say we want to predict house prices based on multiple factors:

Here:
• Y: House price (dependent variable)
• X₁: Square footage
• X₂: Number of bedrooms
• X₃: Distance from city center

We want to predict the price of a house (Y) based on:


1. Square footage (X₁) – size of the house
2. Number of bedrooms (X₂) – rooms available
3. Distance from city center (X₃) – in kilometers

Given Sample Data:

Square Bedrooms Distance from City Price (Y) (₹ in


House
Footage (X₁) (X₂) (X₃) lakhs)
1 1000 2 15 50
2 1500 3 10 65
3 2000 3 5 80
4 2500 4 2 95
5 1800 3 8 75

Multiple linear Regression Implementation in R:


#1. Create dataset
house_data <- [Link](
Square_Feet = c(1000, 1500, 2000, 2500, 1800),
Bedrooms = c(2, 3, 3, 4, 3),
Distance = c(15, 10, 5, 2, 8),
Price = c(50, 65, 80, 95, 75)
)
# View dataset
print(house_data)

# 2. Build the multiple linear regression model


model <- lm(Price ~ Square_Feet + Bedrooms + Distance, data = house_data)

# Display the summary of the model


Print(summary(model))

# 3: Predict a new house price


# New house data
new_house <- [Link](
Square_Feet = c(2200, 1600, 2800),
Bedrooms = c(3, 2, 4),
Distance = c(6, 12, 3)
)
# Predict prices using the model
predicted_prices <- predict(model, newdata = new_house)

# Display predictions
cat("Predicted Prices (in lakhs):\n")
print(predicted_prices)

# 4: Plot the observed data


#You can visualize how one variable (e.g., Square_Feet) affects Price, keeping others fixed.
# Plot observed data
plot(house_data$Square_Feet, house_data$Price,
main = "House Price vs Square Footage",
xlab = "Square Feet", ylab = "Price (in lakhs)",
pch = 19, col = "blue")
#5. Add regression line (keeping Bedrooms and Distance constant)
abline(lm(Price ~ Square_Feet, data = house_data), col = "red", lwd = 2)

#3D Visualization (Optional, for multiple predictors)


# Install and load package
#[Link]("scatterplot3d")
library(scatterplot3d)
s3d <- scatterplot3d(
house_data$Square_Feet, house_data$Bedrooms, house_data$Price,
main = "3D Plot: Price vs Square Feet & Bedrooms",
xlab = "Square Feet", ylab = "Bedrooms", zlab = "Price (in lakhs)",
color = "darkblue", pch = 19
)

Output
Square_Feet Bedrooms Distance Price
1 1000 2 15 50
2 1500 3 10 65
3 2000 3 5 80
4 2500 4 2 95
5 1800 3 8 75

Call:
lm(formula = Price ~ Square_Feet + Bedrooms + Distance, data = house_data)

Residuals:
1 2 3 4 5
-0.32895 0.06579 -0.06579 -0.32895 0.65789

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.894737 16.941524 0.761 0.586
Square_Feet 0.033553 0.007644 4.389 0.143
Bedrooms -0.526316 1.915818 -0.275 0.829
Distance 0.328947 0.741410 0.444 0.734

Residual standard error: 0.8111 on 1 degrees of freedom


Multiple R-squared: 0.9994, Adjusted R-squared: 0.9977
F-statistic: 572.2 on 3 and 1 DF, p-value: 0.03072

Predicted Prices (in lakhs):


1 2 3
87.10526 69.47368 105.72368
Step-by-Step Process of Performing Multiple Linear Regression Using
Statistical Methods
Step 1: Formulate the Hypothesis
Define the research question and form a hypothesis about the relationship between the
dependent variable (Y) and multiple independent variables (X₁, X₂, X₃, …).
Example: “House price (Y) depends on square footage (X₁), number of bedrooms (X₂), and
distance from the city center (X₃).”
Step 2: Data Collection and Cleaning
• Collect relevant data for both dependent and independent variables.
• Clean the data by handling missing values, removing outliers, and ensuring that the
data types are appropriate.
Example: Remove houses with missing bedroom information or incorrect price
entries.

Step 3: Explore the Data (EDA – Exploratory Data Analysis)


• Perform exploratory data analysis to understand data distribution and relationships
between variables.
• Use scatter plots, correlation matrices, and summary statistics to visualize and
detect patterns or trends.

Step 4: Model Specification


Formulate the multiple linear regression equation as:

Step 5: Estimate Coefficients


Use statistical techniques, such as the Least Squares Method, to estimate coefficients
(β values).
The goal is to minimize the sum of squared differences between the predicted and actual
values.
Step 6: Assess Model Fit
Evaluate how well the model fits the data by checking:
• R-squared (R²): Explains how much of the variation in Y is explained by Xs.
• Adjusted R-squared: Adjusts for the number of predictors.
• p-values: Check the significance of each variable.
• Residual plots: Ensure no major patterns are left unexplained.

Step 7: Interpret Results


Interpret each coefficient (β):
• A positive β means an increase in X increases Y.
• A negative β means an increase in X decreases Y.
Example: If β₁ = 0.05, it means that for every extra square foot, the house price
increases by 0.05 units.
Step 8: Validate the Model
• Test the model on new or unseen data to ensure reliability.
• Use techniques like cross-validation or train-test splitting to check how well the
model generalizes.

Step 9: Make Predictions

Once validated, use the model to make predictions or forecasts for new data.
Example: Predict the price of a house using its square footage, number of bedrooms, and
distance from the city center.

Linear Model Selection and Diagnostics

Linear Model Selection and Diagnostics is the process of:


• Choosing the best linear regression model for a dataset, and
• Evaluating its performance to ensure that it accurately represents the relationship
between variables.
When we have several independent variables (predictors), not all of them may be useful.
Some may be irrelevant, redundant, or highly correlated with others — making the model
unnecessarily complex. Hence, model selection helps to simplify the model without losing
predictive accuracy.

Steps in Linear Model Selection and Diagnostics


Step 1: Model Specification
• Decide which independent variables (predictors) to include.
• Choose the right form of the model (e.g., linear, quadratic).
Example:
To predict a student’s exam score, you may consider:
• Study hours
• Attendance
• Sleep hours
But not irrelevant factors like “shoe size.”

Step 2: Model Estimation


• Use a method like Least Squares Regression to estimate coefficients.
• This method finds the line of best fit that minimizes the error (difference between
actual and predicted values).
Example:
R finds coefficients for:
Exam Score= β0+β1(Study Hours)+β2(Attendance)+ε

Step 3: Model Evaluation


• Measure how well the model fits the data using metrics like:
o R² (R-squared): How much variance in the dependent variable is explained.
o Adjusted R²: Corrects R² for the number of predictors (helps avoid
overfitting).
o Residual Plots: Check for randomness in residuals (errors).

Step 4: Model Selection


• Compare different models to choose the best one using:
o AIC (Akaike Information Criterion)
o BIC (Bayesian Information Criterion)
o Cross-Validation
Example:
If you have two models, The model with lower AIC/BIC is preferred (better fit with fewer
errors).

Step 5: Model Diagnostics


• Check assumptions of linear regression:
1. Linearity: Relationship between predictors and dependent variable is linear.
2. Normality: Residuals are normally distributed.
3. Homoscedasticity: Equal variance of residuals.
4. Independence: Errors are independent of each other.
Linear model selection and diagnostics are important steps in statistical modelling because
they help ensure that the model is appropriate for the data and that the results are reliable.
There are many tools available for linear model selection and diagnostics, including software
packages such as R, SAS, and Stata.

Advanced Graphics
• Advanced plot customization in statistics involves fine-tuning visual elements of
graphs and charts to convey data effectively.
• It includes adjusting colours, adding annotations, changing axis scales, incorporating
multiple data series, and utilizing various plot types to enhance the representation of
statistical information.
• The advanced graph in statistics is a visual representation of the data.
• Advanced graph customization allows you to tailor every aspect of your
visualizations.
• This can involve adjusting axis limits, changing tick marks, customizing grid lines,
modifying line styles and marker shapes, incorporating annotations, adjusting legends,
applying colour palettes, and even creating interactive or 3D plots.
Characteristics of Advanced Graphs
1. Color and Style
Colour Palettes: Use meaningful colour combinations to make graphs clear and
visually appealing.
Line Styles and Markers: Change line types (solid, dashed, dotted) and markers
(circle, square, triangle) to differentiate data points easily.
2. Annotations (Adding Extra Info to Graphs)
Text Annotations: Add titles, labels, or notes to explain key points in your graph.
Example: Label the highest or lowest value on the chart.
Arrows and Lines: Use arrows or lines to draw attention to specific trends or
important data points.
3. Axis Customization
Tick Marks and Labels: Change how numbers or text appear on the x and y axes for
clarity. Example: Show ticks every 10 units instead of every 1 unit.
Logarithmic Scales: Use when data values cover a wide range (like 1, 10, 100, 1000)
to make graphs easier to read.
4. Legend (Explains the Graph)
Positioning: Move the legend (top, bottom, left, or right) to where it looks neat and
doesn’t cover the data.
Example: Place it at the top for bar charts or at the side for line graphs.
Custom Labels: Provide clear and concise labels for each series in the legend.
5. Faceting and Multiple Plots:
Facet Grids: Use facets to create multiple plots based on different categories; making
it easier to compare subsets of the data.
Arrangement: Adjust the arrangement of subplots to create a coherent and
informative layout.
6. Statistical Summaries:
Error Bars: Include error bars to show variability or uncertainty in data.
Regression Lines: Add regression lines or other statistical summaries to illustrate
trends.
7. Background and Grid Lines:
Background Colour: Customize the background colour to improve contrast and
aesthetics.
Grid Lines: Adjust the appearance of grid lines to guide the viewer’s eye.
8. Export and Save Options:
High-Resolution Output: Ensure that exported graphs are of high resolution for
publications or presentations.
Save in Multiple Formats: Save plots in different formats (PDF, PNG, SVG) for
versatility.

Plot Customization:
Advanced plot customization in advanced graphs involves fine-tuning visual elements to
convey complex data effectively.
Customizing plots in R can be done using various functions and parameters to modify the
appearance of different plot elements. Here’s a basic overview of how you can customize
plots in R:
1. Titles and Labels
2. Colors and Symbols
3. Adding a Legend
4. Axis Customization
5. Lines
6. Annotations
7. Plot Range
8. Defining a Particular Layout

1. Titles and Labels: You can add titles to your plot and label the axes to describe your
data.
• main: Main title of the plot.
• xlab and ylab: Labels for the x and y-axes.

x <- 1:10
y <- sin(x)
plot(x, y, main="Sine Wave", xlab="X-axis", ylab="Y-axis")

Output
2. Colors and Symbols

You can change the colour of points or lines and choose different symbols for
data points.
• col → Changes the color.
• pch → Changes the symbol type (1 = circle, 2 = triangle, etc.).

plot(x, y, col="blue", pch=2, main="Colored Points")


Output

3. Adding a Legend
A legend helps identify what each colour or symbol in the plot represents.
• legend() → Adds a legend box.
Example
plot(x, y, col="red", pch=3)
legend("topright", legend="Data Points", col="red", pch=3)

4. Axis Customization
You can modify the appearance of the X and Y axes — such as tick marks and limits.
• xlim, ylim → Set axis limits.
• xaxt, yaxt → Hide or show axes.
• axis() → Manually customize axis ticks.
Example
plot(x, y, xlim=c(0, 15), ylim=c(-1.5, 1.5), main="Custom Axes")

Output

xaxt and yaxt


These are graphical parameters that control whether the X-axis or Y-axis is drawn
automatically.
• xaxt="n" → Hides the X-axis
• yaxt="n" → Hides the Y-axis
If you hide an axis, you can later redraw it manually using the axis() function.
Example
x <- 1:10
y <- x^2
# Create a plot without axes
plot(x, y, xaxt="n", yaxt="n", main="Custom Axes")
Output:

axis() Function
The axis() function is used to manually add or customize the axes after the plot is created.
Syntax
axis(side, at, labels, col, lty, ...)

Argument Meaning

side Which axis to draw (1 = bottom, 2 = left, 3 = top, 4 = right)

at Where to place tick marks (a numeric vector)

labels What labels to show at those tick marks

col Color of the axis line and labels

lty Line type (solid, dashed, etc.)

# Create a plot without axes


x <- 1:10
y <- x^2
# Create a plot without axes
plot(x, y, xaxt="n", yaxt="n", main="Custom Axes")

# Add customized axes


axis(1, at=seq(0, 10, by=2), col="blue") # Bottom X-axis with ticks every 2 units
axis(2, at=seq(0, 100, by=20), col="red") # Left Y-axis with ticks every 20 units

5. Lines: Add lines to highlight trends or reference points.


Functions:
➢ lines() → The lines() function adds connected line segments to an existing plot.
It’s typically used to draw a trend line or connect data points after plotting them.
➢ abline() → The abline() function adds a straight line to the plot — it can be:
• A horizontal line
• A vertical line
• Or a regression line (slope + intercept)

Syntax line():
lines(x, y, col, lty, lwd)

Parameter Description

x, y The points to connect with a line


Parameter Description

col Line color

lty Line type (1 = solid, 2 = dashed, etc.)

lwd Line width

Example
# Create data
x <- 1:10
y1 <- x^2
lines(x, y1,type="o", col="blue", lty=2, lwd=2)

Syntax abline():
abline(a, b, col, lty, lwd)

Parameter Description

a Intercept of the line (where it crosses the Y-axis)

b Slope of the line

col Line color

lty, lwd Line style and width

x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a horizontal line at y=50
abline(h=50, col="red", lty=2)

x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a horizontal line at y=50
abline(v=5, col="red", lty=2)

x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a regression line
model<-lm(y~x)
abline(model, col="red", lty=2)
6. Annotations: Annotations are extra pieces of information added directly onto a plot —
usually as text, arrows, or labels — to explain or highlight something important in the data.
text(): text() function is used to add text annotations to plot
Example:
x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
text(x, y, labels = y, pos = 1, col = "purple")

6. Plot Range: It controls the minimum and maximum values shown along each axis.
xlim and ylim: Set the range of the x and y axes.
x <- 1:10
y <- x^2
plot(x, y, xlim = c(0, 15), ylim = c(0, 150), main = "Custom Plot Range")
8. Defining a Particular Layout
The arrangement of plots in a single window can be controlled using the layout() function in
R. This function allows you to divide the plotting area into multiple regions (panels),
so you can display multiple plots in one figure.
#1. Creates a matrix from the vector
lay_Mat <- matrix(c(1, 3, 2, 3), 2, 2)
print(lay_Mat)

#2. Divide the graphics device into the panels


# tells R to divide the graphics device into the panels described by lay_Mat.
layout(mat = lay_Mat)
# draws an empty diagram showing the panel arrangement (useful for checking).
[Link](n = 3)
Example
# Example layout: top-left = panel 1, top-right = panel 2, bottom row = panel 3 (spans
both columns)
lay_Mat <- matrix(c(1, 3, 2, 3), nrow = 2, ncol = 2)
print(lay_Mat)
# Create layout
layout(mat = lay_Mat)
# Optional: show empty layout (visual guide)
[Link](n = max(lay_Mat))
# Panel 1 (top-left)
x1 <- 1:10
y1 <- sin(x1)
plot(x1, y1, type = "o", col = "blue", main = "Panel 1: sin(x)",
xlab = "x", ylab = "sin(x)")
# Panel 2 (top-right)
x2 <- 1:10
y2 <- cos(x2)
plot(x2, y2, type = "o", col = "darkgreen", main = "Panel 2: cos(x)",
xlab = "x", ylab = "cos(x)")
# Panel 3 (bottom, spanning both columns)
data3 <- rnorm(200, mean = 0, sd = 1)
hist(data3, main = "Panel 3: Histogram (spans bottom)", xlab = "value", col = "lightpink")

Output
Plotting Regions and Margins
Definition:
Regions are areas of a graph that can be resized, relocated, or formatted with colors, borders,
or other visual properties.
• The figure region is the space containing the axes, labels, and title.
• The outer region (outer margins) is the space around the figure region, useful for
additional text or decorations.
• The margins are space between the plot area and the chart border, of a plot. Margins
can be adjusted with the mar parameter.
Three Main Plot Regions in R Graphics
• Plot Region:
This is the main area where your actual data (points, lines, bars, etc.) is drawn. It uses
the coordinate system defined by the X and Y axes.
• Figure Region:
Contains the plot region along with space for axis labels and titles. Often referred to
as figure margins.
• Outer Region:
The area outside the figure region. used for additional annotations, global titles, or
legends that apply to multiple plots.

1. Plotting Regions
Use rect() to draw rectangles or regions on a plot.
Example:
# Draw a rectangle from (1, 2) to (4, 6)
plot(1:10, type = "n") # Create an empty plot
rect(1, 2, 4, 6, col = "lightblue", border = "blue") Plot Type Meaning

"p" Points Draws only points (default).

"l" Lines Connects data points with straight lines.

Draws both points and lines, but with


"b" Both
points slightly separated from lines.

Draws stair-step lines (horizontal then


"s" Steps
vertical).

No points or lines — creates an empty


"n" None plot (useful for adding elements
manually).

2. Plotting Margins
Margins are the spaces between the chart border and the plot area (canvas). You can adjust
margins on any of the four sides: bottom, left, top, and right. Use the par() function with the
mar parameter to change the margin sizes.
The values represent the number of text lines used as margin space.
par(mar = c(bottom, left, top, right))

Example
par(mar = c(6, 2, 4, 5)) # Set margins (bottom, left, top, right)
plot(1:10)
Outer margin (oma)
• The outer margins are especially useful for adding text to a combination of plots.
• To access the current outer margins with par("oma").
In this example we set
par(oma = c(6, 1, 2, 7) )
plot(1:20,col="green")

This means:
• Six lines are displayed below the plot,
• One on the left,
• Two on the top, and
• Seven on the right.

3. Default Spacing: Default Figure Margin Settings in R


You can check the default figure margin settings using par() function
1. par()$oma # Gives the default outer margin
[1] 0 0 0 0
Here,
oma = c(0, 0, 0, 0) → There is no outer margin set by default.

2. par()$mar #Gives the default margin space


[1] 5.1 4.1 4.1 2.1
The default figure margin space is
mar = c(5.1, 4.1, 4.1, 2.1)
• 5.1 lines of text on the bottom,
• 4.1 on the left and top, and
• 2.1 on the right.
4. Custom Spacing
We can customize the outer margin and figure margin. Text can be added to the margin using
mtext() function in R. The mtext() function adds text in the margins of a plot — not inside
the plotting area.

Syntax of mtext()
mtext(text, side, line = 0, outer = FALSE, at = NA, adj=NA, col = NA, font = NA, ...)

Argument Description
text The text string you want to add (e.g., "My Label").
The side of the plot where text should appear: 1 = bottom, 2 = left, 3 = top, 4 =
side
right.(Default side is 3)
The line (distance) from the plot margin where the text is placed. Increasing this
line
moves the text farther away from the plot.
Logical (TRUE or FALSE). If TRUE, text is placed in the outer margin
outer
(outside the figure region).
at Position along the axis (e.g., where along x or y to place the text).
adj Horizontal alignment of text (0 = left, 0.5 = center(default), 1 = right).
col Text color.
font Font style (1 = plain, 2 = bold, 3 = italic, 4 = bold italic).
... Other graphical parameters like family, las, etc.

Example
par(oma=c(1,4,3,2), mar=c(4:7))
plot(1:10)

box("figure", lty=2,col="orange")
box("outer", lty=3,col="darkblue")

mtext("Figure region margins (mar[])", line=2, col="red")


mtext("Outer region margins (oma[])", line=0.5, outer=TRUE,col="green")
5. Clipping

Clipping means restricting (or “cutting off”) anything that is drawn outside a specific area
of the plot. In other words, Clipping controls where R is allowed to draw things (like points,
lines, or legends). You can control clipping behaviour using the parameter xpd inside par()
function.
Setting Meaning
xpd = FALSE (Default) Clipping is ON — drawings are restricted to the plot region.
Clipping is OFF — allows drawing outside the plot region (like into
xpd = TRUE
margins or the outer area).
xpd = NA Drawings are restricted to the figure region (a bit larger than the plot).

Example1
par(xpd = TRUE) # Allow drawing outside plot region
plot(1:10, 1:10, main = "xpd = TRUE Example")
text(10, 11, "Outside!", col = "red", cex = 1.2)
Example2

# Sample dataset
sample_data <- [Link](
x = 1:10,
y = c(3, 5, 7, 6, 9, 8, 10, 12, 11, 13),
group = rep(1:2, each = 5) # Two groups: 1 and 2
)
print(sample_data)
# Set margin and allow drawing outside the plot
par(mar = c(4, 4, 3, 4), xpd = TRUE)

# Draw scatter plot


plot(sample_data$x, sample_data$y,
col = sample_data$group, # Different colors for each group
pch = 16, # Solid circle points
main = "Scatter Plot with Legend Outside",
xlab = "X values",
ylab = "Y values")

# Add legend outside the plot


legend("topright",
inset = c(-0.18, 0), # Pushes legend outside (right side)
legend = c("Group 1", "Group 2"),
pch = 16, # Same point style as the plot
col = 1:2, # Same colors
title = "Groups")
Point-and-Click Coordinate Interaction:
This feature lets you interact with a graph using your mouse for example, by clicking on
points in the plot to get their exact coordinates or to add notes, labels, or other actions.
• The locator() function is used for point and click coordinate interaction.
It allows you to click anywhere on the plot, and R will return the (x, y) coordinates
of where you clicked.
• This is very helpful when you want to know the exact position of points on a graph
or when you want to add annotations interactively.
Note: R can detect and read your mouse clicks inside the plotting window.

Features of Point-and-Click Coordinate Interaction:


1. Retrieving Coordinates Silently
2. Visualizing Selected Coordinates
3. Ad Hoc Annotation

1. Retrieving Coordinates Silently


• The locator() function lets you find and return coordinates of points you click on a
plot.
• When you run locator(n), R waits for you to click n times on the plot and then gives
the (x, y) coordinates of the points clicked.
• It is called silently because R doesn’t draw anything unless you tell it to — it just
records the coordinates.
plot(10,20)
point_clicked<-locator(5) # Click 5 times on the plot
print(point_clicked)
Output
$x
[1] 8.394841 11.908021 11.272014 7.486259 10.060573
$y
[1] 20.84307 20.47967 15.90094 24.25895 26.14859

2. Visualizing Selected Coordinates


• You can display the points or lines at the coordinates you clicked using locator()
function.
• This helps to visually see where the clicks occurred.
• You can specify the type of marker or line, its color, and style.
Syntax:
locator(n = 512, type = "n", ...)
Parameter Description
Number of points you want to click on the plot. Default = 512 (you can stop
n
early by pressing Esc).
Specifies what should be drawn when you click:
"n" → nothing (silent)
type "p" → points
"l" → lines
"o" → both points and lines
Additional graphical arguments (like col, pch, lty, lwd, etc.). These affect how
...
the points or lines look when drawn.

Example
plot(1, 1)
lst <- locator(type = "o", pch = 4, lty = 2, lwd = 3, col = "red")
print(lst)

Here:
• type = "o" → draws both points and lines
• pch = 4 → symbol shape
• lty = 2 → dashed line
• col = "red" → color of the line/points
Output

$x
[1] 0.6305104 0.6956254 0.7637690 0.6305104
$y
[1] 0.9767428 1.2892597 1.0094480 0.9767428

3. Ad Hoc Annotation
• The locator() function can also be used for interactive annotations — like adding
legends or labels at a specific spot.
• Since it returns valid (x, y) coordinates, you can use those coordinates to position text
or legends dynamically.
Example 1
# Step 1: Create a simple scatter plot
x <- 1:10
y <- c(2, 3, 5, 4, 7, 8, 6, 9, 10, 11)
plot(x, y, pch = 19, col = "blue",
main = "Click on a point to add a label",
xlab = "X-axis", ylab = "Y-axis")

# Step 2: Wait for user to click once


pt <- locator(1) # Click once on the plot

# Step 3: Add text where user clicked


text(pt$x, pt$y, labels = "You clicked here!", col = "red", pos = 1)

Output
How This Works
1. The plot appears with blue dots.
2. R waits for you to click once (because of locator(1)).
3. Wherever you click, R saves the x and y coordinates in pt.
4. The text() function then writes “You clicked here!” exactly at that position.

Example 2: Add Text Dynamically Using Mouse Clicks and a Loop


# Step 1: Create a simple scatter plot
x <- 1:10
y <- c(2, 3, 5, 4, 7, 8, 6, 9, 10, 11)
plot(x, y, pch = 19, col = "blue",
main = "Click to Add Text Labels Dynamically",
xlab = "X-axis", ylab = "Y-axis")

# Step 2: Ask user how many points they want to label


n <- [Link](readline(prompt = "How many points do you want to label? "))

# Step 3: Loop to allow n clicks and label each point

i <- 1
while (i <= n) {
cat("Click", i, "on the plot...\n")
pt <- locator(1)
label <- paste("Point", i)
text(pt$x, pt$y, labels = label, col = "red", pos = 4)
cat("You clicked at: X =", pt$x, ", Y =", pt$y, "\n")
i <- i + 1
}
cat(" Done! All points labeled.\n")

Output
How many points do you want to label? 6
Click 1 on the plot...
You clicked at: X = 1.01956 , Y = 2.068436
Click 2 on the plot...
You clicked at: X = 2.024678 , Y = 2.96783
Click 3 on the plot...
You clicked at: X = 3.046832 , Y = 4.971027
Click 4 on the plot...
You clicked at: X = 3.96677 , Y = 4.030751
Click 5 on the plot...
You clicked at: X = 4.954852 , Y = 7.015106
Click 6 on the plot...
You clicked at: X = 6.045149 , Y = 7.914501
Done! All points labeled.

Customizing Traditional R Plots

You can customize traditional plots extensively. R allows us to customize plots to make them
more colourful, neat, and informative. You can change colours, point shapes, labels, axes, and
boxes to make your graphs look better and easier to read.

Types of Customizing Traditional R Plots


1. Graphical Parameters for Style and Layout
2. Customizing Boxes
3. Customizing Axes

1. Graphical Parameters for Style and Layout


You can control how your plot looks using the par() function in R. This function controls how
your plot looks on the screen or when printed.
par() Parameters:
Parameter Purpose
mar Set plot margins (bottom, left, top, right)
cex Controls size of text and symbols
col Sets color for points, lines, text
pch Sets symbol style for points (e.g., circle, triangle)
mfrow or mfcol Creates multiple plot layout in one screen
las Controls direction of axis labels

Example
Print(
hp <- mtcars$hp
mpg <- mtcars$mpg
wtex <- mtcars$wt / mean(mtcars$wt)
# Adjust margins using par()
par(mar = c(5, 4, 4, 2)) # (bottom, left, top, right)

plot(hp, mpg, cex = wtex, col = "blue",


main = "Horsepower vs MPG",
xlab = "Horsepower", ylab = "Miles per Gallon")

Note: Each point’s size (cex = wtex) is proportional to car weight.


This means heavier cars show as bigger dots!

Output
Mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

2. Customizing Boxes
You can add or change the box around your plot using the box() function. The box() function
in R is used to draw a box around the current plot region. It helps highlight the plotting area
or customize its boundary style.
Syntax:
box(which = "plot", lty = "solid", lwd = 1, col = "black", bty = "o")

Argument Description
which Specifies which box to draw — "plot", "figure", or "inner" (default = "plot")
lty Line type (1 = solid, 2 = dashed, etc.)
lwd Line width
col Box color
Box type (same as in par()):
• "o" = full box (default)
bty • "l" = L-shaped box
• "7", "c", "u", "]" = different open-sided styles
• "n" = no box

3. Customizing Axes
The axis() function lets you control how and where axes appear on the plot.
You can specify which side of the plot the axis should appear on and customize tick marks
and labels.
Syntax:
axis(side, at = NULL, labels = TRUE)

Parameters:
• side: Defines which side to draw the axis on.
o 1 = bottom
o 2 = left
o 3 = top
o 4 = right
• at: Points at which to draw tick marks.
• labels: Labels to display at tick marks (TRUE by default).

Example
x <- 1:5
y <- x * x

# Create plot without axes


plot(x, y, axes = FALSE)

# Add custom axes


axis(side = 1, at = 1:5, labels = LETTERS[1:5],col="red") # bottom axis
axis(side = 2,col="blue") # left axis

Specialized Text and Label Notation

R provides tools to control fonts, add mathematical symbols, and use colors in plots to make
them more visually appealing and informative.

1. Specialized Text and Label Notation


These functions allow you to control fonts and display special characters, such as Greek
symbols or mathematical formulas.

a. Font Control

You can change the font appearance in R using:


• family → selects font type (e.g., sans, serif, mono)
• font → sets style (normal, bold, italic, etc.)
Font Family Description
"sans" Default font
Font Family Description
"serif" Traditional font with strokes
"mono" Monospaced font (like typewriter)

Font Value Style


1 Normal text
2 Bold
3 Italic
4 Bold + Italic

Example
par(mar = c(3, 3, 3, 3))
plot(1, 1, type = "n", xlim = c(-1, 1), ylim = c(0, 7), xaxt = "n", yaxt = "n", ann = FALSE)
text(0.6, 6, "Sans text (default)", family = "sans", font = 1)
text(0.5, 5, "Serif text Bold", family = "serif", font = 2)
text(0.4, 4, "Mono text Italic", family = "mono", font = 3)
text(0.3, 3, "Mono bold italic", family = "mono", font = 4)

b. Greek Symbols

To display Greek letters or symbols, use the expression() function.


par(mar = c(3, 3, 3, 3))
plot(1, 1, type = "n", xlim = c(-1, 1), ylim = c(0.5, 4.5), xaxt = "n", yaxt = "n", ann =
FALSE)
text(0.4, 4, expression(alpha), cex = 1.5)
text(0.3, 3, expression(paste("sigma: ", sigma, " Sigma: ", Sigma)), cex = 1.5)
text(0.2, 2, expression(paste(beta, ", ", gamma, ", ", phi)), cex = 1.5)
text(0.1, 1, expression(paste(Gamma, "(", tau, ") = 24 when ", tau, " = 5")), cex = 1.5)
title(main = expression(paste("Greek Symbols: ", epsilon, ", ", kappa)), [Link] = 2)

Note:
• expression() helps display symbols or equations.
• paste() combines text and symbols together.

c. Mathematical Expressions
You can show full mathematical formulas using expression() and paste().

Example

expr1 <- expression(c^2 == a[1]^2 + b[1]^2)


expr2 <- expression(paste(pi^x, " (1 - pi)^y"))
expr3 <- expression(paste("Sample mean = ", italic(n)^(-1), sum(italic(x)[italic(i)]),
italic(i)==1, italic(n)))
expr4 <- expression(paste("f(x|", alpha, ",", beta, ") = ", frac((alpha-1)*(beta-1)),
B(alpha, beta)))

plot(1:10, 1:10,type = "n", xlab = "", ylab = "",


xaxt = "n", yaxt = "n",
main = "Mathematical Expressions")
text(6, 8, expr1, cex = 1.3, col = "blue")
text(6, 6, expr2, cex = 1.3, col = "darkgreen")
text(6, 4, expr3, cex = 1.3, col = "purple")
text(6, 2, expr4, cex = 1.3, col = "red")

Output

Defining colours and plotting in higher Dimension

Defining Colors in R

R supports many ways to represent and apply colors in plots.


1. Named Colors
R includes many built-in color names like:
"red", "blue", "green", "yellow", "black", "white"
Example
plot(1:10, col = "red" , pch = 16, main = "Named Color")
2. Hexadecimal Color Codes
Hexadecimal color codes combine red, green, and blue (RGB) values.
They start with # followed by six characters.

Color Hex Code


Red "#FF0000"
Green "#00FF00"
Blue "#0000FF"
Orange "#FFA500"
Purple "#800080"

Example
plot(1:10, col = "#FFA500", pch = 16, main = "Hexadecimal Color")

3. RGB Values
Use the rgb() function to mix red, green, and blue values (0–1 range).

Color Code

Red rgb(1, 0, 0)

Green rgb(0, 1, 0)

Blue rgb(0, 0, 1)

Yellow rgb(1, 1, 0)

Purple rgb(0.5, 0, 0.5)


Example
plot(1:10, col = rgb(0.2, 0.5, 0.8), pch = 16, main = "RGB Color")

4. HSL Values
hcl() uses Hue, Saturation, and Lightness (0–360°, 0–1, 0–1).

Color Code

Red hcl(0, 1, 0.5)

Green hcl(120, 1, 0.5)

Blue hcl(240, 1, 0.5)

Yellow hcl(60, 1, 0.5)

Purple hcl(300, 1, 0.5)

Example:
plot(1:10, col = hcl(240, 100, 50) , pch = 16, main = "HSL (HCL) Color")
Plotting in Higher Resolution
Plotting in higher dimensions often involves techniques like 3D plots, color mapping, facet
grids, or even interactive visualizations to represent more than three dimensions effectively.
Representing and Using Colour: Colour plays an important role in data visualization.
It can enhance aesthetics or aid interpretation by distinguishing values and variables.
Understanding how R handles colour helps in effective visualization.
1. RGB Color Representation
➢ Colors are defined by combinations of three primary colors: Red, Green, Blue
(RGB).
➢ Each component has an intensity from 0 to 255.
➢ RGB combinations yield 16,777,216 (256³) possible colors.
Expressed as (R, G, B) triplets:
(0, 0, 0) → black
(255, 255, 255) → white
(0, 255, 0) → green

Using Colors in R
✓ R includes 650+ named colors accessible via colors() function.
colors()
[1] "white" "aliceblue "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
[7] "antiquewhite4" "aquamarine" "aquamarine1"
[10] "aquamarine2" "aquamarine3" "aquamarine4"
[13] "azure" "azure1" "azure2"
✓ Default R palette can be viewed or set using palette() function.
To view the current default palette call function palette().
>palette()
[1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5"
"#CD0BBC" "#F5C710" "gray62"
To set a custom palette pass a vector of color names or hexadecimal
codes to palette() function.
>palette(c(“red”,”green”,”blue”,”yellow”))

✓ Use col2rgb() to get RGB values for color names.


col2rgb(c("black", "green3", "pink"))
[,1] [,2] [,3]
red 0 0 255
green 0 205 192
blue 0 0 203

Hexadecimal (Hex) Color Codes

• Hexadecimal colour codes are numeric representations of RGB values.


• Format of Hexadecimal colour codes: #RRGGBB
o Each pair (RR, GG, BB) represents red, green, and blue
components.
o (RR, GG, BB) Values use digits 0–9 and letters A–F.
• To convert RGB triplets to hex in R:

> rgb(t(col2rgb(c("black", "green3", "pink"))), maxColorValue = 255)


[1] "#000000" "#00CD00" "#FFC0CB"

This shows the hex codes for "black", "green3", and "pink" are "#000000" "#
00CD00" "#FFC0CB" respectively.

Built-in Palettes:
When you make plots in R, you often want to use different colours to make
them look clear and attractive. A colour palette is a group of colours that R can
generate for you automatically.
Why Use Palettes?
• To create smooth colour transitions (e.g., heat maps, elevation maps).
• To represent values that change gradually.
• To choose many colours easily without selecting them manually.

Function Use Example

rainbow(n) Bright rainbow-like colors rainbow(10)

[Link](n) Colors from red → yellow (heat-like) [Link](10)


Function Use Example

[Link](n) Earth/terrain tones (greens, browns) [Link](10)

[Link](n) Colors for topographic maps [Link](10)

[Link](n) Cyan-magenta colors [Link](10)

[Link](n) Shades of gray [Link](10)

gray(level) Single grayscale color gray(0.5)

Example
plot(1:10, col = rainbow(10), pch = 19)

Custom Palettes in R
Sometimes the built-in color palettes (like rainbow(), [Link](),
[Link](), etc.) are not enough. In such cases, you can create your own color
palette using the colorRampPalette() function.

What is colorRampPalette()?
• It is a function that creates color gradients between two or more colors.
• You provide a vector of color names, and it returns a new function.
• This new function can generate any number of colors smoothly
transitioning between the given ones.

Syntax
palette_function <- colorRampPalette(colors = c("color1", "color2", ...))

Then, to generate colors:

palette_function(N) # N = number of colors you want

Example 1
To generate Purple to Yellow Gradient
[Link] <- colorRampPalette(colors = c("purple", "yellow"))
plot(1:10, col = [Link](10), pch = 19)

Example 2
To generate red, orange, yellow green gradient
red_o <- colorRampPalette(colors = c("red", "orange","yellow","green"))
plot(1:30,col=red_o(30),pch=19,cex=2)

3D SCATTER PLOTS:

A 3D scatterplot is a type of graph that allows you to plot three continuous


variables at once — one for each axis (x, y, and z). It helps you visualize how three
variables relate to each other.
Why Use 3D Scatterplots?
• To represent relationships among three numeric variables
• To identify clusters or patterns that might not appear in 2D plots
• To visualize complex data in an intuitive way

In R, the scatterplot3d package is used to create 3D scatterplots.


The syntax is very similar to the normal plot() function — you just add a z-axis for
the third variable.
Syntax
scatterplot3d(x, y, z)
where
x → values for x-axis(left to right)
y → values for y-axis(foreground to background)
z → values for z-axis(bottom to top)

Example: 3D Scatterplot with Sample Data

Step 1: Install the scatterplot3d package


[Link]("scatterplot3d")

Step2: Load the scatterplot3d package


library(scatterplot3d)

Step 3: Create Sample Data


[Link](123)
height <- round(rnorm(20, mean = 165, sd = 10), 1)
weight <- round(rnorm(20, mean = 60, sd = 8), 1)
age <- round(runif(20, min = 18, max = 25), 0)

# Combine into a data frame


students <- [Link](height, weight, age)
print(students)
Step 4: Create a 3D Scatterplot
scatterplot3d(
x = students$height,
y = students$weight,
z = students$age,
color = "blue",
pch = 19, # solid circles
main = "3D Scatterplot of Students' Data",
xlab = "Height (cm)",
ylab = "Weight (kg)",
zlab = "Age (years)"
)
Visual Enhancements
When creating 3D scatterplots, it can sometimes be hard to see the depth of points —
especially when they overlap or look similar.
To make the plot clearer and easier to read, we can add visual enhancements such as:
• Colouring points to show depth
• Drawing lines perpendicular to the x–y plane
• Adjusting line styles for better readability

1. Change the shape and colour of points :


The argument pch and color can be used to change the shape and colour of the
point.
Example 1
library(scatterplot3d)
x <- 1:10
y <- x
z <- x^2
scatterplot3d(
x, y, z,
color=rainbow(10), pch=c(1:10), [Link] = 1.5
)
Example 2
library(scatterplot3d)

# Create a simple dataset


[Link](123) # for reproducibility
height <- round(rnorm(20, mean = 165, sd = 10), 1) # in cm
weight <- round(rnorm(20, mean = 60, sd = 8), 1) # in kg
age <- round(runif(20, min = 18, max = 25), 0) # in years

# Combine into a data frame


students <- [Link](height, weight, age)
print(students)
shapes<-c(1:20)
color=[Link](20)

scatterplot3d(
x = height,
y =weight,
z = age,
color=color,
pch=shapes,
xlab = "Height",
ylab = "Weight",
zlab = "Age",
main = "Height,Weight & Age"
)

2. Change the point shapes and colour by group:


library(scatterplot3d)

# Create a simple dataset


[Link](123) # for reproducibility
height <- round(rnorm(20, mean = 165, sd = 10), 1) # in cm
weight <- round(rnorm(20, mean = 60, sd = 8), 1) # in kg
age <- round(runif(20, min = 18, max = 25), 0) # in years

# Combine into a data frame


students <- [Link](height, weight, age)
print(students)

# Assign Shapes Based on Age Groups


shapes <- ifelse(age <= 20, 16, # Circle
ifelse(age <= 23, 17, # Triangle
15)) # Square

# Assign colors Based on age


color <- ifelse(age <= 20, "green", # Circle
ifelse(age <= 23, "red", # Triangle
"blue"))

# 3D Scatter Plot
scatterplot3d(
x = height,
y = weight,
z = age,
color = color,
pch = shapes,
xlab = "Height",
ylab = "Weight",
zlab = "Age",
main = "Height, Weight & Age (Shape by Age Group)"
)

# Add Legend
legend("topright",
legend = c("18-20 (Circle)", "21-23 (Triangle)", "24-25 (Square)"),
pch = c(16, 17, 15),
col = c("green","red","blue"))
3. Remove the box around the plot
We can remove the box around the plot by setting “box” parameter of scatterplot3d()
function to false.

library(scatterplot3d)
x <- 1:10
y <- x
z <- x^2
scatterplot3d(x, y, z,color=rainbow(10),pch=c(1:10),box=FALSE)

4. Adding Vertical lines to the plot:


When you use the argument type = "h" inside scatterplot3d(), the function draws vertical
lines from each point (x, y, z) down to the base plane (z = 0).

Example 1
library(scatterplot3d)

x <- 1:10
y <- x
z <- x^2
scatterplot3d(
x, y, z,
type = "h", # adds vertical lines from points to base
[Link] = 2, # dashed lines for verticals
highlight.3d = TRUE
)

Example2
library(scatterplot3d)

# Create a simple dataset


[Link](123) # for reproducibility
height <- round(rnorm(20, mean = 165, sd = 10), 1) # in cm
weight <- round(rnorm(20, mean = 60, sd = 8), 1) # in kg
age <- round(runif(20, min = 18, max = 25), 0) # in years

# Combine into a data frame


students <- [Link](height, weight, age)
print(students)
scatterplot3d(
x = height,
y =weight,
z = age,
highlight.3d = TRUE,
type = "h",
[Link] = 2,
[Link] = 3,
xlab = "Height",
ylab = "Weight",
zlab = "Age",
main = "Height,Weight & Age"
)
Argument Purpose

highlight.3d = Adds color shading from red to black to emphasize 3D


TRUE depth (based on y-axis position).

Draws vertical lines from each point down to the x–y


type = "h" plane — helps show height and position clearly.

[Link] = 2 Makes the “h” lines dashed instead of solid (line type 2).

[Link] = 3 Makes the non-visible box sides dotted (line type 3).

xlab, ylab, zlab Label the three axes.

main Adds a title to the plot.

You might also like