Understanding Linear Regression Basics
Understanding Linear Regression Basics
Purpose of regression
• Predicting or estimating the value of the dependent variable based on the values of
one or more independent variables.
• Understanding the strength and nature of the relationship between variables.
• Making inferences about the population based on sample data.
Regression Analysis
Regression analysis means creating a model that shows how one thing (the dependent
variable) changes when other things (independent variables) change.
It helps us check how strong the relationship is, see which factors are important, and
make predictions from data.
It is used in many fields (like business, economics, health, and science) to:
• Understand relationships between variables.
• Predict future outcomes.
• Help in making better decisions based on data.
Dependent variable (Y): This is the main thing you want to predict or study. It is also
called the outcome or response variable.
Independent Variable (X): The independent variable is the predictor variable — the one
that causes or influences changes in another variable (the dependent variable).
In other words, you control or observe this variable to see how it affects the outcome.
An Example of a Linear Relationship:
Let's consider an example illustrating a linear relationship between two variables, such as the
relationship between a person's age and their reaction time.
Variables: Dependent Variable (Y): Reaction time (in milliseconds) response variable - this
is what we want to predict or understand.
Independent Variable (X): Age of the person (in years) - this is the predictor variable.
Assumption: We assume that younger individuals tend to have faster reaction times, and as
people age, their reaction times might increase.
Data Collection: Gather data on reaction times and ages from individuals across different
age groups.
Analysis: After collecting the data, plot the reaction time against ages on a scatter plot.
If a linear relationship exists, the points might tend to form a pattern where, on average, as
age increases, reaction time also tends to increase or decrease in a linear manner.
Linear Relationship:
A positive linear relationship would imply that as age increases, reaction time increases.
A negative linear relationship would suggest that as age increases, reaction time decreases.
Modelling:
• Linear regression is used to fit a straight line to data and measure the relationship
between two variables — in this example, age and reaction time.
• By studying this relationship, we can check whether there is a correlation between
age and reaction time, and whether that relationship is linear (increases or decreases
steadily).
• This helps us understand how one variable affects the other — for instance, how
reaction time changes as age increases. This example shows how linear relationships
between variables can be explored, visualized, and modelled using statistics.
• The goal is to find the best-fitting straight line through the data points.
• This line shows how the dependent variable (Y) changes as the independent variable
(X) changes.
• The line is chosen so that the sum of squared errors (differences between actual and
predicted Y values) is as small as possible.
Example:
Suppose you want to study how hours studied (X) affect exam marks (Y).
2 50
4 60
Hours Studied (X) Exam Marks (Y)
6 70
8 80
Here:
• X = Hours studied → Independent variable
• Y = Exam marks → Dependent variable
After applying regression, you might get:
This means:
• Intercept (β₀) = 40 → if a student doesn’t study (X=0), they still get 40 marks.
• Slope (β₁) = 5 → for every extra hour of study, marks increase by 5 points.
2. Straight-Line Relationship
• The relationship between the independent and dependent variables is assumed to be
linear.
• The formula is:
• The residual is the difference between the actual value and the predicted value.
• The regression line is chosen in such a way that the sum of squared residuals is as
small as possible.
Think of it as: Finding the line that fits the data points “best.”
4. Statistical Inference:
• Regression allows us to test statistically whether there is a real relationship between X
and Y.
• We test if the slope (β₁) is significantly different from zero.
• This helps check if changes in X actually affect Y or if it’s just random.
Example: Testing if hours studied really impact marks, or if it’s just coincidence.
5. Assumptions
For simple linear regression to work correctly, we assume:
1. The relationship between X and Y is linear.
2. The residuals (errors) are normally distributed.
3. Residuals have constant variance (called homoscedasticity).
4. Observations are independent of each other.
Example: In a dataset of students’ scores, one student’s marks should not depend on
another’s.
6. Interpretation of Coefficients:
• The slope coefficient (β₁) represents the change in the dependent variable for a one-
unit change in the independent variable.
• The intercept (β₀) represents the value of the dependent variable when the
independent variable is zero (if such an interpretation is meaningful in the context).
Example:
If the equation is Marks=40+5×Study Hours:
β₀ = 40: Expected marks when study hours = 0.
β₁ = 5: Marks increase by 5 for each extra hour of study.
7. Prediction
• Once the model is built, you can predict Y for any given X (within your data range).
Example:
If a student studies 6 hours, predicted marks = 40+5×6=70.
The goal of simple linear regression is to estimate the values of β₀ and β₁ that minimize
the sum of squared differences between the observed Y values and the values predicted
by the model for given X values.
Where:
Problems:
X = {2, 3, 5, 7, 9}
Y = {4, 5, 7, 10, 15}
Step 1: Calculate the Mean
Σ 49.8 32.8
Interpretation:
For every increase of 1 unit in X, Y increases by 1.52 units.
When X = 0, the predicted Y value is 0.3.
Implementation of Simple Linear Regression in R
Example Data:
Let’s use the same dataset:
2 4
3 5
5 7
7 10
9 15
Part Meaning
lm() The function that performs linear regression.
Formula that defines the relationship between variables. Read
y~x as “Y depends on X”.Here, y = dependent variable, x =
independent variable.
Tells R to look for the variables x and y inside the data frame
data = data
named data.
The <- operator stores the model’s output in an object named
model <-
model. You can later print or summarize it.
Output
Call:
lm(formula = y ~ x, data = data)
Coefficients:
(Intercept) x
0.3049 1.5183
#summary output
Call:
lm(formula = y ~ x, data = data)
Residuals:
1 2 3 4 5
0.6585 0.1402 -0.8963 -0.9329 1.0305
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3049 1.0435 0.292 0.7892
x 1.5183 0.1800 8.434 0.0035 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Explanation:
• coef(model)[1] gives β₀ (intercept)
• coef(model)[2] gives β₁ (slope)
# Display predictions
cat("Predicted Y values:\n")
print(predictions)
Explanation:
We use the predict() function to find new Y values based on the regression line.
Explanation:
• plot() draws the scatter plot of actual data.
• abline(model) automatically adds the best-fit regression line.
Interpretation:
• R² (R-squared) tells how well the regression line fits the data.
• Value close to 1 → excellent fit.
• Example: R-squared = 0.98 means 98% of variation in Y is explained by
X. That’s an excellent fit!
Component Result
Intercept (β₀) 0.3
Slope (β₁) 1.52
Regression Equation Ŷ = 0.3 + 1.52X
R² Value ≈ 0.98
Example Prediction If X = 6 → Ŷ = 9.42
Interpretation
This means that for every 1-unit increase in X, Y increases by approximately
1.52 units.
When X = 0, the predicted value of Y is around 0.3.
The model fits the data very well (since R² ≈ 0.98).
Complete R program (Simple Linear Regression)
# Create data vectors
x <- c(2, 3, 5, 7, 9) # Independent variable
y <- c(4, 5, 7, 10, 15) # Dependent variable
# R-squared value
r2 <- summary(model)$[Link]
cat("R-squared:", r2, "\n")
Output
[1] "Data:"
x y
1 2 4
2 3 5
3 5 7
4 7 10
5 9 15
Call:
lm(formula = y ~ x, data = data)
Coefficients:
(Intercept) x
0.3049 1.5183
Call:
lm(formula = y ~ x, data = data)
Residuals:
1 2 3 4 5
0.6585 0.1402 -0.8963 -0.9329 1.0305
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.3049 1.0435 0.292 0.7892
x 1.5183 0.1800 8.434 0.0035 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Predicted Y values:
1 2 3
6.378049 9.414634 12.451220
R-squared: 0.9595301
Multiple Linear Regression
Key Points:
• It is an extension of Simple Linear Regression.
• It helps to determine the mathematical relationship among multiple predictor
(independent) variables and one outcome (dependent) variable.
• Also known as Multilinear Regression, it is used to analyze how several factors
together affect one response variable.
• The goal of multiple linear regression is to create a model that best fits the
relationship between the dependent and independent variables. It helps in predicting
or understanding how changes in the independent variables affect the dependent
variable.
Example:
Let’s consider a real-life example of multiple linear regression: Imagine you’re trying to
predict house prices based on various factors such as square footage, number of bedrooms,
and distance from the city centre.
Solution: In the case of multiple linear regression for predicting house prices,
Dependent Variable: House price.
Independent Variables (Predictors):
1. Square Footage: The size of the house in square feet.
2. Number of Bedrooms: The count of bedrooms in the house.
3. Distance from City Centre: How far the house is from the city centre.
Multiple Linear Regression can be used to predict house prices using various predictors like
square footage, number of bedrooms, and distance from the city centre.
The model analyzes how each of these independent variables influences the dependent
variable (house price). It helps in understanding how changes in property features affect the
price and can be used for future predictions based on new data.
Multiple Linear Regression is widely used in various fields for prediction and analysis:
Field Application
Forecasting GDP, inflation, or unemployment using economic
Economics
indicators.
Measuring the impact of pricing, advertising, and demographics on
Marketing
sales.
Studying how age, habits, and health factors affect disease
Healthcare
outcomes.
Finance Predicting stock prices, asset values, and company performance.
Environmental Estimating pollution levels or air/water quality using weather and
Science geographical data.
In Simple Linear Regression, we study the relationship between one independent variable
(X) and one dependent variable (Y).
However, in real life, the outcome (Y) often depends on many factors, not just one.
The equation for a multiple linear regression model is:
Where
Symbol Meaning
Y Dependent variable (the outcome we want to predict)
X₁, X₂, X₃, …, Xₚ Independent variables (the factors affecting Y)
β₀ Intercept — the value of Y when all X’s are 0
Coefficients showing how much Y changes when one X changes
β₁, β₂, β₃, …, βₚ
(while others are constant)
Symbol Meaning
Error term — accounts for random variation or unexplained
ε
factors
Example:
Let’s say we want to predict house prices based on multiple factors:
Here:
• Y: House price (dependent variable)
• X₁: Square footage
• X₂: Number of bedrooms
• X₃: Distance from city center
# Display predictions
cat("Predicted Prices (in lakhs):\n")
print(predicted_prices)
Output
Square_Feet Bedrooms Distance Price
1 1000 2 15 50
2 1500 3 10 65
3 2000 3 5 80
4 2500 4 2 95
5 1800 3 8 75
Call:
lm(formula = Price ~ Square_Feet + Bedrooms + Distance, data = house_data)
Residuals:
1 2 3 4 5
-0.32895 0.06579 -0.06579 -0.32895 0.65789
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.894737 16.941524 0.761 0.586
Square_Feet 0.033553 0.007644 4.389 0.143
Bedrooms -0.526316 1.915818 -0.275 0.829
Distance 0.328947 0.741410 0.444 0.734
Once validated, use the model to make predictions or forecasts for new data.
Example: Predict the price of a house using its square footage, number of bedrooms, and
distance from the city center.
Advanced Graphics
• Advanced plot customization in statistics involves fine-tuning visual elements of
graphs and charts to convey data effectively.
• It includes adjusting colours, adding annotations, changing axis scales, incorporating
multiple data series, and utilizing various plot types to enhance the representation of
statistical information.
• The advanced graph in statistics is a visual representation of the data.
• Advanced graph customization allows you to tailor every aspect of your
visualizations.
• This can involve adjusting axis limits, changing tick marks, customizing grid lines,
modifying line styles and marker shapes, incorporating annotations, adjusting legends,
applying colour palettes, and even creating interactive or 3D plots.
Characteristics of Advanced Graphs
1. Color and Style
Colour Palettes: Use meaningful colour combinations to make graphs clear and
visually appealing.
Line Styles and Markers: Change line types (solid, dashed, dotted) and markers
(circle, square, triangle) to differentiate data points easily.
2. Annotations (Adding Extra Info to Graphs)
Text Annotations: Add titles, labels, or notes to explain key points in your graph.
Example: Label the highest or lowest value on the chart.
Arrows and Lines: Use arrows or lines to draw attention to specific trends or
important data points.
3. Axis Customization
Tick Marks and Labels: Change how numbers or text appear on the x and y axes for
clarity. Example: Show ticks every 10 units instead of every 1 unit.
Logarithmic Scales: Use when data values cover a wide range (like 1, 10, 100, 1000)
to make graphs easier to read.
4. Legend (Explains the Graph)
Positioning: Move the legend (top, bottom, left, or right) to where it looks neat and
doesn’t cover the data.
Example: Place it at the top for bar charts or at the side for line graphs.
Custom Labels: Provide clear and concise labels for each series in the legend.
5. Faceting and Multiple Plots:
Facet Grids: Use facets to create multiple plots based on different categories; making
it easier to compare subsets of the data.
Arrangement: Adjust the arrangement of subplots to create a coherent and
informative layout.
6. Statistical Summaries:
Error Bars: Include error bars to show variability or uncertainty in data.
Regression Lines: Add regression lines or other statistical summaries to illustrate
trends.
7. Background and Grid Lines:
Background Colour: Customize the background colour to improve contrast and
aesthetics.
Grid Lines: Adjust the appearance of grid lines to guide the viewer’s eye.
8. Export and Save Options:
High-Resolution Output: Ensure that exported graphs are of high resolution for
publications or presentations.
Save in Multiple Formats: Save plots in different formats (PDF, PNG, SVG) for
versatility.
Plot Customization:
Advanced plot customization in advanced graphs involves fine-tuning visual elements to
convey complex data effectively.
Customizing plots in R can be done using various functions and parameters to modify the
appearance of different plot elements. Here’s a basic overview of how you can customize
plots in R:
1. Titles and Labels
2. Colors and Symbols
3. Adding a Legend
4. Axis Customization
5. Lines
6. Annotations
7. Plot Range
8. Defining a Particular Layout
1. Titles and Labels: You can add titles to your plot and label the axes to describe your
data.
• main: Main title of the plot.
• xlab and ylab: Labels for the x and y-axes.
x <- 1:10
y <- sin(x)
plot(x, y, main="Sine Wave", xlab="X-axis", ylab="Y-axis")
Output
2. Colors and Symbols
You can change the colour of points or lines and choose different symbols for
data points.
• col → Changes the color.
• pch → Changes the symbol type (1 = circle, 2 = triangle, etc.).
3. Adding a Legend
A legend helps identify what each colour or symbol in the plot represents.
• legend() → Adds a legend box.
Example
plot(x, y, col="red", pch=3)
legend("topright", legend="Data Points", col="red", pch=3)
4. Axis Customization
You can modify the appearance of the X and Y axes — such as tick marks and limits.
• xlim, ylim → Set axis limits.
• xaxt, yaxt → Hide or show axes.
• axis() → Manually customize axis ticks.
Example
plot(x, y, xlim=c(0, 15), ylim=c(-1.5, 1.5), main="Custom Axes")
Output
axis() Function
The axis() function is used to manually add or customize the axes after the plot is created.
Syntax
axis(side, at, labels, col, lty, ...)
Argument Meaning
Syntax line():
lines(x, y, col, lty, lwd)
Parameter Description
Example
# Create data
x <- 1:10
y1 <- x^2
lines(x, y1,type="o", col="blue", lty=2, lwd=2)
Syntax abline():
abline(a, b, col, lty, lwd)
Parameter Description
x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a horizontal line at y=50
abline(h=50, col="red", lty=2)
x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a horizontal line at y=50
abline(v=5, col="red", lty=2)
x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
# Add a regression line
model<-lm(y~x)
abline(model, col="red", lty=2)
6. Annotations: Annotations are extra pieces of information added directly onto a plot —
usually as text, arrows, or labels — to explain or highlight something important in the data.
text(): text() function is used to add text annotations to plot
Example:
x <- 1:10
y <- x^2
plot(x, y, main="Using abline()", xlab="X", ylab="Y")
text(x, y, labels = y, pos = 1, col = "purple")
6. Plot Range: It controls the minimum and maximum values shown along each axis.
xlim and ylim: Set the range of the x and y axes.
x <- 1:10
y <- x^2
plot(x, y, xlim = c(0, 15), ylim = c(0, 150), main = "Custom Plot Range")
8. Defining a Particular Layout
The arrangement of plots in a single window can be controlled using the layout() function in
R. This function allows you to divide the plotting area into multiple regions (panels),
so you can display multiple plots in one figure.
#1. Creates a matrix from the vector
lay_Mat <- matrix(c(1, 3, 2, 3), 2, 2)
print(lay_Mat)
Output
Plotting Regions and Margins
Definition:
Regions are areas of a graph that can be resized, relocated, or formatted with colors, borders,
or other visual properties.
• The figure region is the space containing the axes, labels, and title.
• The outer region (outer margins) is the space around the figure region, useful for
additional text or decorations.
• The margins are space between the plot area and the chart border, of a plot. Margins
can be adjusted with the mar parameter.
Three Main Plot Regions in R Graphics
• Plot Region:
This is the main area where your actual data (points, lines, bars, etc.) is drawn. It uses
the coordinate system defined by the X and Y axes.
• Figure Region:
Contains the plot region along with space for axis labels and titles. Often referred to
as figure margins.
• Outer Region:
The area outside the figure region. used for additional annotations, global titles, or
legends that apply to multiple plots.
1. Plotting Regions
Use rect() to draw rectangles or regions on a plot.
Example:
# Draw a rectangle from (1, 2) to (4, 6)
plot(1:10, type = "n") # Create an empty plot
rect(1, 2, 4, 6, col = "lightblue", border = "blue") Plot Type Meaning
2. Plotting Margins
Margins are the spaces between the chart border and the plot area (canvas). You can adjust
margins on any of the four sides: bottom, left, top, and right. Use the par() function with the
mar parameter to change the margin sizes.
The values represent the number of text lines used as margin space.
par(mar = c(bottom, left, top, right))
Example
par(mar = c(6, 2, 4, 5)) # Set margins (bottom, left, top, right)
plot(1:10)
Outer margin (oma)
• The outer margins are especially useful for adding text to a combination of plots.
• To access the current outer margins with par("oma").
In this example we set
par(oma = c(6, 1, 2, 7) )
plot(1:20,col="green")
This means:
• Six lines are displayed below the plot,
• One on the left,
• Two on the top, and
• Seven on the right.
Syntax of mtext()
mtext(text, side, line = 0, outer = FALSE, at = NA, adj=NA, col = NA, font = NA, ...)
Argument Description
text The text string you want to add (e.g., "My Label").
The side of the plot where text should appear: 1 = bottom, 2 = left, 3 = top, 4 =
side
right.(Default side is 3)
The line (distance) from the plot margin where the text is placed. Increasing this
line
moves the text farther away from the plot.
Logical (TRUE or FALSE). If TRUE, text is placed in the outer margin
outer
(outside the figure region).
at Position along the axis (e.g., where along x or y to place the text).
adj Horizontal alignment of text (0 = left, 0.5 = center(default), 1 = right).
col Text color.
font Font style (1 = plain, 2 = bold, 3 = italic, 4 = bold italic).
... Other graphical parameters like family, las, etc.
Example
par(oma=c(1,4,3,2), mar=c(4:7))
plot(1:10)
box("figure", lty=2,col="orange")
box("outer", lty=3,col="darkblue")
Clipping means restricting (or “cutting off”) anything that is drawn outside a specific area
of the plot. In other words, Clipping controls where R is allowed to draw things (like points,
lines, or legends). You can control clipping behaviour using the parameter xpd inside par()
function.
Setting Meaning
xpd = FALSE (Default) Clipping is ON — drawings are restricted to the plot region.
Clipping is OFF — allows drawing outside the plot region (like into
xpd = TRUE
margins or the outer area).
xpd = NA Drawings are restricted to the figure region (a bit larger than the plot).
Example1
par(xpd = TRUE) # Allow drawing outside plot region
plot(1:10, 1:10, main = "xpd = TRUE Example")
text(10, 11, "Outside!", col = "red", cex = 1.2)
Example2
# Sample dataset
sample_data <- [Link](
x = 1:10,
y = c(3, 5, 7, 6, 9, 8, 10, 12, 11, 13),
group = rep(1:2, each = 5) # Two groups: 1 and 2
)
print(sample_data)
# Set margin and allow drawing outside the plot
par(mar = c(4, 4, 3, 4), xpd = TRUE)
Example
plot(1, 1)
lst <- locator(type = "o", pch = 4, lty = 2, lwd = 3, col = "red")
print(lst)
Here:
• type = "o" → draws both points and lines
• pch = 4 → symbol shape
• lty = 2 → dashed line
• col = "red" → color of the line/points
Output
$x
[1] 0.6305104 0.6956254 0.7637690 0.6305104
$y
[1] 0.9767428 1.2892597 1.0094480 0.9767428
3. Ad Hoc Annotation
• The locator() function can also be used for interactive annotations — like adding
legends or labels at a specific spot.
• Since it returns valid (x, y) coordinates, you can use those coordinates to position text
or legends dynamically.
Example 1
# Step 1: Create a simple scatter plot
x <- 1:10
y <- c(2, 3, 5, 4, 7, 8, 6, 9, 10, 11)
plot(x, y, pch = 19, col = "blue",
main = "Click on a point to add a label",
xlab = "X-axis", ylab = "Y-axis")
Output
How This Works
1. The plot appears with blue dots.
2. R waits for you to click once (because of locator(1)).
3. Wherever you click, R saves the x and y coordinates in pt.
4. The text() function then writes “You clicked here!” exactly at that position.
i <- 1
while (i <= n) {
cat("Click", i, "on the plot...\n")
pt <- locator(1)
label <- paste("Point", i)
text(pt$x, pt$y, labels = label, col = "red", pos = 4)
cat("You clicked at: X =", pt$x, ", Y =", pt$y, "\n")
i <- i + 1
}
cat(" Done! All points labeled.\n")
Output
How many points do you want to label? 6
Click 1 on the plot...
You clicked at: X = 1.01956 , Y = 2.068436
Click 2 on the plot...
You clicked at: X = 2.024678 , Y = 2.96783
Click 3 on the plot...
You clicked at: X = 3.046832 , Y = 4.971027
Click 4 on the plot...
You clicked at: X = 3.96677 , Y = 4.030751
Click 5 on the plot...
You clicked at: X = 4.954852 , Y = 7.015106
Click 6 on the plot...
You clicked at: X = 6.045149 , Y = 7.914501
Done! All points labeled.
You can customize traditional plots extensively. R allows us to customize plots to make them
more colourful, neat, and informative. You can change colours, point shapes, labels, axes, and
boxes to make your graphs look better and easier to read.
Example
Print(
hp <- mtcars$hp
mpg <- mtcars$mpg
wtex <- mtcars$wt / mean(mtcars$wt)
# Adjust margins using par()
par(mar = c(5, 4, 4, 2)) # (bottom, left, top, right)
Output
Mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
2. Customizing Boxes
You can add or change the box around your plot using the box() function. The box() function
in R is used to draw a box around the current plot region. It helps highlight the plotting area
or customize its boundary style.
Syntax:
box(which = "plot", lty = "solid", lwd = 1, col = "black", bty = "o")
Argument Description
which Specifies which box to draw — "plot", "figure", or "inner" (default = "plot")
lty Line type (1 = solid, 2 = dashed, etc.)
lwd Line width
col Box color
Box type (same as in par()):
• "o" = full box (default)
bty • "l" = L-shaped box
• "7", "c", "u", "]" = different open-sided styles
• "n" = no box
3. Customizing Axes
The axis() function lets you control how and where axes appear on the plot.
You can specify which side of the plot the axis should appear on and customize tick marks
and labels.
Syntax:
axis(side, at = NULL, labels = TRUE)
Parameters:
• side: Defines which side to draw the axis on.
o 1 = bottom
o 2 = left
o 3 = top
o 4 = right
• at: Points at which to draw tick marks.
• labels: Labels to display at tick marks (TRUE by default).
Example
x <- 1:5
y <- x * x
R provides tools to control fonts, add mathematical symbols, and use colors in plots to make
them more visually appealing and informative.
a. Font Control
Example
par(mar = c(3, 3, 3, 3))
plot(1, 1, type = "n", xlim = c(-1, 1), ylim = c(0, 7), xaxt = "n", yaxt = "n", ann = FALSE)
text(0.6, 6, "Sans text (default)", family = "sans", font = 1)
text(0.5, 5, "Serif text Bold", family = "serif", font = 2)
text(0.4, 4, "Mono text Italic", family = "mono", font = 3)
text(0.3, 3, "Mono bold italic", family = "mono", font = 4)
b. Greek Symbols
Note:
• expression() helps display symbols or equations.
• paste() combines text and symbols together.
c. Mathematical Expressions
You can show full mathematical formulas using expression() and paste().
Example
Output
Defining Colors in R
Example
plot(1:10, col = "#FFA500", pch = 16, main = "Hexadecimal Color")
3. RGB Values
Use the rgb() function to mix red, green, and blue values (0–1 range).
Color Code
Red rgb(1, 0, 0)
Green rgb(0, 1, 0)
Blue rgb(0, 0, 1)
Yellow rgb(1, 1, 0)
4. HSL Values
hcl() uses Hue, Saturation, and Lightness (0–360°, 0–1, 0–1).
Color Code
Example:
plot(1:10, col = hcl(240, 100, 50) , pch = 16, main = "HSL (HCL) Color")
Plotting in Higher Resolution
Plotting in higher dimensions often involves techniques like 3D plots, color mapping, facet
grids, or even interactive visualizations to represent more than three dimensions effectively.
Representing and Using Colour: Colour plays an important role in data visualization.
It can enhance aesthetics or aid interpretation by distinguishing values and variables.
Understanding how R handles colour helps in effective visualization.
1. RGB Color Representation
➢ Colors are defined by combinations of three primary colors: Red, Green, Blue
(RGB).
➢ Each component has an intensity from 0 to 255.
➢ RGB combinations yield 16,777,216 (256³) possible colors.
Expressed as (R, G, B) triplets:
(0, 0, 0) → black
(255, 255, 255) → white
(0, 255, 0) → green
Using Colors in R
✓ R includes 650+ named colors accessible via colors() function.
colors()
[1] "white" "aliceblue "antiquewhite"
[4] "antiquewhite1" "antiquewhite2" "antiquewhite3"
[7] "antiquewhite4" "aquamarine" "aquamarine1"
[10] "aquamarine2" "aquamarine3" "aquamarine4"
[13] "azure" "azure1" "azure2"
✓ Default R palette can be viewed or set using palette() function.
To view the current default palette call function palette().
>palette()
[1] "black" "#DF536B" "#61D04F" "#2297E6" "#28E2E5"
"#CD0BBC" "#F5C710" "gray62"
To set a custom palette pass a vector of color names or hexadecimal
codes to palette() function.
>palette(c(“red”,”green”,”blue”,”yellow”))
This shows the hex codes for "black", "green3", and "pink" are "#000000" "#
00CD00" "#FFC0CB" respectively.
Built-in Palettes:
When you make plots in R, you often want to use different colours to make
them look clear and attractive. A colour palette is a group of colours that R can
generate for you automatically.
Why Use Palettes?
• To create smooth colour transitions (e.g., heat maps, elevation maps).
• To represent values that change gradually.
• To choose many colours easily without selecting them manually.
Example
plot(1:10, col = rainbow(10), pch = 19)
Custom Palettes in R
Sometimes the built-in color palettes (like rainbow(), [Link](),
[Link](), etc.) are not enough. In such cases, you can create your own color
palette using the colorRampPalette() function.
What is colorRampPalette()?
• It is a function that creates color gradients between two or more colors.
• You provide a vector of color names, and it returns a new function.
• This new function can generate any number of colors smoothly
transitioning between the given ones.
Syntax
palette_function <- colorRampPalette(colors = c("color1", "color2", ...))
Example 1
To generate Purple to Yellow Gradient
[Link] <- colorRampPalette(colors = c("purple", "yellow"))
plot(1:10, col = [Link](10), pch = 19)
Example 2
To generate red, orange, yellow green gradient
red_o <- colorRampPalette(colors = c("red", "orange","yellow","green"))
plot(1:30,col=red_o(30),pch=19,cex=2)
3D SCATTER PLOTS:
scatterplot3d(
x = height,
y =weight,
z = age,
color=color,
pch=shapes,
xlab = "Height",
ylab = "Weight",
zlab = "Age",
main = "Height,Weight & Age"
)
# 3D Scatter Plot
scatterplot3d(
x = height,
y = weight,
z = age,
color = color,
pch = shapes,
xlab = "Height",
ylab = "Weight",
zlab = "Age",
main = "Height, Weight & Age (Shape by Age Group)"
)
# Add Legend
legend("topright",
legend = c("18-20 (Circle)", "21-23 (Triangle)", "24-25 (Square)"),
pch = c(16, 17, 15),
col = c("green","red","blue"))
3. Remove the box around the plot
We can remove the box around the plot by setting “box” parameter of scatterplot3d()
function to false.
library(scatterplot3d)
x <- 1:10
y <- x
z <- x^2
scatterplot3d(x, y, z,color=rainbow(10),pch=c(1:10),box=FALSE)
Example 1
library(scatterplot3d)
x <- 1:10
y <- x
z <- x^2
scatterplot3d(
x, y, z,
type = "h", # adds vertical lines from points to base
[Link] = 2, # dashed lines for verticals
highlight.3d = TRUE
)
Example2
library(scatterplot3d)
[Link] = 2 Makes the “h” lines dashed instead of solid (line type 2).
[Link] = 3 Makes the non-visible box sides dotted (line type 3).