0% found this document useful (0 votes)
11 views4 pages

Implementing Linear Regression in Python

The document outlines an experiment to implement Linear Regression, a popular machine learning algorithm used for predictive analysis of continuous variables. It explains the mathematical representation, types of linear regression (simple and multiple), and provides a Python program to estimate coefficients and plot the regression line. The conclusion states that the implementation of linear regression in Python was successful.

Uploaded by

patilmanasi0904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views4 pages

Implementing Linear Regression in Python

The document outlines an experiment to implement Linear Regression, a popular machine learning algorithm used for predictive analysis of continuous variables. It explains the mathematical representation, types of linear regression (simple and multiple), and provides a Python program to estimate coefficients and plot the regression line. The conclusion states that the implementation of linear regression in Python was successful.

Uploaded by

patilmanasi0904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

lOMoAR cPSD| 59981395

Experiment No. 1

Aim: To implement Linear Regression.


Theory:
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis. Linear
regression makes predictions for continuous/real or numeric variables such as
sales, salary, age, product price, etc.
Linear regression algorithm shows a linear relationship between a dependent (y)
and one or more independent (y) variables, hence called as linear regression.
Since linear regression shows the linear relationship, which means it finds how
the value of the dependent variable is changing according to the value of the
independent variable.
The linear regression model provides a sloped straight line representing the
relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:


y= a0+a1x+ ε
lOMoAR cPSD| 59981395

Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom) a1 = Linear regression coefficient (scale factor
to each input value). ε = random error
The values for x and y variables are training datasets for Linear Regression
model representation.
Types of Linear Regression
Linear regression can be further divided into two types of the algorithm:
• Simple Linear Regression:
If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression algorithm is
called Multiple Linear Regression.

Program:
import numpy as np
import [Link] as plt
def estimate_coef(x,
y):
# number of
observations/points
n = [Link](x)

# mean of x and y
vector m_x =
[Link](x) m_y =
[Link](y)

# calculating cross-deviation and deviation about x


SS_xy = [Link](y * x) - n * m_y *
m_x
SS_xx = [Link](x * x) - n * m_x *
m_x
lOMoAR cPSD| 59981395

# calculating regression
coefficients b_1 = SS_xy / SS_xx
b_0 = m_y - b_1 * m_x

return (b_0, b_1)


def plot_regression_line(x,
y, b):
# plotting the actual points as scatter
plot [Link](x, y, color="m",
marker="o", s=30)

# predicted response vector


y_pred = b[0] + b[1] * x

# plotting the regression line


[Link](x, y_pred, color="g")

# putting labels
[Link]('x')
[Link]('y')

# function to show plot


[Link]()
def
main():
# observations / data
x = [Link]([0, 1, 2, 3, 4, 5, 6, 7, 8,
9]) y = [Link]([1, 3, 2, 5, 7, 8, 8, 9,
10, 12])

# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \nb_1 =
{}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)
if __name__ ==
"__main__":
main()
lOMoAR cPSD| 59981395

Output:

Conclusion: Thus, we have successfully implemented linear regression in


python.

Common questions

Powered by AI

Simple linear regression involves using a single independent variable to predict the dependent variable, resulting in a formula Y = a0 + a1X + ε. The model is characterized by a single slope and intercept. Conversely, multiple linear regression incorporates more than one independent variable, extending the formula to Y = a0 + a1X1 + a2X2 + ... + anXn + ε, where X1, X2,..., Xn are independent variables each with their own linear regression coefficient (a1, a2,..., an). This results in a more complex model that can capture more intricate relationships between variables, but it also increases the risk of overfitting and demands careful selection and interpretation of coefficients .

Numpy facilitates the computational process of linear regression by providing efficient data structures and operations for handling numerical calculations. It supports array manipulations needed for mean calculations, deviation analyses, and performing algebraic operations required for estimating regression coefficients. Matplotlib assists in the visualization process by offering tools for creating plots, allowing the visualization of the relationship between data points, and the regression line, which aids in the interpretation of how well the regression model fits the data. Together, these libraries streamline the development process, making computation and visualization more accessible and efficient .

The calculation of cross-deviation (SS_xy) and deviation about the independent variable (SS_xx) are critical steps in deriving linear regression coefficients. SS_xy captures the covariance between X and Y, indicating how much they vary in relation to one another, while SS_xx measures the variance of X. These calculations enable the determination of the slope (b1) of the regression line as they are used to compute b1 = SS_xy / SS_xx, where b1 quantifies the change in the dependent variable (Y) for a unit change in the independent variable (X). Accurately calculating these deviations allows for an accurate estimation of the regression slope, which is pivotal for predictive accuracy .

From the Python implementation, the linear regression model's accuracy can be assessed by comparing the regression line to empirical data points, visually determining how well the model predicts the dependent variable. Real-world applicability is demonstrated through the model's simplicity and efficiency in establishing relationships between variables. However, model accuracy depends on the linear relationship assumption and the absence of multicollinearity, especially in multiple regression. Thus, while linear regression is applicable in many scenarios, its use in complex real-world problems requires ensuring model assumptions hold true and sufficient data pre-processing to maintain efficacy .

The random error (ε) in the linear regression equation captures the variability in the dependent variable (Y) that is not explained by the independent variable(s) (X). This error term acknowledges that the model may not perfectly predict every outcome due to unmeasured variables, random fluctuations, or model simplification. Factors influencing its magnitude include the inherent noise in the data, measurement errors, omitted variables that affect Y, and possible model misspecification. A smaller random error indicates a more accurate model, while a larger error suggests the presence of additional factors or variability not captured in the model .

Visualizing the regression line alongside actual data points in linear regression provides immediate insights into model performance. It allows for visual assessment of the fit, revealing how closely the regression line approximates the real data distribution. Observations can be made about regions where the model overestimates or underestimates the dependent variable. Such visualization assists in detecting heteroscedasticity, outliers, or non-linear patterns that may necessitate model adjustments or indicate poor model applicability. This practice enhances understanding of the relationship modeled, potentially guiding model refinements .

Using a single independent variable in simple linear regression simplifies model interpretation and reduces data requirements, making computations less intensive. However, it limits the model's ability to capture complex patterns, potentially leading to biased estimates if relevant variables are omitted. Conversely, multiple linear regression increases model flexibility and explanatory power by considering several variables simultaneously. This can improve prediction accuracy but introduces complexities such as multicollinearity, higher data requirements, and increased computational cost. The choice between them depends on the specific context and the complexity of relationships among the variables involved .

The key components of a linear regression model include the dependent variable (Y), the independent variable (X), the intercept of the line (a0), the linear regression coefficient (a1), and the random error (ε). The dependent variable represents the outcome we aim to predict. The independent variable(s) serve as the predictors for the dependent variable. The intercept (a0) provides an additional degree of freedom by allowing the regression line to intersect the Y-axis. The linear regression coefficient (a1) acts as a scale factor to each input value, determining the slope of the line, which indicates how much the dependent variable changes per unit change in the independent variable. The random error (ε) accounts for the variability in Y that X does not explain .

The implementation of a linear regression model in Python demonstrates the machine learning process through several key stages. Initially, data input occurs via defining observations of independent and dependent variables, exemplified in numpy arrays (x, y). Next, the model estimates coefficients using the `estimate_coef` function, which involves mathematical calculations, such as mean computation and deviation analysis to obtain slope and intercept coefficients. Following this, these coefficients are utilized in the predictive model represented by a line, which is calculated using the regression equation. Lastly, the `plot_regression_line` function showcases visualization, where the actual data points and the regression line are plotted using matplotlib, allowing for visual assessment of the model's performance against the observed data flow ends with displaying the plotted graph .

Implementing a linear regression model in Python serves significant educational purposes for students learning machine learning. It offers practical experience in understanding fundamental concepts, such as data pre-processing, parameter estimation, and model evaluation. The exercise reinforces theoretical knowledge through coding practice, bridging the gap between theory and application. Students enhance their problem-solving skills by debugging, fine-tuning models, and interpreting outputs. Additionally, it fosters an appreciation for the role of libraries like numpy and matplotlib in simplifying complex processes, preparing students for more advanced machine learning concepts and projects .

You might also like