0% found this document useful (0 votes)
2 views3 pages

Regression Assignment

The document outlines a regression assignment requiring the application of a complete machine learning pipeline on a selected dataset. Key steps include data loading, cleaning, outlier analysis, exploratory data analysis, encoding, scaling, model training, and evaluation. Additionally, optional tasks include model interpretation and comparisons with other regression techniques.

Uploaded by

bashaadel422
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views3 pages

Regression Assignment

The document outlines a regression assignment requiring the application of a complete machine learning pipeline on a selected dataset. Key steps include data loading, cleaning, outlier analysis, exploratory data analysis, encoding, scaling, model training, and evaluation. Additionally, optional tasks include model interpretation and comparisons with other regression techniques.

Uploaded by

bashaadel422
Copyright
© All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

REGRESSION ASSIGNMENT

Objective:
select any regression dataset (from Kaggle or any open-source repository) and apply
the complete ML pipeline covered in class, including data understanding, cleaning,
visualization, outlier detection, encoding, scaling, model training, evaluation, and
interpretation.
Instructions:
Choose a regression dataset with:
- At least 1000 rows
- 1 numerical target variable
- Mixed numerical and categorical features
Required Steps:
1) Load & Inspect the Dataset
- Load dataset using pandas
- Show first 10 rows
- Print .info() and .describe()
- Identify numerical vs categorical features
2) Handle Missing Values
- Check for null values
- Remove or impute
- Explain the choice
3) Remove Duplicates
- Detect duplicated rows
- Remove them
- Report how many were removed
4) Outlier Analysis
- Boxplots for numerical features
- Identify outliers using IQR
- Decide whether to keep/remove with justification
5) EDA (Exploratory Data Analysis)
- Histograms for numerical features
- Countplots for categorical features
- Correlation matrix + heatmap
- Interpret strongest 2 correlations
6) Encode Categorical Variables
- Label Encoding or One-Hot Encoding
- Explain why chosen method is appropriate
7) Feature Scaling
- Apply StandardScaler on numerical features
- Fit only on training set, transform both
8) Train/Test Split
- 80% Train / 20% Test
- random_state=42
- Show shapes
9) Train Regression Model
- Use Linear Regression
- Fit and show coefficients
10) Predictions
- Predict on test set
- Show first 10 actual vs predicted
11) Model Evaluation
Compute:
- MAE
- MSE
- RMSE
- R2 Score

Bonus (Optional):
• Model Interpretation
Explain:
- Which feature has strongest effect and why
- Whether model is underfitting/overfitting
- Whether scaling improved accuracy
Also,
- Learning Curve
- Residual Plot
- Compare Linear Regression with Ridge/Lasso
- Polynomial Regression

You might also like