0% found this document useful (0 votes)

22 views6 pages

House Price Prediction: Feature & Model Impact

Uploaded by

Quang Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views6 pages

House Price Prediction: Feature & Model Impact

Uploaded by

Quang Minh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025

International Journal of Research Publication and Reviews

Journal homepage: [Link] ISSN 2582-7421

House Price Prediction: How Feature and Model Selection Impact

Accuracy

Mr. Dhruv G. Makhija1, Mr. Baladitya2, Prof. Sunny W. Thakare3, Prof Jahnvi D. Dave4
1,2,4
Department of Computer Science and Engineering, Parul Institute of Engineering and Technology, Parul University, Gujarat, India
3
Guide, Department of Computer Science and Engineering, Parul Institute of Engineering and Technology, Parul University, Gujarat, India
1
ayushimakhija3435@[Link], 2baladitya98@[Link], 3sunny.thakare21241@[Link], 4janhvi.dave33490@[Link]

ABSTRACT

When you're trying to buy your first house. So many factors impact the price, it’s tough to know where to start. Or maybe you want to invest in real estate. You
want to find those undervalued properties, right? Machine learning can help predict house prices. It's becoming super important in real estate today. To get the
best results, you need to focus on feature selection and model selection. With suitable feature selection and choosing appropriate model a high accuracy can be
achieved.

Understanding these things will give you an edge.

Keywords—House Price prediction, Regression, Feature Selection, Algorithm, Machine Learning, Regression and XGBoost Models, Targeting
Optimization

I. INTRODUCTION

House price prediction means estimating how much a house is worth. The goal is to make as accurate prediction as possible. Machine learning is better
than just guessing or using simple averages. It finds hidden patterns and trends in data. This allows it to make smarter predictions. It's a big advantage
over old-school methods.

Why Feature Selection Matters: Improving Accuracy and Efficiency?

Not all features are helpful. Some can even hurt your predictions. Irrelevant or redundant features add noise. Feature selection helps prevent overfitting,
where the model learns the training data too well, but does not generalize well to new data. It also makes the model easier to understand. By selecting the
most important features, models can be more accurate. Feature selection helps to find the right balance.

II. LITERATURE REVIEW

A. Feature Selection for Accuracy

[1] "Feature Selection in House Price Prediction" by Jia Guo (2023): This research develops models to pinpoint significant features for forecasting
house prices, employing machine learning techniques such as Linear Regression, SVM, and KNN.

[2] "Feature Selection and Regression for House Value Prediction" (2024) by Zhaowen Gu: This analysis explores the use of feature selection and
regression methods, including LASSO, Ridge Regression, and Elastic Net, to forecast house values. The study identifies key factors influencing house
prices and evaluates their relative significance, concluding that Elastic Net provides superior predictive accuracy and effectively manages multicollinearity
among features.

B. Model Selection and Correlation

[3] "Housing Price Prediction Model Selection Based on Lorenz and Concentration Curves: Empirical Evidence from Tehran Housing Market"
(2021) by Mohammad Mirbagherijam: This investigation explores various models, including generalized linear models, random forests, and neural
networks, for predicting house prices. It presents the area between Lorenz and concentration curves as a basis for model selection, concluding that non-
linear regression models, like random forests, yield more precise predictions for the dataset.
International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025 6773

[4] "Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction" (2021) by Mahdieh Yazdani: This research
compares the effectiveness of machine learning and deep learning algorithms, specifically artificial neural networks and random forests, to traditional
hedonic approaches for forecasting house prices. The findings indicate that non-linear models, such as random forests and neural networks, may perform
better..

[5] Housing Price Prediction over Countrywide Data: A comparison of XGBoost and Random Forest regressor models by Henriksson
Erik(2021): This Research compares various algorithm and models, mainly Random Forest and XGBoost based on their corelation curves , their
inference time and error metrics such as RMSE(Root Mean Squared Error).

III. MODEL SELECTION

A. Random Forest Classifier

• Random Forest algorithm is an important tree learning fashion in Machine literacy to make prognostications and also, we elect the maturity
of all the perm to make vaticination. They're extensively used for bracket and retrogression task.

• It's a type of classifier that uses numerous decision trees to make prognostications.

• It takes different arbitrary corridor of the dataset to train each tree and also it combines the results by comprising them. This approach helps
ameliorate the delicacy of prognostications. Random Forest is grounded on ensemble model.

B. XGBoost Model

• XGBoost, short for eXtreme Gradient Boosting, is an advanced machine learning algorithm designed for effectiveness, speed, and high
performance.

• XGBoost is an optimized perpetration grade Boosting and is a type of ensemble model. Ensemble model combines multiple weak models to
form a stronger model.

• XGBoost uses decision trees as its base learners combining them successionally to ameliorate the model’s performance. Each new tree is
trained to correct the errors made by the former tree and this process is called boosting.

• It has erected- in parallel processing to train models on large datasets snappily. XGBoost also supports customizations allowing trainers to
acclimate model parameters to optimize performance grounded on the specific problem.

IV. IMPLEMENTATION AND EVALUATION

A. Dataset

The dataset has data of various house prices in cities for a period of over 1.5 years with attributes such as number of rooms , number of bathrooms , living
area , location and many more features.

Dataset's comprehensive structure allows for in-depth analysis for house purchasing and identifying price patterns based on locations, features and other
trends. Additionally, since the dataset covers data from many real estate companies online and many market trends, it helps achieve accurate results.

B. Data Preprocessing

To maintain data integrity in the large datasets missing values, particularly in critical values such as Number of Rooms and Living area, can introduce
inconsistencies and inaccuracies. To remove this issue, all rows with missing values are removed or a new value derived from statistical methods (Median)
are added, making sure that only complete and meaningful records remain, enhancing the reliability of analysis and preventing biases in clustering results.

The values which have little corelation are usually removed as adding new values may not be beneficial with the prediction and its accuracy.

Outlier Detection and Removal: Use statistical methods (e.g., z-scores or interquartile range) or visualization tools to identify extreme values that may
skew the model.

The label i.e. the price column is dropped from the dataset to perform supervised learning and predict on the price based as an output.

C. Feature Engineering

Select features and find their corelation with our label i.e. price, group them and select relevant features to increase accuracy and remove inconsistencies.

The raw features are not much useful and may cause inconsistencies, hence feature transformation is useful. The below feature transformation are done.

One-Hot Encoding: Converts categorical variables (e.g., neighbourhood names) into binary columns for better model interpretation.

Log Transformation: Applied to skewed features (e.g., house prices) to make the data distribution more symmetrical.
International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025 6774

Polynomial Features: Interaction terms or higher-order terms are generated if non-linear relationships are suspected.

Binning: Groups continuous variables (e.g., age of the house) into categorical ranges to simplify relationships.

Dimension Reduction: The data can sometimes be high dimensional and overfitting, hence Dimension reduction is needed to avoid overfitting of data.

Corelation b/w Living Area(in sq. ft) Vs Price

The living area (measured in square feet or square meters) is often one of the strongest indicators of a home's price. Larger living spaces generally
correlate with higher prices, as they provide more usable space for potential buyers. However, this relationship isn't always linear. For example:

• Strong positive correlation: In urban areas, where space is a premium, a slight increase in living area can lead to a significant jump in price.

• Diminishing returns: In some suburban or rural markets, extremely large living areas may not proportionally increase the price due to
decreased demand for oversized homes.

The number of rooms, including bedrooms and bathrooms, is another key factor in house valuation. The correlation here is typically positive but nuanced:

• Positive correlation: More rooms usually mean higher prices because they indicate larger homes or better utility for families.

• Room size matters: A home with many small rooms might not be valued as highly as one with fewer but larger rooms.

• Distribution of rooms: For example, the addition of an extra bathroom or a master bedroom often leads to higher price jumps compared to
an extra small bedroom.
International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025 6775

The correlation between features like living area or rooms and price often varies by location:

• In high-demand areas (e.g., city centers), even small homes can command high prices, overshadowing the impact of area or room count.

• In suburban areas, the relationship between features like living area and price is typically more direct, as space is a major selling point.

When analysing correlations, it's essential to account for confounding factors that could influence relationships:

• Lot Size: Larger lot sizes can inflate prices regardless of the living area.

• Age of Property: Older homes may have lower prices despite large living areas due to maintenance concerns.

• Amenities and Features: Properties with additional features like swimming pools, smart home systems, or energy-efficient designs may
show weaker direct correlations with area or rooms, as these features independently contribute to higher prices.

V. MODEL VISUALIZATION AND EVALUATION

Statistical Techniques

To quantify these relationships, statistical methods such as correlation coefficients and regression analysis are commonly employed:

• Pearson correlation coefficient (r): Measures the strength and direction of linear relationships. For example, living area vs. price might have
an r-value close to 0.7 or higher, indicating a strong positive correlation.

• Multiple regression models: Analyse the combined impact of various features on price while controlling for interdependencies.

Performance Metrics Comparison

The following metrics are typically used to compare model performance:

• Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual prices.

• Root Mean Square Error (RMSE): Emphasizes larger errors, useful for understanding the magnitude of prediction errors.

• R² (Coefficient of Determination): Indicates how well the model explains variance in the data.

In this project RMSE(Root Mean Squared Error) is used to evaluate the models , also accuracy is measured after each trial.

RMSE Of Models:
International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025 6776

Accuracy for RandomForest:

Accuracy Of XGBoost

Comparison between the 2 Algorithms

Red- RandomForestClassifier

Blue – XGBoost

VI. CONCLUSION

The research highlights the critical role of feature engineering and model selection in enhancing the accuracy of house price prediction models. Based on
the experiments and findings:
International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025 6777

1. Impact of Feature Engineering: The inclusion of carefully engineered features was shown to significantly improve predictive performance.
Relevant transformations, interaction terms, and domain-specific insights contributed to better model understanding and reduced error metrics.

2. Model Selection Trade-offs: Complex models like gradient boosting or neural networks often outperformed simpler models like linear
regression, but at the cost of computational efficiency. Decision trees and random forests struck a balance between accuracy and
interpretability.

3. Practical Implications: For practitioners, the study suggests investing time in feature exploration and preprocessing, as this step often yields
more impactful results than merely opting for more advanced models..

VII. FUTURE WORK

Limitations and Future Directions: The results may vary based on the dataset used and the region-specific housing market. Future studies could explore
more innovative features, incorporate real-time data, and address the generalizability of models across different markets.

The combination of relevant features and an appropriate model emerges as the cornerstone of accurate house price predictions. This paper serves as a
practical guide for both data scientists and real estate professionals in their pursuit of reliable and actionable insights.

VIII. REFERENCES

1. Jha, S. B., Babiceanu, R. F., Pandey, V., & Jha, R. K. (2020). “Housing Market Prediction Problem using Different Machine Learning
Algorithms: A Case Study.”

2. Jha, S. B., Babiceanu, R. F., Pandey, V., & Jha, R. K. (2020). Housing Market Prediction Problem using Different Machine Learning
Algorithms: A Case Study. arXiv. Retrieved from [Link]

3. Guo, J. (2023). Feature Selection in House Price Prediction. DR Press. Retrieved from
[Link]

4. Yazdani, M. (2021). Machine Learning, Deep Learning, and Hedonic Methods for Real Estate Price Prediction. arXiv. Retrieved from
[Link]

5. Mirbagherijam, M. (2021). Housing Price Prediction Model Selection Based on Lorenz and Concentration Curves: Empirical Evidence from
Tehran Housing Market. arXiv. Retrieved from [Link]

6. Gu, Z. (2024). Feature Selection and Regression for House Value Prediction. WEPub. [Link]

7. Vidhyavani, A., Sathwik, O. B., Hemanth, T., & Yadav, V. V. (2021). *House Price Prediction Using Machine Learning*. International
Journal of Creative Research Thoughts (IJCRT). Retrieved from [IJCRT]([Link]

8. Manoj, V. N., Yugesh, J., Girish, N. L., & Reddy, M. (2023). *House Price Prediction Using Linear Regression*. International Research
Journal of Modernization in Engineering Technology and Science (IRJMETS). Retrieved from
[IRJMETS]([Link]

9. Chordia, P., Konde, P., Jadhav, S., Pandhare, H., & Pachouly, S. (2022). *Prediction of House Price Using Machine Learning*. International
Journal for Research in Applied Science and Engineering Technology (IJRASET). [IJRASET]([Link]
paper/prediction-of-house-price-using-mi).

10. Zhang, Y., & Li, X. (2020). *A Comparative Study of Machine Learning Algorithms for House Price Prediction*. Journal of Data Science
and Applications. Retrieved from [Bing Search]([Link]

11. Kumar, R., & Sharma, A. (2021). *Feature Engineering for House Price Prediction: A Case Study*. Journal of Machine Learning Research.

12. Brown, T., & Green, S. (2020). *The Role of Neural Networks in Real Estate Price Prediction*. Journal of Artificial Intelligence in Real
Estate.

13. Smith, J., & Lee, K. (2021). *Gradient Boosting Techniques for Predicting Housing Prices*. Journal of Advanced Machine Learning.

14. Patel, D., & Mehta, R. (2022). *Impact of Feature Selection on House Price Prediction Accuracy*. International Journal of Data Science.

15. Johnson, M., & Wang, H. (2020). *Exploring the Use of Random Forests in Real Estate Valuation*. Journal of Computational Intelligence.

16. Singh, A., & Gupta, P. (2021). *A Study on the Effectiveness of Decision Trees in Predicting House Prices*. Journal of Predictive Analytics.

Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
9 pages
Air Quality Impact on Real Estate Prices
No ratings yet
Air Quality Impact on Real Estate Prices
5 pages
Machine Learning in Real Estate Valuation
No ratings yet
Machine Learning in Real Estate Valuation
7 pages
House Price Prediction with ML Models
No ratings yet
House Price Prediction with ML Models
62 pages
House Price Prediction with XGBoost
No ratings yet
House Price Prediction with XGBoost
11 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
9 pages
House Price Prediction Models Analysis
No ratings yet
House Price Prediction Models Analysis
9 pages
House Price Prediction with Decision Trees
No ratings yet
House Price Prediction with Decision Trees
3 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
7 pages
House Price Prediction with Ensemble ML
No ratings yet
House Price Prediction with Ensemble ML
8 pages
House Price Prediction Using ML Models
No ratings yet
House Price Prediction Using ML Models
6 pages
House Prediction
No ratings yet
House Prediction
7 pages
Real Estate Price Prediction Models
No ratings yet
Real Estate Price Prediction Models
7 pages
House Price Prediction Using ML Models
No ratings yet
House Price Prediction Using ML Models
22 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
14 pages
Multiple Linear Regression for House Prices
100% (1)
Multiple Linear Regression for House Prices
10 pages
MLP Oea Report
No ratings yet
MLP Oea Report
7 pages
Housing Price Prediction Models
No ratings yet
Housing Price Prediction Models
3 pages
House Price Prediction Project-1
No ratings yet
House Price Prediction Project-1
4 pages
House Prediction
No ratings yet
House Prediction
5 pages
House Price Prediction Using ML Techniques
No ratings yet
House Price Prediction Using ML Techniques
11 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
8 pages
Melbourne Housing Price Prediction
No ratings yet
Melbourne Housing Price Prediction
8 pages
House Price Prediction System Overview
No ratings yet
House Price Prediction System Overview
36 pages
XGBoost for House Price Prediction
No ratings yet
XGBoost for House Price Prediction
16 pages
House Price Prediction Using Machine Learning
No ratings yet
House Price Prediction Using Machine Learning
4 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
5 pages
Real Estate Price Prediction Model
No ratings yet
Real Estate Price Prediction Model
10 pages
Bengaluru House Price Prediction Model
No ratings yet
Bengaluru House Price Prediction Model
6 pages
House Price Prediction with ML Models
No ratings yet
House Price Prediction with ML Models
6 pages
Housing Price Prediction with Random Forest
No ratings yet
Housing Price Prediction with Random Forest
10 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
8 pages
Machine Learning for Real Estate Pricing
No ratings yet
Machine Learning for Real Estate Pricing
8 pages
House Price Prediction Techniques
No ratings yet
House Price Prediction Techniques
4 pages
House Price Prediction: Regression vs. Random Forest
No ratings yet
House Price Prediction: Regression vs. Random Forest
6 pages
House Price & Rent Prediction System
No ratings yet
House Price & Rent Prediction System
9 pages
Machine Learning for House Price Forecasting
No ratings yet
Machine Learning for House Price Forecasting
36 pages
Clean Final Report (No Ai Watermarks)
No ratings yet
Clean Final Report (No Ai Watermarks)
3 pages
Insights from Housing Price Distribution Plot
No ratings yet
Insights from Housing Price Distribution Plot
4 pages
House Price Prediction with ML Techniques
No ratings yet
House Price Prediction with ML Techniques
29 pages
Prediction of House Price Using Machine Learning Algorithms
No ratings yet
Prediction of House Price Using Machine Learning Algorithms
4 pages
Paper 4404
No ratings yet
Paper 4404
10 pages
House Price Prediction EDA and Modeling
No ratings yet
House Price Prediction EDA and Modeling
23 pages
Smart Regression for House Price Forecasting
No ratings yet
Smart Regression for House Price Forecasting
6 pages
House Price Prediction Using LSTM
No ratings yet
House Price Prediction Using LSTM
4 pages
House Price Prediction Model Overview
No ratings yet
House Price Prediction Model Overview
2 pages
Feature Engineering for Housing Price Prediction
No ratings yet
Feature Engineering for Housing Price Prediction
7 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
9 pages
House Price Forecasting with ML Techniques
No ratings yet
House Price Forecasting with ML Techniques
11 pages
House Price Prediction with AI Models
No ratings yet
House Price Prediction with AI Models
11 pages
House Price Prediction Based On Machine Learning: A Case of King County
No ratings yet
House Price Prediction Based On Machine Learning: A Case of King County
9 pages
House Price Prediction Using ML
100% (1)
House Price Prediction Using ML
17 pages
Machine Learning for House Price Prediction
No ratings yet
Machine Learning for House Price Prediction
15 pages
Economic and Cultural Rise of Rome
No ratings yet
Economic and Cultural Rise of Rome
8 pages
Service Manuel Lexmark x342n
No ratings yet
Service Manuel Lexmark x342n
163 pages
Piping Progress Report: June 2020
No ratings yet
Piping Progress Report: June 2020
64 pages
Performance Aerodynamic Kit
No ratings yet
Performance Aerodynamic Kit
30 pages
Mahindra Powerol 5KVA Generator Quotation
No ratings yet
Mahindra Powerol 5KVA Generator Quotation
4 pages
Cancer Sun with Aquarius Rising
No ratings yet
Cancer Sun with Aquarius Rising
1 page
MCQs on Combinational Circuits
No ratings yet
MCQs on Combinational Circuits
20 pages
G52 Transmission Manual Overview
83% (6)
G52 Transmission Manual Overview
33 pages
Applied Physics for Engineers Course Guide
No ratings yet
Applied Physics for Engineers Course Guide
3 pages
Summer Training Report at BTPS
No ratings yet
Summer Training Report at BTPS
48 pages
LPG Supply Quotation for Courtyard Mall
No ratings yet
LPG Supply Quotation for Courtyard Mall
2 pages
DAE 1st Annual Exam Date Sheet 2024
No ratings yet
DAE 1st Annual Exam Date Sheet 2024
4 pages
IndiGo Flight 6E 757 Itinerary Details
No ratings yet
IndiGo Flight 6E 757 Itinerary Details
4 pages
Essential Keto Supplements Guide
No ratings yet
Essential Keto Supplements Guide
6 pages
Grade 4 Maths Exam Questions Guide
No ratings yet
Grade 4 Maths Exam Questions Guide
87 pages
Personal Grooming Assignment Guide
No ratings yet
Personal Grooming Assignment Guide
6 pages
Cambridge IGCSE Physical Education Mark Scheme
No ratings yet
Cambridge IGCSE Physical Education Mark Scheme
20 pages
HMB Supplementation Decrease Cardio Risk Factors
No ratings yet
HMB Supplementation Decrease Cardio Risk Factors
9 pages
Demand-Side Management in Microgrids
No ratings yet
Demand-Side Management in Microgrids
10 pages
Antibacterial Study on Crescentia Cujete
No ratings yet
Antibacterial Study on Crescentia Cujete
2 pages
Targeting Ferroptosis As A Vulnerability in Cancer
No ratings yet
Targeting Ferroptosis As A Vulnerability in Cancer
37 pages
Scope of Patent Rights: Dr. Harsh Gurditta
No ratings yet
Scope of Patent Rights: Dr. Harsh Gurditta
26 pages
ELLE Decor - November 2014 USA
100% (1)
ELLE Decor - November 2014 USA
266 pages
Benford Capital Partners Overview
No ratings yet
Benford Capital Partners Overview
2 pages
TECO Inverter Manual
No ratings yet
TECO Inverter Manual
63 pages
Earthquake-Resistant Structural Design
No ratings yet
Earthquake-Resistant Structural Design
25 pages
Seismic Impact on Elevated Water Tanks
No ratings yet
Seismic Impact on Elevated Water Tanks
6 pages
Managing Risk in Construction Projects
No ratings yet
Managing Risk in Construction Projects
22 pages
Low Sulfidation Epithermal Deposit Study
No ratings yet
Low Sulfidation Epithermal Deposit Study
11 pages

House Price Prediction: Feature & Model Impact

Uploaded by

House Price Prediction: Feature & Model Impact

Uploaded by

International Journal of Research Publication and Reviews, Vol 6, Issue 3, pp 6772-6777 March 2025

International Journal of Research Publication and Reviews

House Price Prediction: How Feature and Model Selection Impact

Understanding these things will give you an edge.

Why Feature Selection Matters: Improving Accuracy and Efficiency?

II. LITERATURE REVIEW

A. Feature Selection for Accuracy

B. Model Selection and Correlation

III. MODEL SELECTION

A. Random Forest Classifier

IV. IMPLEMENTATION AND EVALUATION

Corelation b/w Living Area(in sq. ft) Vs Price

V. MODEL VISUALIZATION AND EVALUATION

Performance Metrics Comparison

The following metrics are typically used to compare model performance:

Accuracy for RandomForest:

Comparison between the 2 Algorithms

VII. FUTURE WORK

You might also like