(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 10, 2017
Modeling House Price Prediction using Regression
Analysis and Particle Swarm Optimization
Case Study: Malang, East Java, Indonesia
Adyan Nur Alfiyatin
Hilman Taufiq
Faculty of Computer Science
Faculty of Computer Science
Brawijaya University, Malang, Indonesia
Brawijaya University, Malang, Indonesia
Ruth Ema Febrita
Wayan Firdaus Mahmudy
Faculty of Computer Science
Faculty of Computer Science
Brawijaya University, Malang, Indonesia
Brawijaya University, Malang, Indonesia
Abstract—House prices increase every year, so there is a need The first approach is a quantitative prediction. A quantitative
for a system to predict house prices in the future. House price approach is an approach that utilizes time-series data [5]. The
prediction can help the developer determine the selling price of a time-series approach is to look for the relationship between
house and can help the customer to arrange the right time to current prices and prevailing prices. The second approach is to
purchase a house. There are three factors that influence the price use linear regression based on hedonic pricing [6], [7].
of a house which include physical conditions, concept and Previous research conducted by Gharehchopogh, et al. [7]
location. This research aims to predict house prices based on using linear regression approach get 0,929 error with the actual
NJOP houses in Malang city with regression analysis and particle price. In linear regression, determining coefficients generally
swarm optimization (PSO). PSO is used for selection of affect
using the least square method, but it takes a long time to get the
variables and regression analysis is used to determine the optimal
coefficient in prediction. The result from this research proved
best formula.
combination regression and PSO is suitable and get the minimum Particle swarm optimization (PSO) is proposed to find the
prediction error obtained which is IDR 14.186. coefficients aimed at obtaining optimal results [8]. Some
previous researches such as Marini and Walzack [9], [10] show
Keywords—House prediction; regression analysis; particle that PSO gets better results than other hybrid methods. There
swarm optimization are several advantages of PSO, in the small search space PSO
I. INTRODUCTION can do better solution search [11]. Although the PSO global
search is less than optimal [12], but on the optimization
Investment is a business activity that most people are problem the value of the variable on the regression equation
interested in this globalization era. There are several objects can find a maximum solution using PSO [12], [13].
that are often used for investment, for example, gold, stocks
and property. In particular, property investment has increased This research aims to create a house price prediction model
significantly since 2011, both on demand and property selling using regression and PSO to obtain optimal prediction results.
[1]. One of the increasing of property demand is because of PSO is used for selection of affect variables in house
high population in Indonesia. Indonesian Central Bureau of prediction, regression is used to determine the optimal
Statistics states that in East Java 50% of the population of East coefficient in prediction. In this study, researchers wanted to
Java classified as a young population who have age know the performance of the developed model in time series
approximately at 30 years old [2]. The result of this census data. Prediction house prices are expected to help people who
indicates that the younger generation will need a house or buy plan to buy a house so they can know the price range in the
a house in the future. Based on preliminary research conducted, future, then they can plan their finance well. In addition, house
there are two standards of house price which are valid in price predictions are also beneficial for property investors to
buying and selling transaction of a house that is house price know the trend of housing prices in a certain location. This
based on the developer (market selling price) and price based research is focused in Malang City, because Malang is one of
on Value of Selling Tax Object (NJOP). According to Lim, et tourism and urban city in East Java.
al the fundamental problem for a developer is to determine the
selling price of a house [3]. In determining the price of home, II. RELATED WORK
the developer must calculate carefully and determine the A. House Price Affecting Factors
appropriate method because property prices always increase
There are several factors that affect house prices. In his
continuously and almost never fall in the long term or short [4].
research Rahadi, et al. [14] divide these factors into three main
There are several approaches that can be used to determine groups, there are physical condition, concept and location.
the price of the house, one of them is the prediction analysis. Physical conditions are properties possessed by a house that
323 | P a g e
[Link]
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 10, 2017
can be observed by human senses, including the size of the Location is an important factor in shaping the price of a
house, the number of bedrooms, the availability of kitchen and house. This is because the location determines the prevailing
garage, the availability of the garden, the area of land and land price [16]. In addition, the location also determines the
buildings, and the age of the house [15], while the concept is an ease of access to public facilities, such as schools, campus,
idea offered by developers who can attract potential buyers, for hospitals and health centers, as well as family recreation
example, the concept of a minimalist home, healthy and green facilities such as malls, culinary tours, or even offer a beautiful
environment, and elite environment. scenery [17], [18]. In general, the factors affecting the house
prices will be presented in Table 1.
TABLE I. HOUSE PRICE AFFECTING FACTORS
stm
H
G
eA
A
C
A
R
B
lP
u
S
u
h
fn
cd
sfn
cn
h
sp
h
eh
d
u
tu
h
b
ctn
tiu
tn
n
S
p
n
n
o
o
o
a
a
a
g
a
g
a
eo
o
a
to
o
o
a
o
a
a
rra
ia
a
a
y
o
e
e
iz
te
c
re
r
c
e
cer
r
e
c
e
e
e
tre
Physical condition Concept Location
cs
fis
s
t
t
Literature
ri
lei
fl
i
sictl
ei
eri
i
[15] (Limsombunchai, 2004 ) √ √ √ √ √ √ √
[18] (Jim and Chen, 2009) √ √ √ √
[17] (Kisilevich, Keim and Rokach,
√ √
2013)
[16] (Zhu and Wei, 2013) √ √ √ √ √
[14] (Rahadi, et all, 2015) √ √ √ √ √ √ √ √ √ √ √ √
[19] (Bryant, 2016) √ √ √ √
B. Hedonic Pricing IV. RESEARCH METHODOLOGY
Hedonic pricing is a price prediction model based on the
hedonic price theory, which assumes that the value of a
property is the sum of all its attributes value [20]. In the
implementation, hedonic pricing can be implemented using
regression model. Equation 1 will show the regression model in
determining a price.
Where, y is the predicted price, and x1, x2, xi are the
attributes of a house. While a, b, ... n indicate the correlation
coefficients of each variables in the determination of house
prices.
III. DATA SET
In this research, we use house price data based on NJOP
from Land and Building Tax (PBB) payment structure. Due to Fig. 1. Diagram flow research.
limited access to the data, this study used 9 houses data in time
series scattered in Malang City area, within 2014-2017. Based on Fig. 1, the process of regression analysis and
Normalization of data is done by completing the empty data at particle swarm optimization methods is described in the
a certain time with the assumption that land prices tend to following section:
change every 2 years, while building prices tend to be stable.
A. Regression analysis
The data tabulation offer information of the houses
The prediction model used in this research is hedonic
includes: home id, address (street name), longitude-latitude,
pricing, the suitable model using regression, with the standard
year, building area, land area, NJOP building price (IDR/m2),
formula as shown in (1). The dependent variable symbolized as
NJOP land price (IDR/m2), distance from city center(km),
Y is NJOP price and independent variables with symbol x1- x14
amount number of campuses, amount number of restaurants,
consist of year, building area, land area, NJOP land price
amount number of health facilities, amount number of
(IDR/m2), NJOP building price (IDR/m2), distance to center of
playground, amount number of schools, amount number of
the city, amount number of campuses, amount number of
traditional markets or malls, amount number of worship places,
restaurants, amount number of health facilities, amount number
and also easiness access to public transportation. The city
of amusement parks, amount number of educational facilities,
center in this study defined as the location of the square of
amount number of traditional markets, amount number of
Malang City. The distance to city center is calculated using
worship places, and easiness to public transportations is shown
Google maps. Meanwhile, easy access to public transportation
in (2).
is calculated between radius 400 meter. The calculation of
nearest objects in the certain radius using buffering techniques
accessed through the site [Link]
324 | P a g e
[Link]
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 10, 2017
In this case, the public transportation variable will be 0 or 1,
0 means no public transport passes the area within 200 meters. ∑| |
And 1 means that there is public transports which passes
through the area. MAE calculate the average of absolute error for each
predicted result. MAE is useful when measuring errors in
B. Particle Swarm Optimization (PSO) certain units. MAE values can be calculated using (10).
PSO is a stochastic optimization method that represents
solutions as particle [21]. Amount number of particles are ∑ | |
generated randomly, where each particle consists of some
dimensions of xi position and velocity vi. Each particle will RMSE is used to calculate predicted performance by
measure its fitness value which shown in (3). considering the prediction error of each data. RMSE formula
can be seen there (11).
f (x) = Ꜫ from prediction (3)
Where, f (x) is the fitness value of each particle that √ ∑
indicates the error prediction value. Each particle will explore
the solution search space to get optimal results. The
displacement from one position to another is greatly influenced V. EXPERIMENT AND RESULT
by the speed of each particle, to obtain the best position
required a dynamic speed formulation using (4) [22]. The experimental process examines the parameters used on
particle swarm optimization such as particle test, iteration test,
vit+1 = [Link] + c1 . r1 (pi – xi) + c2 . r2 (pgi – xi) (4) and also inertia weight combination test.
Where, vi shows the velocity value for the particle The PSO algorithm generates population and initial velocity
dimension to i to n, t denotes the iteration time, w is the value in the range of [0-100]. The range used has been tested from
of the inertia vector whose value is obtained dynamically using the number -1000 to 1000 and obtained that range 0-100 can
(5) [23]. pi is the best position ever obtained for each particle, provide highest fitness solutions. Particle test and iteration test
while the pgi is the best position ever achieved by the whole for each model use a multiple of 100 in which the maximum
particle. c1 and c2 sequential are cognitive and social constant, particle test lies in 3000 particles, if the particles tested over
which in this study is 2.5 and 0.5. r1 and r2 are 0.5 and 2.5. 3000 require longer computation time. For each testing run 5
Once obtained speed will be updated position using (6). times, and the fitness value obtained from the average test
results. The last test was a combination of inertia weight,
( – ) , (5) performed to know the displacement velocity of each particle,
t +1 t +1 inertia weight is tested in a range [0,1-0,9]. The result of each
xi = xi + vi , (6)
parameter testing is shown in Table 2.
In the PSO, too fast particle displacement position can
make the method fail to obtain the optimum solution. This TABLE II. TEST RESULT OF PARAMETER
problem can be handled by performing speed control or Test Iteration Inertia
M Fitness Fitness Fitness
velocity clamping [9]. The speed control mechanism by Particles Test weight
conducting conditions for the speed of each particle uses (7). 39950.9
1 1800 700 186.704 0.8 0.4 2420.86
t+1 max t+1 max 474
if (vij > vj ) then vij = vj 825.913 45242.5 86434.2
if (vij t + 1 < vj min ) then vij t+1 = vj min, (7) 2 1800
4
1900
22
0.2 0.7
66
298492.
3 500 139.68 1800 814.624 0.3 0.8
While, the value of vjmax is generated using equation 8 and 2
vj is the negative value of vjmax.
min
4 2000
201506.
500 69.38 0.2 0.7 2.126
91
(8) 539040.
5 2500 1900 124.27 0.3 0.9 243.902
066
Calculation cycle of velocity values vi and updated position
214060. 297389.
xi will be repeated until maximum iteration is achieved. When 6 800
584
600
054
0.4 0.7 846.26
the iteration is over, the best particles come out as the optimum 38
solution. 236999.
7 1900 1800 581.986 0.4 0.9 8.
218
75
C. Testing Methods
The model developed in this research will be tested using M-1 represents Karang Besuki area, M-2 represents
several methods such as Mean Absolute Percentage Error Tunggulwulung area, M-3 represents Lowokwaru area, M-4
(MAPE), Mean Absolute Error (MAE), and Root Mean Square represents Puncak Trikora area, M-5 represents Sumbersari
Error (RMSE). MAPE is calculated by making an average area, M-6 represents Dinoyo area, and M-7 represents Manggar
percentage of the absolute error of each predicted result. Thus, area. The experimental result shows that the fitness value based
MAPE can indicate how much prediction error. MAPE is on data being tested. Furthermore, this research is better using
described in (9). more data.
325 | P a g e
[Link]
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 8, No. 10, 2017
After knowing the result of parameter testing, error values Mathematical/ Analytical Modelling and Comuter Simulation, 2010, no.
are calculated based on RMSE, MAE, and MAPE. Comparison 1.
of test values is shown in Table 3. [6] A. Azadeh, B. Ziaei, and M. Moghaddam, ―A hybrid fuzzy regression-
fuzzy cognitive map algorithm for forecasting and optimization of
housing market fluctuations,‖ Expert Syst. Appl., vol. 39, no. 1, pp. 298–
TABLE III. RESULT OF TESTING METHOD 315, 2012.
Accuracy [7] F. S. Gharehchopogh, T. H. Bonab, and S. R. Khaze, ―A Linear
Methods Regression Approach to Prediction of Stock Market Trading Volume: A
MAPE MAE RMSE
Case Study,‖ Int. J. Manag. Value Supply Chain., vol. 4, no. 3, pp. 25–
Regression 4.84552 4.84552 2201253 31, 2013.
[8] H.-I. Hsieh, T.-P. Lee, and T.-S. Lee, ―A Hybrid Particle Swarm
Regression - PSO Optimization and Support Vector Regression Model for Financial Time
Series Forecasting,‖ Int. J. Bus. Adm., vol. 2, no. 2, pp. 48–56, 2011.
Model 1 0.73255 2837.2 14186
[9] F. Marini and B. Walczak, ―Particle swarm optimization (PSO). A
Model 2 0.0238 5520.95 44168 tutorial,‖ Chemom. Intell. Lab. Syst., vol. 149, pp. 153–165, 2015.
[10] A. Hayder M. Albehadili Abdurrahman and N. . Islam, ―An Algorith for
Model 3 0.02251 16635.9 99816
Time Series Prediction Using,‖ Int. J. Sci. Knowl. Comput. Inf.
Model 4 5.84929 16798.2 67193 Technol., vol. 4, no. 6, pp. 26–33, 2014.
[11] Y. P. Anggodo and W. F. Mahmudy, ―Automatic Clustering and
Model 5 0.42763 44950.7 179803 Optimized Fuzzy Logical Relationship for Minimum Living Needs
Forecasting,‖ J. Environ. Eng. Sustain. Technol., vol. 4, no. 1, pp. 1–7,
Model 6 0.07718 34153.1 170765 2017.
Model 7 0.0932 19830.8 79323 [12] Y. P. Anggodo, W. Cahyaningrum, A. N. Fauziyah, I. L. Khoiriyah, K.
Oktavianis, and I. Cholissodin, ―Hybrid K-means Dan Particle Swarm
Optimization Untuk Clustering Nasabah Kredit,‖ J. Teknol. Inf. dan
VI. CONCLUSION Ilmu Komput., vol. 4, no. 2, pp. 1–6, 2017.
In this paper, several tests have been performed using linear [13] Y. P. Anggodo, A. K. Ariyani, M. K. Ardi, and W. F. Mahmudy,
regression and particle swarm optimization methods to perform ―Optimation of Multi-Trip Vehicle Routing Problem with Time
Windows using Genetic Algorithm,‖ J. Environ. Eng. Sustain. Technol.,
house price prediction. Based on the NJOP data of 9 houses, vol. 3, no. 2, pp. 92–97, 2017.
the system is modeling house price predictions into 7 models [14] R. A. Rahadi, S. K. Wiryono, D. P. Koesrindartotoor, and I. B. Syamwil,
each of them represents one area. The area modeling includes ―Factors influencing the price of housing in Indonesia,‖ Int. J. Hous.
Kelurahan Karang Besuki, Tunggulwulung, Lowokwaru, Mark. Anal., vol. 8, no. 2, pp. 169–188, 2015.
Puncak Trikora, Sumbersari, Dinoyo, Manggar. Based on the [15] V. Limsombunchai, ―House price prediction: Hedonic price model vs.
result from particle test, iteration test and inertia weight test can artificial neural network,‖ Am. J. …, 2004.
be concluded that M-1 represents Karang Besuki area get the [16] D. X. Zhu and K. L. Wei, ―The Land Prices and Housing Prices ——
best parameter for optimal prediction. Those best values of Empirical Research Based on Panel Data of 11 Provinces and
parameters obtained are 1800 particles, 700 iterations and of Municipalities in Eastern China,‖ Int. Conf. Manag. Sci. Eng., no. 2009,
pp. 2118–2123, 2013.
inertia weight 0.4 and 0.8 can get minimum prediction error
[17] S. Kisilevich, D. Keim, and L. Rokach, ―A GIS-based decision support
RMSE as IDR 14.186. For the other model, the error prediction system for hotel room rate estimation and temporal price prediction: The
values are still large. Using different methods that match the hotel brokers’ context,‖ Decis. Support Syst., vol. 54, no. 2, pp. 1119–
time-series data will be used in the future research to obtain 1133, 2013.
smaller error prediction values and using more data to get the [18] C. Y. Jim and W. Y. Chen, ―Value of scenic views: Hedonic assessment
better result. of private housing in Hong Kong,‖ Landsc. Urban Plan., vol. 91, no. 4,
pp. 226–234, 2009.
REFERENCES [19] L. Bryant, ―Housing affordability in Australia: an empirical study of the
[1] R. M. A. van der Schaar, ―Analysis of Indonesian Property Market; impact of infrastructure charges,‖ J. Hous. Built Environ., 2016.
Overview and Foreign Ownership,‖ Investment Indonesian. 2015. [20] S. Rosen, ―Hedonic Prices and Implicit Markets: Product Differentiation
[2] The Central Bureau of Statistics, ―Population Census,‖ 2015. in Pure Competition,‖ J. Polit. Econ., vol. 82, no. 1, pp. 34–55, 1974.
[3] W. T. Lim, L. Wang, and Y. Wang, ―Singapore Housing Price Prediction [21] J. Kennedy and R. Eberhart, ―Particle swarm optimization,‖ 1995 IEEE
Using Neural Networks,‖ Int. Conf. Nat. Comput. Fuzzy Syst. Knowl. Int. Conf. Neural Networks (ICNN 95), vol. 4, pp. 1942–1948, 1995.
Discov., vol. 12, pp. 518–522, 2016. [22] R. C. Eberhart and Y. Shi, ―Comparing inertia weights and constriction
[4] Y. Feng and K. Jones, ―Comparing multilevel modelling and artificial factors in particle swarm optimization,‖ IEEE Congr. Evol. Comput.,
neural networks in house price prediction,‖ 2015 2nd IEEE Int. Conf. vol. 1, no. 7, pp. 84–88 vol.1, 2000.
Spat. Data Min. Geogr. Knowl. Serv., pp. 108–114, 2015. [23] A. Ratnaweera, S. K. Halgamuge, and H. C. Watson, ―Self-organizing
[5] R. Ghodsi, ―Estimation of Housing Prices by Fuzzy Regression and hierarchical Particle swarm optimizer with time varying acceleration
Artificial Neural Network,‖ in Fourth Asia International Conference on coefficients,‖ IEEE Trans. Evol. Comput., vol. 8, no. 3, p. 240–?255,
2004
326 | P a g e
[Link]