RecordNumber
4033
Author
Farahzadiy, Mehdi
Crop_Body
Mehdi Farahzadiy;, Rahman Farnooshz and Mohammad Hassan Behzadi
Title of Article
Machine Learning Models for Housing Prices Forecasting using Registration Data
Title Of Journal
مجله پژوهشهاي آماري ايران
PublishInfo
Statistical Research and Training Center پژوهشكده آمار
Publication Year
2020
Volum
17
Issue Number
1
Page
191-214
Keywords
Housing price forecasting , nearest neighbor regression , random forest regression. , support vector regression , long short-Term memory neural network , and extreme gradient boosting regression
Abstract
This article has been compiled to identify the best model of
housing price forecasting using machine learning methods with maximum
accuracy and minimum error. Five important machine learning algorithms
are used to predict housing prices, including Nearest Neighbor Regression
Algorithm (KNNR), Support Vector Regression Algorithm (SVR), Random
Forest Regression Algorithm (RFR), Extreme Gradient Boosting Regression
Algorithm (XGBR), and the Long Short-Term Memory Neural Network Algorithm
(LSTM). This research has been done using the data of the Statistics
Center of Iran, which contains information on the purchase and sale of
residential units in Tehran in the years 2014 to 2020 and includes 998299
transactions and 11 features. Loss of data, batch data conversion, normalization,
etc. are performed on the housing data set to obtain the final and
error-free data set. To divide the data set into training and test data sets, the
important and practical method of cross-validation or K-Fold has been used
because of its simplicity and effectiveness and as a universally valid method.
Various evaluation criteria such as MSE, RMSE, MAE,ME and R2 were used
to compare the models and identify the best model. Comparison of models
in terms of all evaluation criteria in all K-fold subsets proves the stability
and superiority of the Extreme Gradient Boosting Regression model.