Due to the climate crisis and the improvement of public transportation networks, countries around the world are strongly advocating the low-carbon traveling mode. Shared bike as a new business model has a positive impact on the urban environment and transportation. The ability to estimate the hourly demand for bike sharing with high accuracy is essential for metropolis to offer stable bike rental services. Presently, data mining and predictive analysis technology can be utilized to realize the forecast of the hourly demand of shared bicycles. Data used in this article include the Seoul bike rented count dataset and weather information. This paper discusses various machine learning models for rental bike demand prediction, including Linear Regression, Ridge Regression, Lasso Regression, K-Nearest Neighbor, Random Forest, Decision Tree Regression, Support Vectors Machine, and Gradient Boosting Decision Tree. Different parameter tuning methods have been applied to improve the performance of basic predictive models. In addition, the redundant and irrelevant features have been removed to improve the performance of each basic model. After evaluating the individual basic predictors, several competent basic predictors are selected to compose a stacking-based ensemble model. Experimental results show that the stacking-based ensemble model outperforms the basic predictive models in all indicators.
Published in | American Journal of Information Science and Technology (Volume 7, Issue 2) |
DOI | 10.11648/j.ajist.20230702.13 |
Page(s) | 62-69 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2023. Published by Science Publishing Group |
Data Mining, Predictive Analytics, Regression Models, Ensemble Models, Bike Sharing Demand
[1] | Breiman, L. (1996). Stacked regressions. Machine Learning, 24 (1), 49-64. |
[2] | Bui, D. T., Tran, A. T., Klempe, H., Pradhan, B., & Revhaug, I. (2016). Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides, 13 (2), 361-378. |
[3] | Chang, W., Ji, X., Wang, L., Liu, H., Zhang, Y., Chen, B., et al. (2021), A machine-learning method of predicting vital capacity plateau value for ventilatory pump failure based on data mining. Healthcare, DOI: 10.3390/healthcare9101306. |
[4] | Eren, E. & Uz, V. E. (2020). A review on bike-sharing: The factors affecting bike-sharing demand. Sustainable Cities and Society, DOI: 10.1016/j.scs.2019.101882. |
[5] | Fishman, E. (2016). Bikeshare: A review of recent literature. Transport Reviews, 36 (1), 92-113. |
[6] | Komi, M., Li, J., Zhai, Y., & Zhang, X. (2017). Application of data mining methods in diabetes prediction. In Proceedings of 2017 2nd International Conference on Image, Vision and Computing (ICIVC), Chengdu, China, June 2-4, 2017, pp. 1006-1010. |
[7] | Lessmann, S., Baesens, B. U., Seow, H. V., Thomas, L. C. (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: a ten-year update. European Journal of Research, 247 (1), 124-136. |
[8] | Ngai, E. W., Hu, Y., Wong, Y. H., Chen, Y., & Sun, X. (2011). The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature. Decision Support Systems, 50 (3), 559-569. |
[9] | Nugumanova, A., Maulit, A., Mansurova, M., & Baiburin, Y. (2021). Understanding bike sharing stations usage with Chi-Square statistics. In Proceedings of 13th International Conference on Computational Collective Intelligence, Kallithea, Rhodes, Greece, September 29-October 1, 2021, pp. 425-436. |
[10] | Prasad, A. M., Iverson, L. R., & Liaw, A. (2006). Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems, 9 (2), 181-199. |
[11] | Qi, Y., Li, Q., Karimian, H., & Liu, D. (2019). A hybrid model for spatiotemporal forecasting of PM2.5 based on graph convolutional neural network and long short-term memory. Science of the Total Environment, 664, 1-10. |
[12] | Ruß, G., Kruse, R. R., Schneider, M., & Wagner, P. (2008). Data mining with neural networks for wheat yield prediction. In Proceedings of the 8th Industrial Conference on Advances in Data Mining: Medical Applications, E-Commerce, Marketing, and Theoretical Aspects, Leipzig, Germany, July 16-18, 2008, pp. 47-56. |
[13] | Sathishkumar, V. E. & Cho, Y. (2020). A rule-based model for Seoul bike sharing demand prediction using weather data. European Journal of Remote Sensing, 53 (sup1), 166-183. |
[14] | Sathishkumar, V. E., Park, J., & Cho, Y. (2020). Using data mining techniques for bike sharing demand prediction in metropolitan city. Computer Communications, 153, 353-366. |
[15] | Sun, Y. (2018). Sharing and riding: how the dockless bike sharing scheme in China shapes the city. Urban Science, 2 (3), 68. |
[16] | Tjur, T. (2009), Coefficients of determination in logistic regression models - a new proposal: The coefficient of discrimination. American Statistician, 63 (4), 366-372. |
[17] | Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5 (2), 241-259. |
[18] | Wu, C., Kuo, S., & Kao, S. C. (2019), Classification-based data mining applied in vehicle accident prediction. Fuzzy Systems and Data Mining, 320, 218-223. |
APA Style
Xinxue Lin, Chang Lu. (2023). A Stacking-Based Ensemble Model for Prediction of Metropolitan Bike Sharing Demand. American Journal of Information Science and Technology, 7(2), 62-69. https://doi.org/10.11648/j.ajist.20230702.13
ACS Style
Xinxue Lin; Chang Lu. A Stacking-Based Ensemble Model for Prediction of Metropolitan Bike Sharing Demand. Am. J. Inf. Sci. Technol. 2023, 7(2), 62-69. doi: 10.11648/j.ajist.20230702.13
AMA Style
Xinxue Lin, Chang Lu. A Stacking-Based Ensemble Model for Prediction of Metropolitan Bike Sharing Demand. Am J Inf Sci Technol. 2023;7(2):62-69. doi: 10.11648/j.ajist.20230702.13
@article{10.11648/j.ajist.20230702.13, author = {Xinxue Lin and Chang Lu}, title = {A Stacking-Based Ensemble Model for Prediction of Metropolitan Bike Sharing Demand}, journal = {American Journal of Information Science and Technology}, volume = {7}, number = {2}, pages = {62-69}, doi = {10.11648/j.ajist.20230702.13}, url = {https://doi.org/10.11648/j.ajist.20230702.13}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajist.20230702.13}, abstract = {Due to the climate crisis and the improvement of public transportation networks, countries around the world are strongly advocating the low-carbon traveling mode. Shared bike as a new business model has a positive impact on the urban environment and transportation. The ability to estimate the hourly demand for bike sharing with high accuracy is essential for metropolis to offer stable bike rental services. Presently, data mining and predictive analysis technology can be utilized to realize the forecast of the hourly demand of shared bicycles. Data used in this article include the Seoul bike rented count dataset and weather information. This paper discusses various machine learning models for rental bike demand prediction, including Linear Regression, Ridge Regression, Lasso Regression, K-Nearest Neighbor, Random Forest, Decision Tree Regression, Support Vectors Machine, and Gradient Boosting Decision Tree. Different parameter tuning methods have been applied to improve the performance of basic predictive models. In addition, the redundant and irrelevant features have been removed to improve the performance of each basic model. After evaluating the individual basic predictors, several competent basic predictors are selected to compose a stacking-based ensemble model. Experimental results show that the stacking-based ensemble model outperforms the basic predictive models in all indicators.}, year = {2023} }
TY - JOUR T1 - A Stacking-Based Ensemble Model for Prediction of Metropolitan Bike Sharing Demand AU - Xinxue Lin AU - Chang Lu Y1 - 2023/04/20 PY - 2023 N1 - https://doi.org/10.11648/j.ajist.20230702.13 DO - 10.11648/j.ajist.20230702.13 T2 - American Journal of Information Science and Technology JF - American Journal of Information Science and Technology JO - American Journal of Information Science and Technology SP - 62 EP - 69 PB - Science Publishing Group SN - 2640-0588 UR - https://doi.org/10.11648/j.ajist.20230702.13 AB - Due to the climate crisis and the improvement of public transportation networks, countries around the world are strongly advocating the low-carbon traveling mode. Shared bike as a new business model has a positive impact on the urban environment and transportation. The ability to estimate the hourly demand for bike sharing with high accuracy is essential for metropolis to offer stable bike rental services. Presently, data mining and predictive analysis technology can be utilized to realize the forecast of the hourly demand of shared bicycles. Data used in this article include the Seoul bike rented count dataset and weather information. This paper discusses various machine learning models for rental bike demand prediction, including Linear Regression, Ridge Regression, Lasso Regression, K-Nearest Neighbor, Random Forest, Decision Tree Regression, Support Vectors Machine, and Gradient Boosting Decision Tree. Different parameter tuning methods have been applied to improve the performance of basic predictive models. In addition, the redundant and irrelevant features have been removed to improve the performance of each basic model. After evaluating the individual basic predictors, several competent basic predictors are selected to compose a stacking-based ensemble model. Experimental results show that the stacking-based ensemble model outperforms the basic predictive models in all indicators. VL - 7 IS - 2 ER -