In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.
Published in | International Journal of Data Science and Analysis (Volume 6, Issue 5) |
DOI | 10.11648/j.ijdsa.20200605.15 |
Page(s) | 153-162 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2020. Published by Science Publishing Group |
Over-dispersion, Count, Discrete Weibull, Artificial Neural Network
[1] | Karlaftis, M. G. and Tarko, A. P. (1998). Heterogeneity considerations in accident modeling. Accident Analysis & Prevention, 30 (4): 425–433. |
[2] | Cameron, A. C. and Trivedi, P. K. (2013). Regression analysis of count data, volume 53. Cambridge university press. |
[3] | Chin, H. C. and Quddus, M. A. (2003). Applying the random effect negative binomial model to examine traffic accident occurrence at signalized intersections. Accident Analysis & Prevention, 35 (2): 253–259. |
[4] | Lord, D. and Mannering, F. (2010). The statistical analysis of crash-frequency data: a review and assessment of methodological alternatives. Transportation research part A: policy and practice, 44 (5): 291–305. |
[5] | Hauer, E. (1997). Observational before/after studies in road safety. Estimating the effect of highway and traffic engineering measures on road safety. |
[6] | Kadane, J. B., Shmueli, G., Minka, T. P., Borle, S., Boatwright, P., et al. (2006). Conjugate analysis of the conway-maxwell-poisson distribution. Bayesian analysis, 1 (2): 363–374. |
[7] | Consul, P. and Famoye, F. (1992). Generalized poisson regression model. Communications in Statistics-Theory and Methods, 21 (1): 89–109. |
[8] | Sellers, K. F., Shmueli, G., et al. (2010). A flexible regression model for count data. The Annals of Applied Statistics, 4 (2): 943–961. |
[9] | Smith, D. and Faddy, M. (2016). Mean and variance modeling of under-and overdispersed count data. Journal of Statistical Software, 69 (6): 1–23. |
[10] | Sáez-Castillo, A. and Conde-Sánchez, A. (2013). A hyper-poisson regression model for overdispersed and underdispersed count data. Computational Statistics & Data Analysis, 61: 148–157. |
[11] | Chanialidis, C., Evers, L., Neocleous, T., and Nobile, A. (2018). Efficient bayesian inference for com-poisson regression models. Statistics and Computing, 28 (3): 595–608. |
[12] | Klakattawi, H., Vinciotti, V., and Yu, K. (2018). A simple and adaptive dispersion regression model for count data. Entropy, 20 (2): 142. |
[13] | Lee, A. H., Stevenson, M. R., Wang, K., and Yau, K. K. (2002). Modeling young driver motor vehicle crashes: data with extra zeros. Accident Analysis & Prevention, 34 (4): 515–521. |
[14] | Berhanu, G. (2004). Models relating traffic safety with road environment and traffic flows on arterial roads in addis ababa. Accident Analysis & Prevention, 36 (5): 697–704. |
[15] | Lord, D., Washington, S. P., and Ivan, J. N. (2005). Poisson, poisson-gamma and zero-inflated regression models of motor vehicle crashes: balancing statistical fit and theory. Accident Analysis & Prevention, 37 (1): 35–46. |
[16] | Lord, D. (2006). Modeling motor vehicle crashes using poisson-gamma models: Examining the effects of low sample mean values and small sample size on the estimation of the fixed dispersion parameter. Accident Analysis & Prevention, 38 (4): 751–766. |
[17] | Lord, D., Geedipally, S. R., and Guikema, S. D. (2010). Extension of the application of conway-maxwell-poisson models: Analyzing traffic crash data exhibiting underdispersion. Risk Analysis: An International Journal, 30 (8): 1268–1276. |
[18] | Winkelmann, R. and Zimmermann, K. F. (1995). Recent developments in count data modelling: theory and application. Journal of economic surveys, 9 (1): 1–24. |
[19] | Oh, J., Washington, S. P., and Nam, D. (2006). Accident prediction model for railway-highway interfaces. Accident Analysis & Prevention, 38 (2): 346–356. |
[20] | Hilbe, J. M. (2011). Modeling count data. Springer. |
[21] | Nakagawa, T. and Osaki, S. (1975). The discrete weibull distribution. IEEE Transactions on Reliability, 24 (5): 300–301. |
[22] | Roy, D. (2004). Discrete rayleigh distribution. IEEE Transactions on Reliability, 53 (2): 255–260. |
[23] | Sato, H., Ikota, M., Sugimoto, A., and Masuda, H. (1999). A new defect distribution metrology with a consistent discrete exponential formula and its applications. IEEE Transactions on Semiconductor Manufacturing, 12 (4): 409–418. |
[24] | Barbiero, A. (2015). Discreteweibull: Discrete weibull distributions (type 1 and 3), r package version 1.1. |
[25] | Da Silva, M. F., Ferrari, S. L. P., and Cribari-Neto, F. (2008). Improved likelihood inference for the shape parameter in weibull regression. Journal of Statistical Computation and Simulation, 78 (9): 789–811. |
[26] | Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5 (3): 236–244. |
[27] | Gichuhi, A. W. (2008). Nonparametric changepoint analysis for bernoulli random variables based on neural networks. |
[28] | Yunos, Z. M., Ali, A., Shamsyuddin, S. M., Ismail, N., et al. (2016a). Predictive modelling for motor insurance claims using artificial neural networks. Int. J. Advance Soft Compu. Appl, 8 (3). |
[29] | Haghani, S., Sedehi, M., and Kheiri, S. (2017). Artificial neural network to modeling zero- inflated count data: Application to predicting number of return to blood donation. Journal of research in health sciences, 17 (3): E1–4. |
[30] | Ke, J. and Liu, X. (2008). Empirical analysis of optimal hidden neurons in neural network modeling for stock prediction. In 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application, volume 2, pages 828–832. IEEE. |
[31] | Hilbe, J. M. (2014). Modeling count data. Cambridge University Press. |
APA Style
Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. (2020). Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. International Journal of Data Science and Analysis, 6(5), 153-162. https://doi.org/10.11648/j.ijdsa.20200605.15
ACS Style
Kipkorir Collins; Anthony Waititu; Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int. J. Data Sci. Anal. 2020, 6(5), 153-162. doi: 10.11648/j.ijdsa.20200605.15
AMA Style
Kipkorir Collins, Anthony Waititu, Anthony Wanjoya. Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data. Int J Data Sci Anal. 2020;6(5):153-162. doi: 10.11648/j.ijdsa.20200605.15
@article{10.11648/j.ijdsa.20200605.15, author = {Kipkorir Collins and Anthony Waititu and Anthony Wanjoya}, title = {Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data}, journal = {International Journal of Data Science and Analysis}, volume = {6}, number = {5}, pages = {153-162}, doi = {10.11648/j.ijdsa.20200605.15}, url = {https://doi.org/10.11648/j.ijdsa.20200605.15}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20200605.15}, abstract = {In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network.}, year = {2020} }
TY - JOUR T1 - Discrete Weibull and Artificial Neural Network Models in Modelling Over-dispersed Count Data AU - Kipkorir Collins AU - Anthony Waititu AU - Anthony Wanjoya Y1 - 2020/10/26 PY - 2020 N1 - https://doi.org/10.11648/j.ijdsa.20200605.15 DO - 10.11648/j.ijdsa.20200605.15 T2 - International Journal of Data Science and Analysis JF - International Journal of Data Science and Analysis JO - International Journal of Data Science and Analysis SP - 153 EP - 162 PB - Science Publishing Group SN - 2575-1891 UR - https://doi.org/10.11648/j.ijdsa.20200605.15 AB - In modelling count data, the use of least square regression models suffers several methodological limitations and statistical properties in instances of discrete, non-negative integer count of a dependent variable. Unlike the classical regression model, count data models are non-linear with many properties of the response variable relating to discreteness, non-linearity and deal with non-negative values only. A good starting point for modelling count data is the Poisson regression model since it lends itself well with the nature properties of count data. However, the limitation of equi-dispersion renders it inappropriate for modelling over-dispersed data. Negative Binomial regression model has been widely used and considered as the default regression model for over-dispersed count data. This model is a modification of Poisson regression model and though widely used, it might not be the best model for over-dispersion and other models have been found to perform better. Over-dispersion in this study was defined relative to the Poisson model. This study models over-dispersed count data using discrete Weibull regression model and artificial neural network model with a median neuron in the hidden layer. After fitting the two models on simulated data and real data, the artificial neural network model outperformed the discrete Weibull regression model. Application on data set from German health survey gave RMSE of DW regression model as 69.0668 and 35.5652 for the artificial neural network. VL - 6 IS - 5 ER -