| Peer-Reviewed

Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries

Received: 19 February 2018     Accepted: 19 March 2018     Published: 23 March 2018
Views:       Downloads:
Abstract

Count data has been witnessed in a wide range of disciplines in real life. Poisson, negative binomial (NB), zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) are some of the regression models proposed to model data with count response. All the count models are potential candidates that can model count data, but there is no means to choose the one that would perform better than the others. This study aimed to assess the count models mentioned earlier at various degrees of zero inflation. Datasets were simulated with ZIP distribution with different conditions of zero inflation (0%, 2%, 5%, 10%, 15%, 20%, 30% and 40%). Poisson and NB were observed to predict regression coefficients well when the proportion of zero is below 15%. The two ZIM performed well at higher degrees of zero inflation; beyond 15% for ZIP and 20% for ZINB. Exploratory examination of the caries data revealed a zero inflation below 15%, that is, 3.23%. Analysis of early childhood caries (ECC) data among 3-6 year old children who visited Lady Northey Dental Clinic was then performed with Poisson and NB. Akaike information criterion (AIC) test was used to compare all the competing models both under simulation and with real data. Poisson yielded lower AIC values at lower zero inflation rates as compared to other three models. ZIP had the lowest AIC value at 10%, 15%, 20%, 30% and 40% levels of zero inflation. NB model had the lowest AIC value when real data was analyzed. Education level of the father- primary school completed, chewing gum several times a week, Feeding habit jam several times a day, Feeding habit juice every day, Feeding habit soda every day and Feeding habit sweets several times a week were found to be significant factors causing ECC.

Published in International Journal of Data Science and Analysis (Volume 4, Issue 1)
DOI 10.11648/j.ijdsa.20180401.15
Page(s) 24-31
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Simulation, RMSE, Competing Models

References
[1] Agresti, A. (2003). Categorical data analysis (Vol. 482). John Wiley & Sons.
[2] Benson, N. F. (2018). Introduction to a Special Issue on Simulation Studies as a Means of Informing Psychoeducational Testing and Assessment. Journal of Psychoeducational Assessment, 36(1), 3-6.
[3] Beaujean, A. A. (2018). Simulating data for clinical research: A tutorial. Journal of Psychoeducational Assessment, 0734282917690302.
[4] Cameron, A. C., & Trivedi, P. K. (2013). Regression analysis of count data (Vol. 53). Cambridge university press.
[5] Çolak, H., Dülgergil, Ç. T., Dalli, M., Hamidi, M. M., et al. (2013). Early child- hood caries update: A review of causes, diagnoses, and treatments. Journal of Natural Science, Biology and Medicine, 4 (1), 29.
[6] Coxe, S., West, S. G., & Aiken, L. S. (2009). The analysis of count data: A gentle introduction to poisson regression and its alternatives. Journal of personality assessment, 91 (2), 121–136.
[7] Greene, W. (2008). Functional forms for the negative binomial model for count data. Economics Letters, 99 (3), 585–590.
[8] Hallgren, K. A. (2013). Conducting simulation studies in the r programming environment. Tutorials in quantitative methods for psychology, 9 (2), 43.
[9] Hilbe, J. M. (2011). Negative binomial regression. Cambridge University Press.
[10] Lambert, D. (1992). Zero-inflated poisson regression, with an application to defects in manufacturing. Technometrics, 34 (1), 1–14.
[11] Morgan, G. B., Moore, C. A., & Floyd, H. S. (2018). On using simulations to inform decision making during instrument development. Journal of Psychoeducational Assessment, 36(1), 82-94.
[12] Morris, T. P., White, I. R., & Crowther, M. J. (2017). Using simulation studies to evaluate statistical methods. arXiv preprint arXiv:1712.03198.
[13] Mwalili, S. M., Lesaffre, E., & Declerck, D. (2008). The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research, 17 (2), 123–139.
[14] Padhi, S. S., & Mohapatra, P. K. (2007). A discrete event simulation model for awarding of works contract in the government–a case study. In 5th international conference on e-governence-2007.
[15] Sainani, K. L. (2015). What is computer simulation? PM&R, 7 (12), 1290–1293.
[16] Sokolowski, J. A., & Banks, C. M. (2011). Principles of modeling and simulation: a multidisciplinary approach. John Wiley & Sons.
[17] Wenger, S. J., & Freeman, M. C. (2008). Estimating species occurrence, abun- dance, and detection probability using zero-inflated distributions. Ecology, 89 (10), 2953–2959.
[18] Xia, Y., Morrison-Beedy, D., Ma, J., Feng, C., Cross, W., & Tu, X. (2012). Modeling count outcomes from hiv risk reduction interventions: a compari- son of competing statistical models for count responses. AIDS research and treatment, 2012.
[19] Zuur, A. F., Ieno, E. N., Walker, N. J., Saveliev, A. A., & Smith, G. M. (2009). Zero-truncated and zero-inflated models for count data. In Mixed effects models and extensions in ecology with r (pp. 261–293). Springer.
Cite This Article
  • APA Style

    Agnes Njambi Wanjau, Samuel Musili Mwalili, Oscar Ngesa. (2018). Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries. International Journal of Data Science and Analysis, 4(1), 24-31. https://doi.org/10.11648/j.ijdsa.20180401.15

    Copy | Download

    ACS Style

    Agnes Njambi Wanjau; Samuel Musili Mwalili; Oscar Ngesa. Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries. Int. J. Data Sci. Anal. 2018, 4(1), 24-31. doi: 10.11648/j.ijdsa.20180401.15

    Copy | Download

    AMA Style

    Agnes Njambi Wanjau, Samuel Musili Mwalili, Oscar Ngesa. Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries. Int J Data Sci Anal. 2018;4(1):24-31. doi: 10.11648/j.ijdsa.20180401.15

    Copy | Download

  • @article{10.11648/j.ijdsa.20180401.15,
      author = {Agnes Njambi Wanjau and Samuel Musili Mwalili and Oscar Ngesa},
      title = {Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries},
      journal = {International Journal of Data Science and Analysis},
      volume = {4},
      number = {1},
      pages = {24-31},
      doi = {10.11648/j.ijdsa.20180401.15},
      url = {https://doi.org/10.11648/j.ijdsa.20180401.15},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20180401.15},
      abstract = {Count data has been witnessed in a wide range of disciplines in real life. Poisson, negative binomial (NB), zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) are some of the regression models proposed to model data with count response. All the count models are potential candidates that can model count data, but there is no means to choose the one that would perform better than the others. This study aimed to assess the count models mentioned earlier at various degrees of zero inflation. Datasets were simulated with ZIP distribution with different conditions of zero inflation (0%, 2%, 5%, 10%, 15%, 20%, 30% and 40%). Poisson and NB were observed to predict regression coefficients well when the proportion of zero is below 15%. The two ZIM performed well at higher degrees of zero inflation; beyond 15% for ZIP and 20% for ZINB. Exploratory examination of the caries data revealed a zero inflation below 15%, that is, 3.23%. Analysis of early childhood caries (ECC) data among 3-6 year old children who visited Lady Northey Dental Clinic was then performed with Poisson and NB. Akaike information criterion (AIC) test was used to compare all the competing models both under simulation and with real data. Poisson yielded lower AIC values at lower zero inflation rates as compared to other three models. ZIP had the lowest AIC value at 10%, 15%, 20%, 30% and 40% levels of zero inflation. NB model had the lowest AIC value when real data was analyzed. Education level of the father- primary school completed, chewing gum several times a week, Feeding habit jam several times a day, Feeding habit juice every day, Feeding habit soda every day and Feeding habit sweets several times a week were found to be significant factors causing ECC.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Assessment and Selection of Competing Models for Count Data: An Application to Early Childhood Caries
    AU  - Agnes Njambi Wanjau
    AU  - Samuel Musili Mwalili
    AU  - Oscar Ngesa
    Y1  - 2018/03/23
    PY  - 2018
    N1  - https://doi.org/10.11648/j.ijdsa.20180401.15
    DO  - 10.11648/j.ijdsa.20180401.15
    T2  - International Journal of Data Science and Analysis
    JF  - International Journal of Data Science and Analysis
    JO  - International Journal of Data Science and Analysis
    SP  - 24
    EP  - 31
    PB  - Science Publishing Group
    SN  - 2575-1891
    UR  - https://doi.org/10.11648/j.ijdsa.20180401.15
    AB  - Count data has been witnessed in a wide range of disciplines in real life. Poisson, negative binomial (NB), zero inflated Poisson (ZIP) and zero inflated negative binomial (ZINB) are some of the regression models proposed to model data with count response. All the count models are potential candidates that can model count data, but there is no means to choose the one that would perform better than the others. This study aimed to assess the count models mentioned earlier at various degrees of zero inflation. Datasets were simulated with ZIP distribution with different conditions of zero inflation (0%, 2%, 5%, 10%, 15%, 20%, 30% and 40%). Poisson and NB were observed to predict regression coefficients well when the proportion of zero is below 15%. The two ZIM performed well at higher degrees of zero inflation; beyond 15% for ZIP and 20% for ZINB. Exploratory examination of the caries data revealed a zero inflation below 15%, that is, 3.23%. Analysis of early childhood caries (ECC) data among 3-6 year old children who visited Lady Northey Dental Clinic was then performed with Poisson and NB. Akaike information criterion (AIC) test was used to compare all the competing models both under simulation and with real data. Poisson yielded lower AIC values at lower zero inflation rates as compared to other three models. ZIP had the lowest AIC value at 10%, 15%, 20%, 30% and 40% levels of zero inflation. NB model had the lowest AIC value when real data was analyzed. Education level of the father- primary school completed, chewing gum several times a week, Feeding habit jam several times a day, Feeding habit juice every day, Feeding habit soda every day and Feeding habit sweets several times a week were found to be significant factors causing ECC.
    VL  - 4
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Department of Statistics and Actuarial Sciences, Jomo Kenyatta University of Agriculture and Technology, Nairobi, Kenya

  • Sections