Longitudinal studies play an important role in scientific researches. The defining characteristic of the longitudinal studies is that observations are collected from each subject repeatedly over time, or under different conditions. Missing values are common in longitudinal studies. The presence of missing values is always a fundamental challenge since it produces potential bias, even in well controlled conditions. Three different missing data mechanisms are defined; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Several imputation methods have been developed in literature to handle missing values in longitudinal data. The most commonly used imputation methods include complete case analysis (CCA), mean imputation (Mean), last observation carried forward (LOCF), hot deck (HOT), regression imputation (Regress), K-nearest neighbor (KNN), The expectation maximization (EM) algorithm, and multiple imputation (MI). In this article, a comparative study is conducted to investigate the efficiency of these eight imputation methods under different missing data mechanisms. The comparison is conducted through simulation study. It is concluded that the MI method is the most effective method as it has the least standard errors. The EM algorithm has the largest relative bias. The different methods are also compared via real data application.
Published in | International Journal of Statistical Distributions and Applications (Volume 3, Issue 4) |
DOI | 10.11648/j.ijsd.20170304.13 |
Page(s) | 72-80 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Dropout Missing, Longitudinal Data, Missing Data, Multiple Imputations, Single Imputation
[1] | Allison, P. D. (2002) Missing data, quantitative applications in the social sciences, SAGE University Papers. |
[2] | Blankers, M., Koeter, M. W. J., and Schippers, G. M. (2010) Missing data approaches in e health research: simulation study and a tutorial for non-mathematically inclined researchers, Journal of Medical Internet Research, 12, 5: e54. |
[3] | Chen J, Shao J. (2000) Nearest neighbor imputation for survey data, Journal of Official Statistics, 16, 113–141. |
[4] | Dempster, A. P., Laird, M. N., and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm, Journal of the Royal Statistical Society, B39, 1-38. |
[5] | Dragset, I. G. (2009) Analysis of longitudinal data with missing values, MSc. Thesis, Department of Mathematical Sciences, Norwegian University of Science and Technology. |
[6] | Engel, J. M. and Diehr, P. (2003) Imputation of missing longitudinal data: a comparison of methods, Journal of Clinical Epidemiology, 56, 968-976. |
[7] | Fichman, M. and Cummings, J. M. (2003) Multiple Imputation for Missing Data: Making the Most of What you Know, Organizational Research Methods, 6, 282-308. |
[8] | Gad, A. M. and Ahmed, A. S. (2006) Analysis of longitudinal data with intermittent missing values using the stochastic EM algorithm, Computational Statistics & Data Analysis, 50, 2702 – 2714 |
[9] | Hürny, C., Bernhard, J., Gelber, R. D., Coates, A., Gastiglione, M., Isley, M., Dreher, D., peterson, H., Goldhirsch, A. and Senn, H. J. (1992) Quality of life measures for patients receiving adjutant therapy for breast cancer: an international trial, European J. Cancer, 28, 118–124. |
[10] | Ibrahim, J. G., Chen, M. H. and Lipsitz, S. R. (2001) Missing responses in generalized linear mixed models when the missing data mechanism is nonignorable, Biometrika, 88, 551–564. |
[11] | Lane, P. (2008) Handling drop-out in longitudinal clinical trial: a comparison of the LOCF and MMRM approaches, Pharmaceutical Statistics, 7, 93-106. |
[12] | Little, R. J. A and Rubin, D. B. (2002) Statistical analysis with missing data, 2nd edition, Wiley, US. |
[13] | Madow W. G., Nisselson, H. and Olkin, I. (1983) Incomplete data in sample surveys, report and case studies, 1, Academic Press, New York. |
[14] | Mishra, S., and Khare, D. (2014) On comparative performance of multiple imputation methods for moderate to large proportions of missing data in clinical trials: a simulation study, Journal of Medical Statistics and Informatics, 2, 7662-7669. |
[15] | Nakai, M. (2011) Simulation study: Introduction of imputation methods for missing data in longitudinal analysis, Applied Mathematical Sciences, 57, 2807-2818. |
[16] | Nakai, M. (2012) Effectiveness of Imputation Methods for Missing Data in AR (1) Longitudinal Dataset, Int. Journal of Math. Analysis, 6, 1391 – 1394. |
[17] | Nakai, M., Chen, D. G., Nishimura, K., Miyamoto, Y. (2014) Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism, Open Journal of Statistics, 4, 27-37. |
[18] | Nakai, M., and Ke, W. (2011) Review of the Methods for Handling Missing Data in Longitudinal Data Analysis, International Journal of Mathematical Analysis, 5, 1-13. |
[19] | Newman, D. (2003) Longitudinal modeling with randomly systematically missing data: A simulation of ad hoc, maximum likelihood, and multiple imputation techniques, Organizational Research Methods, 6, 328-362. |
[20] | Rancourt, E., Särndal, C. and Lee, H. (1994) Estimation of the variance in the presence of nearest neighbor imputation, Survey Research Methods Proceedings, 888-893. |
[21] | Rubin, D. B. (1987) Multiple Imputation for Nonresponse in Surveys, Wiley, New York. |
[22] | Saha, C., Jones, M. B. (2009) Bias in the last observation carried forward method under informative dropout, Journal of Statistical Planning and Inference, 139, 246 -255. |
[23] | Saunders, J. A., Morrow-Howell, N., Spitznagel, E., Dork, P., Proctor, E. K., and Pescarino, R. (2006) Imputing missing data: a comparison of methods for social work researchers, National Association of Social Workers, 30, 19-31. |
[24] | Shieh, Y. Y. (2003) Imputation methods on general linear mixed models of longitudinal studies, Committee on Statistical Methodology Conference Papers. |
[25] | Streiner, D. L. (2002) The case of the missing data: Methods of dealing with dropouts and other research vagaries, Canadian Journal of Psychiatry, 47, 68-75. |
[26] | Troxel, A. B., Harrington, D. P., Lipsitz, S. R. (1998) Analysis of longitudinal data with non-ignorable non monotone missing values. Appl. Statist, 47, 425–438. |
[27] | Van der Heijden, J. M. G., Donders, R. T., Stijnen, T., and Moons, K. G. M. (2006) Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostics research: A clinical example, Journal of Clinical Epidemiology, 59, 1102-1109. |
[28] | Zhu, X. (2015) Comparison of Four Methods for Handing Missing Data in Longitudinal Data Analysis through a Simulation Study, Open Journal of Statistics, 4, 933-944. |
APA Style
Ahmed Mahmoud Gad, Rania Hassan Mohamed Abdelkhalek. (2017). Imputation Methods for Longitudinal Data: A Comparative Study. International Journal of Statistical Distributions and Applications, 3(4), 72-80. https://doi.org/10.11648/j.ijsd.20170304.13
ACS Style
Ahmed Mahmoud Gad; Rania Hassan Mohamed Abdelkhalek. Imputation Methods for Longitudinal Data: A Comparative Study. Int. J. Stat. Distrib. Appl. 2017, 3(4), 72-80. doi: 10.11648/j.ijsd.20170304.13
AMA Style
Ahmed Mahmoud Gad, Rania Hassan Mohamed Abdelkhalek. Imputation Methods for Longitudinal Data: A Comparative Study. Int J Stat Distrib Appl. 2017;3(4):72-80. doi: 10.11648/j.ijsd.20170304.13
@article{10.11648/j.ijsd.20170304.13, author = {Ahmed Mahmoud Gad and Rania Hassan Mohamed Abdelkhalek}, title = {Imputation Methods for Longitudinal Data: A Comparative Study}, journal = {International Journal of Statistical Distributions and Applications}, volume = {3}, number = {4}, pages = {72-80}, doi = {10.11648/j.ijsd.20170304.13}, url = {https://doi.org/10.11648/j.ijsd.20170304.13}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20170304.13}, abstract = {Longitudinal studies play an important role in scientific researches. The defining characteristic of the longitudinal studies is that observations are collected from each subject repeatedly over time, or under different conditions. Missing values are common in longitudinal studies. The presence of missing values is always a fundamental challenge since it produces potential bias, even in well controlled conditions. Three different missing data mechanisms are defined; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Several imputation methods have been developed in literature to handle missing values in longitudinal data. The most commonly used imputation methods include complete case analysis (CCA), mean imputation (Mean), last observation carried forward (LOCF), hot deck (HOT), regression imputation (Regress), K-nearest neighbor (KNN), The expectation maximization (EM) algorithm, and multiple imputation (MI). In this article, a comparative study is conducted to investigate the efficiency of these eight imputation methods under different missing data mechanisms. The comparison is conducted through simulation study. It is concluded that the MI method is the most effective method as it has the least standard errors. The EM algorithm has the largest relative bias. The different methods are also compared via real data application.}, year = {2017} }
TY - JOUR T1 - Imputation Methods for Longitudinal Data: A Comparative Study AU - Ahmed Mahmoud Gad AU - Rania Hassan Mohamed Abdelkhalek Y1 - 2017/11/10 PY - 2017 N1 - https://doi.org/10.11648/j.ijsd.20170304.13 DO - 10.11648/j.ijsd.20170304.13 T2 - International Journal of Statistical Distributions and Applications JF - International Journal of Statistical Distributions and Applications JO - International Journal of Statistical Distributions and Applications SP - 72 EP - 80 PB - Science Publishing Group SN - 2472-3509 UR - https://doi.org/10.11648/j.ijsd.20170304.13 AB - Longitudinal studies play an important role in scientific researches. The defining characteristic of the longitudinal studies is that observations are collected from each subject repeatedly over time, or under different conditions. Missing values are common in longitudinal studies. The presence of missing values is always a fundamental challenge since it produces potential bias, even in well controlled conditions. Three different missing data mechanisms are defined; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). Several imputation methods have been developed in literature to handle missing values in longitudinal data. The most commonly used imputation methods include complete case analysis (CCA), mean imputation (Mean), last observation carried forward (LOCF), hot deck (HOT), regression imputation (Regress), K-nearest neighbor (KNN), The expectation maximization (EM) algorithm, and multiple imputation (MI). In this article, a comparative study is conducted to investigate the efficiency of these eight imputation methods under different missing data mechanisms. The comparison is conducted through simulation study. It is concluded that the MI method is the most effective method as it has the least standard errors. The EM algorithm has the largest relative bias. The different methods are also compared via real data application. VL - 3 IS - 4 ER -