The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.
Published in | International Journal of Data Science and Analysis (Volume 6, Issue 1) |
DOI | 10.11648/j.ijdsa.20200601.11 |
Page(s) | 1-11 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2020. Published by Science Publishing Group |
Clean Data, Missing Data, Imputation, Petroleum Total Demand
[1] | Christian Fürber (2016) Data Quality Management with Semantic Technologies Springer Gabler. |
[2] | Roderick J. A, Donald B. Rubin (2002) Statistical Analysis with Missing Data, Wiley-Interscience. |
[3] | Wanishsakpong, W., & Notodiputro, K. A. (2018). Locally weighted scatter‐plot smoothing for analysing temperature changes and patterns in A ustralia. Meteorological Applications, 25 (3), 357-364. |
[4] | Jack E. Olson (2003) Data Quality: The Accuracy Dimension (The Morgan Kaufmann Series in Data Management Systems) 1st Edition Morgan Kaufmann. |
[5] | Alexandra, A., Megan, D., Elizabeth, D. and Shivani, M. (2015). City-Level Energy Decision Making: Data Use in Energy Planning, Implementation, and Evaluation in U.S. Cities NREL is a national laboratory of the U.S. Department of Energy Office of Energy Efficiency & Renewable EnergyOperated by the Alliance for Sustainable Energy, LLCT report. |
[6] | Kihara, P. N. (2013). Estimation of Finite Population Total in the Face of Missing Values Using Model Calibration and Model Assistance on Semiparametric and Nonparametric Models. PhD thesis. |
[7] | Rüeger, S., McDaid, A., & Kutalik, Z. (2018). Improved imputation of summary statistics for admixed populations. bioRxiv, 203927. |
[8] | Mbugua, L. (2014). Modeling energy demand using nonparametric and extreme value theory. Lambert Academic Publishing. |
[9] | Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley Series in Probability and Statistics ISBN: 9780470316696 |DOI:10.1002/9780470316696. |
[10] | Schafer, J. L. (1999) Multiple Imputation: A Primer. Statistical MethodsinMedicalResearch, 8, 3-15. http://dx.doi.org/10.1191/096228099671525676. |
[11] | Schafer, J. L and John W. G. (2002) Missing Data: Our View of the State of the Art. Psychological Methods. The American Psychological Association, 7 (2), 147–177 |
[12] | Fan, J. (1992). Design-adaptive nonparametric regression. Journal of the American Statistical Association, 87, 998-1004. |
[13] | Fan, J and Gijbels, I (2003). Local polynomial modeling and its application. Chapman and Hall. |
[14] | Ruppert, D and Wand, M. P (1994). Multivariate weighted least squares regression. Ann. Statist. 22, 1346–70. |
[15] | Horvitz, D., and Thompson, D. (1952) A generalization of sampling without replacement from a finite universe. Journal of American Statistical Association, 47:663-685. |
[16] | Breidt, F. J., Opsomer, J. D., Johnson, A. A., and Ranalli, M. G. (2007). Semiparametric model-assisted estimation for natural resource surveys. Survey Methodology, 33 (1), 35. |
[17] | Cochran, W. G. (1977). Sampling techniques-3. New York, NY (USA) Wiley. |
[18] | Pyeye, S. (2018). Imputation Based On Local Polynomial Regression for Nonmonotone Nonrespondents in Longitudinal Surveys (Doctoral dissertation, JKUAT-PAUSTI). |
[19] | Fritz, M. (2019). Steady state adjusting trends using a data-driven local polynomial regression. Economic Modelling. |
[20] | Cattaneo, M. D., Jansson, M., & Ma, X. (2019). Simple local polynomial density estimators. Journal of the American Statistical Association, (just-accepted), 1-11. |
APA Style
Benard Mworia Warutumo, Pius Nderitu Kihara, Levi Mbugua. (2020). Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. International Journal of Data Science and Analysis, 6(1), 1-11. https://doi.org/10.11648/j.ijdsa.20200601.11
ACS Style
Benard Mworia Warutumo; Pius Nderitu Kihara; Levi Mbugua. Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. Int. J. Data Sci. Anal. 2020, 6(1), 1-11. doi: 10.11648/j.ijdsa.20200601.11
AMA Style
Benard Mworia Warutumo, Pius Nderitu Kihara, Levi Mbugua. Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis. Int J Data Sci Anal. 2020;6(1):1-11. doi: 10.11648/j.ijdsa.20200601.11
@article{10.11648/j.ijdsa.20200601.11, author = {Benard Mworia Warutumo and Pius Nderitu Kihara and Levi Mbugua}, title = {Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis}, journal = {International Journal of Data Science and Analysis}, volume = {6}, number = {1}, pages = {1-11}, doi = {10.11648/j.ijdsa.20200601.11}, url = {https://doi.org/10.11648/j.ijdsa.20200601.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20200601.11}, abstract = {The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya.}, year = {2020} }
TY - JOUR T1 - Estimating Total Energy Demand from Incomplete Data Using Non-parametric Analysis AU - Benard Mworia Warutumo AU - Pius Nderitu Kihara AU - Levi Mbugua Y1 - 2020/01/08 PY - 2020 N1 - https://doi.org/10.11648/j.ijdsa.20200601.11 DO - 10.11648/j.ijdsa.20200601.11 T2 - International Journal of Data Science and Analysis JF - International Journal of Data Science and Analysis JO - International Journal of Data Science and Analysis SP - 1 EP - 11 PB - Science Publishing Group SN - 2575-1891 UR - https://doi.org/10.11648/j.ijdsa.20200601.11 AB - The validity and usefulness of empirical data requires that the data analyst ascertains the cleanliness of the collected data before any statistical analysis commence. In this study, petroleum demand data for a period of 24 hours was collected from 1515 households in 10 clusters. The primary sampling units were stratified into three economic classes of which 50% were drawn from low class, 28% from medium class and 22% from high class. 63.6% of the questionnaires were completed whereas incomplete data was computed using multivariate imputation by chained equation with the aid of auxiliary information from past survey. The proportion of missing data and its pattern was ascertained. The study assumed that missing data was at random. Nonparametric methods namely Nadaraya Watson, Local Polynomial and a design estimator Horvitz Thompson were fitted to aid in the estimation of the total demand for petroleum which has no close substitute. The performance of the three estimators were compared and the study found that the Local Polynomial approach appeared to be more efficient and competitive with low bias. Local polynomial estimator took care of the boundary bias better as compared to Nadaraya Watson and Horvitz Thompson estimators. The results were used to estimate the long time gaps in petroleum demand in Nairobi county, Kenya. VL - 6 IS - 1 ER -