Variable selection in count data using Penalized Poisson regression is one of the challenges in applying Poisson regression model when the explanatory variables are correlated. To tackle both estimate the coefficients and perform variable selection simultaneously, Lasso penalty was successfully applied in Poisson regression. However, Lasso has two major limitations. In the p > n case, the lasso selects at most n variables before it saturates, because of the nature of the convex optimization problem. This seems to be a limiting feature for a variable selection method. Moreover, the lasso is not well-defined unless the bound on the L1-norm of the coefficients is smaller than a certain value. If there were a group of variables among which the pairwise correlations are very high, then the lasso tends to select only one variable from the group and does not care which one is selected. To address these issues, we propose the elastic net method between explanatory variables and to provide the consistency of the variable selection simultaneously. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in the model together.
Published in | International Journal of Data Science and Analysis (Volume 5, Issue 5) |
DOI | 10.11648/j.ijdsa.20190505.14 |
Page(s) | 99-103 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2019. Published by Science Publishing Group |
Penalized, Poisson Regression, Elastic Net Penalty, Lasso
[1] | M. Pourahmadi, High-dimensional covariance estimation: with high-dimensional data, vol. 882. John Wiley & Sons, 2013. |
[2] | S. Hossain and E. Ahmed, “Shrinkage and penalty estimators of a Poisson regression model,” Aust. N. Z. J. Stat., vol. 54, no. 3, pp. 359–373, 2012. |
[3] | M. El Anbari and A. Mkhadri, “Penalized regression combining the L 1 norm and a correlation based penalty,” Sankhya B, vol. 76, no. 1, pp. 82–102, 2014. |
[4] | H. D. Bondell and B. J. Reich, “Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR,” Biometrics, vol. 64, no. 1, pp. 115–123, 2008. |
[5] | R. Tibshirani, “Regression shrinkage and selection via the lasso,” J. R. Stat. Soc. Ser. B, vol. 58, no. 1, pp. 267–288, 1996. |
[6] | O. Troyanskaya et al., “Missing value estimation methods for DNA microarrays,” Bioinformatics, vol. 17, no. 6, pp. 520–525, 2001. |
[7] | Z. Wang, S. Ma, and C.-Y. Wang, “Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany,” Biometrical J., vol. 57, no. 5, pp. 867–884, 2015. |
[8] | H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” J. R. Stat. Soc. Ser. B (statistical Methodol., vol. 67, no. 2, pp. 301–320, 2005. |
[9] | Y. Fan and C. Y. Tang, “Tuning parameter selection in high dimensional penalized likelihood,” J. R. Stat. Soc. Ser. B (Statistical Methodol., vol. 75, no. 3, pp. 531–552, 2013. |
[10] | Z. Y. Algamal, “Diagnostic in poisson regression models,” Electron. J. Appl. Stat. Anal., vol. 5, no. 2, pp. 178–186, 2012. |
[11] | Z. Y. Algamal and M. H. Lee, “Penalized logistic regression with the adaptive LASSO for gene selection in high-dimensional cancer classification,” Expert Syst. Appl., vol. 42, no. 23, pp. 9326–9332, 2015. |
[12] | J. Friedman, T. Hastie, and R. Tibshirani, “Regularization paths for generalized linear models via coordinate descent,” J. Stat. Softw., vol. 33, no. 1, p. 1, 2010. |
[13] | K. Hoffmann, “Stein estimation—a review,” Stat. Pap., vol. 41, no. 2, p. 127, 2000. |
[14] | F. Xue and A. Qu, “Variable Selection for Highly Correlated Predictors,” arXiv Prepr. arXiv1709.04840, 2017. |
[15] | M. Y. Park and T. Hastie, “L1-regularization path algorithm for generalized linear models,” J. R. Stat. Soc. Ser. B (Statistical Methodol., vol. 69, no. 4, pp. 659–677, 2007. |
[16] | G. N. Collins, R. J. Lee, G. B. McKelvie, A. C. N. Rogers, and M. Hehir, “Relationship between prostate specific antigen, prostate volume and age in the benign prostate,” Br. J. Urol., vol. 71, no. 4, pp. 445–450, 1993. |
APA Style
Josephine Mwikali, Samuel Mwalili, Anthony Wanjoya. (2019). Penalized Poisson Regression Model Using Elastic Net and Least Absolute Shrinkage and Selection Operator (Lasso) Penality. International Journal of Data Science and Analysis, 5(5), 99-103. https://doi.org/10.11648/j.ijdsa.20190505.14
ACS Style
Josephine Mwikali; Samuel Mwalili; Anthony Wanjoya. Penalized Poisson Regression Model Using Elastic Net and Least Absolute Shrinkage and Selection Operator (Lasso) Penality. Int. J. Data Sci. Anal. 2019, 5(5), 99-103. doi: 10.11648/j.ijdsa.20190505.14
AMA Style
Josephine Mwikali, Samuel Mwalili, Anthony Wanjoya. Penalized Poisson Regression Model Using Elastic Net and Least Absolute Shrinkage and Selection Operator (Lasso) Penality. Int J Data Sci Anal. 2019;5(5):99-103. doi: 10.11648/j.ijdsa.20190505.14
@article{10.11648/j.ijdsa.20190505.14, author = {Josephine Mwikali and Samuel Mwalili and Anthony Wanjoya}, title = {Penalized Poisson Regression Model Using Elastic Net and Least Absolute Shrinkage and Selection Operator (Lasso) Penality}, journal = {International Journal of Data Science and Analysis}, volume = {5}, number = {5}, pages = {99-103}, doi = {10.11648/j.ijdsa.20190505.14}, url = {https://doi.org/10.11648/j.ijdsa.20190505.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20190505.14}, abstract = {Variable selection in count data using Penalized Poisson regression is one of the challenges in applying Poisson regression model when the explanatory variables are correlated. To tackle both estimate the coefficients and perform variable selection simultaneously, Lasso penalty was successfully applied in Poisson regression. However, Lasso has two major limitations. In the p > n case, the lasso selects at most n variables before it saturates, because of the nature of the convex optimization problem. This seems to be a limiting feature for a variable selection method. Moreover, the lasso is not well-defined unless the bound on the L1-norm of the coefficients is smaller than a certain value. If there were a group of variables among which the pairwise correlations are very high, then the lasso tends to select only one variable from the group and does not care which one is selected. To address these issues, we propose the elastic net method between explanatory variables and to provide the consistency of the variable selection simultaneously. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in the model together.}, year = {2019} }
TY - JOUR T1 - Penalized Poisson Regression Model Using Elastic Net and Least Absolute Shrinkage and Selection Operator (Lasso) Penality AU - Josephine Mwikali AU - Samuel Mwalili AU - Anthony Wanjoya Y1 - 2019/10/29 PY - 2019 N1 - https://doi.org/10.11648/j.ijdsa.20190505.14 DO - 10.11648/j.ijdsa.20190505.14 T2 - International Journal of Data Science and Analysis JF - International Journal of Data Science and Analysis JO - International Journal of Data Science and Analysis SP - 99 EP - 103 PB - Science Publishing Group SN - 2575-1891 UR - https://doi.org/10.11648/j.ijdsa.20190505.14 AB - Variable selection in count data using Penalized Poisson regression is one of the challenges in applying Poisson regression model when the explanatory variables are correlated. To tackle both estimate the coefficients and perform variable selection simultaneously, Lasso penalty was successfully applied in Poisson regression. However, Lasso has two major limitations. In the p > n case, the lasso selects at most n variables before it saturates, because of the nature of the convex optimization problem. This seems to be a limiting feature for a variable selection method. Moreover, the lasso is not well-defined unless the bound on the L1-norm of the coefficients is smaller than a certain value. If there were a group of variables among which the pairwise correlations are very high, then the lasso tends to select only one variable from the group and does not care which one is selected. To address these issues, we propose the elastic net method between explanatory variables and to provide the consistency of the variable selection simultaneously. Real world data and a simulation study show that the elastic net often outperforms the lasso, while enjoying a similar sparsity of representation. In addition, the elastic net encourages a grouping effect, where strongly correlated predictors tend to be in the model together. VL - 5 IS - 5 ER -