With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.
Published in | International Journal of Statistical Distributions and Applications (Volume 9, Issue 2) |
DOI | 10.11648/j.ijsd.20230902.11 |
Page(s) | 49-61 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2023. Published by Science Publishing Group |
Semi-Parametric Models with Interaction, Variable Selection, Modal Regression, Adaptive LASSO
[1] | Liang, H., H. Wang a, and C.-L. Tsai, Profiled forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Statistica Sinica, 2012. 22 (2): p. 531-554. |
[2] | Zhao, W., et al., Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 2013. 66 (1): p. 165-191. |
[3] | Zhang, R., W. Zhao, and J. Liu, Robust estimation and variable selection for semiparametric partially linear varying coefficient model based on modal regression. Journal of Nonparametric Statistics, 2013. 25 (2): p. 523-544. |
[4] | Bradley, E., et al., Least angle regression. The Annals of Statistics, 2004. 32 (2): p. 407-499. |
[5] | Hao, N. and H. H. Zhang, Interaction Screening for Ultrahigh-Dimensional Data. Journal of the American Statistical Association, 2014. 109 (507): p. 1285-1301. |
[6] | Hao, N., Y. Feng, and H. H. Zhang, Model Selection for High-Dimensional Quadratic Regression via Regularization. Journal of the American Statistical Association, 2018. 113 (522): p. 615-625. |
[7] | Dong, Y. and H. Jiang, A Two-Stage Regularization Method for Variable Selection and Forecasting in High-Order Interaction Model. Complexity, 2018. 2018: p. 1-12. |
[8] | Lv, J., H. Yang, and C. Guo, Variable selection in partially linear additive models for modal regression. Communications in Statistics - Simulation and Computation, 2017. 46 (7): p. 5646-5665. |
[9] | Yao, W., B. G. Lindsay, and R. Li, Local Modal Regression. J Nonparametr Stat, 2012. 24 (3): p. 647-663. |
[10] | Li, J., S. Ray, and B. G. Lindsay, A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 2007. 8: p. 1687-1723. |
[11] | Fan, J. and R. Li, Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties. Journal of the American Statistical Association, 2001. 96 (456): p. 1348-1360. |
[12] | Hao, N. and H. H. Zhang, A Note on High-Dimensional Linear Regression With Interactions. The American Statistician, 2018. 71 (4): p. 291-297. |
[13] | Wainwright, M. J., Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using-Constrained Quadratic Programming (Lasso). IEEE Transactions on Information Theory, 2009. 55 (5): p. 2183-2202. |
[14] | Liu, X., L. Wang, and H. Liang, Estimation and Variable Selection for Semiparametric Additive Partial Linear Models STATISTICA SINICA, 2011. 21 (3): p. 1225-1248. |
[15] | Zou, H., The Adaptive Lasso and Its Oracle Properties. Journal of the American Statistical Association, 2012. 101 (476): p. 1418-1429. |
[16] | R, J. and C. de Boor, A Practical Guide to Splines. Mathematics of Computation, 1980. 34 (149). |
APA Style
Yafeng Xia, Na Kui. (2023). Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. International Journal of Statistical Distributions and Applications, 9(2), 49-61. https://doi.org/10.11648/j.ijsd.20230902.11
ACS Style
Yafeng Xia; Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int. J. Stat. Distrib. Appl. 2023, 9(2), 49-61. doi: 10.11648/j.ijsd.20230902.11
AMA Style
Yafeng Xia, Na Kui. Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data. Int J Stat Distrib Appl. 2023;9(2):49-61. doi: 10.11648/j.ijsd.20230902.11
@article{10.11648/j.ijsd.20230902.11, author = {Yafeng Xia and Na Kui}, title = {Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data}, journal = {International Journal of Statistical Distributions and Applications}, volume = {9}, number = {2}, pages = {49-61}, doi = {10.11648/j.ijsd.20230902.11}, url = {https://doi.org/10.11648/j.ijsd.20230902.11}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20230902.11}, abstract = {With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods.}, year = {2023} }
TY - JOUR T1 - Variable Selection for Semi-Parametric Models with Interaction Under High Dimensional Data AU - Yafeng Xia AU - Na Kui Y1 - 2023/04/15 PY - 2023 N1 - https://doi.org/10.11648/j.ijsd.20230902.11 DO - 10.11648/j.ijsd.20230902.11 T2 - International Journal of Statistical Distributions and Applications JF - International Journal of Statistical Distributions and Applications JO - International Journal of Statistical Distributions and Applications SP - 49 EP - 61 PB - Science Publishing Group SN - 2472-3509 UR - https://doi.org/10.11648/j.ijsd.20230902.11 AB - With the continuous development of modern science and technology and the continuous improvement of data collection technology, researchers can collect a lot of high-dimensional data from various fields. At present, there has been some development in the selection of variables under high-dimensional data, but most of these studies only consider the selection of variables for main effects. However, when modeling many important practical problems, the main effects alone may not be enough to describe the relationship between the response variable and the predictor variable. Therefore, the variable selection problem with interaction terms under high-dimensional data is more meaningful. Based on this, this article focus on the robust estimation for semi-parametric models with interactions in high-dimensional data under the framework of mode regression. And the two-stage regularization method is applied to implement variable selection with high-dimensional data. At Stage 1, using the B-spline basic function to approximate the non-parametric function. Both parametric and non-parametric components were selected simultaneously based on mode regression and the adaptive least absolute shrinkage and selection operator (LASSO) estimation. At Stage 2, the model variables are composed of the selected variables at Stage 1 and interaction terms are derived from the main effects. To maintain the heredity structure between main effects of linear part and interaction effects, we only selected the interaction terms to obtain important interaction effects. Then, under proper regularization conditions, oracle properties of variable selection and the consistency of the hierarchical structure are proved. Numerical results are also shown to demonstrate performance of the methods. VL - 9 IS - 2 ER -