Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof.
Published in | International Journal of Statistical Distributions and Applications (Volume 8, Issue 1) |
DOI | 10.11648/j.ijsd.20220801.12 |
Page(s) | 14-23 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2022. Published by Science Publishing Group |
Profile Forward Selection, Strong Genetic Condition, Marginality Principle, Screening Consistency, Variable Selection
[1] | Schwender, H., Ickstadt, K. (2008). Identification of SNP interactions using logic regression. Biostatistics, 9 (1): 187-198. |
[2] | Assary, E., Vincent J. P., and Keers, P., et al. (2018). Gene-environment interaction and psychiatric disorders: review and future directions. Seminars in Cell & Development Biology, 77: 133-143. |
[3] | Engle, R., Granger, C., and Rice, J., et al. (1986). Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association, 81 (394): 310-320. |
[4] | Hao, N., Feng, Y., and Zhang. H. (2018). Model selection for high dimensional quadratic regression via regularization. Journal of the American Statistical Association, 113 (522): 615-625. |
[5] | Yao, D., He, J. (2018). A two-stage regularization method for variables election and forecasing in high-order interaction model. Complexity, 4: 1-12. |
[6] | Radchenko, P., James, G. M. (2010). Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 105 (492): 1541-1553. |
[7] | Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104 (488): 1512-1524. |
[8] | Fan, J., Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli, 11 (6): 1031-1057. |
[9] | Liang, H., Wang, H., and Tsai, C. (2012). Profiled forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Journal of the Statistica Sinica, 22 (2): 531-554. |
[10] | Fan, J., Gijbels, I. (1996). Local polynomial modelling and its applications. London: Monographs on Statistics and Applied Probability. |
[11] | McCullagh, P. (2002). “What is Statistical Model?”. The Annals of Statistics, 30: 1225-1267. |
[12] | Fan, J., Lv, J. (2008). Sure indenpence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society, 70 (5): 849-911. |
[13] | Chipman, H., Hamada, M., and Wu, C. F. J. (1997). A bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics, 39: 372-381. |
[14] | Chen, J., Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95 (3): 759-771. |
[15] | Bickel, P., Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36 (1): 199-227. |
APA Style
Yafeng Xia, Lin Yan. (2022). Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. International Journal of Statistical Distributions and Applications, 8(1), 14-23. https://doi.org/10.11648/j.ijsd.20220801.12
ACS Style
Yafeng Xia; Lin Yan. Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. Int. J. Stat. Distrib. Appl. 2022, 8(1), 14-23. doi: 10.11648/j.ijsd.20220801.12
AMA Style
Yafeng Xia, Lin Yan. Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. Int J Stat Distrib Appl. 2022;8(1):14-23. doi: 10.11648/j.ijsd.20220801.12
@article{10.11648/j.ijsd.20220801.12, author = {Yafeng Xia and Lin Yan}, title = {Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms}, journal = {International Journal of Statistical Distributions and Applications}, volume = {8}, number = {1}, pages = {14-23}, doi = {10.11648/j.ijsd.20220801.12}, url = {https://doi.org/10.11648/j.ijsd.20220801.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20220801.12}, abstract = {Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof.}, year = {2022} }
TY - JOUR T1 - Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms AU - Yafeng Xia AU - Lin Yan Y1 - 2022/04/23 PY - 2022 N1 - https://doi.org/10.11648/j.ijsd.20220801.12 DO - 10.11648/j.ijsd.20220801.12 T2 - International Journal of Statistical Distributions and Applications JF - International Journal of Statistical Distributions and Applications JO - International Journal of Statistical Distributions and Applications SP - 14 EP - 23 PB - Science Publishing Group SN - 2472-3509 UR - https://doi.org/10.11648/j.ijsd.20220801.12 AB - Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof. VL - 8 IS - 1 ER -