| Peer-Reviewed

Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms

Received: 17 March 2022     Accepted: 6 April 2022     Published: 23 April 2022
Views:       Downloads:
Abstract

Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof.

Published in International Journal of Statistical Distributions and Applications (Volume 8, Issue 1)
DOI 10.11648/j.ijsd.20220801.12
Page(s) 14-23
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2022. Published by Science Publishing Group

Keywords

Profile Forward Selection, Strong Genetic Condition, Marginality Principle, Screening Consistency, Variable Selection

References
[1] Schwender, H., Ickstadt, K. (2008). Identification of SNP interactions using logic regression. Biostatistics, 9 (1): 187-198.
[2] Assary, E., Vincent J. P., and Keers, P., et al. (2018). Gene-environment interaction and psychiatric disorders: review and future directions. Seminars in Cell & Development Biology, 77: 133-143.
[3] Engle, R., Granger, C., and Rice, J., et al. (1986). Semiparametric estimates of the relation between weather and electricity sales. Journal of the American Statistical Association, 81 (394): 310-320.
[4] Hao, N., Feng, Y., and Zhang. H. (2018). Model selection for high dimensional quadratic regression via regularization. Journal of the American Statistical Association, 113 (522): 615-625.
[5] Yao, D., He, J. (2018). A two-stage regularization method for variables election and forecasing in high-order interaction model. Complexity, 4: 1-12.
[6] Radchenko, P., James, G. M. (2010). Variable selection using adaptive nonlinear interaction structures in high dimensions. Journal of the American Statistical Association, 105 (492): 1541-1553.
[7] Wang, H. (2009). Forward regression for ultra-high dimensional variable screening. Journal of the American Statistical Association, 104 (488): 1512-1524.
[8] Fan, J., Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli, 11 (6): 1031-1057.
[9] Liang, H., Wang, H., and Tsai, C. (2012). Profiled forward regression for ultrahigh dimensional variable screening in semiparametric partially linear models. Journal of the Statistica Sinica, 22 (2): 531-554.
[10] Fan, J., Gijbels, I. (1996). Local polynomial modelling and its applications. London: Monographs on Statistics and Applied Probability.
[11] McCullagh, P. (2002). “What is Statistical Model?”. The Annals of Statistics, 30: 1225-1267.
[12] Fan, J., Lv, J. (2008). Sure indenpence screening for ultra-high dimensional feature space. Journal of the Royal Statistical Society, 70 (5): 849-911.
[13] Chipman, H., Hamada, M., and Wu, C. F. J. (1997). A bayesian variable selection approach for analyzing designed experiments with complex aliasing. Technometrics, 39: 372-381.
[14] Chen, J., Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95 (3): 759-771.
[15] Bickel, P., Levina, E. (2008). Regularized estimation of large covariance matrices. The Annals of Statistics, 36 (1): 199-227.
Cite This Article
  • APA Style

    Yafeng Xia, Lin Yan. (2022). Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. International Journal of Statistical Distributions and Applications, 8(1), 14-23. https://doi.org/10.11648/j.ijsd.20220801.12

    Copy | Download

    ACS Style

    Yafeng Xia; Lin Yan. Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. Int. J. Stat. Distrib. Appl. 2022, 8(1), 14-23. doi: 10.11648/j.ijsd.20220801.12

    Copy | Download

    AMA Style

    Yafeng Xia, Lin Yan. Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms. Int J Stat Distrib Appl. 2022;8(1):14-23. doi: 10.11648/j.ijsd.20220801.12

    Copy | Download

  • @article{10.11648/j.ijsd.20220801.12,
      author = {Yafeng Xia and Lin Yan},
      title = {Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms},
      journal = {International Journal of Statistical Distributions and Applications},
      volume = {8},
      number = {1},
      pages = {14-23},
      doi = {10.11648/j.ijsd.20220801.12},
      url = {https://doi.org/10.11648/j.ijsd.20220801.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijsd.20220801.12},
      abstract = {Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof.},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Variable Selection Based on Profile Forward Selection of Partial Linear Models with Interactive Terms
    AU  - Yafeng Xia
    AU  - Lin Yan
    Y1  - 2022/04/23
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ijsd.20220801.12
    DO  - 10.11648/j.ijsd.20220801.12
    T2  - International Journal of Statistical Distributions and Applications
    JF  - International Journal of Statistical Distributions and Applications
    JO  - International Journal of Statistical Distributions and Applications
    SP  - 14
    EP  - 23
    PB  - Science Publishing Group
    SN  - 2472-3509
    UR  - https://doi.org/10.11648/j.ijsd.20220801.12
    AB  - Due to the rapid development of information technology and data acquisition technology, the model which only considers the linear main effect can not provide accurate prediction results, and the interaction between the predictor and response variables can not be ignored, so the variable selection problem of the model with interaction terms has become an important research topic in the statistical analysis today. In this paper, we discuss the problem of variable selection for a partially linear model with interaction terms using the profile forward selection method under high dimensional data. We propose the two-stage interactive selection algorithm (iPFST) under strong genetic condition and the profile forward selection algorithm (iPFSM) under marginality principle respectively. Theoretically, we use the consistency of profile estimators to prove that profile estimators have uniform convergence rate, and use the screening consistency to prove that iPFST algorithm and iPFSM algorithm can uniformly identify all important linear main effect terms and important interaction effect terms with probability 1. Seven regularization conditions for the theorem are given. Numerical simulation shows the superiority of iPFST and iPFSM in variable selection, and the two algorithms are compared, then iPFST algorithm is better than iPFSM algorithm. Finally, we give detailed technical proof.
    VL  - 8
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • School of Sciences, Lanzhou University of Technology, Department of University, Lanzhou, China

  • School of Sciences, Lanzhou University of Technology, Department of University, Lanzhou, China

  • Sections