| Peer-Reviewed

The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining

Received: 26 April 2018     Published: 27 April 2018
Views:       Downloads:
Abstract

To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.

Published in International Journal on Data Science and Technology (Volume 4, Issue 1)
DOI 10.11648/j.ijdst.20180401.12
Page(s) 6-14
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2018. Published by Science Publishing Group

Keywords

Data Mining, Outlier Excavation, Machine Learning, Talent Identification

References
[1] E. Knorr and V. Tucakov, “Distance-based outliers: algorithms and applications,” Vldb Journal, 2000, vol. 8, pp. 237-253.
[2] F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.
[3] M. M. Breuing, H. P. Kriegel and R. T. Ng, “LOF: identifying density-based local outliers,” ACM Sigmord Record, 2000, vol. 29, pp. 93-104.
[4] A. K. Jain, M. N. Murty and P. J. Flynn, “Data clustering: a review,” ACM Computing Surveys, 1999, vol. 31, pp. 264-323.
[5] L. V. Utkin, A. I. Chekh and Y. A. Zhuk, “Binary classification svm-based algorithms with interval-valued training data using triangular and epanechnikov kernels,” Neural Networks, 2016, vol. 80, pp. 53-66.
[6] L. Breiman, “Random forest,” Machine Learning, 2001, vol. 45, pp. 5-32.
[7] Y. Freund and L. Mason, “The alternating decision tree learning agorithm,” Machine Learning: Sixteenth International Conference, 1999, vol. 99, pp. 124-133.
[8] G. K. Smyth, “Linear models and empirical bayes methods for assessing differential expression in microarray experiments,” Statistical Applications in Genetics and Molecular Biology, 2004, vol. 3, pp. 1-25.
[9] R. K. Pearson, “Outliers in process modeling and identification,” IEEE Transactions on Control Systems, 2008, vol. 10, pp. 55-63.
[10] D. Yu, G. Sheikholeslami and A. Zhang, “Findout: finding outliers in very large datasets,” Knowledge and Information Systems, 2002, vol. 4, pp. 387-412.
[11] R. D. Banker and H. Chang, “The super-efficiency procedure for outlier identification, not for ranking efficient units,” European Journal of Operational Research, 2006, vol. 175, pp. 1311-1320.
[12] C. C. Aggarwal and P. S. Yu, “Outlier detection for high dimensional data,” ACM Sigmod Record, 2001, vol. 30, pp. 37-46.
[13] M. S. Chen, J. Han and P. S. Yu, “Data mining: an overview from a database perspective,” IEEE Transactions on Knowledge and Data Engineering, 1996, vol. 8, pp. 866-883.
[14] F. Jiang, J. W. Du, Y. F. Sui, et al, “Outlier detection based on boundary and distance,” Acta Electronica Sinica, 2010, vol. 38, pp. 700-705.
Cite This Article
  • APA Style

    Junlong Zhang, Dan Zhao, Huijie Wang. (2018). The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. International Journal on Data Science and Technology, 4(1), 6-14. https://doi.org/10.11648/j.ijdst.20180401.12

    Copy | Download

    ACS Style

    Junlong Zhang; Dan Zhao; Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int. J. Data Sci. Technol. 2018, 4(1), 6-14. doi: 10.11648/j.ijdst.20180401.12

    Copy | Download

    AMA Style

    Junlong Zhang, Dan Zhao, Huijie Wang. The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining. Int J Data Sci Technol. 2018;4(1):6-14. doi: 10.11648/j.ijdst.20180401.12

    Copy | Download

  • @article{10.11648/j.ijdst.20180401.12,
      author = {Junlong Zhang and Dan Zhao and Huijie Wang},
      title = {The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining},
      journal = {International Journal on Data Science and Technology},
      volume = {4},
      number = {1},
      pages = {6-14},
      doi = {10.11648/j.ijdst.20180401.12},
      url = {https://doi.org/10.11648/j.ijdst.20180401.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20180401.12},
      abstract = {To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - The Outliers and Prediction Analysis of University Talents Introduced Based on Data Mining
    AU  - Junlong Zhang
    AU  - Dan Zhao
    AU  - Huijie Wang
    Y1  - 2018/04/27
    PY  - 2018
    N1  - https://doi.org/10.11648/j.ijdst.20180401.12
    DO  - 10.11648/j.ijdst.20180401.12
    T2  - International Journal on Data Science and Technology
    JF  - International Journal on Data Science and Technology
    JO  - International Journal on Data Science and Technology
    SP  - 6
    EP  - 14
    PB  - Science Publishing Group
    SN  - 2472-2235
    UR  - https://doi.org/10.11648/j.ijdst.20180401.12
    AB  - To create profits for colleges and universities, introduction of talents is an important indicator of the value evaluation of talent introduction in colleges and universities. It can meet the needs of the large data system demand for abnormal detection and prediction in the process of talent introduction. In this article, after reducing the dimension of data by principal component analysis, using the method based on distance (markov distance), the method based on density (local outlier factor) and the method based on clustering (two-step, k-means), we establish the outlier detection model. We find 15 significant outliers and find that the publication of SSCI papers and the experience in C9 institutions have a significant effect on obtaining National Foundation of China. Finally, we use support vector machine, decision tree (C4.5, C5.0), bayes, and random forest to establish the talent prediction model after eliminating abnormal values. By comparing four methods, we find that support vector machine method and decision tree method’s prediction accuracies are higher. After optimization, their accuracies can reach 75.00% and 72.09% respectively.
    VL  - 4
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China

  • School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China

  • School of Data Sciences, Zhejiang University of Finance and Economics, Hangzhou, China

  • Sections