| Peer-Reviewed

Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search

Received: 13 September 2021     Accepted: 4 October 2021     Published: 15 October 2021
Views:       Downloads:
Abstract

The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.

Published in American Journal of Computer Science and Technology (Volume 4, Issue 4)
DOI 10.11648/j.ajcst.20210404.11
Page(s) 90-96
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Hyper Parameters, Batch Size, Learning Rate, Greedy Search

References
[1] L. Bottou. Online learning and stochastic approximations. On-line learning in neural networks, 17 (9): 142, 1998.
[2] I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International conference on machine learning, 2013.
[3] L. Dinh, R. Pascanu, S. Bengio, and Y. Bengio. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, 2017.
[4] P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv: 1706.02677, 2017.
[5] S. Jastrzebski, Z. Kenton, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, and A. Storkey. Three factors influencing minima in sgd. arXiv e-prints, 1711.04623, 2017.
[6] N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017.
[7] Y. Chen, C. Jin, and B. Yu. Stability and convergence trade-off of iterative optimization algorithms. arXiv preprint arXiv: 1804.01619, 2018.
[8] M. Hardt, B. Recht, and Y. Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, 2015.
[9] J. Lin, R. Camoriano, and L. Rosasco. Generalization properties and implicit regularization for multiple passes sgm. International Conference on Machine Learning, 2016.
[10] W. Mou, L. Wang, X. Zhai, and K. Zheng. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Annual Conference On Learning Theory, 2018.
[11] A. Pensia, V. Jog, and P.-L. Loh. Generalization error bounds for noisy, iterative algorithms. In IEEE International Symposium on Information Theory, 2018.
[12] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, Feb, 2012.
[13] H Xu, J van Genabith, D Xiong, Q Liu, Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change, arXiv preprint arXiv: 2005.02008, 2020.
[14] SL Smith, PJ Kindermans, C Ying, QV Le, DON’T DECAY THE LEARNING RATE, INCREASE THE BATCH SIZE, arXiv preprint arXiv: 1711.00489, 2017.
[15] L Balles, J Romero, P Hennig, Coupling adaptive batch sizes with learning rates, arXiv preprint arXiv: 1612.05086, 2016.
[16] F He, T Liu, D Tao, Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence, Advances in Neural Information Processing Systems 32 (NeurIPS 2019).
Cite This Article
  • APA Style

    Mingyu Bae. (2021). Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. American Journal of Computer Science and Technology, 4(4), 90-96. https://doi.org/10.11648/j.ajcst.20210404.11

    Copy | Download

    ACS Style

    Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am. J. Comput. Sci. Technol. 2021, 4(4), 90-96. doi: 10.11648/j.ajcst.20210404.11

    Copy | Download

    AMA Style

    Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am J Comput Sci Technol. 2021;4(4):90-96. doi: 10.11648/j.ajcst.20210404.11

    Copy | Download

  • @article{10.11648/j.ajcst.20210404.11,
      author = {Mingyu Bae},
      title = {Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search},
      journal = {American Journal of Computer Science and Technology},
      volume = {4},
      number = {4},
      pages = {90-96},
      doi = {10.11648/j.ajcst.20210404.11},
      url = {https://doi.org/10.11648/j.ajcst.20210404.11},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210404.11},
      abstract = {The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.},
     year = {2021}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search
    AU  - Mingyu Bae
    Y1  - 2021/10/15
    PY  - 2021
    N1  - https://doi.org/10.11648/j.ajcst.20210404.11
    DO  - 10.11648/j.ajcst.20210404.11
    T2  - American Journal of Computer Science and Technology
    JF  - American Journal of Computer Science and Technology
    JO  - American Journal of Computer Science and Technology
    SP  - 90
    EP  - 96
    PB  - Science Publishing Group
    SN  - 2640-012X
    UR  - https://doi.org/10.11648/j.ajcst.20210404.11
    AB  - The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.
    VL  - 4
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • North London Collegiate School Jeju, Jeju, Korea

  • Sections