| Peer-Reviewed

Study and Analysis of Topic Modelling Methods and Tools – A Survey

Received: 30 January 2017     Accepted: 18 February 2017     Published: 9 March 2017
Views:       Downloads:
Abstract

Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.

Published in American Journal of Mathematical and Computer Modelling (Volume 2, Issue 3)
DOI 10.11648/j.ajmcm.20170203.12
Page(s) 84-87
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2017. Published by Science Publishing Group

Keywords

Topic Models, Topic Modelling Methods, LSA, PLSA, LDA, CTM, Tools

References
[1] Papadimitriou, C. H., Tamaki, H., Raghavan, P., &Vempala, S. (1998). Latent semantic indexing: A Probabilistic Analysis, Paper presented at the Proceedings of the Seventeenth ACM Sigact-Sigmod-Sigart Symposium on Principles of Database Systems.
[2] Hofmann, T. (1999), Probabilistic Latent Semantic Indexing, Paper presented at the Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.
[3] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003), Latent Dirichlet Allocation, the Journal of Machine Learning Research, 3, 993-1022.
[4] Rebecca Katherine Abey, The Statistics of Topic Modelling, University of Canterbury, 2015.
[5] Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, “Mining Frequent Patterns With Counting Inference, ” ACM SIGKDD Explorations Newsletter, Vol.2, No.2, pp.66–75, 2000.
[6] M. J. Zaki and C. J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining”, in Proceedings, SDM, Vol.2, 2002, pp.457–473.
[7] X. Wei and W. B. Croft, “LDA-based Document Models for Ad-Hoc Retrieval”, in Proceedings 29th Annual International, ACM SIGIR Conf. Res. Develop. Information Retrieval, 2006, pp.178–185.
[8] David M. Blei, “Introduction to Probabilistic Topic Models”, Communications of the ACM, 2011 pp.
[9] Mark Steyvers, Tom Griffiths, “Probabilistic Topic Models”, In Landauer.
[10] Zhu, Jun and Eric P Xing, “Conditional Topic Random Fields”, Forbes. Ed. Johannes Fürnkranzand Thorsten Joachims.
[11] A. Gruber, M. Rosen-Zvis and Y. Weiss, “Hidden Topic Markov Models”, in Artificial Intelligence and Statistics, 2007.
[12] T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum, “Integrating Topics and Syntax”, In Advances in Neural Information Processing Systems 17, Vol.17, 2005, pp. 537-44.
[13] M. Divya, et al., “A Survey on Topic Modelling”, International Journal of Recent Advances in Engineering & Technology (IJRAET), Volume-1, Issue - 2, 2013.
[14] Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42 (1), 2001, 177-196.
[15] Blei, D. M., Ng, A. Y., and Jordan, M. I., -Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, 2003, 993-1022.
[16] Ahmed, A., Xing, E. P., and William W., -Joint Latent Topic Models for Text and Citations, ACM New York, NY, USA, 2008.
[17] RubayyiAlghamdi et al., A Survey of Topic Modeling in Text Mining, International Journal of Advanced Computer Science and Applications, Vol.6, No.1, 2015.
[18] Lee, S., Baker, J., Song, J., and Wetherbe, J. C., -An Empirical Comparison of Four Text Mining Methods, Proceedings of the 43rd Hawaii International Conference on System Sciences, 2010.
[19] Mimno, D. (2015). Package 'mallet' Packages.
[20] Řehůřek, R., &Sojka, P. (2011), Gensim–Python Framework for Vector Space Modelling, NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic.
[21] Blei, (2012), Topic Modelling and Digital Humanities, Journal of Digital Humanities, 2 (1), 8-11.
[22] Phan, X. -H., & Nguyen, C. T. (2007), GibbsLDA++: AC/C++ Implementation of Latent Dirichlet Allocation (LDA): Technical report.
Cite This Article
  • APA Style

    Himanshu Sharma, Arvind K. Sharma. (2017). Study and Analysis of Topic Modelling Methods and Tools – A Survey. American Journal of Mathematical and Computer Modelling, 2(3), 84-87. https://doi.org/10.11648/j.ajmcm.20170203.12

    Copy | Download

    ACS Style

    Himanshu Sharma; Arvind K. Sharma. Study and Analysis of Topic Modelling Methods and Tools – A Survey. Am. J. Math. Comput. Model. 2017, 2(3), 84-87. doi: 10.11648/j.ajmcm.20170203.12

    Copy | Download

    AMA Style

    Himanshu Sharma, Arvind K. Sharma. Study and Analysis of Topic Modelling Methods and Tools – A Survey. Am J Math Comput Model. 2017;2(3):84-87. doi: 10.11648/j.ajmcm.20170203.12

    Copy | Download

  • @article{10.11648/j.ajmcm.20170203.12,
      author = {Himanshu Sharma and Arvind K. Sharma},
      title = {Study and Analysis of Topic Modelling Methods and Tools – A Survey},
      journal = {American Journal of Mathematical and Computer Modelling},
      volume = {2},
      number = {3},
      pages = {84-87},
      doi = {10.11648/j.ajmcm.20170203.12},
      url = {https://doi.org/10.11648/j.ajmcm.20170203.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20170203.12},
      abstract = {Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.},
     year = {2017}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Study and Analysis of Topic Modelling Methods and Tools – A Survey
    AU  - Himanshu Sharma
    AU  - Arvind K. Sharma
    Y1  - 2017/03/09
    PY  - 2017
    N1  - https://doi.org/10.11648/j.ajmcm.20170203.12
    DO  - 10.11648/j.ajmcm.20170203.12
    T2  - American Journal of Mathematical and Computer Modelling
    JF  - American Journal of Mathematical and Computer Modelling
    JO  - American Journal of Mathematical and Computer Modelling
    SP  - 84
    EP  - 87
    PB  - Science Publishing Group
    SN  - 2578-8280
    UR  - https://doi.org/10.11648/j.ajmcm.20170203.12
    AB  - Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.
    VL  - 2
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • School of CSE, Jaipur National University, Jaipur, India

  • Dept of CSI, University of Kota, Kota, Rajasthan, India

  • Sections