Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.
Published in | American Journal of Mathematical and Computer Modelling (Volume 2, Issue 3) |
DOI | 10.11648/j.ajmcm.20170203.12 |
Page(s) | 84-87 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2017. Published by Science Publishing Group |
Topic Models, Topic Modelling Methods, LSA, PLSA, LDA, CTM, Tools
[1] | Papadimitriou, C. H., Tamaki, H., Raghavan, P., &Vempala, S. (1998). Latent semantic indexing: A Probabilistic Analysis, Paper presented at the Proceedings of the Seventeenth ACM Sigact-Sigmod-Sigart Symposium on Principles of Database Systems. |
[2] | Hofmann, T. (1999), Probabilistic Latent Semantic Indexing, Paper presented at the Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. |
[3] | Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003), Latent Dirichlet Allocation, the Journal of Machine Learning Research, 3, 993-1022. |
[4] | Rebecca Katherine Abey, The Statistics of Topic Modelling, University of Canterbury, 2015. |
[5] | Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal, “Mining Frequent Patterns With Counting Inference, ” ACM SIGKDD Explorations Newsletter, Vol.2, No.2, pp.66–75, 2000. |
[6] | M. J. Zaki and C. J. Hsiao, “CHARM: An Efficient Algorithm for Closed Itemset Mining”, in Proceedings, SDM, Vol.2, 2002, pp.457–473. |
[7] | X. Wei and W. B. Croft, “LDA-based Document Models for Ad-Hoc Retrieval”, in Proceedings 29th Annual International, ACM SIGIR Conf. Res. Develop. Information Retrieval, 2006, pp.178–185. |
[8] | David M. Blei, “Introduction to Probabilistic Topic Models”, Communications of the ACM, 2011 pp. |
[9] | Mark Steyvers, Tom Griffiths, “Probabilistic Topic Models”, In Landauer. |
[10] | Zhu, Jun and Eric P Xing, “Conditional Topic Random Fields”, Forbes. Ed. Johannes Fürnkranzand Thorsten Joachims. |
[11] | A. Gruber, M. Rosen-Zvis and Y. Weiss, “Hidden Topic Markov Models”, in Artificial Intelligence and Statistics, 2007. |
[12] | T. L. Griffiths, M. Steyvers, D. M. Blei, and J. B. Tenenbaum, “Integrating Topics and Syntax”, In Advances in Neural Information Processing Systems 17, Vol.17, 2005, pp. 537-44. |
[13] | M. Divya, et al., “A Survey on Topic Modelling”, International Journal of Recent Advances in Engineering & Technology (IJRAET), Volume-1, Issue - 2, 2013. |
[14] | Hofmann, T., Unsupervised learning by probabilistic latent semantic analysis, Machine Learning, 42 (1), 2001, 177-196. |
[15] | Blei, D. M., Ng, A. Y., and Jordan, M. I., -Latent Dirichlet Allocation, Journal of Machine Learning Research, 3, 2003, 993-1022. |
[16] | Ahmed, A., Xing, E. P., and William W., -Joint Latent Topic Models for Text and Citations, ACM New York, NY, USA, 2008. |
[17] | RubayyiAlghamdi et al., A Survey of Topic Modeling in Text Mining, International Journal of Advanced Computer Science and Applications, Vol.6, No.1, 2015. |
[18] | Lee, S., Baker, J., Song, J., and Wetherbe, J. C., -An Empirical Comparison of Four Text Mining Methods, Proceedings of the 43rd Hawaii International Conference on System Sciences, 2010. |
[19] | Mimno, D. (2015). Package 'mallet' Packages. |
[20] | Řehůřek, R., &Sojka, P. (2011), Gensim–Python Framework for Vector Space Modelling, NLP Centre, Faculty of Informatics, Masaryk University, Brno, Czech Republic. |
[21] | Blei, (2012), Topic Modelling and Digital Humanities, Journal of Digital Humanities, 2 (1), 8-11. |
[22] | Phan, X. -H., & Nguyen, C. T. (2007), GibbsLDA++: AC/C++ Implementation of Latent Dirichlet Allocation (LDA): Technical report. |
APA Style
Himanshu Sharma, Arvind K. Sharma. (2017). Study and Analysis of Topic Modelling Methods and Tools – A Survey. American Journal of Mathematical and Computer Modelling, 2(3), 84-87. https://doi.org/10.11648/j.ajmcm.20170203.12
ACS Style
Himanshu Sharma; Arvind K. Sharma. Study and Analysis of Topic Modelling Methods and Tools – A Survey. Am. J. Math. Comput. Model. 2017, 2(3), 84-87. doi: 10.11648/j.ajmcm.20170203.12
@article{10.11648/j.ajmcm.20170203.12, author = {Himanshu Sharma and Arvind K. Sharma}, title = {Study and Analysis of Topic Modelling Methods and Tools – A Survey}, journal = {American Journal of Mathematical and Computer Modelling}, volume = {2}, number = {3}, pages = {84-87}, doi = {10.11648/j.ajmcm.20170203.12}, url = {https://doi.org/10.11648/j.ajmcm.20170203.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajmcm.20170203.12}, abstract = {Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized.}, year = {2017} }
TY - JOUR T1 - Study and Analysis of Topic Modelling Methods and Tools – A Survey AU - Himanshu Sharma AU - Arvind K. Sharma Y1 - 2017/03/09 PY - 2017 N1 - https://doi.org/10.11648/j.ajmcm.20170203.12 DO - 10.11648/j.ajmcm.20170203.12 T2 - American Journal of Mathematical and Computer Modelling JF - American Journal of Mathematical and Computer Modelling JO - American Journal of Mathematical and Computer Modelling SP - 84 EP - 87 PB - Science Publishing Group SN - 2578-8280 UR - https://doi.org/10.11648/j.ajmcm.20170203.12 AB - Now days, topic models have been widely used to identify topics in text corpora. Topic modelling is a mechanism of extracting common topics which occurs among the collection of documents. Topic models are actually a suite of algorithms which uncover the hidden thematic structure in document collections. These algorithms shall definitely be help to develop new paradigms to search, browse and summarize large archive of texts. This paper presents a survey of various important topic modelling techniques and tools which highlights the probabilistic topic models. The primary aim of this paper is to help researchers who do not have a strong background in mathematics or statistics to feel comfortable with using topic modelling methods and tools in their research work. Apart from it, the merits and demerits of topic modelling methods are also summarized. VL - 2 IS - 3 ER -