| Peer-Reviewed

Data Mining and Revealing Hidden Sentiment in Tweets Using Spark

Received: 10 December 2021     Accepted: 22 January 2022     Published: 29 March 2022
Views:       Downloads:
Abstract

Data science is important and scientific value in our lives because it is multi-fields, it's the science that uses scientific methods, processes, algorithms, and systems for the purpose of extracting knowledge and ideas from data whether this data is organized or not. Data science is called 21st century oil to highlight its importance and scientific value in our lives. We paid great attention in this research paper, where we achieved three main steps in the field of data analysis, collecting data from different sources in the Internet and then storing the data within the system, the second step cleaning the data in order to obtain structured data and then applying the algorithms that are responsible for classifying the data. In pursuit of development, we have collected more than 1600000 tweets about the educational process and the possibility of future online education. We give great attention to this field of Sentiment Analysis (SA) and the use of modern Spark technology, which has achieved great success since its emergence. We have succeeded in using Spark in getting a good result, handling data quickly and accurately, which encouraged us to test it on two algorithms of Machine Learning, Support Vector Machine (SVM) & Maximum Entropy (Max Ent).

Published in International Journal on Data Science and Technology (Volume 8, Issue 1)
DOI 10.11648/j.ijdst.20220801.13
Page(s) 14-21
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2022. Published by Science Publishing Group

Keywords

Sentiment Analysis, Maximum Entropy, SVM, Machine Learning (ML), Spark

References
[1] Kumar, Akshi, et al. "Sarcasm detection using soft attention-based bidirectional long short-term memory model with convolution network." IEEE Access 7 (2019): 23319-23328.
[2] Medhat, Walaa, Ahmed Hassan, and Hoda Korashy. "Sentiment analysis algorithms and applications: A survey." Ain Shams engineering journal 5.4 (2014): 1093-1113.
[3] El Alaoui, Imane, et al. "A novel adaptable approach for sentiment analysis on big social data." Journal of Big Data 5.1 (2018): 12.
[4] Hemalatha, I., GP Saradhi Varma, and A. Govardhan. "Sentiment analysis tool using machine learning algorithms." International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) 2.2 (2013): 105-109.
[5] Kawade, Dipak R., and Kavita S. Oza. "Sentiment analysis: machine learning approach." Int. J. Eng. Technol.(IJET) 9.3 (2017).
[6] Osisanwo, F. Y., et al. "Supervised machine learning algorithms: classification and comparison." International Journal of Computer Trends and Technology (IJCTT) 48.3 (2017): 128-138.
[7] Yang, Yong, Chun Xu, and Ge Ren. "Sentiment analysis of text using SVM." Electrical, Information Engineering and Mechatronics 2011. Springer, London, 2012. 1133-1139.
[8] Awad, Mariette, and Rahul Khanna. Efficient learning machines: theories, concepts, and applications for engineers and system designers. Springer Nature, 2015. https://en.wikipedia.org/wiki/Principle_of_maximum_entropy
[9] De Martino, Andrea, and Daniele De Martino. "An introduction to the maximum entropy approach and its application to inference problems in biology." Heliyon 4.4 (2018): e00596.
[10] Liu, Xiao, et al. "Pricing Interval European Option with the Principle of Maximum Entropy." Entropy 21.8 (2019): 788.
[11] https://www.tsia.com/blog/the-new-data-refineries-transforming-big-data-into-decisions
[12] https://www.unglobalpulse.org/2011/11/social-impact-through-satellite-remote-sensing-visualizing-acute-and-chronic-crises-beyond-the-visible-spectrum
[13] Ameen Aqlan, and other, "A Study of Sentiment Analysis: Concepts, Techniques, and Challenges" doi.org/10.1007/978-981-13-6459-4_16
[14] Jaynes, Edwin T. "Information theory and statistical mechanics." Physical review 106.4 (1957): 620.
[15] Nigam, Kamal, John Lafferty, and Andrew McCallum. "Using maximum entropy for text classification." IJCAI-99 workshop on machine learning for information filtering. Vol. 1. No. 1. 1999.
[16] Berger, Adam, Stephen A. Della Pietra, and Vincent J. Della Pietra. "A maximum entropy approach to natural language processing." Computational linguistics 22.1 (1996): 39-71.
[17] Bao, Yanwei, et al. "The role of pre-processing in twitter sentiment analysis." International conference on intelligent computing. Springer, Cham, 2014.
[18] Teufl, Peter, and Stefan Kraxberger. "Extracting semantic knowledge from twitter." International Conference on Electronic Participation. Springer, Berlin, Heidelberg, 2011.
[19] Effrosynidis, Dimitrios, Symeon Symeonidis, and Avi Arampatzis. "A comparison of pre-processing techniques for twitter sentiment analysis." International Conference on Theory and Practice of Digital Libraries. Springer, Cham, 2017.
[20] Haddi, Emma, Xiaohui Liu, and Yong Shi. "The role of text pre-processing in sentiment analysis." Procedia Computer Science 17 (2013): 26-32.
[21] Htet, Hein, Soe Soe Khaing, and Yi Yi Myint. "Tweets sentiment analysis for healthcare on big data processing and IoT architecture using maximum entropy classifier." International Conference on Big Data Analysis and Deep Learning Applications. Springer, Singapore, 2018. Bernard marr, big data in practice. Springer Journal.
[22] Jodha, Rajshree, et al. "Text Classification using KNN with different Features Selection Methods." Text Classification using KNN with different Features Selection Methods 8.1 (2018): 8-8.
[23] Guller, Mohammed. Big data analytics with Spark: A practitioner's guide to using Spark for large scale data analysis. Apress, 2015.
[24] Shakhovska, Natalya. Advances in Intelligent Systems and Computing. Springer International Pu, 2017.
Cite This Article
  • APA Style

    Ameen Abdullah Qaid Aqlan. (2022). Data Mining and Revealing Hidden Sentiment in Tweets Using Spark. International Journal on Data Science and Technology, 8(1), 14-21. https://doi.org/10.11648/j.ijdst.20220801.13

    Copy | Download

    ACS Style

    Ameen Abdullah Qaid Aqlan. Data Mining and Revealing Hidden Sentiment in Tweets Using Spark. Int. J. Data Sci. Technol. 2022, 8(1), 14-21. doi: 10.11648/j.ijdst.20220801.13

    Copy | Download

    AMA Style

    Ameen Abdullah Qaid Aqlan. Data Mining and Revealing Hidden Sentiment in Tweets Using Spark. Int J Data Sci Technol. 2022;8(1):14-21. doi: 10.11648/j.ijdst.20220801.13

    Copy | Download

  • @article{10.11648/j.ijdst.20220801.13,
      author = {Ameen Abdullah Qaid Aqlan},
      title = {Data Mining and Revealing Hidden Sentiment in Tweets Using Spark},
      journal = {International Journal on Data Science and Technology},
      volume = {8},
      number = {1},
      pages = {14-21},
      doi = {10.11648/j.ijdst.20220801.13},
      url = {https://doi.org/10.11648/j.ijdst.20220801.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20220801.13},
      abstract = {Data science is important and scientific value in our lives because it is multi-fields, it's the science that uses scientific methods, processes, algorithms, and systems for the purpose of extracting knowledge and ideas from data whether this data is organized or not. Data science is called 21st century oil to highlight its importance and scientific value in our lives. We paid great attention in this research paper, where we achieved three main steps in the field of data analysis, collecting data from different sources in the Internet and then storing the data within the system, the second step cleaning the data in order to obtain structured data and then applying the algorithms that are responsible for classifying the data. In pursuit of development, we have collected more than 1600000 tweets about the educational process and the possibility of future online education. We give great attention to this field of Sentiment Analysis (SA) and the use of modern Spark technology, which has achieved great success since its emergence. We have succeeded in using Spark in getting a good result, handling data quickly and accurately, which encouraged us to test it on two algorithms of Machine Learning, Support Vector Machine (SVM) & Maximum Entropy (Max Ent).},
     year = {2022}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Data Mining and Revealing Hidden Sentiment in Tweets Using Spark
    AU  - Ameen Abdullah Qaid Aqlan
    Y1  - 2022/03/29
    PY  - 2022
    N1  - https://doi.org/10.11648/j.ijdst.20220801.13
    DO  - 10.11648/j.ijdst.20220801.13
    T2  - International Journal on Data Science and Technology
    JF  - International Journal on Data Science and Technology
    JO  - International Journal on Data Science and Technology
    SP  - 14
    EP  - 21
    PB  - Science Publishing Group
    SN  - 2472-2235
    UR  - https://doi.org/10.11648/j.ijdst.20220801.13
    AB  - Data science is important and scientific value in our lives because it is multi-fields, it's the science that uses scientific methods, processes, algorithms, and systems for the purpose of extracting knowledge and ideas from data whether this data is organized or not. Data science is called 21st century oil to highlight its importance and scientific value in our lives. We paid great attention in this research paper, where we achieved three main steps in the field of data analysis, collecting data from different sources in the Internet and then storing the data within the system, the second step cleaning the data in order to obtain structured data and then applying the algorithms that are responsible for classifying the data. In pursuit of development, we have collected more than 1600000 tweets about the educational process and the possibility of future online education. We give great attention to this field of Sentiment Analysis (SA) and the use of modern Spark technology, which has achieved great success since its emergence. We have succeeded in using Spark in getting a good result, handling data quickly and accurately, which encouraged us to test it on two algorithms of Machine Learning, Support Vector Machine (SVM) & Maximum Entropy (Max Ent).
    VL  - 8
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Department of Computer Science, Kakatiya University, Warangal, India

  • Sections