| Peer-Reviewed

Social Media Data Extraction Method Benchmarking Comparison

Received: 7 April 2019     Accepted: 13 August 2019     Published: 28 August 2019
Views:       Downloads:
Abstract

Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.

Published in International Journal on Data Science and Technology (Volume 5, Issue 2)
DOI 10.11648/j.ijdst.20190502.12
Page(s) 40-44
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2019. Published by Science Publishing Group

Keywords

Natural Language Processing, Text Analytics, Twitter Analysis, Social Media, Software Analysis, Big Data Analysis

References
[1] Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360.
[2] Russell, M. A. & Russell, M. (2011). 21 Recipes for Mining Twitter. O'Reilly Media, Inc.
[3] Moujahid, A. (2015) An Introduction to Text Mining Using Twitter Streaming API and Python. Data Analytics and More. N. p., n. d. Web. 04 May.
[4] Zaman, T. R., Herbrich, R., Gael, J. V., & Stern, D. (2010) Predicting information spreading in Twitter. Workshop on computational social science and the wisdom of crowds, nips 104 (45), 17599-601.
[5] Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712.
[6] Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3), 130-137.
[7] Sui, Z. (2019). Social Media Text Data Visualization Modeling: A Timely Topic Score Technique, American Journal of Management Science and Engineering. 4 (3), 49-55. doi: 10.11648/j. ajmse.20190403. 12.
[8] Wang, Y., & Liu, H. (2013) Advances in the Machine Learning Methods, Wireless Internet Technology, 7, 89-90.
[9] Zhan, P. (2014) Talking about the Machine Learning Method, Network Security Technology and Application, 1, 145-146.
[10] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20 (3), 273-297.
[11] Li, B., Cong, Y., Tian, Z., & Xue, Y. (2014) Prediction and virtual screening of the selective inhibitors of MMP-13 to MMP-1 based on the molecular descriptors and the machine learning methods, Acta Physico-Chimica Sinica, 1, 136-137.
[12] Zha, Y., Sun, C., & Wang, K. (2015) Research on the Tax Loss of the Real Estate Industry Based on the Micro-data -- Empirical Analysis Based on the Machine Learning Method, China's Prices, 9, 109-110.
[13] Sun, C., & Wang, C. (2015) Application of the Machine Learning in the Credit Risk Prediction and Recognition, China's Prices, 12, 101-102.
[14] Twitter Analytics 2015. “Twitter Analytics”. https://analytics.twitter.com/about, N. p., n. d. Web. 05 May.
[15] Followthehashtag 2015. “Followthehashtag // Twitter Keyword Search Analytics, Influence, Geo Content Analysis Tool, and Much More.” https://www.followthehashtag.com/, N. p., n. d. Web. 04 May.
[16] Tweepy 2015. “Tweepy”. http://www.tweepy.org/, N. p., n. d. Web. 05 May.
[17] Next Analytics 2015. “Next Analytics”. https://www.nextanalytics.com/, N. p., n. d. Web. 05 May.
Cite This Article
  • APA Style

    Zhenhuan Sui. (2019). Social Media Data Extraction Method Benchmarking Comparison. International Journal on Data Science and Technology, 5(2), 40-44. https://doi.org/10.11648/j.ijdst.20190502.12

    Copy | Download

    ACS Style

    Zhenhuan Sui. Social Media Data Extraction Method Benchmarking Comparison. Int. J. Data Sci. Technol. 2019, 5(2), 40-44. doi: 10.11648/j.ijdst.20190502.12

    Copy | Download

    AMA Style

    Zhenhuan Sui. Social Media Data Extraction Method Benchmarking Comparison. Int J Data Sci Technol. 2019;5(2):40-44. doi: 10.11648/j.ijdst.20190502.12

    Copy | Download

  • @article{10.11648/j.ijdst.20190502.12,
      author = {Zhenhuan Sui},
      title = {Social Media Data Extraction Method Benchmarking Comparison},
      journal = {International Journal on Data Science and Technology},
      volume = {5},
      number = {2},
      pages = {40-44},
      doi = {10.11648/j.ijdst.20190502.12},
      url = {https://doi.org/10.11648/j.ijdst.20190502.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20190502.12},
      abstract = {Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.},
     year = {2019}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Social Media Data Extraction Method Benchmarking Comparison
    AU  - Zhenhuan Sui
    Y1  - 2019/08/28
    PY  - 2019
    N1  - https://doi.org/10.11648/j.ijdst.20190502.12
    DO  - 10.11648/j.ijdst.20190502.12
    T2  - International Journal on Data Science and Technology
    JF  - International Journal on Data Science and Technology
    JO  - International Journal on Data Science and Technology
    SP  - 40
    EP  - 44
    PB  - Science Publishing Group
    SN  - 2472-2235
    UR  - https://doi.org/10.11648/j.ijdst.20190502.12
    AB  - Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
    VL  - 5
    IS  - 2
    ER  - 

    Copy | Download

Author Information
  • Department of Integrated Systems Engineering, The Ohio State University, Columbus, USA

  • Sections