Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.
Published in | International Journal on Data Science and Technology (Volume 5, Issue 2) |
DOI | 10.11648/j.ijdst.20190502.12 |
Page(s) | 40-44 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2019. Published by Science Publishing Group |
Natural Language Processing, Text Analytics, Twitter Analysis, Social Media, Software Analysis, Big Data Analysis
[1] | Allen, T. T., Sui, Z., & Parker, N. L. (2017). Timely decision analysis enabled by efficient social media modeling. Decision Analysis, 14 (4), 250-260. https://doi.org/10.1287/deca.2017.0360. |
[2] | Russell, M. A. & Russell, M. (2011). 21 Recipes for Mining Twitter. O'Reilly Media, Inc. |
[3] | Moujahid, A. (2015) An Introduction to Text Mining Using Twitter Streaming API and Python. Data Analytics and More. N. p., n. d. Web. 04 May. |
[4] | Zaman, T. R., Herbrich, R., Gael, J. V., & Stern, D. (2010) Predicting information spreading in Twitter. Workshop on computational social science and the wisdom of crowds, nips 104 (45), 17599-601. |
[5] | Allen, T. T., Sui, Z., & Akbari, K. (2018). Exploratory text data analysis for quality hypothesis generation. Quality Engineering, 30 (4), 701-712. |
[6] | Porter, M. F. (1980) An algorithm for suffix stripping. Program. 14 (3), 130-137. |
[7] | Sui, Z. (2019). Social Media Text Data Visualization Modeling: A Timely Topic Score Technique, American Journal of Management Science and Engineering. 4 (3), 49-55. doi: 10.11648/j. ajmse.20190403. 12. |
[8] | Wang, Y., & Liu, H. (2013) Advances in the Machine Learning Methods, Wireless Internet Technology, 7, 89-90. |
[9] | Zhan, P. (2014) Talking about the Machine Learning Method, Network Security Technology and Application, 1, 145-146. |
[10] | Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20 (3), 273-297. |
[11] | Li, B., Cong, Y., Tian, Z., & Xue, Y. (2014) Prediction and virtual screening of the selective inhibitors of MMP-13 to MMP-1 based on the molecular descriptors and the machine learning methods, Acta Physico-Chimica Sinica, 1, 136-137. |
[12] | Zha, Y., Sun, C., & Wang, K. (2015) Research on the Tax Loss of the Real Estate Industry Based on the Micro-data -- Empirical Analysis Based on the Machine Learning Method, China's Prices, 9, 109-110. |
[13] | Sun, C., & Wang, C. (2015) Application of the Machine Learning in the Credit Risk Prediction and Recognition, China's Prices, 12, 101-102. |
[14] | Twitter Analytics 2015. “Twitter Analytics”. https://analytics.twitter.com/about, N. p., n. d. Web. 05 May. |
[15] | Followthehashtag 2015. “Followthehashtag // Twitter Keyword Search Analytics, Influence, Geo Content Analysis Tool, and Much More.” https://www.followthehashtag.com/, N. p., n. d. Web. 04 May. |
[16] | Tweepy 2015. “Tweepy”. http://www.tweepy.org/, N. p., n. d. Web. 05 May. |
[17] | Next Analytics 2015. “Next Analytics”. https://www.nextanalytics.com/, N. p., n. d. Web. 05 May. |
APA Style
Zhenhuan Sui. (2019). Social Media Data Extraction Method Benchmarking Comparison. International Journal on Data Science and Technology, 5(2), 40-44. https://doi.org/10.11648/j.ijdst.20190502.12
ACS Style
Zhenhuan Sui. Social Media Data Extraction Method Benchmarking Comparison. Int. J. Data Sci. Technol. 2019, 5(2), 40-44. doi: 10.11648/j.ijdst.20190502.12
AMA Style
Zhenhuan Sui. Social Media Data Extraction Method Benchmarking Comparison. Int J Data Sci Technol. 2019;5(2):40-44. doi: 10.11648/j.ijdst.20190502.12
@article{10.11648/j.ijdst.20190502.12, author = {Zhenhuan Sui}, title = {Social Media Data Extraction Method Benchmarking Comparison}, journal = {International Journal on Data Science and Technology}, volume = {5}, number = {2}, pages = {40-44}, doi = {10.11648/j.ijdst.20190502.12}, url = {https://doi.org/10.11648/j.ijdst.20190502.12}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdst.20190502.12}, abstract = {Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis.}, year = {2019} }
TY - JOUR T1 - Social Media Data Extraction Method Benchmarking Comparison AU - Zhenhuan Sui Y1 - 2019/08/28 PY - 2019 N1 - https://doi.org/10.11648/j.ijdst.20190502.12 DO - 10.11648/j.ijdst.20190502.12 T2 - International Journal on Data Science and Technology JF - International Journal on Data Science and Technology JO - International Journal on Data Science and Technology SP - 40 EP - 44 PB - Science Publishing Group SN - 2472-2235 UR - https://doi.org/10.11648/j.ijdst.20190502.12 AB - Social media has become more and more widely used nowadays. As the most popular media, a lot of information spread through Twitter, especially given the fact that U.S. President Trump has used Twitter as his main official free news publication outlet. Therefore, social media platforms like Twitter have become the important sources to extract information and then the information could be further analyzed through text analytics models for decision-making problems. In this paper, we first investigate several text analytics methods and then multiple tweets retrieving methods/software will be investigated: Twitter Analytics, Application for Twitter, Python plus Tweepy, and Next Analytics. Seven criteria related to features are applied to compare the methods for ease of use, extraction timing and capability to accommodate big data. Given that our results may be approximate because we might not be able to observe all the capability and features of the software, our results show that Python plus Tweepy method is the most ideal one when applying to big data projects (millions of tweets or above) and real time text data extraction. Next Analytics is the software that could retrieve historical text message in a more convenient way through Excel and is able to trace back further in time period, which could give much better capabilities in social media analysis. VL - 5 IS - 2 ER -