Aspect Based Sentiment Analysis: Feature Extraction using Latent Dirichlet Allocation (LDA) and Term Frequency - Inverse Document Frequency (TF-IDF) in Machine Learning (ML)
DOI:
https://doi.org/10.53840/myjict8-2-102Keywords:
Aspect-Based Sentiment Analysis, Opinion Mining, Feature Extraction, Top Modeling, LDA, Count Vectorizer, TF-IDF, SVM, NBAbstract
The growth and development of social networks, blogs, forums, and e-commerce websites has produced a number of data, notably textual data, which has increased tremendously. Twitter is one of the most popular media social platforms; during the COVID-19 pandemic, people all around the world use social media to share their opinions or concerns about the pandemic that has changed their lives. It revealed a significant rise in tweets on coronavirus, including positive, negative, and neutral tweets about the virus's impact. Sentiment analysis faces challenges: sparse data limits understanding, while topic coherence and interpretability demand improvement for clearer insights. The primary goal of this paper is to improve the accuracy and effectiveness of sentiment analysis during the COVID-19 pandemic through the application of advanced techniques and classifiers. In this article, we experiment with such Support Vector Machines (SVM) and Naive Bayes (NB) on Twitter data for high-accuracy machine learning models. Using Latent Dirichlet Allocation (LDA)for feature extraction, we aim to capture comprehensive aspects and topics for sentiment analysis. Additionally, we explore Count Vectorizer and Term Frequency - Inverse Document Frequency (TF-IDF) as word embedding techniques. The main objectives are to extract topics, understand public concerns about Covid-19, and compare classifier performance in Aspect-Based Sentiment Analysis on Covid-19 tweets. This paper introduces advanced sentiment analysis techniques, such as LDA, Count Vectorizer, and SVM, enhancing nuanced sentiment analysis during the COVID-19 pandemic with notable 85% accuracy in SVM classification.
Downloads
References
Abd-Alrazaq, A., Alhuwail, D., Househ, M., Hamdi, M., & Shah, Z. (2020). Top Concerns of Tweeters During the COVID-19 Pandemic: Infoveillance Study. Journal of Medical Internet Research, 22(4), e19016. https://doi.org/10.2196/19016
Abdulaziz, M., Alotaibi, A., Alsolamy, M., & Alabbas, A. (2021). Topic based Sentiment Analysis for COVID-19 Tweets. International Journal of Advanced Computer Science and Applications, 12(1), 626–636. https://doi.org/10.14569/IJACSA.2021.0120172
Apuke, O. D., & Omar, B. (2021). Fake news and COVID-19: modelling the predictors of fake news sharing among social media users. Telematics and Informatics, 56(March 2020), 101475. https://doi.org/10.1016/j.tele.2020.101475
Avasthi, S., Chauhan, R., & Acharjya, D. P. (2022). Information Extraction and Sentiment Analysis to Gain Insight into the COVID-19 Crisis. January, 343–353. https://doi.org/10.1007/978-981-16-2594-7_28
Cambria, E., Poria, S., Gelbukh, A., & Thelwall, M. (2017). Sentiment Analysis Is a Big Suitcase. IEEE Intelligent Systems, 32(6), 74–80. https://doi.org/10.1109/MIS.2017.4531228
Chakraborty, K., Bhatia, S., Bhattacharyya, S., Platos, J., Bag, R., & Hassanien, A. E. (2020). Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media. Applied Soft Computing Journal, 97, 106754. https://doi.org/10.1016/j.asoc.2020.106754
Hu, M., & Liu, B. (2004). Mining and summarizing customer reviews. https://doi.org/10.1145/1014052.1014073
Kausar, M. A., Soosaimanickam, A., & Nasar, M. (2021). Public Sentiment Analysis on Twitter Data during COVID-19 Outbreak. International Journal of Advanced Computer Science and Applications, 12(2), 415–422. https://doi.org/10.14569/IJACSA.2021.0120252
Naseem, U., Razzak, I., Khushi, M., Eklund, P. W., & Kim, J. (2021). COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID-19 Sentiment Analysis. IEEE Transactions on Computational Social Systems, 8(4), 976–988. https://doi.org/10.1109/TCSS.2021.3051189
Pontiki, M., Galanis, D., Papageorgiou, H., Androutsopoulos, I., Manandhar, S., AL-Smadi, M., Al-Ayyoub, M., Zhao, Y., Qin, B., De Clercq, O., Hoste, V., Apidianaki, M., Tannier, X., Loukachevitch, N., Kotelnikov, E., Bel, N., Jiménez-Zafra, S. M., & Eryiğit, G. (2016). SemEval-2016 Task 5: Aspect Based Sentiment Analysis. Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), 19–30. https://doi.org/10.18653/v1/S16-1002
Priya, A., & Kumar, A. (2021). Deep Ensemble Approach for COVID-19 Fake News Detection from Social Media. Proceedings of the 8th International Conference on Signal Processing and Integrated Networks, SPIN 2021, 396–401. https://doi.org/10.1109/SPIN52536.2021.9565958
Rapanta, C., Botturi, L., Goodyear, P., Guàrdia, L., & Koole, M. (2020). Online University Teaching During and After the Covid-19 Crisis: Refocusing Teacher Presence and Learning Activity. Postdigital Science and Education, 2(3), 923–945. https://doi.org/10.1007/s42438-020-00155-y
Raza, G. M., Butt, Z. S., Latif, S., & Wahid, A. (2021). Sentiment Analysis on COVID Tweets: An Experimental Analysis on the Impact of Count Vectorizer and TF-IDF on Sentiment Predictions using Deep Learning Models. 2021 International Conference on Digital Futures and Transformative Technologies, ICoDT2 2021. https://doi.org/10.1109/ICoDT252288.2021.9441508
Rustam, F., Khalid, M., Aslam, W., Rupapara, V., Mehmood, A., & Choi, G. S. (2021). A performance comparison of supervised machine learning models for Covid-19 tweets sentiment analysis. PLoS ONE, 16(2), 1–23. https://doi.org/10.1371/journal.pone.0245909
Sayed, S. A. F., Elkorany, A. M., & Mohammad, S. S. (2021). Applying Different Machine Learning Techniques for Prediction of COVID-19 Severity. IEEE Access, 9, 135697–135707. https://doi.org/10.1109/ACCESS.2021.3116067
World Health Organization. (2021). WHO Coronavirus (COVID-19) Dashboard. In WHO.int.
Yousefinaghani, S., Dara, R., Mubareka, S., Papadopoulos, A., & Sharif, S. (2021). An analysis of COVID-19 vaccine sentiments and opinions on Twitter. International Journal of Infectious Diseases, 108, 256–262. https://doi.org/10.1016/j.ijid.2021.05.059
Downloads
Published
Issue
Section
License
Copyright (c) 2023 Malaysian Journal of Information and Communication Technology (MyJICT)

This work is licensed under a Creative Commons Attribution 4.0 International License.

