Using Natural Language Processing to Detect Offensive Text and Cyberbullying in Social Media: A Review

Abdulkarim Faraj Alqahtani; Mohammad Ilyas

doi:10.53840/myjict7-2-163

Authors

Abdulkarim Faraj Alqahtani Department of Electrical Engineering and Computer Science Florida Atlantic University, Boca Raton, FL, USA
Mohammad Ilyas Department of Electrical Engineering and Computer Science Florida Atlantic University, Boca Raton, FL, USA

DOI:

https://doi.org/10.53840/myjict7-2-163

Keywords:

Detection of offensive texts, Natural Language Processing, Cyberbullying, Sentiment Analysis, Machine Learning

Abstract

Recently there has been an increase in the use of social media leading to higher levels of interaction among people. There are some negative side effects caused by these interactions creating potential for some users to harm other users by bullying. This behavior needs to be identified and mitigated because it causes hurtful feelings for victims and may lead them to hate the society. This behavior is prevalent in the current times on some platforms in social media. In this paper, we discuss different forms of cyberbullying, including the methods and techniques used, the effects it has, and recent research on detecting and preventing it. Also, we review some solutions that were mentioned in the prior research papers that try to reduce and detect this phenomenon. In addition, we review some Natural Language Processing (NLP) techniques that are used to detect cyberbullying in text data, and show various models that detect the offensive text in some social media platforms. We review the most efficient machine learning algorithms with higher accuracy, some graphical results that describe the visualizations showing the negative and the positive text, and discuss some challenges that NLP algorithms face in detecting cyberbullying. For the experimental purpose, we analyzed data from over 39,000 tweets on Twitter, using machine learning algorithms to classify and predict instances of cyberbullying related to religion, age, gender, and ethnicity. We applied three different machine learning algorithms to this dataset and compared their performance using various metrics. The results of this analysis are used to detect the short text that contains cyberbullying. Our aims in this paper are to review 11 numbers of the previous research papers that have suggested solutions by using algorithms of machine learning with (NLP) to detect and reduce this behavior, and experiment three machine learning algorithms on Twitter’s dataset.

Downloads

Download data is not yet available.

References

Chun, J., Lee, J., Kim, J., & Lee, S. (2020). An international systematic review of cyberbullying measurements. Computers in human behavior, 113, 106485.

López-Meneses, E., Vázquez-Cano, E., González-Zamar, M. D., & Abad-Segura, E. (2020). Socioeconomic effects in cyberbullying: Global research trends in the educational context. International journal of environmental research and public health, 17(12), 4369.

Faucher, C., Cassidy, W., & Jackson, M. (2020). Awareness, policy, privacy, and more: Post-secondary students voice their solutions to cyberbullying. European Journal of Investigation in Health, Psychology and Education, 10(3), 795-815.

Salmivalli, C., Laninga‐Wijnen, L., Malamut, S. T., & Garandeau, C. F. (2021). Bullying prevention in adolescence: solutions and new challenges from the past decade. Journal of research on adolescence, 31(4), 1023-1046.

D. W. Otter, J. R. Medina and J. K. Kalita, (2021). A Survey of the Usages of Deep Learning for Natural Language Processing," in IEEE Transactions on Neural Networks and Learning Systems, vol. 32, no. 2, pp. 604-624, Feb. 2021, doi: 10.1109/TNNLS.2020.2979670.

Lauriola, I., Lavelli, A., & Aiolli, F. (2022). An introduction to deep learning in natural language processing: models, techniques, and tools. Neurocomputing, 470, 443-456.

Cai, M. (2021). Natural language processing for urban research: A systematic review. Heliyon, 7(3), e06322.

K. Mishev, A. Gjorgjevikj, I. Vodenska, L. T. Chitkushev and D. Trajanov, (2020). "Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers," in IEEE Access, vol. 8, pp. 131662-131682.

Bharadwaj, Pranav and Shao, Zongru, (2019). Fake News Detection with Semantic Features and Text Mining International Journal on Natural Language Computing (IJNLC) Vol.8, No.3, June 2019, Available at SSRN: https://ssrn.com/abstract=3425828

T. D. Jayasiriwardene and G. U. Ganegoda, (2020). Keyword extraction from Tweets using NLP tools for collecting relevant news, 2020 International Research Conference on Smart Computing and Systems Engineering (SCSE), 2020, pp. 129-135, doi: 10.1109/SCSE49731.2020.9313024.

C. Sharma, R. Ramakrishnan, A. Pendse, P. Chimurkar and K. T. Talele, (2021). Cyber-Bullying Detection Via Text Mining and Machine Learning, 2021 12th International Conference on Computing Communication and Networking Technologies (ICCCNT), 2021, pp. 1-6, doi: 10.1109/ICCCNT51525.2021.9579625.

Jain, V., Saxena, A. K., Senthil, A., Jain, A., & Jain, A. (2021, December). Cyber-Bullying Detection in Social Media Platform using Machine Learning. In 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART) (pp. 401-405). IEEE.

Hani, J., Nashaat, M., Ahmed, M., Emad, Z., Amer, E., & Mohammed, A. (2019). Social media cyberbullying detection using machine learning. Int. J. Adv. Comput. Sci. Appl, 10(5), 703-707.

Mangaonkar, A., Pawar, R., Chowdhury, N. S., & Raje, R. R. (2022). Enhancing collaborative detection of cyberbullying behavior in Twitter data. Cluster Computing, 1-15.

Chandra, S., & Das, B. (2022). An approach framework of transfer learning, adversarial training and hierarchical multi-task learning-a case study of disinformation detection with offensive text. In Journal of Physics: Conference Series (Vol. 2161, No. 1, p. 012049). IOP Publishing.

Al-Garadi, M. A., Hussain, M. R., Khan, N., Murtaza, G., Nweke, H. F., Ali, I., ... & Gani, A. (2019). Predicting cyberbullying on social media in the big data era using machine learning algorithms: review of literature and open challenges. IEEE Access, 7, 70701-70718.

M. Mahat, (2021). Detecting Cyberbullying Across Multiple Social Media Platforms Using Deep Learning, 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), 2021, pp. 299-301, doi: 10.1109/ICACITE51222.2021.9404736.

M. T. Ahmed, M. Rahman, S. Nur, A. Islam and D. Das, (2021). Deployment of Machine Learning and Deep Learning Algorithms in Detecting Cyberbullying in Bangla and Romanized Bangla text: A Comparative Study, 2021 International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), 2021, pp. 1-10, doi: 10.1109/ICAECT49130.2021.9392608.

L. Cheng, Y. N. Silva, D. Hall and H. Liu, (2020). Session-Based Cyberbullying Detection: Problems and Challenges, in IEEE Internet Computing, vol. 25, no. 2, pp. 66-72, 1 March-April 2021, doi: 10.1109/MIC.2020.3032930.

Ali, S., AL ADWAN, M. N., QAMAR, A., & HABES, M. (2021). Gender discrepancies concerning social media usage and its influences on students academic performance. Utopía y Praxis Latinoamericana, 26(1), 321-333.

Husain, F., & Uzuner, O. (2021). A survey of offensive language detection for the arabic language. ACM Transactions on Asian and Low-Resource Language Information Processing (TALLIP), 20(1), 1-44.

J. Wang, K. Fu, C.T. Lu, (2020). SOSNet: A Graph Convolutional Network Approach to Fine-Grained Cyberbullying Detection, Proceedings of the 2020 IEEE International Conference on Big Data (IEEE BigData 2020), December 10-13, 2020.

V. Jain, A. K. Saxena, A. Senthil, A. Jain and A. Jain, (2021). Cyber-Bullying Detection in Social Media Platform using Machine Learning, 2021 10th International Conference on System Modeling & Advancement in Research Trends (SMART), 2021, pp. 401-405, doi: 10.1109/SMART52563.2021.9676194.