The Framework of Data Preparation for Mental Health Detection on Twitter
DOI:
https://doi.org/10.53840/myjict7-2-162Keywords:
Data Preparation, Mental Health Detection, TwitterAbstract
This study aims to generate a framework for data preparation for mental health detection on Twitter. The process of data preparation for mental health detection needs a particular stride. Twitter provides massive data with rich content from Online Social Networks (OSNs). However, this data consists of various types: text, audio, image, and video. Moreover, Twitter's tweets involve hashtags, URLs, retweets, and mentions, but the tweets are meaningful. The tweets also consist of noisy, inconsistent data, anomalies, incomplete data, short-form words, and unmeaningful data. A data preparation framework for mental health detection was proposed to solve these problems. The framework involved Data Collection and Extraction steps, Expert Manual Annotation, Text Cleaning, and Text Representation. Each framework's process was conducted through the experimental method using the Python language. The data was collected for a total of 19,744 tweets in English related to mental health problems for one week. Hence, this study is related to mental health problems, and the manual annotation needs an expert. The expert will annotate the clean text data to detect mental health problems. The text representation was conducted using N-Grams, TFIDF, Bag of Words, and Lemma. These methods represent data for modelling using machine learning techniques. Using the framework could become a process for other problem detection for data preparation. Data preparation is essential for efficient data modelling.
Downloads
References
Al-Garadi, M. A., Khan, M. S., Varathan, K. D., Mujtaba, G., & Al-Kabsi, A. M. (2016). Using online social networks to track a pandemic: A systematic review. Journal of Biomedical Informatics, 62, 1–11. https://doi.org/10.1016/j.jbi.2016.05.005
Al-Moslmi, T., Gaber, S., Al-Shabi, A., Albared, M., & Omar, N. (2015). Feature selection methods effects on machine learning approaches in Malay sentiment analysis. 1st ICRIL-International Conference on Innovation in Science and Technology (LICIST 2015), October, 2–5.
Chang, C.-H., Saravia, E., & Chen, Y.-S. (2016). Subconscious Crowdsourcing: A Feasible Data Collection Mechanism for Mental Disorder Detection on Social Media. 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 374–379. https://doi.org/10.1109/ASONAM.2016.7752261
Gedam, S., & Paul, S. (2021). A Review on Mental Stress Detection Using Wearable Sensors and Machine Learning Techniques. In IEEE Access (Vol. 9, pp. 84045–84066). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ACCESS.2021.3085502
Ghazali, J., Noah, S. A., & Zakaria, L. (2013). Classification of images for automatic textual annotation: A review of techniques. Journal of Applied Sciences, 13(6), 760–767. https://doi.org/10.3923/jas.2013.760.767
Global Burden of Disease Study 2013 Collaborators. (2015). Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute and chronic diseases and injuries in 188 countries, 1990–2013: a systematic analysis for the Global Burden of Disease Study 2013. Lancet, 386(9995), 743–800. https://doi.org/10.1016/S0140-6736(15)60692-4.Global
Guo, Q., Jia, J., Shen, G., Zhang, L., Cai, L., & Yi, Z. (2016). Learning robust uniform features for cross-media social data by using cross autoencoders. Knowledge-Based Systems, 102, 64–75. https://doi.org/10.1016/j.knosys.2016.03.028
Kim, J., Lee, D., & Park, E. (2021). Machine learning for mental health in social media: Bibliometric study. Journal of Medical Internet Research, 23(3). https://doi.org/10.2196/24870
Kumar, S., Morstatter, F., & Liu, H. (2013). Twitter Data Analytics. In Springer.
Lin, H., Jia, J., Qiu, J., Zhang, Y., Shen, G., Xie, L., Tang, J., Feng, L., & Chua, T.-S. (2017). Detecting Stress Based on Social Interactions in Social Networks. IEEE Transactions on Knowledge and Data Engineering, 29(9), 1820–1833. https://doi.org/10.1109/TKDE.2017.2686382
Nasution, M. K. M., & Noah, S. A. M. (2010). Extracting Social Networks from Web Documents (Extended Abstract). 1 St National Doctoral Seminar in Artificial Intelligence Technology (CAIT’2010), 278–281.
O’Dea, B., Wan, S., Batterham, P. J., Calear, A. L., Paris, C., & Christensen, H. (2015). Detecting suicidality on Twitter. Internet Interventions, 2, 183–188. https://doi.org/10.1016/j.invent.2015.03.005
Othman, M. K., & Danuri, M. S. N. M. (2016). Proposed conceptual framework of Dengue Active Surveillance System (DASS) in Malaysia. 2016 International Conference on Information and Communication Technology (ICICTM), 90–96. https://doi.org/10.1109/ICICTM.2016.7890783
Rahman, R. A., Omar, K., Noah, S. A. M., & Danuri, M. S. N. M. (2018). A survey on mental health detection in Online Social Network. International Journal on Advanced Science, Engineering and Information Technology, 8(4–2), 1431–1436.
Robert-McComb, J. J., Casey, S., Kim, Y., Hart, M., Norman, R., & Qian, X. (2015). Experimental Models for Research in Stress and Behavior. Journal of Behavioral and Brain Science, 05(07), 295–305. https://doi.org/10.4236/jbbs.2015.57030
Tsugawa, S., Kikuchi, Y., Kishino, F., Nakajima, K., Itoh, Y., & Ohsaki, H. (2015). Recognizing Depression from Twitter Activity. Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI 2015), 3187–3196. https://doi.org/10.1145/2702123.2702280
World Health Organization. (2001). The World Health Report 2001: Mental health: New Understanding, New Hope. World Health Organization.
World Health Organization, W. (2004). Promoting mental health: Concepts, Emerging Evidence, Practice. In Summary Report. World Health Organization.
Zhang, S., Zhang, C., & Yang, Q. (2003). Data preparation for data mining. Applied Artificial Intelligence, 17(5–6), 375–381. https://doi.org/10.1080/713827180
Downloads
Published
Issue
Section
License
Copyright (c) 2022 Malaysian Journal of Information and Communication Technology (MyJICT)

This work is licensed under a Creative Commons Attribution 4.0 International License.

