Phishing Websites Detection using Machine Learning Approaches

Raja Azlina Raja Mahmood; Tan Jun Ren

doi:10.53840/myjict7-2-158

Pengarang

Raja Azlina Raja Mahmood Department of Communication Technology and Network, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia
Tan Jun Ren Department of Communication Technology and Network, Faculty of Computer Science and Information Technology, Universiti Putra Malaysia

DOI:

https://doi.org/10.53840/myjict7-2-158

Kata kunci:

Phishing websites detection, machine learning, feature selection

Abstrak

Phishing is a form of fraud that attempts to obtain sensitive information via email, website, phone, or other forms of communication. The number of phishing attacks has increased significantly in recent years as more online services are being offered such as the online banking. The attackers design a phishing website, with similar appearance to the genuine website to steal victims’ credentials account information that could lead to identity theft and financial loss. This study aims to detect phishing websites using supervised machine learning algorithms. Six classifiers which include Random Forest, K-Nearest Neighbors, Support Vector Machine, Decision Tree, Logistic Regression and Multilayer Perceptron have been implemented. The performance of the classifiers with 30 baseline features and different subsets of important features have been studied. In this study, a wrapper-based feature selection method was implemented to reduce the number of features to 15, 4 and 2 features respectively. The performance results show that Random Forest classifier using 30 features is the most accurate model to detect phishing websites with 97.41% of accuracy score, 97.14% of precision score, 98.25% of recall score and 97.69% of F1-score value respectively.

Muat turun

Muat turun data belum tersedia.

Rujukan

Alkhalil, Z., Hewage, C., Nawaf, L. & Khan, I. (2021). Phishing attacks: A recent comprehensive study and a new anatomy, Frontiers in Computer Science, 3, 3060–3389.

Bin, S., Qiaoyan, W., & Xiaoying, L. (2010). A DNS based anti-phishing approach. IEEE 2010 Second International Conference on Networks Security, Wireless Communications and Trusted Computing, Wuhan, 262–265.

Boddy, M. (2018). Phishing 2.0: The new evolution in cybercrime. Comput. Fraud Security, 8–10.

Breiman, L. (2001). Random Forests. Machine Learning, 45, 5–32.

Cateni, S., Colla, V., & Vannucci, M. (2017). A hybrid feature selection method for classification purposes. Proceedings - UKSim-AMSS 8th European Modelling Symposium on Computer Modelling and Simulation. Manchester, 39–44.

Chanti, S., and Chithralekha, T. (2020). Classification of anti-phishing solutions. SN Comput. Sci. 1, 11.

Dash, M. & Liu, H. (1997). Feature selection for classification. Intelligent Data Analysis, 1(3), 131–156.

Dhanda, N., Datta, S. S., & Dhanda, M. (2019). Machine learning algorithms. Journal of Com-munications and Information Networks, 210–233.

Hutchinson, S., Zhang, Z., & Liu, Q. (2018). Detecting phishing websites with Random Forest. Machine Learning and Intelligent Communications, 470–479.

Khalid, S., Khalil, T., & Nasreen, S. (2017). A survey of feature selection and feature extraction techniques in machine learning. Procedia Computer Science, 372–378.

Lokesh, G. H & BoreGowda, G. (2021). Phishing website detection based on effective machine learning approach, Journal of Cyber Security Technology, 5:1, 1-14.

Miao, J., & Niu, L. (2017). A survey on feature selection. Procedia Computer Science, 91, 919–926.

Miyamoto, D., Hazeyama, H., & Kadobayashi, Y. (2009). An evaluation of machine learning-based methods for detection of phishing sites. International conference on neural information processing ICONIP 2008: advances in neuro-information processing lecture notes in computer science. Editors M. Köppen, N. Kasabov, and G. Coghill (Berlin, Heidelberg: Springer Berlin Heidelberg), 539–546.

Mohammad, R. M. A., McCluskey, L., & Thabtah, F. (2015). UCI Machine Learning Repository: Phishing Websites Data Set. UCI Machine Learning Repository. https://archive.ics.uci.edu/ml/datasets/Phishing+Websites

Nolan, D. R., & Lally, C. (2018). Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. Journal of Computational Science, 24, 132–142.

Ray, S. (2019). A quick review of machine learning algorithms. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing. Faridabad, 35-39.

Safavian, S.R., & Landgrebe, D.A. (1991). Survey of decision tree classifier methodology. IEEE Transactions on Systems, Man and Cybernetics. 21, 660-674.

Sarker, I.H. (2021). Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2, 160.

Subasi, A., Molah, E., Almkallawi, F., & Chaudhery, T. J. (2017). Intelligent phishing website detection using random forest classifier. 2017 International Conference on Electrical and Computing Technologies and Applications (ICECTA), United Arab Emirates, 1-5.

Zaini, N. S., Stiawan, D., Razak, M. F. A., Firdaus, A., Wan Din, W. I. S., Kasim, S., & Sutikno, T. (2020). Phishing detection system using machine learning classifiers. Indonesian Journal of Electrical Engineering and Computer Science, 17(3), 1165.

APWG (2022). APWG phishing attack trends reports. 2021 anti-phishing work. Group, Inc. Retrieved 20 August, 2022, from https://apwg.org/trendsreports/