Menangani Ketaksaan dalam Transliterasi Mesin Jawi - Rumi menggunakan Pengelasan Naive Bayes Multinomial (NBM)

Che Wan Shamsul Bahri Che Wan Ahmad; Khairuddin Omar; Mohammad Faidzul Nasruddin; Mohd Zamri Murah

doi:10.53840/myjict7-1-8

Authors

Che Wan Shamsul Bahri Che Wan Ahmad Fakulti Sains dan Teknologi Maklumat Kolej Universiti Islam Antarabangsa Selangor (KUIS) Bandar Seri Putra, Bangi, Selangor, Malaysia
Khairuddin Omar Fakulti Teknologi dan Sains Maklumat Universiti Kebangsaan Malaysia (UKM)
Mohammad Faidzul Nasruddin Fakulti Teknologi dan Sains Maklumat Universiti Kebangsaan Malaysia (UKM)
Mohd Zamri Murah Fakulti Teknologi dan Sains Maklumat Universiti Kebangsaan Malaysia (UKM)

DOI:

https://doi.org/10.53840/myjict7-1-8

Keywords:

homograph, natural language processing (NLP), Jawi, machine transliteration

Abstract

This paper discusses the problem of ambiguity in Jawi - Rumi machine transliteration for Jawi homograph words. Machine transliteration (MT) is the process of converting a script from source text to target text automatically. In the context of Malay MT for Jawi - Rumi, there are difficulties in obtaining high -accuracy transliteration of homographical Jawi words. Homographs are words that are the same spelling, but have different meanings and pronunciations. In the old Jawi spelling there were many homograph words, while it was successfully reduced when “Pedoman Ejaan Jawi yang Disempurnakan” (PEJYD) was first introduced by Dewan Bahasa dan Pustaka (DBP) in 1986. The main issue in the study of Malay Jawi - Rumi machine transliteration was word inaccuracy when the Jawi word is transliterated to Rumi. For example, the word “بيرو” can be transliterated to ‘biru’(blue) or ‘biro’(bureau), the word “بيليق” can be transliterated to ‘bilik’(room) or ‘belek’(turn around). This paper proposes that the Multinomial Naive Bayes (NBM) classification method be used for homograph unambiguity for TM Jawi - Rumi. Test results found that the accuracy of using this method can reach up to 67 percent.

Downloads

Download data is not yet available.

References

Adi Yasran, A. A., & Hashim, H. M. (2008). Isu Homograf dan Cabarannya dalam Usaha Pelestarian Tulisan Jawi. Jurnal ASWARA (Akademi Seni Budaya dan Warisan Kebangsaan), 3(1), 109– 126.

Che Wan Shamsul Bahri, C. A., Khairuddin, O., Mohammad Faidzul, N., Mohd Zamri, M., & Abd Rahman, K. (2012). Comparative Study Between Old and Modern Jawi Spelling: Case Study on Kitab Hidayah al-Salikin. Proceeding of the 8th World Conference on Muslim Education, WorldCOME 2012, hlm. 1-14.

Che Wan Shamsul Bahri, C. W. A., Khairuddin, O., Nasrudin, M. F., Mohd Zamri, M. M., & Sanusi,

M. A. (2012). Isu-isu dalam transliterasi mesin manuskrip Melayu ejaan Jawi lama kepada Jawi baru. Seminar Penyelidikan Jawi dan Manuskrip Melayu, hlm. 169-179.

Dewan Bahasa dan Pustaka. (2021). Korpus Dewan Bahasa dan Pustaka. http://sbmb.dbp.gov.my/korpusdbp/Researchers/Search2.aspx

Hamdan, A. R. (1999). Panduan Menulis dan Mengeja Jawi. Kuala Lumpur: Dewan Bahasa dan Pustaka.

Hamdan, A. R. (2013). Aksara Jawi dari zaman kuno hingga zaman penjajahan. Seminar perkaedahan Jawi: Evolusi tulisan Jawi., hlm. 1-13.

Kamus Dewan edisi 4. (2010). Kamus Dewan. Kuala Lumpur: Dewan Bahasa dan Pustaka. Karimi, S., Turpin, A., & Scholer, F. (2006). English to Persian transliteration. Lecture Notes in

Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 4209 LNCS, 255–266. https://doi.org/10.1007/11880561_21

Malik, A., Besacier, L., Boitet, C., & Bhattacharyya, P. (2009). A hybrid model for Urdu Hindi transliteration. Proceedings ofthe 2009 Named Entities Workshop, August, hlm. 177-185. https://doi.org/10.3115/1699705.1699746 [8 Mei 2018].

Manning, C. D., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval.

Cambridge: Cambridge University Press.

Virga, P., & Khudanpur, S. (2003). Transliteration of Proper Names in Cross-Language Applications.

SIGIR Forum (ACM Special Interest Group on Information Retrieval), SPEC. ISS., 365–366. https://doi.org/10.1145/860500.860503

Yonhendri. (2008). Enjin Transliterasi Rumi Jawi. Tesis Sarjana, Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia.

Zhou, Y., Huang, F., & Chen, H. (2008). Combining probability models and web mining models: a framework for proper name transliteration. Information Technology and Management, 9(2), 91– 103.