Pencantas Perkataan Jawi Lama (Old Jawi Stemmer) dalam Bahasa Melayu berasaskan Petua
Old Jawi Stemmer (Old Jawi Stemmer) in Malay based Tips
DOI:
https://doi.org/10.53840/myjict6-1-48Keywords:
pencantas perkataan, Jawi, capaian maklumat, transliterasi, word stemmer, Jawi, information access, transliterationAbstract
Pencantas perkataan berfungsi untuk membuang imbuhan sesuatu perkataan dengan menghasilkan kata dasar bagi perkataan tersebut. Cantasan banyak digunakan dalam bidang pemprosesan bahasa tabii(PBT) seperti transliterasi mesin, penterjemahan mesin dan capaian dokumen. Dengan penggunaan cantasan, saiz kamus dapat dikurangkan kerana perkataan dalam morfologi yang sama tidak perlu dimasukkan berulang kali. Sebaliknya perkataan-perkataan tersebut dimasukkan dalam kumpulan yang sama. Dalam Bahasa Melayu, terdapat dua jenis skrip penulisan sama ada menggunakan iaitu sistem ejaan Rumi atau sistem ejaan Jawi. Banyak kajian berkaitan pencantas perkataan Bahasa Melayu lebih tertumpu kepada sistem ejaan Rumi berbanding dengan sistem ejaan Jawi. Kertas ini mencadangkan pencantas perkataan Melayu bagi aksara Jawi dengan menggunakan satu set peraturan dalam Jawi lama (satu set peraturan yang digunakan untuk mengekang pelbagai bentuk perkataan terbitan Jawi lama). Terdapat 187 petua yang digunakan untuk pencantas Jawi ini yang dipanggil sebagai PEJAL. Terdapat 2500 perkataan Jawi terbitan terdiri daripada awalan, apitan, akhiran, sisipan dan diuji menggunakan pencantas ini. Hasil uji kaji menunjukkan 88.5% daripada perkataan Jawi berjaya dicantas dengan betul.
The word stemming works to remove the affix of a word by generating the base word for that word. Stemming is widely used in natural language processing (NLP) such as machine transliteration, machine translation and document access. With the use of stemming, the dictionary size can be reduced because words in the same morphology do not need to be entered repeatedly. Instead the words are included in the same group. There are two types of script writing in Malay language, either use the Latin spelling system or Jawi spelling system. Many studies on the Malay word grapple more focused on Latin spelling system compared with Jawi spelling system. This paper proposes Malay stemmer for old Jawi characters by using a set of rules in old Jawi (a set of rules used to constrain various forms of words derived from old Jawi). There are 187 rules was developed for the stemmer, called as PEJAL. There are 2500 derived Jawi words consisting of prefixes, suffixes, suffixes, insertions and tested using this stemmer. The experimental results showed that 88.5% of the Jawi words were successfully stemmed correctly.
Downloads
References
Asim, O. (1993). Pengakar Perkataan Melayu dan Sistem Capaian Dokumen. Universiti Kebangsaan Malaysia, Bangi.
Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & M Satar, S. D. (2012). Simple Rules Malay Stemmer. The International Conference on Informatics and Applications (ICIA2012), January 2012, 28–35. http://sdiwc.net/digital-library/download.php?id=00000187.pdf
Fatimah Dato Ahmad. (1995). Sistem capaian dokumen bahasa melayu: satu pendekatan eksperimen & analisis. Universiti Kebangsaan Malaysia.
Idris, N., & Syed Mustapha, S. M. F. D. (2001). Stemming For Term Conflation In Malay Texts. September 2016. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.3762
Leong, L. C., Basri, S., & Alfred, R. (2012). Enhancing Malay stemming algorithm with background knowledge. Pacific Rim International Conference on Artificial Intelligence, 753–758.
Melucci, M. (2008). A Basis for Information Retrieval in Context. ACM Transaction on Informa- Tion System (TOIS), 26, 14–41.
Muhamad Taufik Abdullah, Fatimah Ahmad, Ramlan Mahmod, & Sembok, T. M. T. (2009). Rules frequency order stemmer for malay language.
IJCSNS International Journal of Computer Science and Network Security, 9(2), 433–438. http://paper.ijcsns.org/07_book/200902/20090258.pdf
Nasrudin, M. F., Omar, K., Zakaria, M. S., & Yeun, L. C. (2008). Handwritten cursive Jawi character recognition: A survey. 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation, 247–256.
Suliana Sulaiman. (2013). Pencantas Perkataan Melayu Untuk Aksara Jawi Berasaskan Petua. Fakulti Teknologi Dan Sains Maklumat, Universiti Kebangsaan Malaysia.
Tai, S. Y., Ong, C. S., & Abullah, N. A. (2000). On designing an automated Malaysian stemmer for the Malay language (poster session). Proceedings of the Fifth International Workshop on on Information Retrieval with Asian Languages.
Tai, S. Y., Ong, C. S., & Abullah, N. A. (2000). On designing an automated Malaysian stemmer for the Malay language. Proceedings of the Fifth International Workshop on on Information Retrieval with Asian Languages, 207–208.
Yonhendri, Heryanto, A., Omar, K., & Nasrudin, M. F. (2009). Transliteration Engine Rumi to Jawi (TERUJA).


