Pencantas Perkataan Jawi Lama (Old Jawi Stemmer) dalam Bahasa Melayu berasaskan Petua

Old Jawi Stemmer (Old Jawi Stemmer) in Malay based Tips

Authors

  • Che Wan Shamsul Bahri C.W.Ahmad Fakulti Sains & Teknologi Maklumat, Kolej Universiti Islam Antarabangsa (KUIS)
  • Khairuddin Omar Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia(UKM), Malaysia
  • Mohammad Faidzul Nasruddin Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia(UKM), Malaysia
  • Mohd Zamri Murah Fakulti Teknologi dan Sains Maklumat, Universiti Kebangsaan Malaysia(UKM), Malaysia

DOI:

https://doi.org/10.53840/myjict6-1-48

Keywords:

pencantas perkataan, Jawi, capaian maklumat, transliterasi, word stemmer, Jawi, information access, transliteration

Abstract

 

Pencantas perkataan berfungsi untuk membuang imbuhan sesuatu perkataan dengan menghasilkan kata dasar bagi perkataan tersebut. Cantasan banyak digunakan dalam bidang pemprosesan bahasa tabii(PBT) seperti transliterasi mesin, penterjemahan mesin dan capaian dokumen. Dengan penggunaan cantasan, saiz kamus dapat dikurangkan kerana perkataan dalam morfologi yang sama tidak perlu dimasukkan berulang kali. Sebaliknya perkataan-perkataan tersebut dimasukkan dalam kumpulan yang sama. Dalam Bahasa Melayu, terdapat dua jenis skrip penulisan sama ada menggunakan iaitu sistem ejaan Rumi atau sistem ejaan Jawi. Banyak kajian berkaitan pencantas perkataan Bahasa Melayu lebih tertumpu kepada sistem ejaan Rumi berbanding dengan sistem ejaan Jawi. Kertas ini mencadangkan pencantas perkataan Melayu bagi aksara Jawi dengan menggunakan satu set peraturan dalam Jawi lama (satu set peraturan yang digunakan untuk mengekang pelbagai bentuk perkataan terbitan Jawi lama). Terdapat 187 petua yang digunakan untuk pencantas Jawi ini yang dipanggil sebagai PEJAL. Terdapat 2500 perkataan Jawi terbitan terdiri daripada awalan, apitan, akhiran, sisipan dan diuji menggunakan pencantas ini. Hasil uji kaji menunjukkan 88.5% daripada perkataan Jawi berjaya dicantas dengan betul.

 

 

The word stemming works to remove the affix of a word by generating the base word for that word. Stemming is widely used in natural language processing (NLP) such as machine transliteration, machine translation and document access. With the use of stemming, the dictionary size can be reduced because words in the same morphology do not need to be entered repeatedly. Instead the words are included in the same group. There are two types of script writing in Malay language, either use the Latin spelling system or Jawi spelling system. Many studies on the Malay word grapple more focused on Latin spelling system compared with Jawi spelling system. This paper proposes Malay stemmer for old Jawi characters by using a set of rules in old Jawi (a set of rules used to constrain various forms of words derived from old Jawi). There are 187 rules was developed for the stemmer, called as PEJAL. There are 2500 derived Jawi words consisting of prefixes, suffixes, suffixes, insertions and tested using this stemmer. The experimental results showed that 88.5% of the Jawi words were successfully stemmed correctly.

Downloads

Download data is not yet available.

References

Asim, O. (1993). Pengakar Perkataan Melayu dan Sistem Capaian Dokumen. Universiti Kebangsaan Malaysia, Bangi.

Fadzli, S. A., Norsalehen, A. K., Syarilla, I. A., Hasni, H., & M Satar, S. D. (2012). Simple Rules Malay Stemmer. The International Conference on Informatics and Applications (ICIA2012), January 2012, 28–35. http://sdiwc.net/digital-library/download.php?id=00000187.pdf

Fatimah Dato Ahmad. (1995). Sistem capaian dokumen bahasa melayu: satu pendekatan eksperimen & analisis. Universiti Kebangsaan Malaysia.

Idris, N., & Syed Mustapha, S. M. F. D. (2001). Stemming For Term Conflation In Malay Texts. September 2016. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.9.3762

Leong, L. C., Basri, S., & Alfred, R. (2012). Enhancing Malay stemming algorithm with background knowledge. Pacific Rim International Conference on Artificial Intelligence, 753–758.

Melucci, M. (2008). A Basis for Information Retrieval in Context. ACM Transaction on Informa- Tion System (TOIS), 26, 14–41.

Muhamad Taufik Abdullah, Fatimah Ahmad, Ramlan Mahmod, & Sembok, T. M. T. (2009). Rules frequency order stemmer for malay language.

IJCSNS International Journal of Computer Science and Network Security, 9(2), 433–438. http://paper.ijcsns.org/07_book/200902/20090258.pdf

Nasrudin, M. F., Omar, K., Zakaria, M. S., & Yeun, L. C. (2008). Handwritten cursive Jawi character recognition: A survey. 2008 Fifth International Conference on Computer Graphics, Imaging and Visualisation, 247–256.

Suliana Sulaiman. (2013). Pencantas Perkataan Melayu Untuk Aksara Jawi Berasaskan Petua. Fakulti Teknologi Dan Sains Maklumat, Universiti Kebangsaan Malaysia.

Tai, S. Y., Ong, C. S., & Abullah, N. A. (2000). On designing an automated Malaysian stemmer for the Malay language (poster session). Proceedings of the Fifth International Workshop on on Information Retrieval with Asian Languages.

Tai, S. Y., Ong, C. S., & Abullah, N. A. (2000). On designing an automated Malaysian stemmer for the Malay language. Proceedings of the Fifth International Workshop on on Information Retrieval with Asian Languages, 207–208.

Yonhendri, Heryanto, A., Omar, K., & Nasrudin, M. F. (2009). Transliteration Engine Rumi to Jawi (TERUJA).

Published

30-06-2021

Issue

Section

Articles

How to Cite

C.W.Ahmad, C. W. S. B. ., Omar, K., Nasruddin, M. F. ., & Murah, M. Z. . (2021). Pencantas Perkataan Jawi Lama (Old Jawi Stemmer) dalam Bahasa Melayu berasaskan Petua: Old Jawi Stemmer (Old Jawi Stemmer) in Malay based Tips. Malaysian Journal of Information and Communication Technology (MyJICT), 6(1), 34-43. https://doi.org/10.53840/myjict6-1-48

Share