Analyzing Malay Stemmer Performance Towards Fuzzy Logic Ranking Function on Malay Text Corpus

Rodzman, Shaiful Bakhtiar bin; Ronie, Mohamad Fitri Izuan Abdul; Ismail, Normaly Kamal; Rahman, Nurazzah Abd; Ahmad, Faudziah; Nor, Zulhilmi Mohamed

doi:10.1109/infrkm.2018.8464767

Cited by 7 publications

(4 citation statements)

References 4 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…For example, term "makanan" will be stemmed to its root word "makan". Dictionary for Malay root word and morphological rules for Malay language are applied in the stemming process [16]. Lastly, the stemmed word are used as keywords to search from the indexed file before the retrieved documents are ranked and displayed to the user [16,26,27].…”

Section: Methodsmentioning

confidence: 99%

See 1 more Smart Citation

Development of mobile application for Malay translated hadith search engine

Rahman

Syamil

Rodzman

2020

IJEECS

Self Cite

View full text Add to dashboard Cite

This paper presents the development of mobile application for Malay Translated Hadith search engine. Limitations of current Hadith web application are the design is to optimize its usage on desktop computer but not on mobile devices, which requires simple and user friendly interface. Besides that, web application also needs internet connection to use. Due to increase usage of mobile application among mobile phone users, many existing web applications have moved to mobile based applications to cater for increasing numbers of mobile users. In this study, a mobile application for Android and iOS mobile application has been developed using Flutter framework, a hybrid mobile application framework. A Malay Translated hadith search engine mobile application can easily assist those who are seeking knowledge to learn more about certain topics in hadith, a second source of Islamic knowledge. This mobile application has search and directory features for users to browse the 2028 Sahih Bukhari hadith collection. Users can enter their query using search features to find selected hadith in Malay language. Queries will be processed for searching relevant hadith and display the results to the user. Evaluation using Recall and Precision shows that on the average Recall is 73% and Precision is 33%. Functionality testing is also conducted to test against the functional requirements or specifications. Results shows all requirements are successfully tested.

show abstract

Section: Methodsmentioning

confidence: 99%

“…Hadith Sahih Al Bukhari Malay from Ar-Rahman Labs features Complete Sahih Bukhari book in Malay but this application only supports keyword search [14]. Mutiara Hadis is the pioneer of the Malay Hadith Information Retrieval [15][16][17][18][19][20]. It is one of the web-based application search engines for hadith translated in Malay language.…”

mentioning

confidence: 99%

Development of mobile application for Malay translated hadith search engine

Rahman

Syamil

Rodzman

2020

IJEECS

Self Cite

View full text Add to dashboard Cite

show abstract

“…The aim is to rid the corpus of white space, missing values, duplicate reviews, stop words, non-ASCII characters, and typos that could negatively affect the result of analytics [2]. Stemming [10] reduces words into their base form, where they are considered as one single feature, for example, "walking," "walked" and "walks" are stemmed to "walk." Language detection (LD) in preprocessing helps to reduce the extracted corpus size by filtering out unrelated text based on the language used [11], [12].…”

Section: Introductionmentioning

confidence: 99%

Categorization of Malay Social Media Text and Normalization of Spelling Variations and Vowel-less Words

Maskat

Rahman

2020

Int. J. Adv. Sci. Eng. Inf. Technol.

View full text Add to dashboard Cite

As more data are being introduced, it brings along with it missing values, inconsistencies, and heterogeneities, or so-called unclean aspects. Text analytics relies on clean data to produce reliable results. Pre-processing is an essential phase in text analytics, specifically language detection and normalization. The problem with conducting text analytics on Malay social media text is how substantially it has transformed from formal Malay in terms of spelling and construction, making it difficult to process them. Recent advances have shown works to normalize yet cherry-picked specific types of Malay social media text where their descriptions were listed in simple and narrow categorizations. A formal categorization is necessary to provide significant description of the different patterns of Malay social media text, allowing the selection of suitable methods in handling them. In this paper, we propose an inexhaustive formal categorization for Malay social media text based on inherent nature. We refer to them as Social Media Malay Language (SMML) to differentiate them from the standard Malay language. They are spelling variations, Malay-English mix sentences, loan words/phrases, slang-based words, and vowel-less words. Also, in this work, we conducted a normalization on two of the SMML categories, spelling variations, and vowel-less words, using two similarity matching techniques (i.e., nGram Tversky Index and Levenshtein). Our result shows that similarity-matching techniques can detect both categories, but a more sophisticated technique is necessary to improve the precision score. The normalization of the rest of the categories is extensive research works.

show abstract

“…Removals [4] typically involve white spaces, missing values, duplicate reviews, stop words, non-ascii characters and typos which could have an adverse influence on the result. Stemming [19] replaces words with their canonical form for example stand in place of standing, stood and stands. Language detection (LD) [6,7,24] is crucial for language-dependent tokenisers and could considerably decrease the size of data extracted.…”

Section: Introductionmentioning

confidence: 99%

A taxonomy of Malay social media text

Maskat

Munarko

2019

IJEECS

View full text Add to dashboard Cite

In this paper, we proposed a preliminary taxonomy of Malay social media text. Performing text analytics on Malay social media text is a challenge. The formal Malay language follows specific spelling and sentence construction rules. However, the Malay language used in social media differs in both aspects. This impedes the accuracy of text analytics. Due to the complexity of Malay social media text, many researches has chosen to focus on classifying the formal Malay language. To the best of our knowledge, we are the first to propose a formal taxonomy for Malay text in social media. Narrow and informal categorisations of Malay social media text can be found amidst efforts to pre-process social media text, yet cherry-picked only some categories to be handled. We have differentiated Malay social media text from the formal Malay language by identifying them as Social Media Malay Language or SMML. They consists of spelling variations, Malay-English mix sentence, Malay-spelling English words, slang-based words, vowel-les words, number suffixes and manner of expression.This taxonomy is expected to serve as a guideline in research and commercial products.

show abstract

Analyzing Malay Stemmer Performance Towards Fuzzy Logic Ranking Function on Malay Text Corpus

Cited by 7 publications

References 4 publications

Development of mobile application for Malay translated hadith search engine

Development of mobile application for Malay translated hadith search engine

Categorization of Malay Social Media Text and Normalization of Spelling Variations and Vowel-less Words

A taxonomy of Malay social media text

Contact Info

Product

Resources

About