Recent studies in the field of text-based personality recognition experiment with different languages, feature extraction techniques, and machine learning algorithms to create better and more accurate models; however, little focus is placed on exploring the language use of a group of individuals defined by nationality. Individuals of the same nationality share certain practices and communicate certain ideas that can become embedded into their natural language. Many nationals are also not limited to speaking just one language, such as how Filipinos speak Filipino and English, the two national languages of the Philippines. The addition of several regional/indigenous languages, along with the commonness of codeswitching, allow for a Filipino to have a rich vocabulary. This presents an opportunity to create a text-based personality model based on how Filipinos speak, regardless of the language they use. To do so, data was collected from 250 Filipino Twitter users. Different combinations of data processing techniques were experimented upon to create personality models for each of the Big Five. The results for both regression and classification show that Conscientiousness is consistently the easiest trait to model, followed by Extraversion. Classification models for Agreeableness and Neuroticism had subpar performances, but performed better than those of Openness. An analysis on personality trait score representation showed that classifying extreme outliers generally produce better results for all traits except for Neuroticism and Openness.
SpellCheF is a spell checker for Filipino that uses a hybrid approach in detecting and correcting misspelled words in a document. Its approach is composed of dictionary-lookup, n-gram analysis, Soundex and character distance measurements. It is a plug-in to OpenOffice Writer. Two spelling rules and guidelines, namely, the Komisyon sa Wikang Filipino 2001 Revision of the Alphabet and Guidelines in Spelling the Filipino Language (or KWF), and the Gabay sa Editing sa Wikang Filipino (or GABAY) rulebooks, were incorporated into the system. SpellCheF is composed of three modules, namely the lexicon builder, the detector and the corrector. These three modules used both manual-formulated and learned rules to carry out their tasks. Test results showed that the lexicon builder was able to correctly categorize words based on the spelling rules used. It also generated three databases, namely, (1) the KWF-compliant words database, (2) the Gabay-compliant words database, and (3) the database of words common to both KWF and GABAY. The detector module had an overall error rate of 7% in identifying misspellings. Furthermore, it was observed that n-gram analysis performed better than the simple dictionary look-up. The corrector module had a 94% accuracy rate in generating word suggestions. Results also showed that using the soundex code, more suggestions were generated compared to the use of n-gram analysis. However, the first character of the soundex-based suggestions was always the same are the first character of the misspelled word.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.