Identifying the language of an unknown text is not a new problem but what is new is the task of identifying close languages. Malay and Indonesian as many other languages are very similar, and therefore it is a real difficulty to search, retrieve, classify, and above all translate texts written in one of the two languages. We have built a language identifier to determine whether the text is written in Malay or Indonesian which could be used in any similar situation. It uses the frequency and rank of trigrams of characters, the lists of exclusive words, and the format of numbers. The trigrams are derived from the most frequent words in each language. The current program contains as language models: Malay/Indonesian (661 trigrams), Dutch (826 trigrams), English (652 trigrams), French (579 trigrams), and German (482 trigrams). The trigrams of an unknown text are searched in each language model. The language of the input text is the language having the highest ratio in "number of shared trigrams / total number of trigrams" and "number of winner trigrams / number of shared trigrams". If the language found at trigram search level is 'Malay or Indonesian', the text is then scanned by searching the format of numbers and of some exclusive words.
Problem statement: Topic is a stream of words which stands for the content of a text.
Knowing the topic of a document can help people to be aware from its content and facilitate their
searching process. Approach: This paper proposes an automatic algorithm to identify the topic for a
textual document based on the chunks corresponding to each sentences in the document. Results
and conclusion: We achieved 86% matching for both total and partial matching in our experimental
data sample
-This paper presents a method to generate fill-in clues and answers for building automatically a crossword. Answers are capitalised words present in an input sentence and clues are segments of the dependency syntactic structure of that sentence. The pairs (Clue, ANSWER) are extracted from a collection of raw sentences related to the history of Sarawak. This work is at its early stage, and thus the proposed method that generates automatically fill-in clues, was tested on a small set of sentences and the obtained results are promising. Near 53% of the generated fill-in clues are considered correct. The major contribution of this work is the innovative strategy used to read the result of a pre-order depth-first search applied on a dependency graph to generate the clues. The clues and answers generator is implemented in Python.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.