Keywords: scansion, English, poetry, out-of-vocabulary wordsWe present a finite-state technology (FST) based system capable of performing metrical scansion of verse written in English. Scansion is the traditional task of analyzing the lines of a poem, marking the stressed and non-stressed elements and dividing the line into metrical feet. The system's workflow is composed of several subtasks designed around finite-state machines that analyze verse by performing tokenization, part-of-speech tagging, stress placement, and stress-pattern prediction for unknown words. The scanner also classifies poems according to the predominant type of metrical foot found. We present a brief evaluation of the system using a gold standard corpus of humanscanned verse, on which a per-syllable accuracy of 86.78% is achieved.The program uses open-source components and is released under the GNU GPL license.
In this paper, we describe the research using machine learning techniques to build a comma checker to be integrated in a grammar checker for Basque. After several experiments, and trained with a little corpus of 100,000 words, the sys tem guesses correctly not placing com mas with a precision of 96% and a re call of 98%. It also gets a precision of 70% and a recall of 49% in the task of placing commas. Finally, we have shown that these results can be im proved using a bigger and a more ho mogeneous corpus to train, that is, a bigger corpus written by one unique au thor.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.