We propose a method to control the level of a sentence in a text simplification task. Text simplification is a monolingual translation task translating a complex sentence into a simpler and easier to understand the alternative. In this study, we use the grade level of the US education system as the level of the sentence. Our text simplification method succeeds in translating an input into a specific grade level by considering levels of both sentences and words. Sentence level is considered by adding the target grade level as input. By contrast, the word level is considered by adding weights to the training loss based on words that frequently appear in sentences of the desired grade level. Although existing models that consider only the sentence level may control the syntactic complexity, they tend to generate words beyond the target level. Our approach can control both the lexical and syntactic complexity and achieve an aggressive rewriting. Experiment results indicate that the proposed method improves the metrics of both BLEU and SARI.
This study introduces three language resources for Japanese lexical simplification: 1) an evaluation dataset, 2) lexica, and 3) a toolkit that can be used to develop and benchmark Japanese lexical simplification systems. The word complexity lexicon adopted in this study was automatically expanded using a classifier trained on a small word complexity lexicon created by Japanese language teachers. Based on this word complexity estimator, simplified word pairs were extracted from a large-scale synonym lexicon, and a simplified synonym lexicon that is useful for lexical simplification was developed. In addition, a Python library, which implements automatic evaluation and key methods in each subtask to ease the construction process of a lexical simplification pipeline, was developed. The experimental results on the developed evaluation dataset revealed that the proposed method, which is based on the developed lexicon, achieves the highest performance of Japanese lexical simplification.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.