Tomoyuki Kajiwara scite author profile

We introduce the RUSE 1 metric for the WMT18 metrics shared task. Sentence embeddings can capture global information that cannot be captured by local features based on character or word N-grams. Although training sentence embeddings using small-scale translation datasets with manual evaluation is difficult, sentence embeddings trained from large-scale data in other tasks can improve the automatic evaluation of machine translation. We use a multi-layer perceptron regressor based on three types of sentence embeddings. The experimental results of the WMT16 and WMT17 datasets show that the RUSE metric achieves a state-of-the-art performance in both segment-and system-level metrics tasks with embedding features only.

show abstract

Controllable Text Simplification with Lexical Constraint Loss

Nishihara

Kajiwara

Arase

2019

View full text Add to dashboard Cite

We propose a method to control the level of a sentence in a text simplification task. Text simplification is a monolingual translation task translating a complex sentence into a simpler and easier to understand the alternative. In this study, we use the grade level of the US education system as the level of the sentence. Our text simplification method succeeds in translating an input into a specific grade level by considering levels of both sentences and words. Sentence level is considered by adding the target grade level as input. By contrast, the word level is considered by adding weights to the training loss based on words that frequently appear in sentences of the desired grade level. Although existing models that consider only the sentence level may control the syntactic complexity, they tend to generate words beyond the target level. Our approach can control both the lexical and syntactic complexity and achieve an aggressive rewriting. Experiment results indicate that the proposed method improves the metrics of both BLEU and SARI.

show abstract

Evaluation Dataset and System for Japanese Lexical Simplification

Kajiwara

Yamamoto

2015

View full text Add to dashboard Cite

We have constructed two research resources of Japanese lexical simplification. One is a simplification system that supports reading comprehension of a wide range of readers, including children and language learners. The other is a dataset for evaluation that enables open discussions with other systems. Both the system and the dataset are made available providing the first such resources for the Japanese language.

show abstract

Complex Word Identification Based on Frequency in a Learner Corpus

Kajiwara¹,

Komachi²

2018

View full text Add to dashboard Cite

show abstract

Negative Lexically Constrained Decoding for Paraphrase Generation

Kajiwara

2019

View full text Add to dashboard Cite

Paraphrase generation can be regarded as monolingual translation. Unlike bilingual machine translation, paraphrase generation rewrites only a limited portion of an input sentence. Hence, previous methods based on machine translation often perform conservatively to fail to make necessary rewrites. To solve this problem, we propose a neural model for paraphrase generation that first identifies words in the source sentence that should be paraphrased. Then, these words are paraphrased by the negative lexically constrained decoding that avoids outputting these words as they are. Experiments on text simplification and formality transfer show that our model improves the quality of paraphrasing by making necessary rewrites to an input sentence.

show abstract

SOME: Reference-less Sub-Metrics Optimized for Manual Evaluations of Grammatical Error Correction

Yoshimura

Kaneko

Kajiwara

et al. 2020

View full text Add to dashboard Cite

We propose a reference-less metric trained on manual evaluations of system outputs for grammatical error correction. Previous studies have shown that reference-less metrics are promising; however, existing metrics are not optimized for manual evaluation of the system output because there is no dataset of system output with manual evaluation. This study manually evaluates the output of grammatical error correction systems to optimize the metrics. Experimental results show that the proposed metric improves the correlation with manual evaluation in both systemand sentence-level meta-evaluation. Our dataset and metric will be made publicly available. 1 2 Related Work pioneered the reference-less GEC metric. They presented a metric based on grammatical error detection tools and linguistic features such as language models, and demonstrated that its performance was close to that of reference-based metrics. Asano et al. (2017) combined three submetrics: grammaticality, fluency, and meaning preservation, and outperformed reference-based metrics. They trained a logistic regression model on the GUG dataset 2 (Heilman et al.

show abstract

Text Classification with Negative Supervision

Ohashi

Takayama

Kajiwara

et al. 2020

View full text Add to dashboard Cite

Advanced pre-trained models for text representation have achieved state-of-the-art performance on various text classification tasks. However, the discrepancy between the semantic similarity of texts and labelling standards affects classifiers, i.e. leading to lower performance in cases where classifiers should assign different labels to semantically similar texts. To address this problem, we propose a simple multitask learning model that uses negative supervision. Specifically, our model encourages texts with different labels to have distinct representations. Comprehensive experiments show that our model outperforms the stateof-the-art pre-trained model on both singleand multi-label classifications, sentence and document classifications, and classifications in three different languages.

show abstract

Controlled and Balanced Dataset for Japanese Lexical Simplification

Kodaira¹,

Kajiwara

Komachi

2016

View full text Add to dashboard Cite

We propose a new dataset for evaluating a Japanese lexical simplification method. Previous datasets have several deficiencies. All of them substitute only a single target word, and some of them extract sentences only from newswire corpus. In addition, most of these datasets do not allow ties and integrate simplification ranking from all the annotators without considering the quality. In contrast, our dataset has the following advantages: (1) it is the first controlled and balanced dataset for Japanese lexical simplification with high correlation with human judgment and (2) the consistency of the simplification ranking is improved by allowing candidates to have ties and by considering the reliability of annotators.

show abstract

12 3 4 5 6

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.