Ryohei Sasano scite author profile

Neural encoder-decoder models have shown great success in many sequence generation tasks. However, previous work has not investigated situations in which we would like to control the length of encoder-decoder outputs. This capability is crucial for applications such as text summarization, in which we have to generate concise summaries with a desired length. In this paper, we propose methods for controlling the output sequence length for neural encoder-decoder models: two decoding-based methods and two learning-based methods.1 Results show that our learning-based methods have the capability to control length without degrading summary quality in a summarization task.

show abstract

A Corpus-Based Analysis of Canonical Word Order of Japanese Double Object Constructions

Sasano¹,

Okumura²

2016

View full text Add to dashboard Cite

The canonical word order of Japanese double object constructions has attracted considerable attention among linguists and has been a topic of many studies. However, most of these studies require either manual analyses or measurements of human characteristics such as brain activities or reading times for each example. Thus, while these analyses are reliable for the examples they focus on, they cannot be generalized to other examples. On the other hand, the trend of actual usage can be collected automatically from a large corpus. Thus, in this paper, we assume that there is a relationship between the canonical word order and the proportion of each word order in a large corpus and present a corpusbased analysis of canonical word order of Japanese double object constructions.

show abstract

A fully-lexicalized probabilistic model for Japanese zero anaphora resolution

Sasano

Kawahara²,

Kurohashi

2008

View full text Add to dashboard Cite

This paper presents a probabilistic model for Japanese zero anaphora resolution. First, this model recognizes discourse entities and links all mentions to them. Zero pronouns are then detected by case structure analysis based on automatically constructed case frames. Their appropriate antecedents are selected from the entities with high salience scores, based on the case frames and several preferences on the relation between a zero pronoun and an antecedent. Case structure and zero anaphora relation are simultaneously determined based on probabilistic evaluation metrics.

show abstract

A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis

Sasano

Kurohashi

Okumura

2014

Journal of Natural Language Processing

View full text Add to dashboard Cite

This paper presents a simple but effective approach to unknown word processing in Japanese morphological analysis, which handles 1) unknown words that are derived from words in a pre-defined lexicon and 2) unknown onomatopoeias. Our approach leverages derivation rules and onomatopoeia patterns, and correctly recognizes certain types of unknown words. Experiments revealed that our approach recognized about 4,500 unknown words in 100,000 Web sentences with only roughly 80 harmful side effects and a 6% loss in speed.

show abstract

Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts

Yamada¹,

Hirao²,

Sasano³

et al. 2020

View full text Add to dashboard Cite

Dividing biomedical abstracts into several segments with rhetorical roles is essential for supporting researchers' information access in the biomedical domain. Conventional methods have regarded the task as a sequence labeling task based on sequential sentence classification, i.e., they assign a rhetorical label to each sentence by considering the context in the abstract. However, these methods have a critical problem: they are prone to mislabel longer continuous sentences with the same rhetorical label. To tackle the problem, we propose sequential span classification that assigns a rhetorical label, not to a single sentence but to a span that consists of continuous sentences. Accordingly, we introduce Neural Semi-Markov Conditional Random Fields to assign the labels to such spans by considering all possible spans of various lengths. Experimental results obtained from PubMed 20k RCT and NICTA-PIBOSO datasets demonstrate that our proposed method achieved the best micro sentence-F 1 score as well as the best micro span-F 1 score.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ryohei Sasano

Controlling Output Length in Neural Encoder-Decoders

A Corpus-Based Analysis of Canonical Word Order of Japanese Double Object Constructions

A fully-lexicalized probabilistic model for Japanese zero anaphora resolution

A Simple Approach to Unknown Word Processing in Japanese Morphological Analysis

Sequential Span Classification with Neural Semi-Markov CRFs for Biomedical Abstracts

Contact Info

Product

Resources

About