The effect of corpus size on case frame acquisition for discourse analysis

Sasano, Ryohei; Kawahara, Daisuke; Kurohashi, Sadao

doi:10.3115/1620754.1620830

Cited by 9 publications

(7 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…However, the research trend of zero anaphora resolution has shifted from such rule-based approaches to machine learningbased approaches because in machine learning we can easily integrate many different types of information, such as morpho-syntactic, semantic and discourse-related information. Researchers have developed methods of zero anaphora resolution for Chinese (Zhao and Ng, 2007;Chen and Ng, 2013), Japanese (Seki et al, 2002;Isozaki and Hirao, 2003;Iida et al, 2007a;Taira et al, 2008;Sasano et al, 2008;Sasano et al, 2009;Imamura et al, 2009;Watanabe et al, 2010;Hayashibe et al, 2011;Iida and Poesio, 2011;Yoshikawa et al, 2011;Hangyo et al, 2013;Yoshino et al, 2013) and Italian (Iida and Poesio, 2011). One critical issue in zero anaphora resolution is optimizing the outputs of sub-problems (e.g., zero anaphor detection and antecedent identification).…”

Section: Related Workmentioning

confidence: 99%

Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition

Iida

Torisawa

Hashimoto

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

In this work, we improve the performance of intra-sentential zero anaphora resolution in Japanese using a novel method of recognizing subject sharing relations. In Japanese, a large portion of intrasentential zero anaphora can be regarded as subject sharing relations between predicates, that is, the subject of some predicate is also the unrealized subject of other predicates. We develop an accurate recognizer of subject sharing relations for pairs of predicates in a single sentence, and then construct a subject shared predicate network, which is a set of predicates that are linked by the subject sharing relations recognized by our recognizer. We finally combine our zero anaphora resolution method exploiting the subject shared predicate network and a state-ofthe-art ILP-based zero anaphora resolution method. Our combined method achieved a significant improvement over the the ILPbased method alone on intra-sentential zero anaphora resolution in Japanese. To the best of our knowledge, this is the first work to explicitly use an independent subject sharing recognizer in zero anaphora resolution.

show abstract

Section: Related Workmentioning

confidence: 99%

Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition

Iida

Torisawa

Hashimoto

et al. 2015

Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

View full text Add to dashboard Cite

show abstract

“…However, it would take three months for this experiment using this 100 million word corpus. 1 Although it is best to use the largest possible corpus for this kind of knowledge acquisition tasks (Sasano et al, 2009), it is infeasible to scale to giga-word corpora using such joint models.…”

Section: Overviewmentioning

confidence: 99%

A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

Kawahara¹,

Peterson

Palmer

2014

Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Self Cite

View full text Add to dashboard Cite

We present an unsupervised method for inducing verb classes from verb uses in gigaword corpora. Our method consists of two clustering steps: verb-specific semantic frames are first induced by clustering verb uses in a corpus and then verb classes are induced by clustering these frames. By taking this step-wise approach, we can not only generate verb classes based on a massive amount of verb uses in a scalable manner, but also deal with verb polysemy, which is bypassed by most of the previous studies on verb clustering. In our experiments, we acquire semantic frames and verb classes from two giga-word corpora, the larger comprising 20 billion words. The effectiveness of our approach is verified through quantitative evaluations based on polysemy-aware gold-standard data.

show abstract

“…Case Frames Construction is a workflow to create data structures in natural language processing called case frames [30,31]. Case frames, which are frequent patterns of predictargument structures, are generated by analysing large mount of text collected from the web.…”

Section: Case Frames Constructionmentioning

confidence: 99%

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Shibata

Choi

Taura

2010

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing

View full text Add to dashboard Cite

This paper studies five real-world data intensive workflow applications in the fields of natural language processing, astronomy image analysis, and web data analysis. Data intensive workflows are increasingly becoming important applications for cluster and Grid environments. They open new challenges to various components of workflow execution environments including job dispatchers, schedulers, file systems, and file staging tools. The keys to achieving high performance are efficient data sharing among executing hosts and locality-aware scheduling that reduces the amount of data transfer. While much work has been done on scheduling workflows, many of them use synthetic or random workload. As such, their impacts on real workloads are largely unknown. Understanding characteristics of real-world workflow applications is a required step to promote research in this area. To this end, we analyse real-world workflow applications focusing on their file access patterns and summarize their implications to schedulers and file system/staging designs.

show abstract

The effect of corpus size on case frame acquisition for discourse analysis

Cited by 9 publications

References 12 publications

Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition

Intra-sentential Zero Anaphora Resolution using Subject Sharing Recognition

A Step-wise Usage-based Method for Inducing Polysemy-aware Verb Classes

File-access patterns of data-intensive workflow applications and their implications to distributed filesystems

Contact Info

Product

Resources

About