Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Com 2009
DOI: 10.3115/1620754.1620830
|View full text |Cite
|
Sign up to set email alerts
|

The effect of corpus size on case frame acquisition for discourse analysis

Abstract: This paper reports the effect of corpus size on case frame acquisition for discourse analysis in Japanese. For this study, we collected a Japanese corpus consisting of up to 100 billion words, and constructed case frames from corpora of six different sizes. Then, we applied these case frames to syntactic and case structure analysis, and zero anaphora resolution. We obtained better results by using case frames constructed from larger corpora; the performance was not saturated even with a corpus size of 100 bill… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2009
2009
2023
2023

Publication Types

Select...
7
1

Relationship

2
6

Authors

Journals

citations
Cited by 9 publications
(7 citation statements)
references
References 12 publications
0
7
0
Order By: Relevance
“…However, the research trend of zero anaphora resolution has shifted from such rule-based approaches to machine learningbased approaches because in machine learning we can easily integrate many different types of information, such as morpho-syntactic, semantic and discourse-related information. Researchers have developed methods of zero anaphora resolution for Chinese (Zhao and Ng, 2007;Chen and Ng, 2013), Japanese (Seki et al, 2002;Isozaki and Hirao, 2003;Iida et al, 2007a;Taira et al, 2008;Sasano et al, 2008;Sasano et al, 2009;Imamura et al, 2009;Watanabe et al, 2010;Hayashibe et al, 2011;Iida and Poesio, 2011;Yoshikawa et al, 2011;Hangyo et al, 2013;Yoshino et al, 2013) and Italian (Iida and Poesio, 2011). One critical issue in zero anaphora resolution is optimizing the outputs of sub-problems (e.g., zero anaphor detection and antecedent identification).…”
Section: Related Workmentioning
confidence: 99%
“…However, the research trend of zero anaphora resolution has shifted from such rule-based approaches to machine learningbased approaches because in machine learning we can easily integrate many different types of information, such as morpho-syntactic, semantic and discourse-related information. Researchers have developed methods of zero anaphora resolution for Chinese (Zhao and Ng, 2007;Chen and Ng, 2013), Japanese (Seki et al, 2002;Isozaki and Hirao, 2003;Iida et al, 2007a;Taira et al, 2008;Sasano et al, 2008;Sasano et al, 2009;Imamura et al, 2009;Watanabe et al, 2010;Hayashibe et al, 2011;Iida and Poesio, 2011;Yoshikawa et al, 2011;Hangyo et al, 2013;Yoshino et al, 2013) and Italian (Iida and Poesio, 2011). One critical issue in zero anaphora resolution is optimizing the outputs of sub-problems (e.g., zero anaphor detection and antecedent identification).…”
Section: Related Workmentioning
confidence: 99%
“…However, it would take three months for this experiment using this 100 million word corpus. 1 Although it is best to use the largest possible corpus for this kind of knowledge acquisition tasks (Sasano et al, 2009), it is infeasible to scale to giga-word corpora using such joint models.…”
Section: Overviewmentioning
confidence: 99%
“…Case Frames Construction is a workflow to create data structures in natural language processing called case frames [30,31]. Case frames, which are frequent patterns of predictargument structures, are generated by analysing large mount of text collected from the web.…”
Section: Case Frames Constructionmentioning
confidence: 99%