Abstract:Information on subcategorization and selectional restrictions in a valency dictionary is important for natural language processing tasks such as monolingual parsing, accurate rule-based machine translation and automatic summarization. In this paper we present an efficient method of assigning valency information and selectional restrictions to entries in a bilingual dictionary, based on information in an existing valency dictionary. The method is based on two assumptions: words with similar meaning have similar… Show more
“…In addition to being useful for people as a bilingual dictionary, it is also widely used in NLP applications. For example, it has been the base to make compound noun lexicons (Tanaka and Matsuo, 1999;Ohmori and Higashida, 1999), new bilingual lexicons (Paik et al, 2001;Apel, 2002;Sjöbergh, 2005;Zhang et al, 2005;Fujita and Bond, 2006;Bond and Ogura, 2007) and machine translation transfer rules Nichols et al, 2007).…”
The JMdict/EDICT Japanese-English Dictionary is a freely-available dictionary distributed in XML (JMdict)and text (EDICT) formats. It is widely used as a source of lexical material in dictionary systems and text-processing projects. We propose two refinements to make the dictionary more computationally tractable: marking entries where the English is not a translation equivalent and expanding contracted entries. We then propose and apply semi-automatic methods to refine existing entries. The resulting dictionary is shown to be more suitable for the construction of machine translation rules.
“…In addition to being useful for people as a bilingual dictionary, it is also widely used in NLP applications. For example, it has been the base to make compound noun lexicons (Tanaka and Matsuo, 1999;Ohmori and Higashida, 1999), new bilingual lexicons (Paik et al, 2001;Apel, 2002;Sjöbergh, 2005;Zhang et al, 2005;Fujita and Bond, 2006;Bond and Ogura, 2007) and machine translation transfer rules Nichols et al, 2007).…”
The JMdict/EDICT Japanese-English Dictionary is a freely-available dictionary distributed in XML (JMdict)and text (EDICT) formats. It is widely used as a source of lexical material in dictionary systems and text-processing projects. We propose two refinements to make the dictionary more computationally tractable: marking entries where the English is not a translation equivalent and expanding contracted entries. We then propose and apply semi-automatic methods to refine existing entries. The resulting dictionary is shown to be more suitable for the construction of machine translation rules.
“…For Chinese glosss marked with 'v', its synset ID was used to obtain the English verb synset from WordNet. The verb frames of the English synset were borrowed as indication of different verb sub-categories, under the assumption that words with similar meaning behave similarly syntactically (Fujita & Bond, 2007). Zhong lexicon entries were then generated.…”
This thesis describes the development of Zhong, a computational resource grammar for Chinese, in the framework of Head-driven Phrase Structure Grammar (HPSG: Pollard & Sag, 1994) using Minimal Recursion Semantics (Copestake et al., 2005). In order to increase the grammar's coverage for practical applications, a corpus-driven approach was adopted to systematically expand its lexical and syntactic coverage. The lexicon was expanded through semi-automatic learning lexical entries from an annotated Chinese corpus. Various language phenomena commonly observed in corpora have been analyzed and modeled in the grammar, especially those involving the particle 的 DE. The entire grammar and associated tools are available under an open-source license. A treebank with 798 sentences has been built with the parse trees from the grammar's output. With appropriate trees manually selected from the parses, the treebank was used as a gold standard to train a statistical model which can be used to rank the grammar's output parse trees, both to improve its performance in applications and to be helpful to grammar engineers during development and debugging. To evaluate the grammar's suitability to support applications like grammar feedback systems for second language learners, a small extension of the grammar is also built with MALrules and MAL-types to enable the parsing of sentences containing grammatical errors and detecting the specific errors. The information provided by the grammar would thus allow the feedback system to identify the errors and give appropriate suggestions to the learner.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.