We have previously developed a framework for bi-directional English-to-Chinese/Chinese-to-English machine translation using semi-automatically induced grammars from unannotated corpora. The framework adopts an example-based machine translation (EBMT) approach. This work reports on three extensions to the framework. First, we investigate the comparative merits of three distance metrics (Kullback-Leibler, Manhattan-Norm and Gini Index) for agglomerative clustering in grammar induction. Second, we seek an automatic evaluation method that can also consider multiple translation outputs generated for a single input sentence based on the BLEU metric. Third, our previous investigation shows that Chinese-to-English translation has lower performance due to incorrect use of English inflectional forms -a consequence of random selection among translation alternatives. We present an improved selection strategy that leverages information from the example parse trees in our EBMT paradigm.
This work extends the semi-automatic grammar induction approach previously proposed in [I]. The data-driven approach leams semantic and phrasal categories h m a mining corpus of unnnnolated natural language queries in a specific domain. The approach can be seeded with pre-specified semantic categories to expedite the leaming process. Grammar rules areautomatically acquired by an agglomerative clustering procedure, and the resulting grammar may be hand-edited easily for refmement. 7his work attempts to improve the grammar induction framework by leveraging information in the SQL query that accompanies every mining query. The SQL expression specifies the action of database access in relation to the query, and hence provides information about meaningful natural language strucNres that should to be captured in induced grammar. We have also incorporated the use of Information Gain in place of Mulual Information to capme phmsal structures, as well as the determination of an automatic stopping criterion for agglomerative clustering.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.