Kai-Chung Siu scite author profile

Kai-Chung Siu

5Publications

50Citation Statements Received

60Citation Statements Given

How they've been cited

How they cite others

Affiliations

Chinese University of Hong Kong

Publications

Order By: Most citations

Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries

Meng

Siu

2002

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

AbstractÐThis paper describes a methodology for semiautomatic grammar induction from unannotated corpora of informationseeking queries in a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive to (spoken) natural language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or on the availability of annotated corpora. To strive for reasonable coverage on real data, as well as portability across domains and languages, we adopt a statistical approach. Agglomerative clustering using the symmetrized divergence criterion groups words ªspatially.º These words have similar left and right contexts and tend to form semantic classes. Agglomerative clustering using mutual information groups words ªtemporally.º These words tend to co-occur sequentially to form phrases or multiword entities. Our approach is amenable to the optional injection of prior knowledge to catalyze grammar induction. The resultant grammar is interpretable by humans and is amenable to hand-editing for refinement. Hence, our approach is semiautomatic in nature. Experiments were conducted using the ATIS (Air Travel Information Service) corpus and the semiautomatically-induced grammar G SA is compared to an entirely handcrafted grammar G H . G H took two months to develop and gave concept error rates of 7 percent and 11.3 percent, respectively, in language understanding of two test corpora. G SA took only three days to produce and gave concept errors of 14 percent and 12.2 percent on the corresponding test corpora. These results provide a desirable trade-off between language understanding performance and grammar development effort.

show abstract

A novel channel distortion measure for vector quantization and a fuzzy model for codebook index assignment

Siu¹,

Meng²

1999

View full text Add to dashboard Cite

This paper describes a methodology for semi-automatic grammar induction from unannotated corpora belonging to a restricted domain. The grammar contains both semantic and syntactic structures, which are conducive towards language understanding. Our work aims to ameliorate the reliance of grammar development on expert handcrafting or the availability of annotated corpora. To strive for a reasonable model for real data, as well as portability across domain and languages, we adopt a statistical approach. Our approach is also amenable to the optional injection of prior knowledge to aid grammar induction, and subsequent hand editing for grammar refinement. This constitutes the semi-automatic nature of the approach. Experiments with the ATIS corpus showed positive results in semantic parsing, when compared to an entirely handcrafted grammar.

show abstract

Semi-automatic grammar induction for bi-directional English-Chinese machine translation

Siu¹,

Meng²

2001

View full text Add to dashboard Cite

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects

Meng¹,

Keung²,

Siu³

et al. 2002

View full text Add to dashboard Cite

This paper describes CU VOCAL, a Chinese text-to-speech synthesis system that adopts the approach of corpus-based syllable concatenation. We have demonstrated the applicability of the approach primarily for Cantonese, a major dialect of Chinese predominant in Hong Kong, South China and many overseas Chinese communities. This work extends our previous work as described in [1]. Our approach is able to synthesize speech from free-form text, and it can also be optimized for response generation in specific application domains. We have also demonstrated the portability of the approach to Putonghua, the official Chinese dialect, in a domain-optimized setting. Coarticulatory context is expressed in terms of distinctive features. Tonal context is also included. We conducted a series of listening tests using CU VOCAL, which gave favorable performance.

show abstract

Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars

Siu¹,

Meng²,

Wong³

2003

View full text Add to dashboard Cite

We have previously developed a framework for bi-directional English-to-Chinese/Chinese-to-English machine translation using semi-automatically induced grammars from unannotated corpora. The framework adopts an example-based machine translation (EBMT) approach. This work reports on three extensions to the framework. First, we investigate the comparative merits of three distance metrics (Kullback-Leibler, Manhattan-Norm and Gini Index) for agglomerative clustering in grammar induction. Second, we seek an automatic evaluation method that can also consider multiple translation outputs generated for a single input sentence based on the BLEU metric. Third, our previous investigation shows that Chinese-to-English translation has lower performance due to incorrect use of English inflectional forms -a consequence of random selection among translation alternatives. We present an improved selection strategy that leverages information from the example parse trees in our EBMT paradigm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kai-Chung Siu

Semiautomatic acquisition of semantic structures for understanding domain-specific natural language queries

A novel channel distortion measure for vector quantization and a fuzzy model for codebook index assignment

Semi-automatic grammar induction for bi-directional English-Chinese machine translation

CU VOCAL: corpus-based syllable concatenation for Chinese speech synthesis across domains and dialects

Example-based bi-directional Chinese-English machine translation with semi-automatically induced grammars

Contact Info

Product

Resources

About