Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions

Hui, Pui-Yu; Meng, Helen

doi:10.1109/tasl.2008.2011509

Cited by 2 publications

(5 citation statements)

References 21 publications

(27 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Speech recognition performance evaluated based on the top-scoring recognition hypotheses gave overall character accuracy of around 78%. 3 We have also developed a pen gesture recognizer based on a simple algorithm that proceeds through a sequential procedure of recognizing a point, a circle and a stroke [37]. This simple pen gesture recognition algorithm can generate -best output hypotheses.…”

Section: Design and Collection Of A Multimodal Corpusmentioning

confidence: 99%

See 1 more Smart Citation

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Hui

Meng

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

Abstract-This paper describes our work in semantic interpretation of a "multimodal language" with speech and gestures using latent semantic analysis (LSA). Our aim is to infer the domain-specific informational goal of multimodal inputs. The informational goal is characterized by lexical terms used in the spoken modality, partial semantics of gestures in the pen modality, as well as term co-occurrence patterns across modalities, leading to "multimodal terms." We designed and collected a multimodal corpus of navigational inquiries. We also obtained perfect (i.e. manual) and imperfect (i.e. automatic via recognition) transcriptions for these. We automatically align parsed spoken locative references (SLRs) with their corresponding pen gesture(s) using the Viterbi alignment, according to their numeric and location type features. Then, we characterize each cross-modal integration pattern as a 3-tuple multimodal term with SLR, pen gesture type and their temporal relationship. We propose to use latent semantic analysis (LSA) to derive the latent semantics from manual (i.e. perfect) and automatic (i.e. imperfect) transcriptions of the collected multimodal inputs. In order to achieve this, both multimodal and lexical terms are used to compose an inquiry-term matrix, which is then factorized using singular value decomposition (SVD) to derive the latent semantics automatically. Informational goal inference based on the latent semantics shows that the informational goal inference accuracy of a disjoint test set is 99% and 84% when a perfect and imperfect projection model is used respectively, which performs significantly better than (at least 9.9% absolute) the baseline performance using vector-space model (VSM).

show abstract

Section: Design and Collection Of A Multimodal Corpusmentioning

confidence: 99%

“…This is used to indicate that there are too few or too many pen gestures aligned with one SLR or vice versa. The detailed explanation of this process of Viterbi alignment is provided in [37]. An illustrative example is shown in Fig.…”

Section: Cross-modal Alignmentmentioning

confidence: 99%

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Hui

Meng

2014

IEEE/ACM Trans. Audio Speech Lang. Process.

Self Cite

View full text Add to dashboard Cite

show abstract

“…An SLR can be a direct (full name, abbreviated name or a contextual phrase such as "my current location") or an indirect one [11]. It may also be a singular, aggregated, plural reference or unspecified on number:  A singular reference can be a direct reference with a full name or an abbreviated name.…”

Section: Spoken Locative Referencesmentioning

confidence: 99%

“…Previously, we have applied Belief Networks [12,13] for task goal inference based on unimodal (speech-only) inputs. However, previous studies [11,19] that compare the spoken part of multimodal inputs with unimodal (speech-only) inputs shows that the former generally has simpler syntactic structures, more diverse vocabularies and different term ordering. Therefore, we explore the use of latent semantic modeling (LSM) for task goal inference, with the objective of uncovering the associations between (unimodal or multimodal) terms and task goals through a data-derived latent space.…”

Section: Introductionmentioning

confidence: 96%

See 1 more Smart Citation

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Hui

Meng

2010

Proceedings of the 15th International Conference on Intelligent User Interfaces

Self Cite

View full text Add to dashboard Cite

This paper describes our work in usage pattern analysis and development of a latent semantic analysis framework for interpreting multimodal user input consisting speech and pen gestures.We have designed and collected a multimodal corpus of navigational inquiries. Each modality carries semantics related to domain-specific task goal. Each inquiry is annotated manually with a task goal based on the semantics. Multimodal input usually has a simpler syntactic structure than unimodal input and the order of semantic constituents is different in multimodal and unimodal inputs. Therefore, we proposed to use semantic analysis to derive the latent semantics from the multimodal inputs using latent semantic modeling (LSM). In order to achieve this, we parse the recognized Chinese spoken input for the spoken locative references (SLR). These SLRs are then aligned with their corresponding pen gesture(s). Then, we characterized the cross-modal integration pattern as 3-tuple multimodal terms with SLR, pen gesture type and their temporal relation. The inquiry-multimodal term matrix is then decomposed using singular value decomposition (SVD) to derive the latent semantics automatically. Task goal inference based on the latent semantics shows that the task goal inference accuracy on a disjoint test set is of 99%.

show abstract

Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions

Cited by 2 publications

References 21 publications

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Latent Semantic Analysis for Multimodal User Input With Speech and Gestures

Usage patterns and latent semantic analyses for task goal inference of multimodal user interactions

Contact Info

Product

Resources

About