2009
DOI: 10.1109/tasl.2008.2011509
|View full text |Cite
|
Sign up to set email alerts
|

Cross-Modality Semantic Integration With Hypothesis Rescoring for Robust Interpretation of Multimodal User Interactions

Abstract: Abstract-We develop a framework pertaining to automatic semantic interpretation of multimodal user interactions using speech and pen gestures. The two input modalities abstract the user's intended message differently into input events, e.g., key terms/phrases in speech or different types of gestures in the pen modality. The proposed framework begins by generating partial interpretations for each input event as a ranked list of hypothesized semantics. We devise a cross-modality semantic integration procedure to… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
5
0

Year Published

2010
2010
2014
2014

Publication Types

Select...
1
1

Relationship

2
0

Authors

Journals

citations
Cited by 2 publications
(5 citation statements)
references
References 21 publications
(27 reference statements)
0
5
0
Order By: Relevance
“…Speech recognition performance evaluated based on the top-scoring recognition hypotheses gave overall character accuracy of around 78%. 3 We have also developed a pen gesture recognizer based on a simple algorithm that proceeds through a sequential procedure of recognizing a point, a circle and a stroke [37]. This simple pen gesture recognition algorithm can generate -best output hypotheses.…”
Section: Design and Collection Of A Multimodal Corpusmentioning
confidence: 99%
See 1 more Smart Citation
“…Speech recognition performance evaluated based on the top-scoring recognition hypotheses gave overall character accuracy of around 78%. 3 We have also developed a pen gesture recognizer based on a simple algorithm that proceeds through a sequential procedure of recognizing a point, a circle and a stroke [37]. This simple pen gesture recognition algorithm can generate -best output hypotheses.…”
Section: Design and Collection Of A Multimodal Corpusmentioning
confidence: 99%
“…This is used to indicate that there are too few or too many pen gestures aligned with one SLR or vice versa. The detailed explanation of this process of Viterbi alignment is provided in [37]. An illustrative example is shown in Fig.…”
Section: Cross-modal Alignmentmentioning
confidence: 99%
“…An SLR can be a direct (full name, abbreviated name or a contextual phrase such as "my current location") or an indirect one [11]. It may also be a singular, aggregated, plural reference or unspecified on number:  A singular reference can be a direct reference with a full name or an abbreviated name.…”
Section: Spoken Locative Referencesmentioning
confidence: 99%
“…Previously, we have applied Belief Networks [12,13] for task goal inference based on unimodal (speech-only) inputs. However, previous studies [11,19] that compare the spoken part of multimodal inputs with unimodal (speech-only) inputs shows that the former generally has simpler syntactic structures, more diverse vocabularies and different term ordering. Therefore, we explore the use of latent semantic modeling (LSM) for task goal inference, with the objective of uncovering the associations between (unimodal or multimodal) terms and task goals through a data-derived latent space.…”
Section: Introductionmentioning
confidence: 96%
See 1 more Smart Citation