2012 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI) 2012
DOI: 10.1109/mfi.2012.6343045
|View full text |Cite
|
Sign up to set email alerts
|

An architecture for incremental information fusion of cross-modal representations

Abstract: We present an architecture for natural language processing that parses an input sentence incrementally and merges information about its structure with a representation of visual input, thereby changing the results of parsing. At each step of incremental processing, the elements in the context representation are judged whether they match the content of the sentence fragment up to that step. The information contained in the best matching subset then influences the result of parsing the subsentence. As processing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2018
2018
2021
2021

Publication Types

Select...
3
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(1 citation statement)
references
References 4 publications
0
1
0
Order By: Relevance
“…Another challenge in speech emotion classification is the fusion of the multiple features. A number of previous researches [14,15,16,17,18,19,20] have been reported which focused on major fusion strategies. While most of the above mentioned fusion methods yielded good performance, they almost simply concatenated the multiple features into a single high-dimensional feature vector and fed it into a final classifier or a shallow fusion model which has difficulty in joining learning intrinsic correlations between different acoustic feature representations.…”
Section: Introductionmentioning
confidence: 99%
“…Another challenge in speech emotion classification is the fusion of the multiple features. A number of previous researches [14,15,16,17,18,19,20] have been reported which focused on major fusion strategies. While most of the above mentioned fusion methods yielded good performance, they almost simply concatenated the multiple features into a single high-dimensional feature vector and fed it into a final classifier or a shallow fusion model which has difficulty in joining learning intrinsic correlations between different acoustic feature representations.…”
Section: Introductionmentioning
confidence: 99%