2010
DOI: 10.1109/tamd.2010.2051437
|View full text |Cite
|
Sign up to set email alerts
|

Spatio–Temporal Multimodal Developmental Learning

Abstract: Abstract-It is elusive how the skull-enclosed brain enables spatio-temporal multimodal developmental learning. By multimodal, we mean that the system has at least two sensory modalities, e.g., visual and auditory in our experiments. By spatio-temporal, we mean that the behavior from the system depends not only on the spatial pattern in the current sensory inputs, but also those of the recent past. Traditional machine learning requires humans to train every module using hand-transcribed data, using handcrafted … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
4
0

Year Published

2011
2011
2020
2020

Publication Types

Select...
6
3

Relationship

1
8

Authors

Journals

citations
Cited by 11 publications
(5 citation statements)
references
References 39 publications
1
4
0
Order By: Relevance
“…Results of such investigations provide further support for the findings described in earlier sections, suggesting that spatially and/or temporally aligned information from multiple modalities-auditory, visual, tactile, and proprioceptive-guides our attention, influences the manner in which we explore and engage with our environment, and shapes our perceptual representations to facilitate the acquisition of higher-order skills including object and self-recognition, word learning, and social behaviors such as imitation and cooperation. In addition, this work reinforces the role of top-down processes such as executive control in multisensory function (e.g., Al-azzawi et al 2018, Arsenio & Fitzpatrick 2005, Metta & Fitzpatrick 2003, Torres-Jara et al 2005, Wang & Xin 2018, Zhang & Weng 2010. This type of work will undoubtedly yield new insights into the mechanisms by which multisensory experience bootstraps broader cognition and learning, and by which cognitive processes influence one's interaction and engagement with the environment to impact sensory experience.…”
Section: Discussionsupporting
confidence: 73%
“…Results of such investigations provide further support for the findings described in earlier sections, suggesting that spatially and/or temporally aligned information from multiple modalities-auditory, visual, tactile, and proprioceptive-guides our attention, influences the manner in which we explore and engage with our environment, and shapes our perceptual representations to facilitate the acquisition of higher-order skills including object and self-recognition, word learning, and social behaviors such as imitation and cooperation. In addition, this work reinforces the role of top-down processes such as executive control in multisensory function (e.g., Al-azzawi et al 2018, Arsenio & Fitzpatrick 2005, Metta & Fitzpatrick 2003, Torres-Jara et al 2005, Wang & Xin 2018, Zhang & Weng 2010. This type of work will undoubtedly yield new insights into the mechanisms by which multisensory experience bootstraps broader cognition and learning, and by which cognitive processes influence one's interaction and engagement with the environment to impact sensory experience.…”
Section: Discussionsupporting
confidence: 73%
“…Each of these inputs is processed into a fixed-length vector, then lexical items arise by associations between vectors that represent the corresponding speech and an object's shape. Zhang and Weng (2003) process raw temporally uncoupled audio-visual data using a hierarchical tree clustering method, under touch guidance for classification by a self-organizing autonomous incremental learner (SAIL) robot (see also Zhang and Weng (2010)). Roy (2005) also highlights the importance of combining physical actions and speech in order to interpret words and basic speech acts in terms of schemas, which are grounded through a causalpredictive cycle of action and perception.…”
Section: Related Workmentioning
confidence: 99%
“…The hierarchical discriminant regression (HDR) by Hwang & Weng [190] and the incremental HDR (IHDR) by Weng & Hwang [191] use clusters in the high-dimensional output space as virtual labels to supervise clusters in ascending space. The HDR engine has been used for a variety of applications, from robot visual navigation [191], speech recognition [192], skill transfer [193], to visuo-auditory joint learning [194]. In these systems, the numerical output vector and the input vector were combined as an expanded input to the regressor.…”
Section: ) Motor Output Is Directly Used For Learningmentioning
confidence: 99%