2002
DOI: 10.1207/s15516709cog2601_4
|View full text |Cite
|
Sign up to set email alerts
|

Learning words from sights and sounds: a computational model

Abstract: This paper presents an implemented computational model of word acquisition which learns directly from raw multimodal sensory input. Set in an information theoretic framework, the model acquires a lexicon by finding and statistically modeling consistent cross-modal structure. The model has been implemented in a system using novel speech processing, computer vision, and machine learning algorithms. In evaluations the model successfully performed speech segmentation, word discovery and visual categorization from … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

1
335
0
2

Year Published

2004
2004
2013
2013

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 358 publications
(345 citation statements)
references
References 38 publications
1
335
0
2
Order By: Relevance
“…In [16], an incremental version of Support Vector Machines is used to acquire visual categories. In the context of humanrobot interaction, some recent approaches also explore the combination of incremental learning and interaction with teachers to ground vocabulary about physical objects [10,5,13].…”
Section: State Of the Artmentioning
confidence: 99%
“…In [16], an incremental version of Support Vector Machines is used to acquire visual categories. In the context of humanrobot interaction, some recent approaches also explore the combination of incremental learning and interaction with teachers to ground vocabulary about physical objects [10,5,13].…”
Section: State Of the Artmentioning
confidence: 99%
“…A number of approaches to language acquisition in the phonetic realm frame the problem as recognizing entire words in longer utterances before phones are acquired (ten Bosch et al 2008;Roy and Pentland 2002;Werker and Curtin 2005). Presumably, phones would then be derived in a subsequent step from the recognized wordsalthough some authors raise the possibility that phones and segmentation into phones do not play any important role in language acquisition and that all learning happens on the level of words and utterances.…”
Section: Semi-supervised Approachesmentioning
confidence: 99%
“…Blackburn and Young 1996;Roy and Pentland 2002;Coen 2006). However, these systems are not primarily concerned with segmentation and correspondence learning.…”
Section: Semi-supervised Approachesmentioning
confidence: 99%
“…Behaviour-based approaches are "data-driven" because the result of the learning process is determined largely by low-level features in the environment and less by any pre-defined knowledge in an ontology. Recent work in concept formation involving symbol grounding includes [12,13].…”
Section: Agent Architectures To Support Groundingmentioning
confidence: 99%