2010
DOI: 10.1527/tjsai.25.549
|View full text |Cite
|
Sign up to set email alerts
|

統計的モデル選択に基づいた連続音声からの語彙学習

Abstract: SummaryThis paper proposes a method for the unsupervised learning of lexicons from pairs of a spoken utterance and an object as its meaning under the condition that any priori linguistic knowledge other than acoustic models of Japanese phonemes is not used. The main problems are the word segmentation of spoken utterances and the learning of the phoneme sequences of the words. To obtain a lexicon, a statistical model, which represents the joint probability of an utterance and an object, is learned based on the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

0
10
0

Year Published

2010
2010
2016
2016

Publication Types

Select...
3
2

Relationship

1
4

Authors

Journals

citations
Cited by 6 publications
(10 citation statements)
references
References 10 publications
0
10
0
Order By: Relevance
“…In addition to grounding individual words, previous work has also investigated grounding phrases (referring expressions) to visual objects through semantic decomposition, for example using context free grammars that connect linguistic structures with underlying visual properties [6]. A recent work has applied statistical model selection to automatically acquire lexicons by combining a phoneme acoustic model, a word bigram model, and a word meaning model [20]. Different from above previous work, in our work, the visual perception is indicated by eye gaze.…”
Section: Related Workmentioning
confidence: 99%
“…In addition to grounding individual words, previous work has also investigated grounding phrases (referring expressions) to visual objects through semantic decomposition, for example using context free grammars that connect linguistic structures with underlying visual properties [6]. A recent work has applied statistical model selection to automatically acquire lexicons by combining a phoneme acoustic model, a word bigram model, and a word meaning model [20]. Different from above previous work, in our work, the visual perception is indicated by eye gaze.…”
Section: Related Workmentioning
confidence: 99%
“…Several studies on language acquisition by robots have assumed that robots have no prior lexical knowledge. These studies differ from speech recognition studies based on a large vocabulary and natural language processing studies based on lexical, syntactic, and semantic knowledge [1], [2]. Studies on Akira Taniguchi and Tadahiro Taniguchi are with Ritsumeikan University, 1-1-1 Noji Higashi, Kusatsu, Shiga 525-8577, Japan (email:a.taniguchi@em.ci.ritsumei.ac.jp; taniguchi@em.ci.ritsumei.ac.jp).…”
Section: Introductionmentioning
confidence: 99%
“…A parser, represented in the form of a lexicon and an inventory containing syntactic constructions, is then estimated based on co-occurrence frequencies, which are often utilized to establish mappings between form -typically words -and meaning (e.g., [7][8] [9]). Alignments are computed both bottom-up by first determining structures of rather low complexity and top-down by including syntactic information; learning linguistic structures of rather low complexity from speech has been addressed previously, e.g., learning (novel) words (e.g., [10][8] [11]) or semantically meaningful sequences, so-called acoustic morphemes (e.g., [12][13] [14]). …”
Section: Introductionmentioning
confidence: 99%