2004 IEEE International Conference on Acoustics, Speech, and Signal Processing
DOI: 10.1109/icassp.2004.1327189
|View full text |Cite
|
Sign up to set email alerts
|

Dynamic Bayesian networks for meeting structuring

Abstract: This paper is about the automatic structuring of multiparty meetings using audio information. We have used a corpus of 53 meetings, recorded using a microphone array and lapel microphones for each participant. The task was to segment meetings into a sequence of meeting actions, or phases. We have adopted a statistical approach using dynamic Bayesian networks (DBNs). Two DBN architectures were investigated: a two-level hidden Markov model (HMM) in which the acoustic observations were concatenated; and a multist… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
45
0

Publication Types

Select...
5
2
2

Relationship

0
9

Authors

Journals

citations
Cited by 48 publications
(45 citation statements)
references
References 7 publications
0
45
0
Order By: Relevance
“…Previously, we have outlined a meeting action recognition framework based on acoustic and lexical related features and a layered multistream dynamic Bayesian network model [19], [21]. This model combines the advantages of independent feature-stream processing together with a structured approach.…”
Section: B Group Action Recognitionmentioning
confidence: 99%
See 1 more Smart Citation
“…Previously, we have outlined a meeting action recognition framework based on acoustic and lexical related features and a layered multistream dynamic Bayesian network model [19], [21]. This model combines the advantages of independent feature-stream processing together with a structured approach.…”
Section: B Group Action Recognitionmentioning
confidence: 99%
“…The vector consists of all possible products of the six sound activity locations during a time window of three frames [19] where each vector highlights the turn taking interaction pattern around the time . Considering, for simplicity, a smaller turn taking matrix evaluated only on two frames the diagonal elements highlight whether a speaker active at time , is still speaking at time .…”
Section: B Speaker Turn Featuresmentioning
confidence: 99%
“…Dielmann et al [15] proposed two approaches for meeting structuring from audio-only features using multilevel Dynamic Bayesian Networks (DBNs). The first DBN decomposed the group activities as sequences of sub-actions with no explicit meaning.…”
Section: Turn-taking Patternsmentioning
confidence: 99%
“…Hierarchical Hidden Markov Models (HHMMs) and layered hidden Markov models have been used to model various phenomena that exhibit stochastic structures at several different levels in areas such as speech and text recognition, modeling of group actions in meetings and extracting context from video, [12], [15]- [18]. Zhang et al used a two-layer HMM to model individual and group actions during meetings in [16].…”
Section: Theoretical Background and Related Workmentioning
confidence: 99%