Proceedings of the 30th ACM International Conference on Multimedia 2022
DOI: 10.1145/3503161.3548379
|View full text |Cite
|
Sign up to set email alerts
|

Improving Meeting Inclusiveness using Speech Interruption Analysis

Abstract: Meetings are a pervasive method of communication within all types of companies and organizations, and using remote collaboration systems to conduct meetings has increased dramatically since the COVID-19 pandemic. However, not all meetings are inclusive, especially in terms of the participation rates among attendees. In a recent large-scale survey conducted at Microsoft, the top suggestion given by meeting participants for improving inclusiveness is to improve the ability of remote participants to interrupt and… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
2
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
2
1
1

Relationship

0
4

Authors

Journals

citations
Cited by 4 publications
(3 citation statements)
references
References 40 publications
0
2
0
Order By: Relevance
“…This resulted in six speech features overall. We approximated interruptions as defined by Fu et al [61], by checking two conditions: (a) the interrupter starts speaking before the interrupted has finished and (b) the interrupter stops speaking after the interrupted. In addition, three emotional features, i.e., valence, arousal, and dominance, were derived from our speech audio signals using the wav2vec 2.0 model provided by Wagner et al [62] on their GitHub repository (https://github.com/audeering/w2v2-how-to) Using such a deep learning model for feature extraction from raw speech data has been shown to be beneficial, as it is not constrained by existing knowledge of speech and emotion [63].…”
Section: Audio Feature Calculationmentioning
confidence: 99%
See 1 more Smart Citation
“…This resulted in six speech features overall. We approximated interruptions as defined by Fu et al [61], by checking two conditions: (a) the interrupter starts speaking before the interrupted has finished and (b) the interrupter stops speaking after the interrupted. In addition, three emotional features, i.e., valence, arousal, and dominance, were derived from our speech audio signals using the wav2vec 2.0 model provided by Wagner et al [62] on their GitHub repository (https://github.com/audeering/w2v2-how-to) Using such a deep learning model for feature extraction from raw speech data has been shown to be beneficial, as it is not constrained by existing knowledge of speech and emotion [63].…”
Section: Audio Feature Calculationmentioning
confidence: 99%
“…Our definition of interruptions given in Section 3.3.3 could be considered simplistic, whereas Fu et al [61] also incorporated multi-modal features including video data to decide "who attained the floor".…”
Section: Limitationsmentioning
confidence: 99%
“…These three values are calculated in an absolute and relative manner, resulting in overall six speech features. We approximate interruptions as defined by Fu et al [47] by checking two conditions: a) the interrupter starts speaking before the interrupted has finished and b) the interrupter speaks longer than the interrupted. In addition, three emotional features, i.e., valence, arousal, and dominance, are derived from our speech audio signals using the wav2vec 2.0 model provided by Wagner et al [48] on their GitHub repository 6 .…”
Section: Audio Feature Calculationmentioning
confidence: 99%