Robust speaker turn role labeling of TV Broadcast News shows

Damnati, Géraldine; Charlet, Delphine

doi:10.1109/icassp.2011.5947650

Cited by 22 publications

(15 citation statements)

References 9 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In [1], we have proposed a multi-stage process for speaker turn role labeling. In this work, we pursue this features (basically word n-grams, with the possibility of DGGLQJ QXPHULFDO IHDWXUHV VXFK DV HORFXWLRQ VSHHG HWF« The aim is to find the label among (polit,¬ polit) which gives the highest probability given the feature vector.…”

Section: Role Labeling Of Speaker Turnsmentioning

confidence: 99%

“…In previous work [1], we proposed a multi-view approach applied to TV Broadcast News (TVBN) shows, where three categories of speaker turns were distinguished (anchor speaker, reporters and other speakers). Beyond this 3-fold distinction, characterization of non-journalist speakers can be extended to special categories of people which have specific speaking style and lexical fields, such as politicians, VSRUWVPHQ ODZ\HUV« 7KLV ZRUN SURSRVHV WR IRFXV RQ RQH particular category, namely politician speakers.…”

Section: Introductionmentioning

confidence: 99%

“…First, it is a rare events detection task: in the TVBN corpus described in details in [1], 8% of the non-journalist speaker turns correspond to politicians (representing only 3% of the total DPRXQW RI VSHDNHU WXUQV :KDW ¶V PRUH SROLWLFLDQ VSHHFK LV characterized by several dimensions. The lexical dimension is of course characteristic of political speech in general but politicians can also be characterized by a particular elocution mode.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Detecting politician speech in TV broadcast news shows

Charlet

Damnati

2012

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Self Cite

View full text Add to dashboard Cite

Politician speaker turn detection in TV BroadcastNews shows is addressed in this paper. After a first role labeling pass of speaker turns among anchor, reporter and other, turns labeled as other are submitted to a politician speech detection process. The proposed approach combines acoustical and lexical cues as well as contextual information, and does not use any specific politician model (person-independent). Experiments on a set of 101 TV broadcast news shows show that the proposed approach, which relies on fully automatic processing, enables to detect politician speech with an equal error rate of 12.1%, which turns to a maximal F-measure of 70.3% due to the unbalanced distribution among politicians and nonpoliticians.

show abstract

Section: Role Labeling Of Speaker Turnsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Detecting politician speech in TV broadcast news shows

Charlet

Damnati

2012

2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI)

Self Cite

View full text Add to dashboard Cite

show abstract

“…[1][2][3][4][5][6][7][8][9] Concerning the audio data, the automatic analysis of the audio signals can offer the users useful information. In the case of broadcast news, automatic processing is related to tasks such as sound recognition, 10,11 speaker recognition, 12 anchor detection, 13 role detection, [14][15][16] story boundary detection, 2,17,18 summary construction from anchor talking, 9,19 channel's quality detection, 20 sound event detection, 21,22 non-linguistic humanproduced sounds detection, 5,6,[23][24][25] audio type segmentation in sport games, 4,26,27 highlight scene extraction from sports games, 3 violence scene detection, 28 music characteristics classification, 29,30 jingle detection, 1 commercial block detection, 8 voice activity detection, 31 language recognition, 32 emotion recognition 33 and speech recognition. 34 Sound recognition is the cornerstone of analysis as typically precedes the other stages.…”

Section: Introductionmentioning

confidence: 99%

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Theodorou

Mporas

Lazaridis

et al. 2017

Int. J. Artif. Intell. Tools

View full text Add to dashboard Cite

Aiming to an automatic sound recognizer for radio broadcasting events, a methodology of clustering the audio feature space using the discrimination ability of the audio descriptors as a criterion, is investigated in this work. From a given and close set of audio events, commonly found in broadcast news transmissions, a large set of audio descriptors is extracted and their data-driven ranking of relevance is clustered, providing a more robust feature selection. The clusters of the feature space are feeding machine learning algorithms implemented as classification models during the experimental evaluation. This methodology showed that support vector machines provide significantly good results, considering the achieved accuracy due to their ability of coping well in high dimensionality experimental conditions.

show abstract

“…Typical roles considered in BN audio are formal roles (also referred as functional roles), i.e., roles imposed from the news format and related to the task each speaker performs in the show like anchorman, journalists, interviewees or soundbites. Common features used to train statistical classifiers consist of lexical features [1] as well as structural features from the recording, prosodic features and Dialog Acts [2,3,4]. More recently, automatic role labeling has also been studied in spontaneous conversations including Broadcast Conversations (BC) [3,5,6] as well as meeting recordings [7,8].…”

Section: Introductionmentioning

confidence: 99%

Automatic speaker role labeling in AMI meetings: Recognition of formal and social roles

Sapru

Valente

2012

2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

View full text Add to dashboard Cite

This work aims at investigating the automatic recognition of speaker role in meeting conversations from the AMI corpus. Two types of roles are considered: formal roles, fixed over the meeting duration and recognized at recording level, and social roles related to the way participants interact between themselves, recognized at speaker turn level. Various structural, lexical and prosodic features as well as Dialog Act tags are exhaustively investigated and combined for this purpose. Results reveal an accuracy of 74% in recognizing the speakers formal roles and an accuracy of 66% (percentage of time) in correctly labeling the social roles. Feature analysis reveals that lexical features provide the higher performances in formal/functional role recognition while prosodic features provide the higher performances in social role recognition. Furthermore results reveal that social role recognition in case of rare roles in the corpus can be improved through the use of lexical and Dialog Act information combined over short time windows.

show abstract

Robust speaker turn role labeling of TV Broadcast News shows

Cited by 22 publications

References 9 publications

Detecting politician speech in TV broadcast news shows

Detecting politician speech in TV broadcast news shows

Data-Driven Audio Feature Space Clustering for Automatic Sound Recognition in Radio Broadcast News

Automatic speaker role labeling in AMI meetings: Recognition of formal and social roles

Contact Info

Product

Resources

About