Interspeech 2014 2014
DOI: 10.21437/interspeech.2014-599
|View full text |Cite
|
Sign up to set email alerts
|

A deep neural network approach for sentence boundary detection in broadcast news

Abstract: This paper presents a deep neural network (DNN) approach to sentence boundary detection in broadcast news. We extract prosodic and lexical features at each inter-word position in the transcripts and learn a sequential classifier to label these positions as either boundary or non-boundary. This work is realized by a hybrid DNN-CRF (conditional random field) architecture. The DNN accepts prosodic feature inputs and non-linearly maps them into boundary/non-boundary posterior probability outputs. Subsequently, the… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
4
4
1

Relationship

0
9

Authors

Journals

citations
Cited by 26 publications
(7 citation statements)
references
References 18 publications
(19 reference statements)
0
7
0
Order By: Relevance
“…Early studies on the punctuation restoration task explored a wide range of features such as lexical, acoustic, prosodic, and their combination (Gravano et al, 2009;Levy et al, 2012;Xu et al, 2014;Che et al, 2016a;Szaszák and Tündik, 2019). Graphical model such as conditional random field has been widely used for this task (Lu and Ng, 2010;Zhang et al, 2013) before the emerging of neural network.…”
Section: Related Workmentioning
confidence: 99%
“…Early studies on the punctuation restoration task explored a wide range of features such as lexical, acoustic, prosodic, and their combination (Gravano et al, 2009;Levy et al, 2012;Xu et al, 2014;Che et al, 2016a;Szaszák and Tündik, 2019). Graphical model such as conditional random field has been widely used for this task (Lu and Ng, 2010;Zhang et al, 2013) before the emerging of neural network.…”
Section: Related Workmentioning
confidence: 99%
“…Cuendet et al (2006) makes use of a range of lexical and acoustic features. Xu et al (2014) used prosodic and lexical features to implement a DNN combined with a CRF classifier for broadcast news speech. Atterer et al (2008) used syntactic ground-truth information, in a rare incremental approach, to predict whether the current word on Switchboard is the end of the utterance (dialog act).…”
Section: Related Workmentioning
confidence: 99%
“…They rely on the transcript content to extract features like bag-of-word, POS tags or word embeddings [7,12,16,18,24,27,31]. Mixture of acoustic and lexical features have also been explored [1,13,14,33], which is advantageous when both audio signal and transcript are available.…”
Section: Sentence Boundary Evaluationmentioning
confidence: 99%