Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Lang 2003
DOI: 10.3115/1073445.1073474
|View full text |Cite
|
Sign up to set email alerts
|

Comma restoration using constituency information

Abstract: Automatic restoration of punctuation from unpunctuated text has application in improving the fluency and applicability of speech recognition systems. We explore the possibility that syntactic information can be used to improve the performance of an HMM-based system for restoring punctuation (specifically, commas) in text. Our best methods reduce sentence error rate substantially-by some 20%, with an additional 8% reduction possible given improvements in extraction of the requisite syntactic information. 1 In a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2005
2005
2021
2021

Publication Types

Select...
2
2
2

Relationship

0
6

Authors

Journals

citations
Cited by 8 publications
(3 citation statements)
references
References 10 publications
(12 reference statements)
0
2
0
Order By: Relevance
“…Punctuation placement is determined by a variety of features; considering all possible interactions of these features is hard. We believe that corpus-based algorithms for automatic restoration of punctuation developed for speech recognition applications(Beeferman, Berger, and Lafferty 1998;Shieber and Tao 2003) could help in our task, and we plan to experiment with them in the future.…”
mentioning
confidence: 99%
“…Punctuation placement is determined by a variety of features; considering all possible interactions of these features is hard. We believe that corpus-based algorithms for automatic restoration of punctuation developed for speech recognition applications(Beeferman, Berger, and Lafferty 1998;Shieber and Tao 2003) could help in our task, and we plan to experiment with them in the future.…”
mentioning
confidence: 99%
“…The sentence boundary detection problem is deeply connected to the punctuation recovery problem, especially when predicting punctuation like full stops, question marks, and exclamation marks (Shieber and Tao, 2003), which corresponds to sentence boundaries. These tasks provide a basis for further Natural Language Processing (NLP) tasks, and its impact on subsequent tasks has been analyzed in many speech processing studies Mrozinsk et al, 2006;Ostendorf et al, 2008).…”
Section: Literature Reviewmentioning
confidence: 99%
“…Numerous other strategies have also been devised: combining n-grams with constituency parse information (Shieber and Tao, 2003); maximum entropy using n-gram and part-of-speech features (Huang and Zweig, 2002); conditional random fields (CRFs) (Ueffing et al, 2013); feed-forward neural networks and CRFs on n-gram and lexical features ; even reframing the problem as monolingual machine translation (Peitz et al, 2011).…”
Section: Related Workmentioning
confidence: 99%