Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - 2003
DOI: 10.3115/1119176.1119208
|View full text |Cite
|
Sign up to set email alerts
|

Named entity recognition using a character-based probabilistic approach

Abstract: We present a named entity recognition and classification system that uses only probabilistic character-level features. Classifications by multiple orthographic tries are combined in a hidden Markov model framework to incorporate both internal and contextual evidence. As part of the system, we perform a preprocessing stage in which capitalisation is restored to sentence-initial and all-caps words with high accuracy. We report f-values of 86.65 and 79.78 for English, and 50.62 and 54.43 for the German datasets.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2003
2003
2022
2022

Publication Types

Select...
3
2
2

Relationship

1
6

Authors

Journals

citations
Cited by 11 publications
(4 citation statements)
references
References 5 publications
0
4
0
Order By: Relevance
“…Some years later, many researchers incorporated machine learning algorithms to their systems, but there was still a strong dependency on external resources and domain-specific features and rules (Tjong Kim Sang and De Meulder, 2003). In addition, the majority of the systems used Maximum Entropy (Bender et al, 2003;Chieu and Ng, 2003b;Curran and Clark, 2003;Florian et al, 2003b;Klein et al, 2003) and Hidden Markov Models (Florian et al, 2003b;Klein et al, 2003;Mayfield et al, 2003;Whitelaw and Patrick, 2003). Furthermore, McCallum and Li (2003) used a CRF combined with webaugmented lexicons.…”
Section: Related Workmentioning
confidence: 99%
“…Some years later, many researchers incorporated machine learning algorithms to their systems, but there was still a strong dependency on external resources and domain-specific features and rules (Tjong Kim Sang and De Meulder, 2003). In addition, the majority of the systems used Maximum Entropy (Bender et al, 2003;Chieu and Ng, 2003b;Curran and Clark, 2003;Florian et al, 2003b;Klein et al, 2003) and Hidden Markov Models (Florian et al, 2003b;Klein et al, 2003;Mayfield et al, 2003;Whitelaw and Patrick, 2003). Furthermore, McCallum and Li (2003) used a CRF combined with webaugmented lexicons.…”
Section: Related Workmentioning
confidence: 99%
“…Modeling at the orthographic level has been shown to be a successful method of named entity recognition. Orthographic Tries (Cucerzan and Yarowsky, 1999;Whitelaw and Patrick, 2003; and character n-gram modelling are two methods for capturing orthographic features. While Tries give a rich representation of a word, they are fixed to one boundary of a word and cannot extend beyond unseen character sequences.…”
Section: Character N-gram Modellingmentioning
confidence: 99%
“…To the best knowledge of the authors, the only other attempt to use computational inference methods for this task is Whitelaw and Patrick (2003). Here we assumed all words in the training and raw data sets that were not sentence initial, did not occur in a title sentence, and did not immediately follow punctuation were in the correct case.…”
Section: Normalising Case Informationmentioning
confidence: 99%
“…Chieu and Ng [9] successfully used local features, which are near the word, and global features, which are in the whole document together. Klein et al [14] and Whitelaw et al [15] report that character-based features are useful for recognizing some special structure for the name entity.…”
Section: Name Entity Recognition Paraphrases Acquisition and Heurismentioning
confidence: 99%