Learning multilingual named entity recognition from Wikipedia

Nothman, Joel; Ringland, Nicky; Radford, Will; Murphy, Tara; Curran, James

doi:10.1016/j.artint.2012.03.006

Cited by 274 publications

(211 citation statements)

References 37 publications

Supporting

Mentioning

187

Contrasting

Unclassified

Order By: Relevance

“…They are commonly used to train statistical machine learners but are limited in scope due to the cost of manual annotation. This is a problem because others have shown that more training data leads to higher accuracy language models [1], [2].…”

Section: Introductionmentioning

confidence: 99%

Named Entity Recognizer for Filipino Text Using Conditional Random Field

Patrícia¹,

Alfonso²,

Domingo³

et al. 2013

IJFCC

View full text Add to dashboard Cite

Abstract-The study for a Named Entity Recognizer for Filipino Text Using Conditional Random Field (NERF-CRF) focused creating a system which identifies and classifies named entities present in a given corpus. The named entities were classified into four, namely: person, place, date and org. Named entities that are identified but do not fall in the four classifications are tagged as etc.Different modules were created to achieve the study's purpose, including a tokenizer and a part-of-speech tagger. The conditional random field approach was used in the classification of identified named entities. Filipino biographies were the corpus used in testing the system. The results, based on solving for the F-measure, indicate that the system is 83% accurate, and best in identifying named entity Date with 0% error rate but is unsatisfactory in distinguishing named entity place and org, with 42% and 33% error rates respectively.

show abstract

Section: Introductionmentioning

confidence: 99%

Named Entity Recognizer for Filipino Text Using Conditional Random Field

Patrícia¹,

Alfonso²,

Domingo³

et al. 2013

IJFCC

View full text Add to dashboard Cite

show abstract

“…With the development of multilingual Wikipedia, researchers have been employing it in many multilingual applications [3,16,17,20,23,24]. Similar to the English-only contexts, each dimension in a multilingual context representation vector represented the relatedness of the target entity with a set of entities/words in the corresponding language.…”

Section: Related Workmentioning

confidence: 99%

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

Zhou

Demidova

Cristea

2017

Transactions on Computational Collective Intelligence XXVI

View full text Add to dashboard Cite

“…Richman and Schone utilized the multilingual characteristics of Wikipedia to annotate a large corpus of text with NER tags [14]. Similarly, Nothman et al [15] automatically created multilingual training annotations for NER by exploiting the text and structure of parallel Wikipedia articles in different languages.…”

Section: Named Entity Extractionmentioning

confidence: 99%

A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation

Habib

Keulen

2013

Language Processing and Intelligent Information Systems

View full text Add to dashboard Cite

Abstract. Toponym extraction and disambiguation are key topics recently addressed by fields of Information Extraction and Geographical Information Retrieval. Toponym extraction and disambiguation are highly dependent processes. Not only toponym extraction effectiveness affects disambiguation, but also disambiguation results may help improving extraction accuracy. In this paper we propose a hybrid toponym extraction approach based on Hidden Markov Models (HMM) and Support Vector Machines (SVM). Hidden Markov Model is used for extraction with high recall and low precision. Then SVM is used to find false positives based on informativeness features and coherence features derived from the disambiguation results. Experimental results conducted with a set of descriptions of holiday homes with the aim to extract and disambiguate toponyms showed that the proposed approach outperform the state of the art methods of extraction and also proved to be robust. Robustness is proved on three aspects: language independence, high and low HMM threshold settings, and limited training data.

show abstract

Learning multilingual named entity recognition from Wikipedia

Cited by 274 publications

References 37 publications

Named Entity Recognizer for Filipino Text Using Conditional Random Field

Named Entity Recognizer for Filipino Text Using Conditional Random Field

What’s New? Analysing Language-Specific Wikipedia Entity Contexts to Support Entity-Centric News Retrieval

A Hybrid Approach for Robust Multilingual Toponym Extraction and Disambiguation

Contact Info

Product

Resources

About