A Comparison of Personal Name Matching: Techniques and Practical Issues

Christen, Peter

doi:10.1109/icdmw.2006.2

Cited by 228 publications

(195 citation statements)

References 23 publications

Supporting

Mentioning

181

Contrasting

Unclassified

Order By: Relevance

“…To our knowledge, it is the only available parallel corpus Arabic-English. Since this corpus is small, we decided to test on a parallel newspaper corpus which contains 11942 sentences extracted from ANN 4 , referred in the following as ‫ܥ‬ .…”

Section: Resultsmentioning

confidence: 99%

See 1 more Smart Citation

How to Match Bilingual Tweets?

Karima¹,

Smaïli²

2017

Computer Science &Amp; Information Technology (CS &Amp; IT)

View full text Add to dashboard Cite

show abstract

Section: Resultsmentioning

confidence: 99%

“…Name matching can be defined such as the process of determining, whether two name strings are instances of the same name [4]. This task is not difficult, if the two languages use the same alphabet.…”

Section: Proper Names In Arabicmentioning

confidence: 99%

How to Match Bilingual Tweets?

Karima¹,

Smaïli²

2017

Computer Science &Amp; Information Technology (CS &Amp; IT)

View full text Add to dashboard Cite

show abstract

“…Compute the phonetic codification for each word from each transcription using a given codification algorithm. A general description and comparison of the codification algorithms used in our experiments can be found in [17], for further details we refer to [3,4,5,6,7].…”

Section: Constructing the Combined Representationmentioning

confidence: 99%

Combining Word and Phonetic-Code Representations for Spoken Document Retrieval

Reyes-Barragán

Montes-y-Gómez

Villaseñor-Pineda

2011

Computational Linguistics and Intelligent Text Processing

View full text Add to dashboard Cite

Abstract. The traditional approach for spoken document retrieval (SDR) uses an automatic speech recognizer (ASR) in combination with a word-based information retrieval method. This approach has only showed limited accuracy, partially because ASR systems tend to produce transcriptions of spontaneous speech with significant word error rate. In order to overcome such limitation we propose a method which uses word and phonetic-code representations in collaboration. The idea of this combination is to reduce the impact of transcription errors in the processing of some (presumably complex) queries by representing words with similar pronunciations through the same phonetic code. Experimental results on the CLEF-CLSR-2007 corpus are encouraging; the proposed hybrid method improved the mean average precision and the number of retrieved relevant documents from the traditional word-based approach by 3% and 7% respectively.

show abstract

“…Surveys [8,9]. review the various approaches, including named attributes computations [5], schema mapping [2,17] and duplicate detection in hierarchical data [10], all which inform the construction of profile linkage techniques.…”

Section: Record Linkage and Entity Resolutionmentioning

confidence: 99%

“…We adopt the Jaro Winker metric, as it been reported to be one of the best performing [5] metrics for name-like feature. As many identities may have similar or even identical namesakes, the usernames alone are not sufficiently discriminative.…”

Section: Feature Selectionmentioning

confidence: 99%

Online Social Network Profile Linkage

Zhang

Kan

Liu

et al. 2014

Information Retrieval Technology

View full text Add to dashboard Cite

Abstract. Piecing together social signals from people in different online social networks is key for downstream analytics. However, users may have different usernames in different social networks, making the linkage task difficult. To enable this, we explore a probabilistic approach that uses a domain-specific prior knowledge to address this problem of online social network user profile linkage. At scale, linkage approaches that are based on a naïve pairwise comparisons that have quadratic complexity become prohibitively expensive. Our proposed threshold-based canopying framework -named OPL -reduces this pairwise comparisons, and guarantees a upper bound theoretic linear complexity with respect to the dataset size. We evaluate our approaches on real-world, large-scale datasets obtained from Twitter and Linkedin. Our probabilistic classifier integrating prior knowledge into Naïve Bayes performs at over 85% F1-measure for pairwise linkage, comparable to state-of-the-art approaches.

show abstract

A Comparison of Personal Name Matching: Techniques and Practical Issues

Abstract: Technical-DOT-Reports-AT-cs-DOT-anu.edu.au A list of technical reports, including some abstracts and copies of some full reports may be found at:

Cited by 228 publications

References 23 publications

How to Match Bilingual Tweets?

How to Match Bilingual Tweets?

Combining Word and Phonetic-Code Representations for Spoken Document Retrieval

Online Social Network Profile Linkage

Contact Info

Product

Resources

About