The strength of co-authorship in gene name disambiguation

Farkas, Richárd

doi:10.1186/1471-2105-9-69

Cited by 9 publications

(5 citation statements)

References 12 publications

(31 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…These can then be used to train a classifier to distinguish the correct identifier from incorrect ones [ 58 ]. Knowledge of paper co-authorship has been found to be useful in identifier disambiguation,[ 59 ] based on the idea that an author uses gene names consistently across all of their publications or may work on a specific set of genes consistently.…”

Section: Entity Normalisationmentioning

confidence: 99%

What the papers say: Text mining for genomics and systems biology

2010

View full text Add to dashboard Cite

Keeping up with the rapidly growing literature has become virtually impossible for most scientists. This can have dire consequences. First, we may waste research time and resources on reinventing the wheel simply because we can no longer maintain a reliable grasp on the published literature. Second, and perhaps more detrimental, judicious (or serendipitous) combination of knowledge from different scientific disciplines, which would require following disparate and distinct research literatures, is rapidly becoming impossible for even the most ardent readers of research publications. Text mining -- the automated extraction of information from (electronically) published sources -- could potentially fulfil an important role -- but only if we know how to harness its strengths and overcome its weaknesses. As we do not expect that the rate at which scientific results are published will decrease, text mining tools are now becoming essential in order to cope with, and derive maximum benefit from, this information explosion. In genomics, this is particularly pressing as more and more rare disease-causing variants are found and need to be understood. Not being conversant with this technology may put scientists and biomedical regulators at a severe disadvantage. In this review, we introduce the basic concepts underlying modern text mining and its applications in genomics and systems biology. We hope that this review will serve three purposes: (i) to provide a timely and useful overview of the current status of this field, including a survey of present challenges; (ii) to enable researchers to decide how and when to apply text mining tools in their own research; and (iii) to highlight how the research communities in genomics and systems biology can help to make text mining from biomedical abstracts and texts more straightforward.

show abstract

Section: Entity Normalisationmentioning

confidence: 99%

What the papers say: Text mining for genomics and systems biology

2010

View full text Add to dashboard Cite

show abstract

“…Author disambiguation has been used for diverse applications such as building social networks, normalizing gene names and analyzing collaborations. Farkas (35) was successful in using authors' information for improving the accuracy of a baseline gene normalization system from 80% to 97%. Large scale social network analysis of disambiguated author information is useful for finding key scientific leaders who are "low publishers" in scientific journals (36).…”

Section: Limitationsmentioning

confidence: 99%

NEMO: Extraction and normalization of organization names from PubMed affiliations

Jonnalagadda¹,

Topham²

2010

Disc Collab

View full text Add to dashboard Cite

Background: We are witnessing an exponential increase in biomedical research citations in PubMed. However, translating biomedical discoveries into practical treatments is estimated to take around 17 years, according to the 2000 Yearbook of Medical Informatics, and much information is lost during this transition. Pharmaceutical companies spend huge sums to identify opinion leaders and centers of excellence. Conventional methods such as literature search, survey, observation, self‐identification, expert opinion, and sociometry not only need much human effort, but are also non‐comprehensive. Such huge delays and costs can be reduced by “connecting those who produce the knowledge with those who apply it”. A humble step in this direction is large‐scale discovery of persons and organizations involved in specific areas of research. This can be achieved by automatically extracting and disambiguating author names and affiliation strings retrieved through Medical Subject Heading (MeSH) terms and other keywords associated with articles in PubMed. In this study, we propose NEMO (Normalization Engine for Matching Organizations), a system for extracting organization names from the affiliation strings provided in PubMed abstracts, building a thesaurus (list of synonyms) of organization names, and subsequently normalizing them to a canonical organization name using the thesaurus. Results: We used a parsing process that involves multi‐layered rule matching with multiple dictionaries. The normalization process involves clustering based on weighted local sequence alignment metrics to address synonymy at word level, and local learning based on finding connected components to address synonymy. The graphical user interface and java client library of NEMO are available at http://lnxnemo.sourceforge.net. Conclusion: NEMO associates each biomedical paper and its authors with a unique organization name and the geopolitical location of that organization. This system provides more accurate information about organizations than the raw affiliation strings provided in PubMed abstracts. It can be used for : a) bimodal social network analysis that evaluates the research relationships between individual researchers and their institutions; b) improving author name disambiguation; c) augmenting National Library of Medicine (NLM)’s Medical Articles Record System (MARS) system for correcting errors due to OCR on affiliation strings that are in small fonts; and d) improving PubMed citation indexing strategies (authority control) based on normalized organization name and country.

show abstract

“…In the biomedical domain researchers have focused on supervised methods [8][9][10][11] and using established knowledge [12][13][14][15] to perform gene name normalization and resolve abbreviations. According to the recent BioCreAtIvE challenge, the former problem can be solved with up to 81% success rate [14] for human genes, which are challenging with 5.5 synonyms per name (therefore many genes are named identically).…”

Section: Algorithms For Word Sense Disambiguationmentioning

confidence: 99%

“…The above approaches use cosine similarity [12], SVM [10,11], Bayes, decision trees, induced rules [8], and background knowledge sources such as the Unified Medical Language System (UMLS) [16], Medical Subject Headings (MeSH) [17], and the Gene Ontology (GO) [18]. Two approaches use metadata, such as authors [15] and Journal Descriptor Indexing [13]. Most of the unsupervised approaches so far were evaluated outside the biomedical domain [19][20][21][22][23][24][25], with the exception of [26], who used relations between terms given by the UMLS for unsupervised WSD of medical documents and achieved 74% precision and 49% recall.…”

Section: Algorithms For Word Sense Disambiguationmentioning

confidence: 99%

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

et al. 2009

View full text Add to dashboard Cite

Background: Ontology term labels can be ambiguous and have multiple senses. While this is no problem for human annotators, it is a challenge to automated methods, which identify ontology terms in text. Classical approaches to word sense disambiguation use co-occurring words or terms. However, most treat ontologies as simple terminologies, without making use of the ontology structure or the semantic similarity between terms. Another useful source of information for disambiguation are metadata. Here, we systematically compare three approaches to word sense disambiguation, which use ontologies and metadata, respectively.

show abstract

The strength of co-authorship in gene name disambiguation

Cited by 9 publications

References 12 publications

What the papers say: Text mining for genomics and systems biology

What the papers say: Text mining for genomics and systems biology

NEMO: Extraction and normalization of organization names from PubMed affiliations

Biomedical word sense disambiguation with ontologies and metadata: automation meets accuracy

Contact Info

Product

Resources

About