Technical terminology: some linguistic properties and an algorithm for identification in text

Justeson, John S.; Katz, Slava M.

doi:10.1017/s1351324900000048

Cited by 540 publications

(339 citation statements)

References 16 publications

Supporting

Mentioning

327

Contrasting

Unclassified

Order By: Relevance

“…This filtering step eliminates only 0.9% of the valid gene and protein names from consideration and significantly increases the accuracy of the recognized terms. It is possible to replace this simple term extractor with a more sophisticated method, relying on frequency and distributional information (Justeson and Katz, 1995) and complementing that with techniques that utilize approximate string matching to identify term variants and new terms similar to old ones, such as the method proposed by Krauthammer et al (2000) specifically for biological terms.…”

Section: Data Collectionmentioning

confidence: 99%

Disambiguating proteins, genes, and RNA in text: a machine learning approach

Hatzivassiloglou¹,

Duboue²,

Rzhetsky³

2001

Bioinformatics

162

122

View full text Add to dashboard Cite

We present an automated system for assigning protein, gene, or mRNA class labels to biological terms in free text. Three machine learning algorithms and several extended ways for defining contextual features for disambiguation are examined, and a fully unsupervised manner for obtaining training examples is proposed. We train and evaluate our system over a collection of 9 million words of molecular biology journal articles, obtaining accuracy rates up to 85%.

show abstract

Section: Data Collectionmentioning

confidence: 99%

Disambiguating proteins, genes, and RNA in text: a machine learning approach

Hatzivassiloglou¹,

Duboue²,

Rzhetsky³

2001

Bioinformatics

162

122

View full text Add to dashboard Cite

show abstract

“…We drop all entries according to this heuristic rule. Naturally, many far more sophisticated algorithms can be employed here, e.g., matching grammatical pattern devised to select true keywords, which could be employed, when the knowledge about the part-of-speech classification is available [12,13]. However, the simple stopword method worked well enough for us, especially that we are mostly aiming at labels for further applications in machine learning and hence we can afford having certain fraction of "bogus labels".…”

Section: Processing Methodsmentioning

confidence: 99%

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Łopuszyński

Bolikowski

2014

Communications in Computer and Information Science

View full text Add to dashboard Cite

Abstract. In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.).

show abstract

“…a dependency pattern with semantic (UMLS) class labels for ARG1 and ARG2. To find the appriate semantic label for a complex argument, we first extract its main term using a linguistic filter adapted from (Justeson and Katz, 1995). The filter extracts a sub-string of the argument that matches the following POS-tag regular expression:…”

Section: Learning Patternsmentioning

confidence: 99%

Relation Extraction for Open and Closed Domain Question Answering

Bouma

Fahmi²,

Mur³

2011

Interactive Multi-Modal Question-Answering

View full text Add to dashboard Cite

One of the most accurate methods in Question Answering uses off-line information extraction to find answers for frequently asked questions. It requires automatic extraction from text of all relation instances for relations that users frequently ask for. In this chapter, we present two methods for learning relation instances for relations relevant in a closed and open domain (medical) question answering system. Both methods try to learn automatically dependency paths that typically connect two arguments of a given relation. The first (lightly supervised) method starts from a seed list of argument instances, and extracts dependency paths from all sentences in which a seed pair occurs. This method works well for large text collections and for seeds which are easily identified, such as named entities, and is well-suited for open domain question answering. In a second experiment, we concentrate on medical relation extraction for the question answering module of the IMIX system. The IMIX corpus is relatively small and relation instances may contain complex noun phrases that do not occur frequently in the exact same form in the corpus. In this case, learning from annotated data is necessary. We show that dependency patterns enriched with semantic concept labels give accurate results for relations that are relevant for a medical question answering system. Both methods improve the performance of the Dutch question answering system Joost.

show abstract

Technical terminology: some linguistic properties and an algorithm for identification in text

Cited by 540 publications

References 16 publications

Disambiguating proteins, genes, and RNA in text: a machine learning approach

Disambiguating proteins, genes, and RNA in text: a machine learning approach

Tagging Scientific Publications Using Wikipedia and Natural Language Processing Tools

Relation Extraction for Open and Closed Domain Question Answering

Contact Info

Product

Resources

About