2018
DOI: 10.14569/ijacsa.2018.091225
|View full text |Cite
|
Sign up to set email alerts
|

Efficient Reduction of Overgeneration Errors for Automatic Controlled Indexing with an Application to the Biomedical Domain

Abstract: Studies on MetaMap and MaxMatcher has shown that both concept extraction systems suffer from overgeneration problems. Over-generation occurs when the extraction systems mistakenly select an irrelevant concept. One of the reasons for these errors is that these systems use the words to weight the terms of the concepts. In this paper, an Integer Linear Programming model is used to select the optimal subset of extracted concept mentions covering the largest number of important words in the document to be indexed. … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Year Published

2020
2020
2020
2020

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 6 publications
0
1
0
Order By: Relevance
“…For example, given the noun phrase "ocular complications," we obtain three concepts "Ocular", "Complications" and "Complications Specific to Antepartum or Postpartum" because they share at least one word. Recently, a new solution for the over-generation problem was suggested by [45] using an Integer Linear Programming model. This new suggested solution allows the selection of the most relevant concepts by converting the highest number of important terms in the text, which actually was not evaluated.…”
Section: Introductionmentioning
confidence: 99%
“…For example, given the noun phrase "ocular complications," we obtain three concepts "Ocular", "Complications" and "Complications Specific to Antepartum or Postpartum" because they share at least one word. Recently, a new solution for the over-generation problem was suggested by [45] using an Integer Linear Programming model. This new suggested solution allows the selection of the most relevant concepts by converting the highest number of important terms in the text, which actually was not evaluated.…”
Section: Introductionmentioning
confidence: 99%