Proceedings of the Fourth ACM Conference on Digital Libraries 1999
DOI: 10.1145/313238.313253
|View full text |Cite
|
Sign up to set email alerts
|

Semantic indexing for a complete subject discipline

Abstract: As part of the Illinois Digital Library Initiative (DLI) project we developed "scalable semantics" technologies. These statistical techniques enabled us to index large collections for deeper search than word matching. Through the auspices of the DARPA Information Management program, we are developing an integrated analysis environment, the Interspace Prototype, that uses "semantic indexing" as the foundation for supporting concept navigation. These semantic indexes record the contextual correlation of noun phr… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

0
7
0

Year Published

2001
2001
2011
2011

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 14 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…In the exclusive set, the largest domain is Artificial Intelligence with 856375 web pages while the largest domain in the inclusive set is Operating Systems with 634544 web pages. A similar exercise was performed in the biological sciences in [8]. In our experiment, the exclusive partition set was used to build the subject repositories.…”
Section: Domain Partitionsmentioning
confidence: 99%
“…In the exclusive set, the largest domain is Artificial Intelligence with 856375 web pages while the largest domain in the inclusive set is Operating Systems with 634544 web pages. A similar exercise was performed in the biological sciences in [8]. In our experiment, the exclusive partition set was used to build the subject repositories.…”
Section: Domain Partitionsmentioning
confidence: 99%
“…[6,7,8] Such methods have also been investigated in the "Interspace" 4 prototype. [9,10] The main principles of this approach are discussed in section 3.2.…”
Section: Treatment Of Semantic Heterogeneitymentioning
confidence: 99%
“…[14] Full-text terms are obtained by tokenising the full-text of an Internet document, eliminating stop words, and stemming the remaining terms using a Porter stemmer. 9 For weighting the terms the inverse document frequency is used. [15] The full-text is then indexed with full-text terms having a weight greater than a certain minimum threshold.…”
Section: Fig 2 Parallel Corpus Simulation With Vague Abstract and Fmentioning
confidence: 99%
“…Locating similar cohorts requires the use of sophisticated statistical matching techniques, such as self-organizing maps 64 or concept co-occurrence. 65 However, the health care providers need only use the matching software, not understand the algorithms that are implemented. The underlying technology would be embedded within the health care infrastructure for Internet health monitors.…”
Section: Internet Health Monitors Across Whole Populationsmentioning
confidence: 99%
“…A major use of such a national database is to support physicians in prescribing treatments for a patient, by examining treatment patterns of cohorts of similar patients in the database. Locating similar cohorts requires the use of sophisticated statistical matching techniques, such as self‐organizing maps 64 or concept co‐occurrence 65 . However, the health care providers need only use the matching software, not understand the algorithms that are implemented.…”
Section: Internet Health Monitors Across Whole Populationsmentioning
confidence: 99%