XML holds the promise to yield (1) a more precise search by providing additional information in the elements, (2) a better integrated search of documents from heterogeneous sources, (3) a powerful search paradigm using structural as well as content specifications, and (4) data and information exchange to share resources and to support cooperative search. We survey several indexing techniques for XML documents, grouping them into flatfile, semistructured, and structured indexing paradigms. Searching techniques and supporting techniques for searching are reviewed, including full text search and multistage search. Because searching XML documents can be very flexible, various search result presentations are discussed, as well as database and information retrieval system integration and XML query languages. We also survey various retrieval models, examining how they would be used or extended for retrieving XML documents. To conclude the article, we discuss various open issues that XML poses with respect to information retrieval and database research. IntroductionAn Internet search engine (e.g., Altavista or Infoseek) returns thousands of so-called matched documents from a single query, some of which are relevant and others irrelevant to the query. End users typically have problems with organizing and digesting such vast quantities of information, in which much (i.e., 75% as pointed out by Selberg and Etzioni, 1997) of the information retrieved is likely to be irrelevant. XML holds the promise that searching can be done more precisely because structural, self-describing information and meta-data (e.g., RDF) is available, to allow for context-based and/or category-based search. XML also holds the promise to model heterogeneous data, generated from databases (DBs) or from word processors, thereby enabling search engines to locate and process heterogeneous documents or records.An XML document consists of a set of elements, which are hierarchically structured, as defined by the user. Each element has a name (e.g., p for paragraph), which is defined by the user. Data of an element (say, p) can be stored inside the element delimited by its start tag (i.e., ͗p͘) and its end tag (i.e., ͗/p͘), or it can be stored as values in its attribute (e.g., ͗p idϭ "1"͘). Certain attribute value types are reserved for referencing (e.g., IDREF). An XML element is accessed typically using the XPath language. Child elements and their parent element are separated by a slash. For example, the XPath /header/author/first accesses the first element from the root element header, and then the author element.It is possible to use other mark-up languages (e.g., HTML) or proprietary formats but XML appears to be suitable for a wide variety of information retrieval (IR) tasks, specific enough to reduce modeling complexity and open enough for easy and rapid adoption. A major advantage of XML over HTML is that users can define their own tags. Tag names are typically chosen to incorporate some relationship to the semantics of the contents or the type of co...
ld~~ces in networkg tetiology and the mtabkhtnent of the Morrnation Superhighway have rendered the virtnd Xbrary a concrete possibfi~. ?iTeze currently investigating user &\Terience in _ through a large virtual environment in the contti% of bternet. This provid~users with the abiity to view t+ous virtual objects from tirent & t anca and angles, using common web browsers. To dtiver a good petiorrnance for such applications, we need to addr~s Several issues in Merent resemch discipbes.F&t., we must be able to modd virtual objects tiectivdy. The recently devdoped techniques for mdti-~olution object modting in computer graphics are of great Aue here, since they are capable of sintp~g the object mod-& and therefore reducing the time to render them. Secon& tith the Eted bandwidth constraint of the btemet, we need to reduce the response time by reducing the amount of data requwted over the network One dtemative is to cache object mod& of high -w.Prefetching object mod~by predicting those which~e &dy to be used in the near fnture and dotioading them in advance wiU lead to a Mar improvement. Third, the bt.ernet often tiers from disconnection. A caching mechanism hat dews objects to be cached n?th at least the-mminimum resolution m be wet o provide at least a coarse x
Distributedvirtual environments allow users at different geographical locations to share and interact within a common virtual environment via a local network or through the Internet.To deliver a good performance for such applications, we need to address several issues in different research disciplines.First, we must be able to model virtual objects effectively.The recently developed multi-resolution techniques for object modeling are of great value here, since they are capable of simplifying the object models and therefore reducing the time to render them. This may greatly reduce the demand for rendering performance on the client machines.Second, with the constraint of the limited bandwidth of the Internet, we need to reduce the response time by reducing the amount of data requested over the network. Caching of suitable object models of high affinity will reduce the amount of data requested over the network for a faster response time. Prefetching object models by predicting those which are likely to be used in the near future and downloading them in advance will lead to a similar improvement. Third, the Internet often suffers from disconnection. A caching mechanism that allows objects to be cached with at least their minimum resolutions will be useful to provide at least a coarse view of the objects to the viewer for improved visual perception.In this paper, we describe our implementation of a distributed walkthrough system. Two techniques are fundamental to our system, a mu&-resolzltzon caching mechanism and a set of object prefetching mechanisms. Towards the end of the paper, we quantify the performance of t,he proposed mechanisms.
Internal thought refers to the process of directing attention away from a primary visual task to internal cognitive processing. Internal thought is a pervasive mental activity and closely related to primary task performance. As such, automatic detection of internal thought has significant potential for user modelling in intelligent interfaces, particularly for e-learning applications. Despite the close link between the eyes and the human mind, only few studies have investigated vergence behaviour during internal thought and none has studied moment-to-moment detection of internal thought from gaze. While prior studies relied on long-term data analysis and required a large number of gaze characteristics, we describe a novel method that is computationally light-weight and that only requires eye vergence information that is readily available from binocular eye trackers. We further propose a novel paradigm to obtain ground truth internal thought annotations that exploits human blur perception. We evaluate our method for three increasingly challenging detection tasks: (1) during a controlled math-solving task, (2) during natural viewing of lecture videos, and (3) during daily activities, such as coding, browsing, and reading. Results from these evaluations demonstrate the performance and robustness of vergence-based detection of internal thought and, as such, open up new directions for research on interfaces that adapt to shifts of mental attention. 2
Caching of remote data in a mobile client's local storage can improve data access performance and data availability. Traditional approaches are page-based, without taking advantage of the semantics of cached data. It is difficult for a client to determine if a query could be answered entirely based on locally cached data, forcing it to contact the database server for additional data. We propose a semantic caching mechanism which allows data to be cached as a collection of possibly related blocks, each of which is the result of a previously evaluated query. We investigate mechanisms for transforming projection-selection queries to reuse cached data blocks. This avoids transmitting unwanted data items over low bandwidth wireless channels. Cache replacement techniques based on the semantics of cached data are also proposed. We describe the design of our prototype and study its performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.