A multidatabase system provides integrated access to heterogeneous, autonomous local databases in a distributed system. An important problem in current multidatabase systems is identification of semantically similar data in different local databases. The Summary Schemas Model (SSM) is proposed as an extension to multidatabase systems to aid in semantic identification. The SSM uses a global data structure to abstract the information available in a multidatabase system. This abstracted form allows users to use their own terms (imprecise queries) when accessing data rather than being forced to use system-specified terms. The system uses the global data structure to match the user's terms to the semantically closest available system terms. A simulation of the SSM is presented to compare imprecise-query processing with corresponding query-processing costs in a standard multidatabase system. The costs and benefits of the SSM are discussed, and future research directions are presented.
In this paper, we propose a new term weighting scheme called Term Frequency -Inverse Corpus Frequency (TF-ICF). It does not require term frequency information from other documents within the document collection and thus, it enables us to generate the document vectors of N streaming documents in linear time. In the context of a machine learning application, unsupervised document clustering, we evaluated the effectiveness of the proposed approach in comparison to five widely used term weighting schemes through extensive experimentation. Our results show that TF-ICF can produce document clusters that are of comparable quality as those generated by the widely recognized term weighting schemes and it is significantly faster than those methods.
hen Sun Microsystems introduced its first workstation, the company could not have imagined how quickly workstations would revolutionize computing. The idea of a community of engineers, scientists, or researchers time-sharing on a single mainframe computer could hardly have become ancient any more quickly. The almost instant wide acceptance of workstations and desktop computers indicates that [hey were quickly recognized as giving the best and most flexible performance for the dollar.Desktop computers proliferated for three reasons. First, very large scale integration (VLSI) and wafer scale integration (WSI), despite some problems, increased the gate density of chips while dramatically lowering their production cost.' Moreover. increased gate density permits a more complicated processor, which in turn promotes parallelism.Second, desktop computers distribute processing power to the user in an easily customized open architecture. Rcal-time applications that require intensive 110 and computation need not consume all the resources of a supercomputer. Also, desktop computers support high-definition screens with color and motion far exceeding those available with any multiple-user. shared-resource mainframe.Third, economical. high-bandwidth networks allow desktop computers to share data. thus retaining the most appealing aspect of centralized computing, resource sharing. Moreover, networks allow the computers to share data with dissimilar computing machines. That is perhaps the most important reason for the acceptance of desktop computers. since all the performance in the world is worth little if the machine is isolated.Today's workstations have redefined the way the computing community distributes processing resources, and tomorrow's machines will continue this trend with higher bandwidth networks and higher computational performance. One way to obtain higher computational performance is t o use special parallel coprocessors t o perform functions such as motion and color support of high-definition screens. Future computationally intensive applications suited for desktop computing machines include real-time text, speech, and image processing. These applications require massive parallelism.'
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.