In traditional text clustering methods, documents are represented as "bags of words" without considering the semantic information of each document. For instance, if two documents use different collections of core words to represent the same topic, they may be falsely assigned to different clusters due to the lack of shared core words, although the core words they use are probably synonyms or semantically associated in other forms. The most common way to solve this problem is to enrich document representation with the background knowledge in an ontology. There are two major issues for this approach: (1) the coverage of the ontology is limited, even for WordNet or Mesh, (2) using ontology terms as replacement or additional features may cause information loss, or introduce noise. In this paper, we present a novel text clustering method to address these two issues by enriching document representation with Wikipedia concept and category information. We develop two approaches, exact match and relatedness-match, to map text documents to Wikipedia concepts, and further to Wikipedia categories. Then the text documents are clustered based on a similarity metric which combines document content information, concept information as well as category information. The experimental results using the proposed clustering framework on three datasets (20-newsgroup, TDT2, and LA Times) show that clustering performance improves significantly by enriching document representation with Wikipedia concepts and categories.
Assessing network vulnerability before potential disruptive events such as natural disasters or malicious attacks is vital for network planning and risk management. It enables us to seek and safeguard against most destructive scenarios in which the overall network connectivity falls dramatically. Existing vulnerability assessments mainly focus on investigating the inhomogeneous properties of graph elements, node degree for example, however, these measures and the corresponding heuristic solutions can provide neither an accurate evaluation over general network topologies, nor performance guarantees to large scale networks. To this end, in this paper, we investigate a measure called pairwise connectivity and formulate this vulnerability assessment problem as a new graph-theoretical optimization problem called β-disruptor, which aims to discover the set of critical node/edges, whose removal results in the maximum decline of the global pairwise connectivity. Our results consist of the NP-Completeness and inapproximability proof of this problem, an O(log n log log n) pseudo-approximation algorithm for detecting the set of critical nodes and an O(log 1.5 n) pseudoapproximation algorithm for detecting the set of critical edges. In addition, we devise an efficient heuristic algorithm and validate the performance of the our model and algorithms through extensive simulations.
Abstract-Minimum-latency beaconing schedule (MLBS) in synchronous multihop wireless networks seeks a schedule for beaconing with the shortest latency. This problem is NP-hard even when the interference radius is equal to the transmission radius. All prior works assume that the interference radius is equal to the transmission radius, and the best-known approximation ratio for MLBS under this special interference model is 7. In this paper, we present a new approximation algorithm called strip coloring for MLBS under the general protocol interference model. Its approximation ratio is at most 5 when the interference radius is equal to transmission radius, and is between 3 and 6 in general.
In this paper, we study the problem of distributed virtual backbone construction in sensor networks, where the coverage area of nodes are disks with different radii. This problem is modeled by the construction of a minimum connected dominating set (MCDS) in geometric k-disk graphs. We derive the size relationship of any maximal independent set (MIS) and MCDS in geometric k-disk graphs, and apply it to analyze the performances of two distributed connected dominating set (CDS) algorithms we propose in this paper. These algorithms have bounded performance ratio and low communication overhead. To the best of our knowledge, the results reported in this paper represent the state-of-the-art.
The goal of the next generation Web is to build virtual communities, wherein software agents and people can work in cooperation by sharing knowledge. To achieve this goal, the emerging Semantic Web community has proposed ontologies to express knowledge in a machine understandable way. The process of building and maintaining ontologies, which is known as Ontology Engineering, presents unique challenges. These challenges are related to lack of trustworthy and authoritative knowledge sources and absence of a centralized repository to locate ontologies to be reused. In this paper, we propose a Semantic Web portal, called OntoKhoj that is designed to simplify the Ontology Engineering process. The methodology in developing OntoKhoj is based on algorithms used for searching, aggregating, ranking and classifying ontologies in Semantic Web. The proposed OntoKhoj would 1) allow agents and ontology engineers to retrieve trustworthy, authoritative knowledge, and 2) expedite the process of ontology engineering through extensive reuse of ontologies. We have implemented the OntoKhoj portal and further validated our system on the real ontological data in the Semantic Web.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.