Wai Gen Yee scite author profile

This paper introduces a new algorithm for clustering data in high-dimensional feature spaces, called GARDENHD. The algorithm is organized around the notion of data space reduction, i.e. the process of detecting dense areas (dense cells) in the space. It performs effective and efficient elimination of empty areas that characterize typical high-dimensional spaces and an efficient adjacency-connected agglomeration of dense cells into larger clusters. It produces a compact representation that can effectively capture the essence of data. GARDENHD is a hybrid of cell-based and density-based clustering. However, unlike typical clustering methods in its class, it applies a recursive partition of sparse regions in the space using a new space-partitioning strategy. The properties of this partitioning strategy greatly facilitate data space reduction. The experiments on synthetic and real data sets reveal that GARDENHD and its data space reduction are effective, efficient, and scalable.

show abstract

Bridging the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design

Yee

Navathe

Omiecinski

et al. 2002

View full text Add to dashboard Cite

The partitioned exponential file for database storage management

2006

View full text Add to dashboard Cite

Efficient data allocation over multiple channels at broadcast servers

Yee

Navathe

Omiecinski

et al. 2002

IEEE Trans. Comput.

108

View full text Add to dashboard Cite

Broadcast is a scalable way of disseminating data because broadcasting an item satisfies all outstanding client requests for it. However, because the transmission medium is shared, individual requests may have high response times. In this paper, we show how to minimize the average response time given multiple broadcast channels by optimally partitioning data among them. We also offer an approximation algorithm that is less complex than the optimal and show that its performance is near-optimal for a wide range of parameters. Finally, we briefly discuss the extensibility of our work with two simple, yet seldom researched extensions, namely, handling varying sized items and generating single channel schedules.

show abstract

Scaling replica maintenance in intermittently synchronized mobile databases

Yee

Donahoo

Omiecinski

et al. 2001

View full text Add to dashboard Cite

To avoid the high cost of continuous connectivity, a class of mobile applications employs replicas of shared data that are periodically updated. Updates to these replicas are typically performed on a client-by-client basis-that is, the server individually computes and transmits updates to each clientlimiting scalability. By basing updates on replica groups (instead of clients), however, update generation complexity is no longer bound by client population size. Clients then download updates of pertinent groups. Proper group design reduces redundancies in server processing, disk usage and bandwidth usage, and dimininishes the tie between the complexity of updating replicas and the size of the client population. In this paper, we expand on previous work done on group design, include a detailed I/O cost model for update generation, and propose a heuristic-based greedy algorithm for group computation. Experimental results with an adapted commercial replication system demonstrate a significant increase in overall scalability over the client-centric approach. Figure 1: The update server maintains the primary copy and distributes updates on demand to intermittently connected clients that maintain replicas.

show abstract

A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems

Nguyen

Yee

Jia

2007

View full text Add to dashboard Cite

show abstract

Efficient Query Routing by Improved Peer Description in P2P Networks

Yee¹,

Nguyen²,

Jia³

2008

View full text Add to dashboard Cite

Peer-to-peer file-sharing systems commonly use the set-of-terms model-the union of the terms in the shared files-to describe succinctly a peer's shared files. This information is shared with neighbors who use it to guide query routing decisions. The problem with this model, however, is that it falsely suggests term cooccurrences that do not exist in any single file. Consequently, queries get routed erroneously to peers that have no matching files, wasting network and computation resources in the process. We reduce the amount of co-occurrence errors by partitioning each peer's file set and representing the peer as several file partitions instead of one. Experimental evidence demonstrates that it is possible to reduce the network traffic between neighbors by over 50% at virtually no cost.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Wai Gen Yee

Efficient data access to multi-channel broadcast programs

Clustering high-dimensional data using an efficient and effective data space reduction

Bridging the Gap between Response Time and Energy-Efficiency in Broadcast Schedule Design

The partitioned exponential file for database storage management

Efficient data allocation over multiple channels at broadcast servers

Scaling replica maintenance in intermittently synchronized mobile databases

A Tool for Information Retrieval Research in Peer-to-Peer File Sharing Systems

Efficient Query Routing by Improved Peer Description in P2P Networks

Contact Info

Product

Resources

About