Philippe Cudré-Mauroux scite author profile

Location Based Social Networks (LBSNs) have been widely used as a primary data source to study the impact of mobility and social relationships on each other. Traditional approaches manually define features to characterize users' mobility homophily and social proximity, and show that mobility and social features can help friendship and location prediction tasks, respectively. However, these handcrafted features not only require tedious human efforts, but also are difficult to generalize. In this paper, by revisiting user mobility and social relationships based on a large-scale LBSN dataset collected over a long-term period, we propose LBSN2Vec, a hypergraph embedding approach designed specifically for LBSN data for automatic feature learning. Specifically, LBSN data intrinsically forms a hypergraph including both user-user edges (friendships) and user-time-POI-semantic hyperedges (check-ins). Based on this hypergraph, we first propose a random-walk-with-stay scheme to jointly sample user check-ins and social relationships, and then learn node embeddings from the sampled (hyper)edges by preserving n-wise node proximity (n = 2 or 4). Our evaluation results show that LBSN2Vec both consistently and significantly outperforms the state-of-the-art graph embedding methods on both friendship and location prediction tasks, with an average improvement of 32.95% and 25.32%, respectively. Moreover, using LBSN2Vec, we discover the asymmetric impact of mobility and social relationships on predicting each other, which can serve as guidelines for future research on friendship and location prediction in LBSNs.

show abstract

TrajStore: An adaptive storage system for very large trajectory data sets

Cudré-Mauroux¹,

Wu²,

Madden³

2010

177

130

View full text Add to dashboard Cite

Abstract-The rise of GPS and broadband-speed wireless devices has led to tremendous excitement about a range of applications broadly characterized as "location based services". Current database storage systems, however, are inadequate for manipulating the very large and dynamic spatio-temporal data sets required to support such services. Proposals in the literature either present new indices without discussing how to cluster data, potentially resulting in many disk seeks for lookups of densely packed objects, or use static quadtrees or other partitioning structures, which become rapidly suboptimal as the data or queries evolve. As a result of these performance limitations, we built TrajStore, a dynamic storage system optimized for efficiently retrieving all data in a particular spatiotemporal region. TrajStore maintains an optimal index on the data and dynamically co-locates and compresses spatially and temporally adjacent segments on disk. By letting the storage layer evolve with the index, the system adapts to incoming queries and data and is able to answer most queries via a very limited number of I/Os, even when the queries target regions containing hundreds or thousands of different trajectories.

show abstract

HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms with Concept Drift

Yang

Rettig

et al. 2017

View full text Add to dashboard Cite

Abstract-Histogram-based similarity has been widely adopted in many machine learning tasks. However, measuring histogram similarity is a challenging task for streaming data, where the elements of a histogram are observed in a streaming manner. First, the ever-growing cardinality of histogram elements makes any similarity computation inefficient. Second, the concept-drift issue in the data streams also impairs the accurate assessment of the similarity. In this paper, we propose to overcome the above challenges with HistoSketch, a fast similarity-preserving sketching method for streaming histograms with concept drift. Specifically, HistoSketch is designed to incrementally maintain a set of compact and fixed-size sketches of streaming histograms to approximate similarity between the histograms, with the special consideration of gradually forgetting the outdated histogram elements. We evaluate HistoSketch on multiple classification tasks using both synthetic and real-world datasets. The results show that our method is able to efficiently approximate similarity for streaming histograms and quickly adapt to concept drift. Compared to full streaming histograms gradually forgetting the outdated histogram elements, HistoSketch is able to dramatically reduce the classification time (with a 7500x speedup) with only a modest loss in accuracy (about 3.5%).

show abstract

GridVine: An Infrastructure for Peer Information Management

Cudré-Mauroux

Agarwal

Aberer

2007

IEEE Internet Comput.

View full text Add to dashboard Cite

GridVine is a semantic overlay infrastructure based on a peer-to-peer (P2P) access structure. Built following the principle of data independence, it separates a logical layer -in which data, schemas, and schema mappings are managed -from a physical layer consisting of a structured P2P network supporting decentralized indexing, key load-balancing, and efficient routing. The system is decentralized, yet fosters semantic interoperability through pair-wise schema mappings and query reformulation. GridVine's heterogeneous but semantically related information sources can be queried transparently using iterative query reformulation. The authors discuss a reference implementation of the system and several mechanisms for resolving queries collaboratively.

show abstract

Efficient Versioning for Scientific Array Databases

Seering¹,

Cudré-Mauroux

Madden³

2012

View full text Add to dashboard Cite

Abstract-In this paper, we describe a versioned database storage manager we are developing for the SciDB scientific database. The system is designed to efficiently store and retrieve array-oriented data, exposing a "no-overwrite" storage model in which each update creates a new "version" of an array. This makes it possible to perform comparisons of versions produced at different times or by different algorithms, and to create complex chains and trees of versions.We present algorithms to efficiently encode these versions, minimizing storage space or IO cost while still providing efficient access to the data. Additionally, we present an optimal algorithm that, given a long sequence of versions, determines which versions to encode in terms of each other (using delta compression) to minimize total storage space. We compare the performance of these algorithms on real world data sets from the National Oceanic and Atmospheric Administration (NOAA), OpenStreetMaps, and several other sources. We show that our algorithms provide better performance than existing version control systems not optimized for array data, both in terms of storage size and access time, and that our delta-compression algorithms are able to substantially reduce the total storage space when versions exist with a high degree of similarity. I. INTRODUCTIONIn the SciDB project (http://scidb.org), we are building a new database system designed to manage very large array-oriented data, which arises in many scientific applications. Rather than trying to represent such arrays inside of a relational model (which we found to be inefficient in our previous work [1]), the key idea in SciDB is to build a database from the ground-up using arrays as the primary storage representation, with a query language for manipulating those arrays. Such an array-oriented data model and query language is useful in many scientific applications, such as astronomy and biology settings, where the raw data consists of large collections of imagery or sequence data that needs to be filtered, subsetted, and processed.As a part of the SciDB project, we have spent a large amount of time talking to scientists about their requirements from a data management system (see the "Use Cases" section of the scidb.org website), and one of the features that is consistently cited is the need to be able to access historical versions of data, representing, for example, previous sensor readings, or derived data from historical raw data (and implying the need for a no overwrite storage model.)In this paper, we present the design of the no-overwrite storage manager we have developed for SciDB based on the

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.