Xubin Pei scite author profile

Xubin Pei

5Publications

7Citation Statements Received

13Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

DualTable: A hybrid storage model for update optimization in Hive

Liu

Rabl

et al. 2015

View full text Add to dashboard Cite

Hive is the most mature and prevalent data warehouse tool providing SQL-like interface in the Hadoop ecosystem. It is successfully used in many Internet companies and shows its value for big data processing in traditional industries. However, enterprise big data processing systems as in Smart Grid applications usually require complicated business logics and involve many data manipulation operations like updates and deletes. Hive cannot offer sufficient support for these while preserving high query performance. Hive using the Hadoop Distributed File System (HDFS) for storage cannot implement data manipulation efficiently and Hive on HBase suffers from poor query performance even though it can support faster data manipulation. There is a project based on Hive issue Hive-5317 to support update operations, but it has not been finished in Hive's latest version. Since this ACID compliant extension adopts same data storage format on HDFS, the update performance problem is not solved.In this paper, we propose a hybrid storage model called DualTable, which combines the efficient streaming reads of HDFS and the random write capability of HBase. Hive on DualTable provides better data manipulation support and preserves query performance at the same time. Experiments on a TPC-H data set and on a real smart grid data set show that Hive on DualTable is up to 10 times faster than Hive when executing update and delete operations.978-1-4799-7964-6/15/$31.00

show abstract

Applying Big Data Analytics Into Network Security: Challenges, Techniques and Outlooks

Zhang¹,

Shen²,

Pei³

et al. 2016

View full text Add to dashboard Cite

Queue reorganization for subscription congestion avoidance in publish/subscribe systems

Yan

Muthusamy

Chen

et al. 2013

View full text Add to dashboard Cite

An Efficient Distributed Database Clustering Algorithm for Big Data Processing

Sun¹,

Fu²,

Deng³

et al. 2017

View full text Add to dashboard Cite

Abstract. This paper proposes a distributed data clustering technique based on deep neural network. First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network. The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight. Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data. Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.

show abstract

Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems

Sun¹,

Deng²,

Fu³

et al. 2017

View full text Add to dashboard Cite

Abstract-the existing distributed database management system can realize the data storage with high access bandwidth through the cluster, which has the characteristics of reliable data replication and fault detection and fast automatic system recovery. However, with the existing network platform software and computing model, there is still a need to establish a virtual organization that can implement specific requirements based on the data manipulation mechanism and data access patterns in the network. This paper proposes a large-scale data storage and management scheme based on distributed database, which is more suitable for the use and control of large-scale data access and other types of application and data access modes as needed, compared with other technical solutions. Mode can cover multiple orders of magnitude of data exchange and data processing and input / output systems. The experimental results show that this scheme has the advantages of good reliability, easy operation and support of large amount of data processing.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xubin Pei

DualTable: A hybrid storage model for update optimization in Hive

Applying Big Data Analytics Into Network Security: Challenges, Techniques and Outlooks

Queue reorganization for subscription congestion avoidance in publish/subscribe systems

An Efficient Distributed Database Clustering Algorithm for Big Data Processing

Large-Scale Data Storage and Management Scheme Based on Distributed Database Systems

Contact Info

Product

Resources

About