Abstract. This paper proposes a distributed data clustering technique based on deep neural network. First, each record in the distributed database is taken as an input vector, and its characteristics are extracted and input to the input layer of the depth neural network. The weight of the connection is trained by BP algorithm, and the training of depth neural network output is realized by adjusting the weight. Finally, the data clustering results are judged according to the similarity of the current vector corresponding to the output data. Experimental results based on small-scale distributed systems show that this method has better test set accuracy than traditional k-means clustering method, and is more suitable for large-scale data clustering in the distributed environments.
With high bandwidth and low latency, the InfiniBand protocol has been widely used in distributed databases and high-performance computing in recent years. Compared to traditional PCI bus, 10 Gigabit Ethernet and Myrinet technology, InfiniBand in addition to the delay and bandwidth on the dominant and has a better quality of service. This paper first introduces the basic situation and principle of InfiniBand protocol and its application status in high performance computing field. Based on InfiniBand protocol, this paper proposes a high-performance message passing method based on remote direct memory access, such as delay and peak bandwidth Indicators have very good characteristics. The experimental results show that compared with the traditional technology, the improved method proposed in this paper reduces the delay of more than 20% and the bandwidth is more than doubled, and improves the system performance by reducing the transmission time of the control message.
Efficiently scheduling resources in large scale data center is a key problem that distributed resource management systems face. In the cloud computing environment, the quantity of resources expand and the size of users raises dramatically. It brought up following challenges such as accessing to vast amount of resource information, handling of highly concurrent user requests, tremendous pressure of the system brought up by the update of the mass resources and so on. Traditional resource management systems based on centralized or hierarchical structure have pool expansibility and can't satisfy the new large scale applications. And existing distributed resource scheduling methods (such as P2P routing based and DHTs based resource scheduling methods) can't process user requests with high concurrency rate and high-frequency resource updates well enough. We propose a resource schedule model based on Key-Value Store. The model solves the problem of storing mass resources and efficiently accessing resource information using Key-Value Store. Distributed resource scheduling method based on range-partition can locate the appropriate resources rapidly and also reduce the cost of resource updates by extended invalid push protocol. The evaluation on data from Planetlab shows that, compared to current, and to improve the scheduling efficiency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.