Random walk is a means of network node sampling that requires little index maintenance and can function on almost all connected network topologies. With careful guidance, node samples following a desired probability distribution can be generated with the only requirement that the sampling probabilities of each visited node and its direct neighbors are known at each walk step. This paper describes a broad range of network applications that can benefit from such guided random walks in dynamic and decentralized settings. This paper also examines several key issues for implementing random walks in self-organizing networks, including the convergence time of random walks, impact of dynamic network changes and particularly resulted walker losses, and the difficulty of pacing walk steps without synchronized clocks between network nodes. Our result suggests that with proper management, these issues do not cause significant problems under many realistic network environments.
Abstract-Network structure construction and global state maintenance are expensive in large-scale, dynamic peer-to-peer (p2p) networks. With inherent topology independence and low state maintenance overhead, random walk is an excellent tool in such network environments. However, the current uses are limited to unguided or heuristic random walks with no guarantee on their converged node visitation probability distribution. Such a convergence guarantee is essential for strong analytical properties and high performance of many p2p applications. In this paper, we investigate an approach for random walks to converge to application-desired node visitation probability distributions while only requiring information about direct neighbors of each peer. Our approach is guided by the Metropolis-Hastings algorithm for Monte Carlo Markov Chain sampling. Our contributions are three-fold. First, we analyze the convergence time of the random walk node visitation probability distribution on common p2p network topologies. Second, we analyze the fault tolerance of our random walks in dynamic networks with potential walker losses. Third, we present the effectiveness of random walks in assisting three realistic network applications: random membership subset management, search, and load balancing. Both search and load balancing desire random walks with biased node visitation distributions to achieve application-specific goals. Our analysis, simulations, and Internet experiment demonstrate the advantage of our random walks compared with alternative topology-independent index-free approaches.
Summary Considerable amounts of data are being generated during the development and operation of unconventional reservoirs. Statistical methods that can provide data-driven insights into production performance are gaining in popularity. Unfortunately, the application of advanced statistical algorithms remains somewhat of a mystery to petroleum engineers and geoscientists. The objective of this paper is to provide some clarity to this issue, focusing on how to build robust predictive models and how to develop decision rules that help identify factors separating good wells from poor performers. The data for this study come from wells completed in the Wolfcamp Shale Formation in the Permian Basin. Data categories used in the study included well location and assorted metrics capturing various aspects of well architecture, well completion, stimulation, and production. Predictive models for the production metric of interest are built using simple regression and other advanced methods such as random forests (RFs), support-vector regression (SVR), gradient-boosting machine (GBM), and multidimensional Kriging. The data-fitting process involves splitting the data into a training set and a test set, building a regression model on the training set and validating it with the test set. Repeated application of a “cross-validation” procedure yields valuable information regarding the robustness of each regression-modeling approach. Furthermore, decision rules that can identify extreme behavior in production wells (i.e., top x% of the wells vs. bottom x%, as ranked by the production metric) are generated using the classification and regression-tree algorithm. The resulting decision tree (DT) provides useful insights regarding what variables (or combinations of variables) can drive production performance into such extreme categories. The main contributions of this paper are to provide guidelines on how to build robust predictive models, and to demonstrate the utility of DTs for identifying factors responsible for good vs. poor wells.
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.