While many statistical models and methods are now available for network analysis, resampling network data remains a challenging problem. Cross-validation is a useful general tool for model selection and parameter tuning, but is not directly applicable to networks since splitting network nodes into groups requires deleting edges and destroys some of the network structure. Here we propose a new network resampling strategy based on splitting node pairs rather than nodes applicable to crossvalidation for a wide range of network model selection tasks. We provide a theoretical justification for our method in a general setting and examples of how our method can be used in specific network model selection and parameter tuning tasks. Numerical results on simulated networks and on a citation network of statisticians show that this cross-validation approach works well for model selection.Statistical methods for analyzing networks have received a lot of attention because of their wide-ranging applications in areas such as sociology, physics, biology and medical sciences. Statistical network models provide a principled approach to extracting salient information about the network structure while filtering out the noise. Perhaps the simplest statistical network model is the famous Erdös-Renyi model [Erdös and Rényi, 1960], which served as a building block for a large body of more complex models, including the stochastic block model (SBM) [Holland et al., 1983], the degree-corrected stochastic block model (DCSBM) [Karrer and Newman, 2011], the mixed membership block model [Airoldi et al., 2008], and the latent space model [Hoff et al., 2002], to name a few.While there has been plenty of work on models for networks and algorithms for fitting them, inference for these models is commonly lacking, making it hard to take advantage of the full power of statistical modeling. Data splitting methods provide a general, simple, and relatively model-free inference framework and are commonly used in modern statistics, with cross-validation (CV) being the tool of choice for many model selection and parameter tuning tasks. For networks, both tasks are important -while there are plenty of models to choose from, it is a lot less clear how to select the best model for the data, and how to choose tuning parameters for the selected model, which is often necessary in order to fit it. In classical settings where the data points are assumed to be an i.i.d. sample, cross-validation works by splitting the data into multiple parts (folds), holding out one fold at a time as a test set, fitting the model on the remaining folds and computing its error on the held-out fold, and finally averaging the errors across all folds to obtain the cross-validation error. The model or the tuning parameter is then chosen to minimize this error. To explain the challenge of applying this idea to networks, we first introduce a probabilistic framework.Recall n is the number of nodes and A is the n × n adjacency matrix. Let D = diag(d 1 , d 2 , · · · , d n ) be the diagonal m...
While graphical models for continuous data (Gaussian graphical models) and discrete data (Ising models) have been extensively studied, there is little work on graphical models for data sets with both continuous and discrete variables (mixed data), which are common in many scientific applications. We propose a novel graphical model for mixed data, which is simple enough to be suitable for high-dimensional data, yet flexible enough to represent all possible graph structures. We develop a computationally efficient regression-based algorithm for fitting the model by focusing on the conditional log-likelihood of each variable given the rest. The parameters have a natural group structure, and sparsity in the fitted graph is attained by incorporating a group lasso penalty, approximated by a weighted lasso penalty for computational efficiency. We demonstrate the effectiveness of our method through an extensive simulation study and apply it to a music annotation data set (CAL500), obtaining a sparse and interpretable graphical model While we focus on binary discrete variables for the main presentation, we also show that the proposed methodology can be easily extended to general discrete variables.
Prediction algorithms typically assume the training data are independent samples, but in many modern applications samples come from individuals connected by a network. For example, in adolescent health studies of risk-taking behaviors, information on the subjects' social network is often available and plays an important role through network cohesion, the empirically observed phenomenon of friends behaving similarly. Taking cohesion into account in prediction models should allow us to improve their performance. Here we propose a network-based penalty on individual node effects to encourage similarity between predictions for linked nodes, and show that incorporating it into prediction leads to improvement over traditional models both theoretically and empirically when network cohesion is present. The penalty can be used with many loss-based prediction methods, such as regression, generalized linear models, and Cox's proportional hazard model. Applications to predicting levels of recreational activity and marijuana usage among teenagers from the AddHealth study based on both demographic covariates and friendship networks are discussed in detail and show that our approach to taking friendships into account can significantly improve predictions of behavior while providing interpretable estimates of covariate effects.PREDICTION MODELS FOR NETWORK-LINKED DATA 5 servations (y 1 , x 1 ), (y 2 , x 2 ), · · · , (y n , x n ), where y i ∈ R is the response variable and x i ∈ R p is the vector of covariates for observation i. We write Y = (y 1 , y 2 , · · · , y n ) T for the response vector, and X = (x 1 , x 2 , · · · , x n ) T for the n × p design matrix. We treat X as fixed and assume its columns have been standardized to have mean 0 and variance 1. We also observe the network connecting the observations, G = (V, E), where V = {1, 2, · · · , n} is the node set of the graph, and E ⊂ V × V is the edge set. We represent the graph by its adjacency matrix A ∈ R n×n , where A uv = 1 if (u, v) ∈ E and 0 otherwise. We assume there are no loops so A vv = 0 for all v ∈ V , and the network is undirected, i.e., A uv = A vu . The (unnormalized) Laplacian of G is given by L = D − A, where D = diag(d 1 , d 2 , · · · , d n ) is the degree matrix, with node degree d u defined by d u = v∈V A uv .
Harmful fungi in nature not only cause diseases in plants, but also fungal infection and poisoning when people and animals eat food derived from crops contaminated with them. Unfortunately, such fungi are becoming increasingly more resistant to traditional synthetic antifungal drugs, which can make prevention and control work increasingly more difficult to achieve. This means they are potentially very harmful to human health and lifestyle. Antifungal peptides are natural substances produced by organisms to defend themselves against harmful fungi. As a result, they have become an important research object to help deal with harmful fungi and overcome their drug resistance. Moreover, they are expected to be developed into new therapeutic drugs against drug-resistant fungi in clinical application. This review focuses on antifungal peptides that have been isolated from bacteria, fungi, and other microorganisms to date. Their antifungal activity and factors affecting it are outlined in terms of their antibacterial spectra and effects. The toxic effects of the antifungal peptides and their common solutions are mentioned. The mechanisms of action of the antifungal peptides are described according to their action pathways. The work provides a useful reference for further clinical research and the development of safe antifungal drugs that have high efficiencies and broad application spectra.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.