The area of graph property testing seeks to understand the relation between the global properties of a graph and its local statistics. In the classical model, the local statistics of a graph is defined relative to a uniform distribution over the graphs vertex set. A graph property P is said to be testable if the local statistics of a graph can allow one to distinguish between graphs satisfying P and those that are far from satisfying it.Goldreich recently introduced a generalization of this model in which one endows the vertex set of the input graph with an arbitrary and unknown distribution, and asked which of the properties that can be tested in the classical model can also be tested in this more general setting. We completely resolve this problem by giving a (surprisingly "clean") characterization of these properties. To this end, we prove a removal lemma for vertex weighted graphs which is of independent interest.
Introduction
Background and the main resultProperty testers are fast randomized algorithms whose goal is to distinguish (with high probability, say, 2/3) between objects satisfying some fixed property P and those that are ε-far from satisfying it. Here, ε-far means that an ε-fraction of the input object should be modified in order to obtain an object satisfying P. The study of such problems originated in the seminal papers of Rubinfeld and Sudan [28], Blum, Luby and Rubinfeld [9], and Goldreich, Goldwasser and Ron [20]. Problems of this nature have been studied in so many areas that it will be impossible to survey them here. Instead, the reader is referred to the recent monograph [18] for more background and references. While this area studies questions in theoretical computer science, it has several strong connections with central problems in extremal combinatorics, most notably to the regularity method and the removal lemma, see Subsection 1.2.The classical property testing model assumes that one can uniformly sample entries of the input. In distribution-free testing one assumes that the input is endowed with some arbitrary and unknown distribution D, which also affects the way one defines the distance to satisfying a property. As discussed in [19], one motivation for this model is that it can handle settings in which one cannot produce uniformly distributed entries from the input. Another motivation is that the distribution D can assign higher weight/importance to parts of the input which we want to have higher impact on the distance to satisfying the given property. Until very recently, problems of this type were studied almost exclusively in the setting of testing properties of functions, see [10,11,15,17,24]. Let us mention that distribution-free testing is similar in spirit to the celebrated PAC learning model of Valiant [31], see also the discussion in [27].Our investigation here concerns a distribution-free variant of the adjacency matrix model, also known as the dense graph model. The adjacency matrix model was first defined and studied in [20], where the area of property testing was first...