Impact of sampling design in estimation of graph characteristics

Cem, Emrah; Tozal, Mehmet Engin; Sarac, Kamil

doi:10.1109/pccc.2013.6742788

Cited by 9 publications

(7 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…In order to assess how representative the subset RepDKG is, we evaluated whether the sampled graph is able to preserve the distributions of several characteristic topological graph properties such as degree, path length and clustering coefficients (Ahmed, Neville & Kompella, 2011), (Cem, Tozal & Sarac, 2013). We have compared the statistics of our subgraph with the statistics of the subgraphs that are gained using a variety of sampling techniques such as Fire Forest Sampling (FFS) (Leskovec & Faloutsos, 2006), Snowball Sampling (SS) (Lee, Kim & Jeong, 2006) and Metropolis-Hastings Sampling (MHS) (Lu & Bressan, 2012).…”

Section: Experimental Evaluation Of the Proposed Modelmentioning

confidence: 99%

Student model initialization using domain knowledge ontology representative subset

Grubišić

Žitko

Stankov

2020

J. Technol. Sci. Educ.

View full text Add to dashboard Cite

In intelligent e-learning systems that adapt a learning and teaching process to student knowledge, it is important to adapt the system as quickly as possible. However, adaptation is not possible until the student model is initialized. In this paper, a new approach to student model initialization using domain knowledge representative subset is described. The approach defines which concepts from domain knowledge should be included in the initial test so the system can make conclusions about what students truly know about domain knowledge. This representative subset of domain knowledge is defined using non-semantic mathematical approach based on graph theory. The initial test, created over a domain knowledge representative subset, guarantees encompassing all concepts that are relevant to domain knowledge. A two-level case study is conducted on what would be the representative subset of one selected domain knowledge. It compares semantically selected domain knowledge representative subsets (semantical analysis was done by domain area experts) to a non-semantical, mathematically selected domain knowledge representative subset. The results of the case study show that problems of inequality of semantically selected domain knowledge representative subsets are easily overcome using the presented approach.

show abstract

Section: Experimental Evaluation Of the Proposed Modelmentioning

confidence: 99%

Student model initialization using domain knowledge ontology representative subset

Grubišić

Žitko

Stankov

2020

J. Technol. Sci. Educ.

View full text Add to dashboard Cite

show abstract

“…Researchers have also studied the task of how to choose an effective sampling scheme, how to evaluate different sampling schemes for studying a particular estimation problem [33,70], and the impact of underlying graph structure and the studied graph property on the effectiveness of the sampling algorithms [73].…”

Section: Other Related Workmentioning

confidence: 99%

Estimation of structural properties of online social networks at the extreme

Cem

Sarac

2016

Computer Networks

Self Cite

View full text Add to dashboard Cite

Sampling is a commonly used technique for studying structural properties of online social networks (OSNs). Due to privacy, business, and performance concerns, OSN service providers impose limitations on data access for third parties. The implication of this practice is that one needs to come up with an applicable sampling scheme that can function under these limitations to efficiently estimate structural properties of interest. In this paper, we study how accurately some important properties of graphs can be estimated under a limited data access model. More specifically, we consider random neighbor access (RNA) model as a rather limited data access model in OSNs. In the RNA model, the only query available to get data from the studied graph is the random neighbor query which returns the id of a random neighbor for a given vertex id. We propose various sampling schemes and estimators for average degree and network size under the RNA model. We conduct extensive experiments on both real world OSN graphs and synthetic graphs (1) to measure the performance of the proposed estimators and (2) to identify the factors affecting the accuracy of our estimators. We find that while the average degree estimators can make accurate estimations with reasonable sample sizes despite the extreme data access limitations of the RNA model, network size estimators require quite large sample sizes for accurate estimations. Figure 2: Proposed Estimators: Illustration of which estimator is used based on the sampling scheme and probing type.Probing type is applicable only on ERSRW sampling scheme. We propose average degree estimator under only ERSRW sampling, so choosing the sampling scheme step is not applicable.1 We use the term estimation performance as a combined measure of precision and 4 precision of the estimation; while in the estimation of the network size, it increases the accuracy, especially when the sampling fraction f > 1. 3. The dynamic nature of the underlying graph adds one more layer to the complexity of the estimation problem. The accuracy of the estimation is limited by how fast the samples can be collected and how fast the property of interest changes. As opposed to the static graph case, larger sample sizes do not provide better estimation results especially when the property of interest increases or decreases over time as the old data becomes unrepresentative of the current data.The outline of the paper is as follows: Section 2 presents the background and the related work. Section 3 presents the RNA model and sampling designs. Section 4 presents the estimators for the RNA model. Section 5 presents our experimental evaluations. Section 6 discusses the practical issues. Section 7 concludes the paper. Background and Related WorkThe RNA model enables us to perform a walk, but not a SRW 2 , on the underlying graph. Nevertheless, the estimation techniques proposed under SRW sampling form the basis for those under the RNA model as the sampling schemes that we use under the RNA model, namely RSRW and ERSRW, are the modificatio...

show abstract

“…Moreover, designing more efficient algorithms and/or leveraging computing power are not always easily available [2,5,19] . Secondly, due to the limitations in data collection mechanism, contemporary graphs that are considered as complete graphs are not completely accessible and partially visible to the users [7,11] . One approach to overcome these features of contemporary graph-structured data collections is sampling , i.e., to sample a representative subgraph and exploit its characteristics.…”

Section: Introductionmentioning

confidence: 99%

Cluster-preserving sampling from fully-dynamic streaming graphs

Zhang

Zhu

Pei

et al. 2019

Information Sciences

View full text Add to dashboard Cite

published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rightsCopyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal.If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User

show abstract

Impact of sampling design in estimation of graph characteristics

Cited by 9 publications

References 30 publications

Student model initialization using domain knowledge ontology representative subset

Student model initialization using domain knowledge ontology representative subset

Estimation of structural properties of online social networks at the extreme

Cluster-preserving sampling from fully-dynamic streaming graphs

Contact Info

Product

Resources

About