Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction

In this work, we consider the problem of uncertainty estimation for Transformer-based models. We investigate the applicability of uncertainty estimates based on dropout usage at the inference stage (Monte Carlo dropout). The series of experiments on natural language understanding tasks shows that the resulting uncertainty estimates improve the quality of detection of error-prone instances. Special attention is paid to the construction of computationally inexpensive estimates via Monte Carlo dropout and Determinantal Point Processes.

show abstract

“…In future work, we are seeking to improve UEs quality obtained using the DPP dropout with the help of calibration (Safavi et al, 2020)…”

Section: Discussionmentioning

confidence: 99%

How Certain is Your Transformer?

Shelmanov¹,

Tsymbalov²,

Puzyrev³

et al. 2021

Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

View full text Add to dashboard Cite

show abstract

“…In fact, it is often good practise to use so-called "hard" negative samples, which are similar to entities in T . A better alternative for finding entities not in T would be using more advanced techniques as proposed in [16].…”

Section: Closed-world Assumptionmentioning

confidence: 99%

Approximate Knowledge Graph Query Answering: From Ranking to Binary Classification

Bakel

Aleksiev

Daza

et al. 2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Large, heterogeneous datasets are characterized by missing or even erroneous information. This is more evident when they are the product of community effort or automatic fact extraction methods from external sources, such as text. A special case of the aforementioned phenomenon can be seen in knowledge graphs, where this mostly appears in the form of missing or incorrect edges and nodes.Structured querying on such incomplete graphs will result in incomplete sets of answers, even if the correct entities exist in the graph, since one or more edges needed to match the pattern are missing. To overcome this problem, several algorithms for approximate structured query answering have been proposed. Inspired by modern Information Retrieval metrics, these algorithms produce a ranking of all entities in the graph, and their performance is further evaluated based on how high in this ranking the correct answers appear.In this work we take a critical look at this way of evaluation. We argue that performing a ranking-based evaluation is not sufficient to assess methods for complex query answering. To solve this, we introduce Message Passing Query Boxes (MPQB), which takes binary classification metrics back into use and shows the effect this has on the recently proposed query embedding method MPQE.

show abstract

“…We use Recall (R@k), the fraction of known missing/future links that are in the size-k set returned by the method, and Precision (P@k), the fraction of the k pairs that are known to be missing/future links. Recall is a more important metric, since (1) the returned set of pairs P does not contain final predictions, but rather pairs for a LP method to make final decisions about, and (2) our real-world graphs are inherently incomplete, and thus pairs returned that are not known to be missing links, could nonetheless be missing in the original dataset prior to ground-truth removal (i.e., the openworld assumption [25]). We report both in Table III.…”

Section: B Recall and Precision (Rq1)mentioning

confidence: 99%

“…Link prediction is a long-studied problem that attempts to predict either missing links in an incomplete graph, or links that are likely to form in the future. This has applications in discovering unknown protein interactions to speed up the discovery of new drugs, friend recommendation in social networks, knowledge graph completion, and more [1], [15], [16], [25]. Techniques range from heuristics, such as predicting links based on the number of common neighbors between a pair of nodes, to machine learning techniques, which formulate the link prediction problem as a binary classification problem over node pairs [7], [29].…”

Section: Introductionmentioning

confidence: 99%

A Hidden Challenge of Link Prediction: Which Pairs to Check?

Belth,

Büyükçakır,

Koutra

2021

Preprint

Self Cite

View full text Add to dashboard Cite

The traditional setup of link prediction in networks assumes that a test set of node pairs, which is usually balanced, is available over which to predict the presence of links. However, in practice, there is no test set: the ground-truth is not known, so the number of possible pairs to predict over is quadratic in the number of nodes in the graph. Moreover, because graphs are sparse, most of these possible pairs will not be links. Thus, link prediction methods, which often rely on proximity-preserving embeddings or heuristic notions of node similarity, face a vast search space, with many pairs that are in close proximity, but that should not be linked. To mitigate this issue, we introduce LINKWALDO, a framework for choosing from this quadratic, massively-skewed search space of node pairs, a concise set of candidate pairs that, in addition to being in close proximity, also structurally resemble the observed edges. This allows it to ignore some high-proximity but low-resemblance pairs, and also identify high-resemblance, lower-proximity pairs. Our framework is built on a model that theoretically combines Stochastic Block Models (SBMs) with node proximity models. The block structure of the SBM maps out where in the search space new links are expected to fall, and the proximity identifies the most plausible links within these blocks, using locality sensitive hashing to avoid expensive exhaustive search. LINKWALDO can use any node representation learning or heuristic definition of proximity, and can generate candidate pairs for any link prediction method, allowing the representation power of current and future methods to be realized for link prediction in practice. We evaluate LINKWALDO on 13 networks across multiple domains, and show that on average it returns candidate sets containing 7-33% more missing and future links than both embedding-based and heuristic baselines' sets.

show abstract

Evaluating the Calibration of Knowledge Graph Embeddings for Trustworthy Link Prediction

Cited by 21 publications

References 21 publications

How Certain is Your Transformer?

How Certain is Your Transformer?

Approximate Knowledge Graph Query Answering: From Ranking to Binary Classification

A Hidden Challenge of Link Prediction: Which Pairs to Check?

Contact Info

Product

Resources

About