Ontology alignment systems are evaluated by various performance scores, which are usually computed by a ratio related directly to the frequency of the true positives. However, such ratios provide little information regarding the uncertainty of the overall performance of the corresponding systems. The comparison is also drawn merely by the juxtaposition of computed scores, and specify that one system is superior to one another provided that its score is higher. Nonetheless, the comparison based solely on two figures would not quantify the significance of difference and would not determine the extent to which one system is better. The problem compounds for comparison over multiple benchmarks since averages and micro-averages of performance scores are considered. In this paper, the evaluation of alignment systems is translated into a statistical inference problem by introducing the notion of risk for alignment systems. The risk with respect to a performance score is shown to follow a binomial distribution and is equivalent to the complement of the score, e.g., precision risk = 1 − precision. It is also demonstrated that the maximum likelihood estimation (MLE) is precisely equivalent to the conventional evaluation by using ratios. Instead of using the MLE, the Bayesian model is developed to estimate the risk with respect to a score (or equivalently, the score itself) as a probability distribution from the performance of the systems over single or multiple benchmarks. As a result, the evaluation outcome is a distribution instead of a figure, which provides a broader view of the overall system performance. A Bayesian test is also devised to compare various systems based on their estimated risks, which can compute the confidence that one system is superior to one another. We report the result of applying the proposed methodology to multiple tracks from the ontology alignment evaluation initiative (OAEI).