Automated process discovery techniques allow us to extract business process models from event logs. The quality of process models discovered by these techniques can be assessed with respect to various quality criteria related to simplicity and accuracy. One of these criteria, namely precision, captures the extent to which the behavior allowed by a discovered process model is observed in the log. While numerous measures of precision have been proposed in the literature, a recent study has shown that none of them fulfils a set of five axioms that capture intuitive properties behind the concept of precision. In addition, several existing precision measures suffer from scalability issues when applied to models discovered from real-life event logs. This paper presents a versatile framework for defining precision measures based on behavior abstractions. The key idea is that a precision measure can be defined by three ingredients: a function that abstracts a process model (e.g. as a transition system), a function that does the same for an event log, and a function that compares the behavior abstraction of the model with that of the log. We show empirically that different instances of this framework allow us to strike different tradeoffs between scalability and sensitivity. We also show that two instances of the framework based on lossless abstraction functions yield a precision measure that fulfils all the above-mentioned axioms. 1 A third accuracy criterion in automated process discovery is generalization: the extent to which the process model captures behavior that, while not observed in the log, is implied by it.
Conformance checking encompasses a body of process mining techniques which aim to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior. Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process model. Given a cost function, an alignment is optimal when it contains the least number of mismatches between a log trace and a model trace. Determining optimal alignments, however, is computationally expensive, especially in light of the growing size and complexity of event logs from practice, which can easily exceed one million events with traces of several hundred activities. A common limitation of existing alignment techniques is the inability to exploit repetitions in the log. By exploiting a specific form of sequential pattern in traces, namely tandem repeats, we propose a novel technique that uses pre-and post-processing steps to compress the length of a trace and recomputes the alignment cost while guaranteeing that the cost result never under-approximates the optimal cost. In an extensive empirical evaluation with 50 real-life model-log pairs and against five state-of-the-art alignment techniques, we show that the proposed compression approach systematically outperforms the baselines by up to an order of magnitude in the presence of traces with repetitions, and that the cost over-approximation, when it occurs, is negligible.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.