Understanding and measuring the predictability of consumer purchasing (basket) behaviour is of significant value. While predictability measures such as entropy have been well studied and leveraged in other sectors, their development and application to very large multi-dimensional data sets present in the retailing sector are less common. While a small number of methods exist, we demonstrate they fail to accord with intuition, leading to the potential for misunderstandings between those who conduct the analysis and those who act on the insights. We delineate the requirements for such a measure in this domain to demonstrate these issues in context. A novel measure is then developed based on entropy to directly measure the predictability of basket composition. The measure is designated as bundle entropy (zero denotes a bundle's total predictability, one the total unpredictability). We empirically compare the proposed bundle entropy against existing measures using two large-scale real-world transactional data sets, each including more than 2,000 households (frequent shoppers) over two years. First, we demonstrate how the proposed measure is the only measure that behaves according to the desired properties. Second, we show empirically that bundle entropy differs noticeably from the other measures. Finally, we consider some use case analyses and discuss the utility of the proposed measure in practice.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.