Increasingly, datasets are so large they must be summarized in some fashion so that the resulting summary dataset is of a more manageable size, while still retaining as much knowledge inherent to the entire dataset as possible. One consequence of this situation is that the data may no longer be formatted as single values such as is the case for classical data, but rather may be represented by lists, intervals, distributions, and the like. These summarized data are examples of symbolic data. This article looks at the concept of symbolic data in general, and then attempts to review the methods currently available to analyze such data. It quickly becomes clear that the range of methodologies available draws analogies with developments before 1900 that formed a foundation for the inferential statistics of the 1900s, methods largely limited to small (by comparison) datasets and classical data formats. The scarcity of available methodologies for symbolic data also becomes clear and so draws attention to an enormous need for the development of a vast catalog (so to speak) of new symbolic methodologies along with rigorous mathematical and statistical foundational work for these methods.
Abstract. When observations in large data sets are aggregated into smaller more manageable data sizes, the resulting classifications of observations invariably involve symbolic data. In this paper, covariance and correlation functions are introduced for interval-valued symbolic data. These and their associated terms are then used to fit linear regression models to such data. The methods are illustrated with an example from cardiology.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.