Recently, Yager and Petry were proposing a quality-based methodology to combine data provided by multiple probabilistic sources to improve the quality of information for decision-makers. This paper offers a sort of companion paper that adapts this methodology to possibilistic sources. Possibility theory is particularly well suited to cope with incomplete information from poor-data sources. The methodology and algorithms used for the probabilistic approach are adapted for the possibilistic case. Both approaches are then compared by the means of a numerical example and four experimental benchmark datasets: one, the IRIS data set, being data-poorer than the three other ones (Diabetes dataset, Glass dataset and Liver-disorder dataset). A vector representation is introduced for a possibility distribution as in the probabilistic case and, the Gini's formulation of entropy is being used. However, the Gini's entropy has to be used differently than with the probabilistic case. This has an impact on the selection of subsets. A fusion scheme is designed to select the best-quality subsets according to two information quality factors: quantity of information and source credibility. Results obtained from comparison of both approaches on the four experimental benchmarks confirm the superiority of the possibilistic approach in the presence of information scarcity or incompleteness.
Measuring similarity is of a great interest in many research areas such as in data sciences, machine learning, pattern recognition, text analysis and information retrieval to name a few. Literature has shown that possibility is an attractive notion in the context of distinguishability assessment and can lead to very efficient and computationally inexpensive learning schemes. This paper focuses on determining the similarity between two possibility distributions. A review of existing similarity measures within the possibilistic framework is presented first. Then, similarity measures are analyzed with respect to their capacity to satisfy a set of required properties that a similarity measure should own. Most of the existing possibilistic similarity measures produce undesirable outcomes since they generally depend on the application context. A new similarity measure, called InfoSpecificity, is introduced and the similarity measures are categorized into three main methods: morphic-based, amorphic-based and hybrid. Two experiments are being conducted using four benchmark databases. The aim of the experiments is to compare the efficiency of the possibilistic similarity measures when applied to real data. Empirical experiments have shown good results for the hybrid methods, particularly with the InfoSpecificity measure. In general, the hybrid methods outperform the other two categories when evaluated on small-size samples, i.e., poor-data context (or poor-informed environment) where possibility theory can be used at the greatest benefit.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.