Uncertain Groupings: Probabilistic Combination of Grouping Data

Wanders, Brend; Keulen, Maurice van; Vet, P.E. van der

doi:10.1007/978-3-319-22849-5_17

Cited by 7 publications

(7 citation statements)

References 15 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The latter presents itself both in the number of partitionings as well as in the size of the descriptive sentences. From our experience with a bio-informatics use case [6], the number of partitionings can easily grow into the thousands in real-world applications. The size of the descriptive sentences is determined by the complexity of the dependencies between assertions, its low-level representation, and allowed expressiveness.…”

Section: Optimizationsmentioning

confidence: 99%

“…In our research we actively apply this technology for soft computing data processing tasks such as indeterministic deduplication [4], probabilistic XML data integration [5], and probabilistic integration of data about groupings [6]. Based on these experiences, we find that there are still important open problems in dealing with uncertain data and that the available systems are inadequate on certain aspects.…”

Section: Introductionmentioning

confidence: 99%

“…Optimization opportunities There has been some work on optimization for probabilistic databases, for example, in the context of MayBMS/SPROUT [8,9], but as we experienced in [6], where we apply MayBMS to a bio-informatics homology use case, the research prototypes do not scale well enough to thousands of random variables. By generalizing certain concepts in our formal foundation, we hope to create better understanding of optimization opportunities.…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Revisiting the formal foundation of Probabilistic Databases

Wanders

Keulen

2015

Proceedings of the 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and

Self Cite

View full text Add to dashboard Cite

One of the core problems in soft computing is dealing with uncertainty in data. In this paper, we revisit the formal foundation of a class of probabilistic databases with the purpose to (1) obtain data model independence, (2) separate metadata on uncertainty and probabilities from the raw data, (3) better understand aggregation, and (4) create more opportunities for optimization. The paper presents the formal framework and validates data model independence by showing how to a obtain probabilistic Datalog as well as a probabilistic relational algebra by applying the framework to their non-probabilistic counterparts. We conclude with a discussion on the latter three goals.

show abstract

Section: Optimizationsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

Revisiting the formal foundation of Probabilistic Databases

Wanders

Keulen

2015

Proceedings of the 2015 Conference of the International Fuzzy Systems Association and the European Society for Fuzzy Logic and

Self Cite

View full text Add to dashboard Cite

show abstract

“…Throughout the paper we use an information extraction scenario as running example: the "Paris Hilton example". Although this scenario is from the Natural Language Processing (NLP) domain, note that it is equally applicable to other data integration scenarios such as semantic duplicates [7], entity resolution, uncertain groupings [8], etc.…”

Section: Running Examplementioning

confidence: 99%

“…For details on the first phase, we refer to [2,3], as well as [7][8][9] for techniques on specific extraction and integration problems (merging semantic duplicates, merging grouping data, and information extraction from natural language text, respectively). This paper focuses on the second phase of this process, namely on the problem of how to incorporate evidence of users in the probabilistically integrated data with the purpose to continuously improve its quality as more evidence is gathered.…”

Section: Introductionmentioning

confidence: 99%

Rule-Based Conditioning of Probabilistic Data

Keulen

Kaminski

Matheja

et al. 2018

Lecture Notes in Computer Science

Self Cite

View full text Add to dashboard Cite