Computing Crowd Consensus with Partial Agreement

Hùng, Nguyễn Quốc; Viet, Huynh Huu; Tâm, Nguyễn Thành; Weidlich, Matthias; Yin, Hongzhi; Zhou, Xiaofang

doi:10.1109/tkde.2017.2750683

Cited by 31 publications

(10 citation statements)

References 52 publications

(77 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This assumption matches with real-life settings, e.g. training data, prior knowledge, crowdsourcing, human expert [11,20,21,22,23]. Note that the output size is also known as recall since the size of ground truth is fixed.…”

mentioning

confidence: 71%

Maximal fusion of facts on the web with credibility guarantee

et al. 2019

Self Cite

View full text Add to dashboard Cite

A maximal number of factual claims with credibility higher than the precision requirement are extracted from the Web. • The learning model is up to 20 times faster than traditional learning. • The proposed model extracts up to 6 times more highly credible factual claims than a typical information extraction process. • The proposed model requires less than 57% label information to extract the same number of highly credible factual claims. • The proposed model is robust to 20% noisy data with only 6% deviation.

show abstract

mentioning

confidence: 71%

Maximal fusion of facts on the web with credibility guarantee

et al. 2019

Self Cite

View full text Add to dashboard Cite

show abstract

“…Also, the baseline performs worse for small compression ratios (e.g. 10%), highlighting the practicality of our approach: Acknowledging cognitive load limits of users (b ≤ 20 according to [20,21,42,43,[47][48][49]), our approach helps to identify important data regularities, outperforming a (naive) uniform partitioning. Even with a compression ratio of 90%, the distortion of uniform partitioning is two times higher than our approach.…”

Section: Evaluating the Partition Qualitymentioning

confidence: 96%

What-If Analysis with Conflicting Goals: Recommending Data Ranges for Exploration

Nguyen¹,

Zheng

Weidlich

et al. 2018

2018 IEEE 34th International Conference on Data Engineering (ICDE)

Self Cite

View full text Add to dashboard Cite

What-if analysis is a data-intensive exploration to inspect how changes in a set of input parameters of a model influence some outcomes. It is motivated by a user trying to understand the sensitivity of a model to a certain parameter in order to reach a set of goals that are defined over the outcomes. To avoid an exploration of all possible combinations of parameter values, efficient what-if analysis calls for a partitioning of parameter values into data ranges and a unified representation of the obtained outcomes per range. Traditional techniques to capture data ranges, such as histograms, are limited to one outcome dimension. Yet, in practice, what-if analysis often involves conflicting goals that are defined over different dimensions of the outcome. Working on each of those goals independently cannot capture the inherent trade-off between them. In this paper, we propose techniques to recommend data ranges for what-if analysis, which capture not only data regularities, but also the trade-off between conflicting goals. Specifically, we formulate a parametric data partitioning problem and propose a method to find an optimal solution for it. Targeting scalability to large datasets, we further provide a heuristic solution to this problem. By theoretical and empirical analyses, we establish performance guarantees in terms of runtime and result quality.

show abstract

“…The sample-label information collected via crowdsourcing is generally erroneous, due to the fact that online workers may lack expertise and proper incentives [30], [32]. This heterogeneous nature leads to the diverse submission quality of the completed tasks, pressing an urgent need for quality control [26], [27], [29], [33]- [39].…”

Section: Related Workmentioning

confidence: 99%

“…As a result, they may perform poorly when dealing with the more general multi-label data setting, where each object may have a set of non-exclusive labels, and labels may exhibit semantic correlations. Several multi-label crowd consensus algorithms have been recently proposed [26], [27], [29], [38], [39]. Nowak et al [38] studied the inter-annotator agreement for multilabel image annotation and focused on the annotation quality differences between expert and non-expert workers.…”

Section: Related Workmentioning

confidence: 99%

Active Multilabel Crowd Consensus

Wang

et al. 2021

IEEE Trans. Neural Netw. Learning Syst.

View full text Add to dashboard Cite

Crowdsourcing is an economic and efficient strategy aimed at collecting annotations of data through an online platform. Crowd workers with different expertise are paid for their service, and the task requester usually has a limited budget. How to collect reliable annotations for multi-label data and how to compute the consensus within budget is an interesting and challenging, but rarely studied, problem.In this paper, we propose a novel approach to accomplish Active Multi-label Crowd Consensus (AMCC). AMCC accounts for the commonality and individuality of workers, and assumes that workers can be organized into different groups. Each group includes a set of workers who share a similar annotation behavior and label correlations. To achieve an effective multilabel consensus, AMCC models workers' annotations via a linear combination of commonality and individuality, and reduces the impact of unreliable workers by assigning smaller weights to the group. To collect reliable annotations with reduced cost, AMCC introduces an active crowdsourcing learning strategy that selects sample-label-worker triplets. In a triplet, the selected sample and label are the most informative for the consensus model, and the selected worker can reliably annotate the sample with low cost. Our experimental results on multi-label datasets demonstrate the advantages of AMCC over state-of-the-art solutions on computing crowd consensus and on reducing the budget by choosing cost-effective triplets.

show abstract

Computing Crowd Consensus with Partial Agreement

Cited by 31 publications

References 52 publications

Maximal fusion of facts on the web with credibility guarantee

Maximal fusion of facts on the web with credibility guarantee

What-If Analysis with Conflicting Goals: Recommending Data Ranges for Exploration

Active Multilabel Crowd Consensus

Contact Info

Product

Resources

About