2012
DOI: 10.14778/2535568.2448943
|View full text |Cite
|
Sign up to set email alerts
|

Truth finding on the deep web

Abstract: The amount of useful information available on the Web has been growing at a dramatic pace in recent years and people rely more and more on the Web to fulfill their information needs. In this paper, we study truthfulness of Deep Web data in two domains where we believed data are fairly clean and data quality is important to people's lives: Stock and Flight. To our surprise, we observed a large amount of inconsistency on data from different sources and also some sources with quite low accuracy. We further applie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
4
1

Citation Types

1
155
0
3

Year Published

2013
2013
2019
2019

Publication Types

Select...
5
3
1

Relationship

1
8

Authors

Journals

citations
Cited by 216 publications
(159 citation statements)
references
References 16 publications
(31 reference statements)
1
155
0
3
Order By: Relevance
“…While extremely powerful, there are scenarios where this sampling model does not apply. Most importantly, data sources are not always independent [24]. Furthermore, the number of data sources l has to be large enough to have sufficient overlap between the sources (see Section 6).…”
Section: Data Integration As Sampling Processmentioning
confidence: 99%
“…While extremely powerful, there are scenarios where this sampling model does not apply. Most importantly, data sources are not always independent [24]. Furthermore, the number of data sources l has to be large enough to have sufficient overlap between the sources (see Section 6).…”
Section: Data Integration As Sampling Processmentioning
confidence: 99%
“…These problems have been studied in areas such as knowledge discovery, web personalization, and fact checking [7][8][9][10]. In order to make sense of the data, we must address problems such as the missing or inconsistent data problems while at the same time coping with the sheer amount of data presented to us.…”
Section: Introductionmentioning
confidence: 99%
“…While the truth discovery problem has been studied from different perspectives [12], it remains inefficient. Waguih et al [18] experimentally evaluated the performance of several truth discovery algorithms on three computing nodes on both real-world and synthetic datasets with various configurations, and concluded that most algorithms have efficiency problems.…”
Section: Introductionmentioning
confidence: 99%