2015
DOI: 10.1109/tkde.2014.2320737
|View full text |Cite
|
Sign up to set email alerts
|

Are Data Sets Like Documents?: Evaluating Similarity-Based Ranked Search over Scientific Data

Abstract: The past decade has seen a dramatic increase in the amount of data captured and made available to scientists for research. This increase amplifies the difficulty scientists face in finding the data most relevant to their information needs. In prior work, we hypothesized that Information Retrieval-style ranked search can be applied to data sets to help a scientist discover the most relevant data amongst the thousands of data sets in many formats, much like text-based ranked search helps users make sense of the … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
2
1

Citation Types

0
6
0

Year Published

2015
2015
2021
2021

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(6 citation statements)
references
References 30 publications
0
6
0
Order By: Relevance
“…A true primary key will have a uniqueness ratio of 1, but because the profiler uses the Hyperloglog sketch to compute this, we may have a small error rate, so we simply check that the value is close to 1. When we retrieve the content-similar candidates, we iterate over them (line 18) and check whether they are PK/FK candidates (line 19), in which case we add the candidate PK/FK relationship to the EKG (20). This method is similar to PowerPivot's [7] approach, and works well in practice, as shown in the evaluation.…”
Section: Building and Maintaining The Ekgmentioning
confidence: 98%
“…A true primary key will have a uniqueness ratio of 1, but because the profiler uses the Hyperloglog sketch to compute this, we may have a small error rate, so we simply check that the value is close to 1. When we retrieve the content-similar candidates, we iterate over them (line 18) and check whether they are PK/FK candidates (line 19), in which case we add the candidate PK/FK relationship to the EKG (20). This method is similar to PowerPivot's [7] approach, and works well in practice, as shown in the evaluation.…”
Section: Building and Maintaining The Ekgmentioning
confidence: 98%
“…T d behind the oth our similarity fu data within the uch summarizati search engine ty between the g [14]. The result ity [14,15] We separately t poral, and envi es from a prior the dataset con that study. We s Figure 5: in eac oint-wise calcul culated from the scores for tempo atial search term in Figure 5c.…”
Section: Zing Datamentioning
confidence: 99%
“…With correlat search term tes n is reasonably v imilar ranking. W of using a co ance is reasonabl rsus applying th f the bubble re in both scoring other high-scor and scores calc ons in the datase ested search ter ironmental varia study [15]. We ntents to the 2,9 show the results ch case we grap lation (on the v e dataset summ oral search term ms in Figure 5 The results for d n. Finally, we co ms for each searc g orientation of most cases, the ween the two app ficient of determ mbined scores.…”
Section: Zing Datamentioning
confidence: 99%
See 1 more Smart Citation
“…From user perspectives, research has revealed the similarities and differences between dataset search and document retrieval (Kern & Mathiak, 2015;Megler & Maier, 2015), and recognized that locating data for reuse in research is challenging for many (Gregory et al, 2020). Specifically, data discovery and reuse are purpose-driven activities, characterized by a variety of data needs and discovery strategies (Gregory et al, 2020b) that are interwoven with processes of sensemaking (Koesten et al, 2020).…”
Section: Introductionmentioning
confidence: 99%