Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital Library

Carevic, Zeljko; Roy, Debapriya Basu; Mayr, Philipp

doi:10.1007/978-3-030-54956-5_14

Cited by 6 publications

(12 citation statements)

References 11 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…A comprehensive literature review on dataset retrieval practices is provided in [17] focusing on dataset retrieval practices in different disciplines. Research in this area covers, for instance, the analysis of informationseeking behaviour during dataset retrieval through observations [24], questionnaires and interviews [15,23], and transaction-log studies [9,22].…”

Section: Related Workmentioning

confidence: 99%

“…In Equation 4, the weight of the query (𝑤 𝑞 ) can be defined in a similar way as defined in retrievability (Equation 1). The usefulness of a document may also depend on the difficulty of the query [11,12] 9 . A document 𝑑 should be considered more useful if it is retrieved and consumed following a query 𝑄 than any other document, say 𝑑 ′ with an associated query 𝑄 ′ which is relatively easier than 𝑄 (i.e.…”

Section: From Retrievability To Usefulnessmentioning

confidence: 99%

“…Recently, numerous studies have been conducted to further identify the characteristics of dataset retrieval. These studies include the observation of data retrieval practices [24], interviews and online questionnaires [15,23] and transaction log analysis [9,22].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Studying retrievability of publications and datasets in an integrated retrieval system

Roy

Carevic

Mayr

2022

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

In this paper, we investigate the retrievability of datasets and publications in a real-life Digital Library (DL). The measure of retrievability was originally developed to quantify the influence that a retrieval system has on the access to information. Retrievability can also enable DL engineers to evaluate their search engine to determine the ease with which the content in the collection can be accessed. Following this methodology, in our study, we propose a system-oriented approach for studying dataset and publication retrieval. A speciality of this paper is the focus on measuring the accessibility biases of various types of DL items and including a metric of usefulness. Among other metrics, we use Lorenz curves and Gini coefficients to visualize the differences of the two retrievable document types (specifically datasets and publications). Empirical results reported in the paper show a distinguishable diversity in the retrievability scores among the documents of different types.

show abstract

Section: Related Workmentioning

confidence: 99%

Section: From Retrievability To Usefulnessmentioning

confidence: 99%

See 1 more Smart Citation

Studying retrievability of publications and datasets in an integrated retrieval system

Roy

Carevic

Mayr

2022

Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries

Self Cite

View full text Add to dashboard Cite

show abstract

“…To date, a few studies exist that have attempted to understand the dataset users’ behaviour and intentions (Kacprzak et al. , 2019; Carevic et al. , 2020; Chen et al.…”

Section: Introductionmentioning

confidence: 99%

“…Majority of existing research in the literature focuses on user behaviour for searching textual documents/ web pages, images or videos. Limited research but growing interest exists in the research community to uncover user dataset search behaviour, in light of the vast amount of datasets that is becoming available on the Web due to the Open Data initiatives (Carevic et al. , 2020).…”

Section: Introductionmentioning

confidence: 99%

Large-scale analysis of query logs to profile users for dataset search

Sharifpour

Zhang

2022

View full text Add to dashboard Cite

PurposeWith an explosion of datasets available on the Web, dataset search has gained attention as an emerging research domain. Understanding users' dataset behaviour is imperative for providing effective data discovery services. In this paper, the authors present a study on users' dataset search behaviour through the analysis of search logs from a research data discovery portal.Design/methodology/approachUsing query and session based features, the authors apply cluster analysis to discover distinct user profiles with different search behaviours. One particular behavioural construct of our interest is users' expertise that the authors generate via computing semantic similarity between users' search queries and the title of metadata records in the displayed search results.FindingsThe findings revealed that there are six distinct classes of user behaviours for dataset search, namely; Expert Research, Expert Search, Expert Explore, Novice Research, Novice Search and Novice Explore.Research limitations/implicationsThe user profiles are derived based on analysis of the search log of the research data catalogue in this study. Further research is needed to generalise the user profiles to other dataset search settings. Future research can take on a confirmatory approach to verify these user groups and establish a deeper understanding of their information needs.Practical implicationsThe findings in this paper have implications for designing search systems that tailor search results matching the diverse information needs of different user groups.Originality/valueWe propose for the first time a taxonomy of users for dataset search based on their domain expertise and search behaviour.

show abstract

Learning Domain‐specific Semantic Representation from Weakly Supervised Data to Improve Research Dataset Retrieval

Luo

Hong

Wang

et al. 2022

Proceedings of the Association for Information Science and Tech

View full text Add to dashboard Cite

Along with the development of the data‐driven research paradigm, there are exponentially increasing datasets, which bring challenges to researchers in the efficient retrieval of relevant datasets. Previous studies mainly focused on query expansion methods based on sparse retrieval models to improve the accuracy and recall in retrieval. We investigated the use of semantically rich information to retrieve relevant datasets and the benefits of using domain‐specific dense vector representation as opposed to general representation. First, we used pairs of metadata fields that have semantic relevance to construct the domain‐specific weakly supervised training data. Then, a pre‐trained transformer‐based deep learning model is fine‐tuned on the training data using the contrastive learning method. Finally, dense vector representations of the queries and datasets are obtained based on the fine‐tuned model. The relevance of a dataset to a query is measured by the similarity between the vectors. To evaluate the performance of the proposed model, we collected 104,683 datasets from 13 research data repositories, recruited volunteers to design research‐oriented queries, and annotated the retrieval results. The experimental results show that compared with the domain‐independent fine‐tuned model, our proposed method can improve the NDCG@10 score by about 5%.

show abstract

Characteristics of Dataset Retrieval Sessions: Experiences from a Real-Life Digital Library

Cited by 6 publications

References 11 publications

Studying retrievability of publications and datasets in an integrated retrieval system

Studying retrievability of publications and datasets in an integrated retrieval system

Large-scale analysis of query logs to profile users for dataset search

Learning Domain‐specific Semantic Representation from Weakly Supervised Data to Improve Research Dataset Retrieval

Contact Info

Product

Resources

About