Purpose
Citation contexts have been found useful in many scenarios. However, existing context-based recommendations ignored the importance of diversity in reducing the redundant issues and thus cannot cover the broad range of user interests. To address this gap, the paper aims to propose a novelty task that can recommend a set of diverse citation contexts extracted from a list of citing articles. This will assist users in understanding how other scholars have cited an article and deciding which articles they should cite in their own writing.
Design/methodology/approach
This research combines three semantic distance algorithms and three diversification re-ranking algorithms for the diversifying recommendation based on the CiteSeerX data set and then evaluates the generated citation context lists by applying a user case study on 30 articles.
Findings
Results show that a diversification strategy that combined “word2vec” and “Integer Linear Programming” leads to better reading experience for participants than other diversification strategies, such as CiteSeerX using a list sorted by citation counts.
Practical implications
This diversifying recommendation task is valuable for developing better systems in information retrieval, automatic academic recommendations and summarization.
Originality/value
The originality of the research lies in the proposal of a novelty task that can recommend a diversification context list describing how other scholars cited an article, thereby making citing decisions easier. A novel mixed approach is explored to generate the most efficient diversifying strategy. Besides, rather than traditional information retrieval evaluation, a user evaluation framework is introduced to reflect user information needs more objectively.
Modern scientific research is characterized with sharing datasets and reusing data for developing new models and theories. This paper describes a study to identify research articles with data use and reuse information. Applying a bootstrapping-based unsupervised training strategy, we were able to develop text patterns automatically out of a large training collection of research articles. These patterns were then used to distinguish articles with data use and reuse from those without data usage. Our experiments using Computer Science literature showed that the identification could achieve more than 85% pattern extensibility. We also demonstrate how the results of the identification could be utilized to gain insights on data sharing and reuse in a scientific field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.