2020
DOI: 10.48550/arxiv.2010.05987
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

Abstract: With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus. Clinicians, researchers, and policymakers need to be able to search these articles effectively. In this work, we present a zeroshot ranking algorithm that adapts to COVIDrelated scientific literature. Our approach filters training data from another collection down to medical-related queries, uses a neural reranking model pre-trained on scie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2022
2022

Publication Types

Select...
1
1

Relationship

0
2

Authors

Journals

citations
Cited by 2 publications
(4 citation statements)
references
References 7 publications
0
4
0
Order By: Relevance
“…This prompted the creation of the Covid-19 Open Research Dataset (CORD-19) [81] and the TREC-COVID [60,73] benchmarking effort on one hand, and a flurry of new research and development of IR systems specializing on this task [9,69,80] on the other. In particular, MacAvaney et al [48,49] created Med-MARCO, a subset of the MS MARCO dataset that are related to medical questions. Subsequently, several groups benchmarking on TREC-COVID employed this subset for model training [46,83,88,89], while others explored finetuning on the full MS MARCO for this task [5,40,46,53,63,71].…”
Section: On Transfer Learning From Ms Marco To Other Ir Benchmarksmentioning
confidence: 99%
See 1 more Smart Citation
“…This prompted the creation of the Covid-19 Open Research Dataset (CORD-19) [81] and the TREC-COVID [60,73] benchmarking effort on one hand, and a flurry of new research and development of IR systems specializing on this task [9,69,80] on the other. In particular, MacAvaney et al [48,49] created Med-MARCO, a subset of the MS MARCO dataset that are related to medical questions. Subsequently, several groups benchmarking on TREC-COVID employed this subset for model training [46,83,88,89], while others explored finetuning on the full MS MARCO for this task [5,40,46,53,63,71].…”
Section: On Transfer Learning From Ms Marco To Other Ir Benchmarksmentioning
confidence: 99%
“…In a meta-analysis of participating runs in the TREC-COVID challenge, Chen and Hersh [9] found the use of MS MARCO dataset for finetuning to be associated with higher retrieval performance. Similar to Med-MARCO [48,49], Hamzei et al [31] studies place-related subset of the MS MARCO dataset. Another interesting case study in this context is the application of MS MARCO to conversational search where it has been useful for both creation of new benchmarks [17,18,59] and model training [26, 39, 50, 70, 77-79, 84, 86].…”
Section: On Transfer Learning From Ms Marco To Other Ir Benchmarksmentioning
confidence: 99%
“…These datasets are generally constructed by a set of human curators who were provided with a list of queries (or questions) and a set of supposedly relevant documents, and the goal was to select the most pertinent documents for each query. In addition, multiple datasets have been used to train question answering models such as COVIDQA [ 11 ], COVID-19 Questions [ 12 ], COVID-QA [ 13 ], InfoBot Dataset [ 14 ], MS-MARCO [ 15 ], Med-MARCO [ 16 ], Natural Questions [ 17 ], SQuAD [ 18 ], BioASQ [ 10 ], M-CID [ 19 ] and QuAC [ 20 ]. Other datasets were used to train document summarization models.…”
Section: Datasetsmentioning
confidence: 99%
“… Granularity/Levels of Representations : We also noticed that the works used different levels of granularity, which depends on the intended tasks and the available computational resources. For example, to achieve the task of document retrieval, some works opted for simple document level representations [ 53 ], while other works either used more granular representations [ 12 , 32 , 37 , 40 , 50 , 54 , 55 , 56 ] or a mix of more granular representations with document level representations [ 16 , 24 , 38 , 39 , 44 , 51 , 52 , 57 ]. Using KGs : Knowledge graphs were used in multiple works for different purposes.…”
Section: Exploratory Search Applicationsmentioning
confidence: 99%