Analyzing more than 1M citations to better understand scientific research on COVID-19

Tue Mar 24 2020

COVID-19 has caused worldwide anxiety and thousands of deaths. Each day, new data and research are published. With the high level of activity and collaboration worldwide, what gets published today may be outdated the week after. How do researchers, medical professionals, policymakers, and the public keep up to date and informed?

At scite, we have created an easy way for anyone to see how a scientific article has been cited and, specifically, if it has been supported or contradicted by subsequent research. We do this by analyzing millions of full-text publications, extracting the citation statements from these publications, and then classifying these as supporting or contradicting evidence.

Recently, to help the world make more sense of COVID-19 research, we turned our attention and novel functionality to COVID-19 papers and preprints.

How scite helps trace the impact and reliability of COVID-19 research

In order to analyze research on COVID-19 and coronavirus in general, we identified relevant publications using the CORD-19 dataset, a list of papers and preprints compiled from a variety of publishers and databases. From this list, we were able to download 20,268 PDFs from publishers. After processing these documents, we found that 16,775 had citation statements (and references) we could extract, amounting to 1,266,672 citation statements in total. We then applied our deep learning model to identify citation statements as supporting, contradicting, or mentioning and added these to our database to make them discoverable. We’ve also released all citation tallies openly and citations statements from open documents on Zenodo.

How does this help?

To show the utility of scite when looking at COVID-19 papers, let’s take a single example report — “Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia” — published in the New England Journal of Medicine.

Could the reported behavior of the virus in the Wuhan population be replicated elsewhere? Looking at the article on the publisher’s website, it’s unclear and finding out would require someone to read each paper citing it, a massive amount of work.

With scite, we’ve made it easy to see how this article has been cited (Figure 1). The approach scite uses found the article to be supported nine times.

To just name a few:

  • Kaplan created a model that could reproduce the same R0 as reported in the study — and seems to be in line with other studies using different methods.
  • Kaiyuan Sun et al. could reproduce the reported incubation period using crowdsourced data.
  • Natalie M. Linton et al. showed similar results for the incubation period and other time intervals that govern the epidemiological dynamics of COVID-19 infections using publicly available event-date data

About 4 studies contradicted aspects of the article:

  • Mizumoto et al reported a higher reproduction number of the virus in the confined setting of the cruise ship Diamond Princess
  • Biao Tang et al estimate a higher reproduction number using a deterministic compartmental model that was devised based on the clinical progression of the disease, epidemiological status of the individuals, and intervention measures.

Search citation contexts

Aside from identifying citations that make a claim about an article, scite offers many other features. Users can search through all citation contexts, authors, and titles using the search field in the top right-hand corner (Figure 2). This can, for example, be useful if researchers want to find other modeling studies using the epidemiological data reported in the NEJM paper. Users can easily find 31 citation statements featuring the word “model” in the title, in the citation context itself or in the section header in the paper where the citation originated from!

Use the scite plugin to integrate smart citations into your workflow

We believe the recent coronavirus outbreak is a global challenge that can only be tackled by collaboration. By continuously analyzing new COVID-19 papers, we hope to keep the scientific community up to date with Smart Citations — citations that display the context of the citation and describe whether the paper provides supporting or contradicting evidence — on research that can make a difference. We will update the Zenodo dataset regularly and widely share new versions as they become available.

To allow for more seamless integration into scientists’ workflow, scite offers a free plugin for Chrome and Firefox. With this plugin, users can see Smart Citations with citation counts from scite on every website with a scientific paper.

Let us know what you think

Let us know about any features missing–our team of developers is working day in and day out to make scite more and more useful.