Microsoft Academic is a free academic search engine and citation index that is similar to Google Scholar but can be automatically queried. Its data is potentially useful for bibliometric analysis if it is possible to search effectively for individual journal articles. This article compares different methods to find journal articles in its index by searching for a combination of title, authors, publication year and journal name and uses the results for the widest published correlation analysis of Microsoft Academic citation counts for journal articles so far. Based on 126,312 articles from 323 Scopus subfields in 2012, the optimal strategy to find articles with DOIs is to search for them by title and filter out those with incorrect DOIs. This finds 90% of journal articles. For articles without DOIs, the optimal strategy is to search for them by title and then filter out matches with dissimilar metadata. This finds 89% of journal articles, with an additional 1% incorrect matches. The remaining articles seem to be mainly not indexed by Microsoft Academic or indexed with a different language version of their title. From the matches, Scopus citation counts and Microsoft Academic counts have an average Spearman correlation of 0.95, with the lowest for any single field being 0.63. Thus, Microsoft Academic citation counts are almost universally equivalent to Scopus citation counts for articles that are not recent but there are national biases in the results.
IntroductionCitation-based indicators frequently support formal and informal research evaluations (Wilsdon, Allen, Belfiore, Campbell, Curry, Hill, et al. 2015). They are typically gathered from Scopus or the Web of Science (WoS), both of which index large numbers of journal articles and some other document types. Previous research has found Google Scholar to return higher citation counts than Scopus and WoS for most fields (Falagas, Pitsouni, Malietzis, & Pappas, 2008;Halevi, Moed, & Bar-Ilan, 2017) because of its inclusion of open access online publications in addition to publisher databases. It is not possible to use Google Scholar for large-scale citation analyses because it does not allow automatic data harvesting (Halevi, Moed, & Bar-Ilan, 2017), except for individual academics through the Publish or Perish software (Harzing, 2007). Microsoft Academic, which was officially released in July 2017, is like Google Scholar in its coverage of academic literature, harvesting from publishers and the open web (Harzing & Alakangas, 2017ab;Paszcza, 2016; Thelwall, in press-a, submitted; but allows automatic data harvesting. It is therefore a promising source of citation data for large scale citation analyses. It should be especially useful for fields with many online publications and for recently-published research since it includes citations from preprints (Thelwall, in press-a, submitted). Nevertheless, one important limitation is that it does not allow DOI searches (Hug, Ochsner, & Brändle, 2017) and so it is not clear whether it is possible to obtain reasonably compr...