2023
DOI: 10.48550/arxiv.2301.10140
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

The Semantic Scholar Open Data Platform

Abstract: The volume of scientific output is creating an urgent need for automated tools to help scientists keep up with developments in their field. Semantic Scholar ( S2) is an open data platform and website aimed at accelerating science by helping scholars discover and understand scientific literature. We combine public and proprietary data sources using state-of-theart techniques for scholarly PDF content extraction and automatic knowledge graph construction to build the Semantic Scholar Academic Graph, the largest … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(7 citation statements)
references
References 9 publications
0
4
0
Order By: Relevance
“…A major aspect of Semantic Scholar is to integrate machine learning methods to enhance data quality and search. They did, for instance, develop a system for publication deduplication named S2APLER and perform citation linking based on fuzzy text-matching heuristics (Kinney et al, 2023). Data is provided free and open by Semantic Scholar and can be accessed via API.…”
Section: Related Workmentioning
confidence: 99%
See 1 more Smart Citation
“…A major aspect of Semantic Scholar is to integrate machine learning methods to enhance data quality and search. They did, for instance, develop a system for publication deduplication named S2APLER and perform citation linking based on fuzzy text-matching heuristics (Kinney et al, 2023). Data is provided free and open by Semantic Scholar and can be accessed via API.…”
Section: Related Workmentioning
confidence: 99%
“…Bibliometric and citation analyses are performed with the aid of providers of bibliometric data and are based on the structured data they provide. Semantic Scholar (Kinney et al, 2023) or Crossref (Hendricks, Tkaczyk, Lin, & Feeney, 2020), for instance, provide powerful application programming interfaces (API) to access the already pre-processed data about millions of articles. Such infrastructure should also build the basis to integrate software in bibliometric analyses.…”
Section: Introductionmentioning
confidence: 99%
“…The Web of Science was the database selected to obtain the papers. There are several other options, such as dimensions (Hook et al, 2018), Semantic Scholar (Kinney et al, 2023), OpenAlex (Priem et al, 2022), and Scopus. Among these options, the Web of Science and Scopus include many multidisciplinary, international, and peer-reviewed journals and conferences.…”
Section: Information Sourcesmentioning
confidence: 99%
“…To generate the replay-captions, we first parse titles of ecology papers corresponding to the query “Serengeti+Wildlife” with the Semantic Scholar API (55). We then use the Rapid Automatic Keyword Extraction (Rake (56)) on all titles to keep only keywords from them.…”
Section: Caption Templatesmentioning
confidence: 99%