No abstract
The World Wide Web is revolutionizing the way that researchers access scientific information. Articles are increasingly being made available on the homepages of authors or institutions, at journal Web sites, or in online archives. However, scientific information on the Web is largely disorganized. This article introduces the creation of digital libraries incorporating Autonomous Citation Indexing (ACI). ACI autonomously creates citation indices similar to the Science Citation Index R. An ACI system autonomously locates articles, extracts citations, identifies identical citations that occur in different formats, and identifies the context of citations in the body of articles. ACI can organize the literature and provide most of the advantages of traditional citation indices, such as literature search using citation links, and the evaluation of articles based on citation statistics. Furthermore, ACI can provide significant advantages over traditional citation indices. No manual effort is required for indexing, which should result in a reduction in cost and an increase in the availability of citation indices. An ACI system can also provide more comprehensive and up-to-date indices of the literature by indexing articles on the Web, technical reports, conference papers, etc. Furthermore, ACI makes it easy to browse the context of citations to given articles, allowing researchers to quickly and easily see what subsequent researchers have said about a given article. Digital libraries incorporating ACI may significantly improve scientific dissemination and feedback.
Due to the ease of electronic dissemination, the world of scientific literature on the Web has grown rapidly, becoming a large, highly current database of published research. This acceleration of publication has exacerbated the difficulty researchers face keeping up to date on relevant new research trends. We believe that automatic tools to help researchers keep up with the latest relevant publications will be increasingly important in the future. One such tool, CiteSeer, is an automatic generator of scientific literature databases. CiteSeer uses sophisticated acquisition, parsing, and presentation methods to eliminate most of the manual effort required to perform a literature survey of publications on the Web. It also includes a personalized recommendation system that uses browsing behavior and automatic learning to adapt to individual research interests, even as they change over time. CiteSeer can pro-actively recommend new relevant research papers as they appear on the Web as well as discover new citations, keywords, and authors that may be indicative of novel research trends of interest to the user.
The web has greatly improved access to scientific literature. However, scientific articles on the web are largely disorganized, with research articles being spread across archive sites, institution sites, journal sites, and researcher homepages. No index covers all of the available literature, and the major web search engines typically do not index the content of Postscript/PDF documents at all. This paper discusses the creation of digital libraries of scientific literature on the web, including the efficient location of articles, full-text indexing of the articles, autonomous citation indexing, information extraction, display of query-sensitive summaries and citation context, hubs and authorities computation, similar document detection, user profiling, distributed error correction, graph analysis, and detection of overlapping documents. The software for the system is available at no cost for non-commercial use.
Scientific literature is increasingly becoming available on the World Wide Web. This paper considers the matching of citations found in different papers in order to autonomously construct a citation index from papers in electronic format. Citation indices of scientific literature have traditionally been constructed manually, partly because it can be difficult to autonomously determine if two citations refer to the same paper (citations can be written in many different formats). We present four algorithms for autonomous citation matching. The algorithms are based on edit-distance computation, word matching, word and phrase matching, and subfield extraction. The word and phrase matching algorithm obtains the lowest error rate, and the subfield algorithm is the most computationally efficient. We quantitatively compare the accuracy and efficiency of the algorithms on a number of datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.