Database tomography is an information extraction and analysis system which operates on textual databases. Its primary use to date has been to identify pervasive technical thrusts and themes, and the interrelationships among these themes and sub-themes, which are intrinsic to large textual databases. Its two main algorithmic components are multiword phrase frequency analysis and phrase proximity analysis. This paper shows how database tomography can be used to enhance information retrieval from large textual databases through the newly developed process of simulated nucleation. The principles of simulated nucleation are presented, and the advantages for information retrieval are delineated. An application is described of developing, from Science Citation Index and Engineering Compendex, a database of journal articles focused on near-Earth space science and technology.
Database Tomography (DT) is a textual database analysis system consisting of two major components: 1) Algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment 2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a hypersonic/supersonic flow (HSF) database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain expert provided the pervasive technical themes of the HSF database, and the phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the HSF literature supplemented the DT results with author/ journal/institution publication and citation data. Comparisons of HSF results with past analyses of similarly structured near-earth space and Chemistry databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains.
This paper shows how Database Tomography can be used to derive technical intelligence from the published literature. Database Tomography is a patented system for analyzing large amounts of textual computerized material. It includes algorithms-for extracting multi-word phrase frequencies and performing phrase proximity analyses. Phrase frequency analysis provides the pervasive themes of a database, and the phrase proximity analysis provides the relationships among the pervasive themes, and between the pervasive themes and sub-themes. One potential applicafion of Database Tomography is to obtain the thrusts and interrelationships of a technical field from papers published in the literature within that field. This paper provides applications of Database Tomography to analyses of both the non-technical field of Research lmpact Assessment (RIA) and the technical field of Chemistry. A database of relevant RIA articles was analyzed to produce characteristics and key features of the R1A field. The recent prolific RIA authors, the journals prolific in RIA papers, the prolific institutions in R1A, the prolific keywords specified by the authors, and the authors whose works are cited most prolifically as well as the particular papers/journals/institutions cited most prolifically, are identified. The pervasive themes of RIA are identified through multi-word phrase analyses of the database. A phrase proximity analysis of the database shows the relationships among the pervasive themes, and the relationships between the pervasive themes and subthemes. A similar process was applied to Chemistry, with the exception that the database was limited to one year's issues of the Journal of the American Chemical Society. Wherever possible, the R1A and Chemistry results were compared. Finally, the conceptual use of Database Tomography to help identify promising research directions was discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.