We investigate ways in which to improve the interpretability of LDA topic models by better analyzing and visualizing their outputs. We focus on examining what we refer to as topic similarity networks: graphs in which nodes represent latent topics in text collections and links represent similarity among topics. We describe efficient and effective approaches to both building and labeling such networks. Visualizations of topic models based on these networks are shown to be a powerful means of exploring, characterizing, and summarizing large collections of unstructured text documents. They help to "tease out" non-obvious connections among different sets of documents and provide insights into how topics form larger themes. We demonstrate the efficacy and practicality of these approaches through two case studies: 1) NSF grants for basic research spanning a 14 year period and 2) the entire English portion of Wikipedia.
We present an effective multifaceted system for exploratory analysis of highly heterogeneous document collections. Our system is based on intelligently tagging individual documents in a purely automated fashion and exploiting these tags in a powerful faceted browsing framework. Tagging strategies employed include both unsupervised and supervised approaches based on machine learning and natural language processing. As one of our key tagging strategies, we introduce the KERA algorithm (Keyword Extraction for Reports and Articles). KERA extracts topic-representative terms from individual documents in a purely unsupervised fashion and is revealed to be significantly more effective than state-of-the-art methods. Finally, we evaluate our system in its ability to help users locate documents pertaining to military critical technologies buried deep in a large heterogeneous sea of information.
DEFINITIONSIDA publishes the following documents to report the results of its work. RepotRepots ae the most authoritatie and ost carefully considered products IDA Publishes. They normally embody results of major poject which (a) have a direct beaing on decisions affecting major programs, (b) address issues of slignificant Conc to the Eecutive Banch, the Conress andlor the public, or (c) address issues that have significant economic impficatns. IDA Reports ae rvewed by outside panels of experts to ensure their high quality and rlevle to the problem studiet, and they are released by the Pmesnt of IDA. Group ReportsGroup Repasts mcrd the findings and results of IDA established workdin groups and panels composed of senior individuals addressing major Issues which otherwise would be the sbject of an IDA Repot. IDA Group Reports mre ied bythe senor individuals responsible for the poject and othrs as selected by IDA to ensure their high quality and ramlence to th problem studied, and we released by the PMside of IDA. PapersPapers, also authortteive and carefully considered products of IDA, address studies that am eunvow In scope than thie cnvd In Reports. IDA Papme ar rvwed to emre that they met the high standards expected of refereed papers in profssioul jouMab or formal Alecy repts. DocumentsIDA Documents are used for the conmience of the sponsors or the analysts (a) to reor substantim work done In quick reaction studies, (b) to recor the procedinS of Mconrences and metinls, (c) to maM available preliminaryand blative results of anMlyMs, (d)to rcord data deeloped in the course of an investigation, or (e) to frarmd infomatan that is essentially unamlyzed and unevaluated. The review of IDA Documents is suited to their cotent ani Inteded use.The work preotd In this publication was conducted under IDA's Indepndent Research Proram. Its publication does not Imply endosenent bythe Depastnent of Defe or any other Government Agency, nor should the coitets be construed -rffecting the ofiia position of any Government Agency.IThis Paper has been rcoiloe by IDA to assure that it mas higo tnarso thormghness, obectv, and appropriate analytical mthodology and th the results. conclusions and rccomendations am property supported by the material preted.Approred for public release, unmited distribution. Unclassified. Institutedh for Dnse Analyses AUTHOR(S)T-B5-490 Herbert R. Brown, Robert M. Rolfe PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) B PERFORMING ORGANIZATIONInstitute for Defense Analyses REPORT NUMBER 1801 N. Beauregard St.IDA Paper P-2300 Alexandria, VA 22311-1772 SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES)10. SPONSORING/MONITORING OASD(P&L) AGENCY REPORT NUMBERRoom 2B322, The Pentagon Washington, D.C. 20301-8000 SUPPLEMENTARY NOTES 12a. DISTRIBUTION/AVAILABIUTY STATEMENT 12b. DISTRIBUTION CODEApproved for public release, unlimited distribution. 2A ABSTRACT (Max mum 200 worde)DA Paper P-2300, Joint DoD/Industry Study on Opportunities in Integrated Diagnostics, documents a study which identified DoD opportunities i...
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.