Ai Kawazoe scite author profile

Summary: BioCaster is an ontology-based text mining system for detecting and tracking the distribution of infectious disease outbreaks from linguistic signals on the Web. The system continuously analyzes documents reported from over 1700 RSS feeds, classifies them for topical relevance and plots them onto a Google map using geocoded information. The background knowledge for bridging the gap between Layman's terms and formal-coding systems is contained in the freely available BioCaster ontology which includes information in eight languages focused on the epidemiological role of pathogens as well as geographical locations with their latitudes/longitudes. The system consists of four main stages: topic classification, named entity recognition (NER), disease/location detection and event recognition. Higher order event analysis is used to detect more precisely specified warning signals that can then be notified to registered users via email alerts. Evaluation of the system for topic recognition and entity identification is conducted on a gold standard corpus of annotated news articles.Availability: The BioCaster map and ontology are freely available via a web portal at http://www.biocaster.org.Contact: collier@nii.ac.jp

show abstract

A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

Chanlekha

Kawazoe

Collier

2010

BMC Med Inform Decis Mak

View full text Add to dashboard Cite

BackgroundCurrent public concern over the spread of infectious diseases has underscored the importance of health surveillance systems for the speedy detection of disease outbreaks. Several international report-based monitoring systems have been developed, including GPHIN, Argus, HealthMap, and BioCaster. A vital feature of these report-based systems is the geo-temporal encoding of outbreak-related textual data. Until now, automated systems have tended to use an ad-hoc strategy for processing geo-temporal information, normally involving the detection of locations that match pre-determined criteria, and the use of document publication dates as a proxy for disease event dates. Although these strategies appear to be effective enough for reporting events at the country and province levels, they may be less effective at discovering geo-temporal information at more detailed levels of granularity. In order to improve the capabilities of current Web-based health surveillance systems, we introduce the design for a novel scheme called spatiotemporal zoning.MethodThe proposed scheme classifies news articles into zones according to the spatiotemporal characteristics of their content. In order to study the reliability of the annotation scheme, we analyzed the inter-annotator agreements on a group of human annotators for over 1000 reported events. Qualitative and quantitative evaluation is made on the results including the kappa and percentage agreement.ResultsThe reliability evaluation of our scheme yielded very promising inter-annotator agreement, more than a 0.9 kappa and a 0.9 percentage agreement for event type annotation and temporal attributes annotation, respectively, with a slight degradation for the spatial attribute. However, for events indicating an outbreak situation, the annotators usually had inter-annotator agreements with the lowest granularity location.ConclusionsWe developed and evaluated a novel spatiotemporal zoning annotation scheme. The results of the scheme evaluation indicate that our annotated corpus and the proposed annotation scheme are reliable and could be effectively used for developing an automatic system. Given the current advances in natural language processing techniques, including the availability of language resources and tools, we believe that a reliable automatic spatiotemporal zoning system can be achieved. In the next stage of this work, we plan to develop an automatic zoning system and evaluate its usability within an operational health surveillance system.

show abstract

Classifying disease outbreak reports using n-grams and semantic features

Conway

Doan

Kawazoe

et al. 2009

International Journal of Medical Informatics

View full text Add to dashboard Cite

This paper explores the benefits of using ngrams and semantic features for the classification of disease outbreak reports, in the context of a text mining systemBioCaster -that identifies and tracks emerging infectious disease outbreaks from online news. We show that a combination of bag-of-words features, n-grams and semantic features, in conjunction with feature selection, improves classification accuracy at a statistically significant level when compared to previous work. A novel feature of the work reported in this paper is the use of a semantic tagger -the USAS tagger -to generate features.

show abstract

A multilingual ontology for infectious disease surveillance: rationale, design and challenges

Collier

Kawazoe

Jin

et al. 2007

Lang Resources & Evaluation

View full text Add to dashboard Cite

Kasetsart UniversityAbstract. A lack of surveillance system infrastructure in the Asia-Pacific region is seen as hindering the global control of rapidly spreading infectious diseases such as the recent avian H5N1 epidemic. As part of improving surveillance in the region, the BioCaster project aims to develop a system based on text mining for automatically monitoring Internet news and other online sources in several regional languages. At the heart of the system is an application ontology which serves the dual purpose of enabling advanced searches on the mined facts and of allowing the system to make intelligent inferences for assessing the priority of events. However, it became clear early on in the project that existing classification schemes did not have the necessary language coverage or semantic specificity for our needs. In this article we present an overview of our needs and explore in detail the rationale and methods for developing a new conceptual structure and multilingual terminological resource that focusses on priority pathogens and the diseases they cause. The ontology is made freely available as an online database and downloadable OWL file.

show abstract

The role of roles in classifying annotated biomedical text

Doan¹,

Kawazoe²,

Collier³

2007

View full text Add to dashboard Cite

This paper investigates the roles of named entities (NE's) in annotated biomedical text classification. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology. Concepts were classified as Types, while others were identified as being Roles. Types are specified as NE classes and Roles are integrated into NEs as attributes. We focus on the Roles of NEs by extracting and using them in different ways as features in the classifier. Experimental results show that: 1) Roles for each NE greatly helped improve performance of the system, 2) combining information about NE classes with their Roles contribute significantly to the improvement of performance. We discuss in detail the effect of each Role on the accuracy of text classification.

show abstract

Towards role-based filtering of disease outbreak reports

Doan

Kawazoe

Conway

et al. 2009

Journal of Biomedical Informatics

View full text Add to dashboard Cite

This paper explores the role of named entities (NEs) in the classification of disease outbreak report. In the annotation schema of BioCaster, a text mining system for public health protection, important concepts that reflect information about infectious diseases were conceptually analyzed with a formal ontological methodology and classified into types and roles. Types are specified as NE classes and roles are integrated into NEs as attributes such as a chemical and whether it is being used as a therapy for some infectious disease. We focus on the roles of NEs and explore different ways to extract, combine and use them as features in a text classifier. In addition, we investigate the combination of roles with semantic categories of disease-related nouns and verbs. Experimental results using naïve Bayes and Support Vector Machine (SVM) algorithms show that: (1) roles in combination with NEs improve performance in text classification, (2) roles in combination with semantic categories of noun and verb features contribute substantially to the improvement of text classification. Both these results were statistically significant compared to the baseline "raw text" representation. We discuss in detail the effects of roles on each NE and on semantic categories of noun and verb features in terms of accuracy, precision/recall and F-score measures for the text classification task.

show abstract

Structuring an event ontology for disease outbreak detection

et al. 2008

View full text Add to dashboard Cite

Background: This paper describes the design of an event ontology being developed for application in the machine understanding of infectious disease-related events reported in natural language text. This event ontology is designed to support timely detection of disease outbreaks and rapid judgment of their alerting status by 1) bridging a gap between layman's language used in disease outbreak reports and public health experts' deep knowledge, and 2) making multi-lingual information available.

show abstract

An Inference Problem Set for Evaluating Semantic Theories and Semantic Processing Systems for Japanese

Kawazoe

Tanaka

Mineshima

et al. 2017

View full text Add to dashboard Cite

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ai Kawazoe

BioCaster: detecting public health rumors with a Web-based text mining system

A framework for enhancing spatial and temporal granularity in report-based health surveillance systems

Classifying disease outbreak reports using n-grams and semantic features

A multilingual ontology for infectious disease surveillance: rationale, design and challenges

The role of roles in classifying annotated biomedical text

Towards role-based filtering of disease outbreak reports

Structuring an event ontology for disease outbreak detection

An Inference Problem Set for Evaluating Semantic Theories and Semantic Processing Systems for Japanese

Contact Info

Product

Resources

About