In this paper, experiments on automatic extraction of keywords from abstracts using a supervised machine learning algorithm are discussed. The main point of this paper is that by adding linguistic knowledge to the representation (such as syntactic features), rather than relying only on statistics (such as term frequency and ngrams), a better result is obtained as measured by keywords previously assigned by professional indexers. In more detail, extracting NP-chunks gives a better precision than n-grams, and by adding the POS tag(s) assigned to the term as a feature, a dramatic improvement of the results is obtained, independent of the term selection approach applied.
In the field of syndromic surveillance, various sources are exploited for outbreak detection, monitoring and prediction. This paper describes a study on queries submitted to a medical web site, with influenza as a case study. The hypothesis of the work was that queries on influenza and influenza-like illness would provide a basis for the estimation of the timing of the peak and the intensity of the yearly influenza outbreaks that would be as good as the existing laboratory and sentinel surveillance. We calculated the occurrence of various queries related to influenza from search logs submitted to a Swedish medical web site for two influenza seasons. These figures were subsequently used to generate two models, one to estimate the number of laboratory verified influenza cases and one to estimate the proportion of patients with influenza-like illness reported by selected General Practitioners in Sweden. We applied an approach designed for highly correlated data, partial least squares regression. In our work, we found that certain web queries on influenza follow the same pattern as that obtained by the two other surveillance systems for influenza epidemics, and that they have equal power for the estimation of the influenza burden in society. Web queries give a unique access to ill individuals who are not (yet) seeking care. This paper shows the potential of web queries as an accurate, cheap and labour extensive source for syndromic surveillance.
This paper presents a study on if and how automatically extracted keywords can be used to improve text categorization. In summary we show that a higher performance -as measured by micro-averaged F-measure on a standard text categorization collection -is achieved when the full-text representation is combined with the automatically extracted keywords. The combination is obtained by giving higher weights to words in the full-texts that are also extracted as keywords. We also present results for experiments in which the keywords are the only input to the categorizer, either represented as unigrams or intact. Of these two experiments, the unigrams have the best performance, although neither performs as well as headlines only.
Background:The rapidly increasing dissemination of carbapenem-resistant Enterobacteriaceae (CRE) in both humans and animals poses a global threat to public health. However, the transmission of CRE between humans and animals has not yet been well studied.Objectives:We investigated the prevalence, risk factors, and drivers of CRE transmission between humans and their backyard animals in rural China.Methods:We conducted a comprehensive sampling strategy in 12 villages in Shandong, China. Using the household [residents and their backyard animals (farm and companion animals)] as a single surveillance unit, we assessed the prevalence of CRE at the household level and examined the factors associated with CRE carriage through a detailed questionnaire. Genetic relationships among human- and animal-derived CRE were assessed using whole-genome sequencing–based molecular methods.Results:A total of 88 New Delhi metallo-β-lactamases–type carbapenem-resistant Escherichia coli (NDM-EC), including 17 from humans, 44 from pigs, 12 from chickens, 1 from cattle, and 2 from dogs, were isolated from 65 of the 746 households examined. The remaining 12 NDM-EC were from flies in the immediate backyard environment. The NDM-EC colonization in households was significantly associated with a) the number of species of backyard animals raised/kept in the same household, and b) the use of human and/or animal feces as fertilizer. Discriminant analysis of principal components (DAPC) revealed that a large proportion of the core genomes of the NDM-EC belonged to strains from hosts other than their own, and several human isolates shared closely related core single-nucleotide polymorphisms and blaNDM genetic contexts with isolates from backyard animals.Conclusions:To our knowledge, we are the first to report evidence of direct transmission of NDM-EC between humans and animals. Given the rise of NDM-EC in community and hospital infections, combating NDM-EC transmission in backyard farm systems is needed. https://doi.org/10.1289/EHP5251
SUMMARYFor the purpose of developing a national system for outbreak surveillance, local outbreak signals were compared in three sources of syndromic data – telephone triage of acute gastroenteritis, web queries about symptoms of gastrointestinal illness, and over-the-counter (OTC) pharmacy sales of antidiarrhoeal medication. The data sources were compared against nine known waterborne and foodborne outbreaks in Sweden in 2007–2011. Outbreak signals were identified for the four largest outbreaks in the telephone triage data and the two largest outbreaks in the data on OTC sales of antidiarrhoeal medication. No signals could be identified in the data on web queries. The signal magnitude for the fourth largest outbreak indicated a tenfold larger outbreak than officially reported, supporting the use of telephone triage data for situational awareness. For the two largest outbreaks, telephone triage data on adult diarrhoea provided outbreak signals at an early stage, weeks and months in advance, respectively, potentially serving the purpose of early event detection. In conclusion, telephone triage data provided the most promising source for surveillance of point-source outbreaks.
SUMMARYAn evaluation was conducted to determine which syndromic surveillance tools complement traditional surveillance by serving as earlier indicators of influenza activity in Sweden. Web queries, medical hotline statistics, and school absenteeism data were evaluated against two traditional surveillance tools. Cross-correlation calculations utilized aggregated weekly data for all-age, nationwide activity for four influenza seasons, from 2009/2010 to 2012/2013. The surveillance tool indicative of earlier influenza activity, by way of statistical and visual evidence, was identified. The web query algorithm and medical hotline statistics performed equally well as each other and to the traditional surveillance tools. School absenteeism data were not reliable resources for influenza surveillance. Overall, the syndromic surveillance tools did not perform with enough consistency in season lead nor in earlier timing of the peak week to be considered as early indicators. They do, however, capture incident cases before they have formally entered the primary healthcare system.
ObjectivesTo characterize the mobile colistin resistance gene mcr-5 in Aeromonas hydrophila from backyard pigs in rural areas of China.MethodsPig faecal samples from 194 households were directly tested for the presence of mcr-5 by PCR assay and the phenotypic antimicrobial susceptibility profiles of the mcr-5-positive isolates were determined using the broth dilution method. The genomic location and transferability of mcr-5 were analysed by S1-PFGE with Southern blotting and DNA hybridization, and natural transformation, respectively. One strain isolated from an mcr-5-positive sample was subjected to WGS and the stability of the mcr-5-harbouring plasmid over successive generations was examined by subculturing.ResultsOne mcr-5-positive A. hydrophila isolate showing resistance, with a colistin MIC of 4 mg/L, was isolated from a backyard pig faecal sample. mcr-5 was located on a 7915 bp plasmid designated pI064-2, which could naturally transform into a colistin-susceptible A. hydrophila strain of porcine origin and mediated colistin resistance in both the original isolate and its transformants. The plasmid backbone (3790 bp) of pI064-2 showed 81% nucleotide sequence identity to the corresponding region of the ColE2-type plasmid pAsa1 from Aeromonas salmonicida, while similar replication primases are widely distributed among aeromonads, Enterobacteriaceae and Pseudomonas species.ConclusionsTo the best of our knowledge, this is the first identification of the novel colistin resistance gene mcr-5 in an A. hydrophila isolate from the faeces of a backyard pig. mcr-5 is expected to be able to disseminate among different bacterial species and genera.
BackgroundThe assumption behind the presented work is that the information people search for on the internet reflects the disease status in society. By having access to this source of information, epidemiologists can get a valuable complement to the traditional surveillance and potentially get new and timely epidemiological insights. For this purpose, the Swedish Institute for Infectious Disease Control collaborates with a medical web site in Sweden.MethodsWe built an application consisting of two conceptual parts. One part allows for trends, based on user specified requests, to be extracted from anonymous web query data from a Swedish medical web site. The second conceptual part permits tailored analyses of particular diseases, where more complex statistical methods are applied to the data. To evaluate the epidemiological relevance of the output, we compared Google search data and search data from the medical web site.ResultsIn the paper, we give concrete examples of the output from the web query-based system. We also present results from the comparison between data from the search engine Google and search data from the national medical web site.ConclusionsThe application is in regular use at the Swedish Institute for Infectious Disease Control. A system based on web queries is flexible in that it can be adapted to any disease; we get information on other individuals than those who seek medical care; and the data do not suffer from reporting delays. Although Google data are based on a substantially larger search volume, search patterns obtained from the medical web site may still convey more information from an epidemiological perspective. Furthermore we can see advantages with having full access to the raw data.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.