We participated (as Team 9) in the Article Classification Task of the Biocreative II.5 Challenge: binary classification of full-text documents relevant for protein-protein interaction. We used two distinct classifiers for the online and offline challenges: 1) the lightweight Variable Trigonometric Threshold (VTT) linear classifier we successfully introduced in BioCreative 2 for binary classification of abstracts and 2) a novel Naive Bayes classifier using features from the citation network of the relevant literature. We supplemented the supplied training data with full-text documents from the MIPS database. The lightweight VTT classifier was very competitive in this new full-text scenario: it was a top-performing submission in this task, taking into account the rank product of the Area Under the interpolated precision and recall Curve, Accuracy, Balanced F-Score, and Matthew's Correlation Coefficient performance measures. The novel citation network classifier for the biomedical text mining domain, while not a top performing classifier in the challenge, performed above the central tendency of all submissions, and therefore indicates a promising new avenue to investigate further in bibliome informatics.
Global climate change (GCC) is projected to bring higher-intensity precipitation and highervariability temperature regimes to the Northeastern United States. The interactive effects of GCC with anthropogenic land use and land cover changes (LULCCs) are unknown for watershed level hydrological dynamics and nutrient fluxes to freshwater lakes. Increased nutrient fluxes can promote harmful algal blooms, also exacerbated by warmer water temperatures due to GCC. To address the complex interactions of climate, land and humans, we developed a cascading integrated assessment model to test the impacts of GCC and LULCC on the hydrological regime, water temperature, water quality, bloom duration and severity through 2040 in transnational Lake Champlain's Missisquoi Bay. Temperature and precipitation inputs were statistically downscaled from four global circulation models (GCMs) for three Representative Concentration Pathways. An agent-based model was used to generate four LULCC scenarios. Combined climate and LULCC scenarios drove a distributed hydrological model to estimate river discharge and nutrient input to the lake. Lake nutrient dynamics were simulated with a 3D hydrodynamic-biogeochemical model. We find accelerated GCC could drastically limit land management options to maintain water quality, but the nature and severity of this impact varies dramatically by GCM and GCC scenario.
Background Driven by the COVID-19 pandemic and the dire need to discover an antiviral drug, we explored the landscape of the SARS-CoV-2 biomedical publications to identify potential treatments. Objective The aims of this study are to identify off-label drugs that may have benefits for the coronavirus disease pandemic, present a novel ranking algorithm called CovidX to recommend existing drugs for potential repurposing, and validate the literature-based outcome with drug knowledge available in clinical trials. Methods To achieve such objectives, we applied natural language processing techniques to identify drugs and linked entities (eg, disease, gene, protein, chemical compounds). When such entities are linked, they form a map that can be further explored using network science tools. The CovidX algorithm was based upon a notion that we called “diversity.” A diversity score for a given drug was calculated by measuring how “diverse” a drug is calculated using various biological entities (regardless of the cardinality of actual instances in each category). The algorithm validates the ranking and awards those drugs that are currently being investigated in open clinical trials. The rationale behind the open clinical trial is to provide a validating mechanism of the PubMed results. This ensures providing up to date evidence of the fast development of this disease. Results From the analyzed biomedical literature, the algorithm identified 30 possible drug candidates for repurposing, ranked them accordingly, and validated the ranking outcomes against evidence from clinical trials. The top 10 candidates according to our algorithm are hydroxychloroquine, azithromycin, chloroquine, ritonavir, losartan, remdesivir, favipiravir, methylprednisolone, rapamycin, and tilorone dihydrochloride. Conclusions The ranking shows both consistency and promise in identifying drugs that can be repurposed. We believe, however, the full treatment to be a multifaceted, adjuvant approach where multiple drugs may need to be taken at the same time.
a b s t r a c t 29The importance of searching biomedical literature for drug interaction and side-effects is apparent. 30 Current digital libraries (e.g., PubMed) suffer infrequent tagging and metadata annotation updates. 31 Such limitations cause absence of linking literature to new scientific evidence. This demonstrates a great 32 deal of challenges that stand in the way of scientists when searching biomedical repositories. In this 33 paper, we present a network mining approach that provides a bridge for linking and searching 34 drug-related literature. Our contributions here are two fold: (1) an efficient algorithm called 35 HashPairMiner to address the run-time complexity issues demonstrated in its predecessor algorithm: 36 HashnetMiner, and (2) a database of discoveries hosted on the web to facilitate literature search using 37 the results produced by HashPairMiner. Though the K-H network model and the HashPairMiner algorithm 38 are fairly young, their outcome is evidence of the considerable promise they offer to the biomedical 39 science community in general and the drug research community in particular.40
Great efforts are now underway to control the coronavirus 2019 disease (COVID-19). Millions of people are medically examined, and their data keep piling up awaiting classification. The data are typically both incomplete and heterogeneous which hampers classical classification algorithms. Some researchers have recently modified the popular K NN algorithm as a solution, where they handle incompleteness by imputation and heterogeneity by converting categorical data into numbers. In this article, we introduce a novel K NN variant (K NNV) algorithm that provides better results as demonstrated by thorough experimental work. We employ rough set theoretic techniques to handle both incompleteness and heterogeneity, as well as to find an ideal value for K. The K NNV algorithm takes an incomplete, heterogeneous dataset, containing medical records of people, and identifies those cases with COVID-19. We use in the process two popular distance metrics, Euclidean and Mahalanobis, in an effort to widen the operational scope. The K NNV algorithm is implemented and tested on a real dataset from the Italian Society of Medical and Interventional Radiology. The experimental results show that it can efficiently and accurately classify COVID-19 cases. It is also compared to three K NN derivatives. The comparison results show that it greatly outperforms all its competitors in terms of four metrics: precision, recall, accuracy, and F-Score. The algorithm given in this article can be easily applied to classify other diseases. Moreover, its methodology can be further extended to do general classification tasks outside the medical field.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.