Ayush Singhal scite author profile

The practice of precision medicine will ultimately require databases of genes and mutations for healthcare providers to reference in order to understand the clinical implications of each patient’s genetic makeup. Although the highest quality databases require manual curation, text mining tools can facilitate the curation process, increasing accuracy, coverage, and productivity. However, to date there are no available text mining tools that offer high-accuracy performance for extracting such triplets from biomedical literature. In this paper we propose a high-performance machine learning approach to automate the extraction of disease-gene-variant triplets from biomedical literature. Our approach is unique because we identify the genes and protein products associated with each mutation from not just the local text content, but from a global context as well (from the Internet and from all literature in PubMed). Our approach also incorporates protein sequence validation and disease association using a novel text-mining-based machine learning approach. We extract disease-gene-variant triplets from all abstracts in PubMed related to a set of ten important diseases (breast cancer, prostate cancer, pancreatic cancer, lung cancer, acute myeloid leukemia, Alzheimer’s disease, hemochromatosis, age-related macular degeneration (AMD), diabetes mellitus, and cystic fibrosis). We then evaluate our approach in two ways: (1) a direct comparison with the state of the art using benchmark datasets; (2) a validation study comparing the results of our approach with entries in a popular human-curated database (UniProt) for each of the previously mentioned diseases. In the benchmark comparison, our full approach achieves a 28% improvement in F1-measure (from 0.62 to 0.79) over the state-of-the-art results. For the validation study with UniProt Knowledgebase (KB), we present a thorough analysis of the results and errors. Across all diseases, our approach returned 272 triplets (disease-gene-variant) that overlapped with entries in UniProt and 5,384 triplets without overlap in UniProt. Analysis of the overlapping triplets and of a stratified sample of the non-overlapping triplets revealed accuracies of 93% and 80% for the respective categories (cumulative accuracy, 77%). We conclude that our process represents an important and broadly applicable improvement to the state of the art for curation of disease-gene-variant relationships.

show abstract

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature

Singhal

Simmons

2016

View full text Add to dashboard Cite

show abstract

Ensemble of Convolutional Neural Networks Improves Automated Segmentation of Acute Ischemic Lesions Using Multiparametric Diffusion-Weighted MRI

Winzeck

Mocking

Bezerra

et al. 2019

AJNR Am J Neuroradiol

View full text Add to dashboard Cite

Background and Purpose: Accurate automated infarct segmentation is needed for acute ischemic stroke studies relying on infarct volumes as an imaging phenotype or biomarker that require large numbers of subjects. This study investigates whether an ensemble of convolutional neural networks (CNN) trained on multiparametric DWI maps outperforms single networks trained on solo DWI parametric maps. Materials and Methods: CNNs were trained on combinations of DWI, ADC, and low b-value-weighted images from 116 subjects. The performances of the networks (measured by Dice score, sensitivity and precision) were compared to one another and to ensembles of 5 networks. To assess the generalizability of the approach, the best performing model was applied to an independent evaluation cohort of 151 subjects. Agreement between manual and automated segmentations for identifying patients with large lesions volumes was calculated across multiple thresholds (21 cm3, 31 cm3, 51 cm3, and 70 cm3). Results An ensemble of CNNs trained on DWI, ADC and low b-value-weighted images produced the most accurate acute infarct segmentation over individual networks (p<0.0001). Automated volumes correlated with manually measured volumes (Spearman’s ρ=0.91, p<0.0001) for the independent cohort. For the task of identifying patients with large lesion volumes, agreement between manual outlines and automated outlines was high (Cohen’s κ 0.86 to 0.90, p<0.0001). Conclusion Acute infarcts are more accurately segmented using ensembles of CNNs trained with multi-parametric maps than using a single model trained with a solo map. Automated lesion segmentation can perform with high agreement with manual techniques for identifying patients with large lesion volumes.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ayush Singhal

Text Mining Genotype-Phenotype Relationships from Biomedical Literature for Database Curation and Precision Medicine

Text mining for precision medicine: automating disease-mutation relationship extraction from biomedical literature

Ensemble of Convolutional Neural Networks Improves Automated Segmentation of Acute Ischemic Lesions Using Multiparametric Diffusion-Weighted MRI

Contact Info

Product

Resources

About