PURPOSE Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes—that is, penetrance—enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) –based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene–cancer penetrance meta-analyses spanning 16 gene–cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%—we are able to identify 132 of 142 studies—before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.
Background Several breast cancer risk-assessment models exist. Few studies have evaluated predictive accuracy of multiple models in large screening populations. Methods We evaluated the performance of the BRCAPRO, Gail, Claus, Breast Cancer Surveillance Consortium (BCSC), and Tyrer-Cuzick models in predicting risk of breast cancer over 6 years among 35 921 women aged 40–84 years who underwent mammography screening at Newton-Wellesley Hospital from 2007 to 2009. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and assessed calibration by comparing the ratio of observed-to-expected (O/E) cases. We calculated the square root of the Brier score and positive and negative predictive values of each model. Results Our results confirmed the good calibration and comparable moderate discrimination of the BRCAPRO, Gail, Tyrer-Cuzick, and BCSC models. The Gail model had slightly better O/E ratio and AUC (O/E = 0.98, 95% confidence interval [CI] = 0.91 to 1.06, AUC = 0.64, 95% CI = 0.61 to 0.65) compared with BRCAPRO (O/E = 0.94, 95% CI = 0.88 to 1.02, AUC = 0.61, 95% CI = 0.59 to 0.63) and Tyrer-Cuzick (version 8, O/E = 0.84, 95% CI = 0.79 to 0.91, AUC = 0.62, 95% 0.60 to 0.64) in the full study population, and the BCSC model had the highest AUC among women with available breast density information (O/E = 0.97, 95% CI = 0.89 to 1.05, AUC = 0.64, 95% CI = 0.62 to 0.66). All models had poorer predictive accuracy for human epidermal growth factor receptor 2 positive and triple-negative breast cancers than hormone receptor positive human epidermal growth factor receptor 2 negative breast cancers. Conclusions In a large cohort of patients undergoing mammography screening, existing risk prediction models had similar, moderate predictive accuracy and good calibration overall. Models that incorporate additional genetic and nongenetic risk factors and estimate risk of tumor subtypes may further improve breast cancer risk prediction.
PURPOSE:The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS:We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS:For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.
Twitter provides a rich database of spatiotemporal information about users who broadcast their real-time opinions, sentiment, and activities. In this paper, we sought to investigate the holistic influence of land use and time period on public sentiment. A total of 880,937 tweets posted by 26,060 active users were collected across Massachusetts (MA), USA, through 31 November 2012 to 3 June 2013. The IBM Watson Alchemy API (application program interface) was employed to quantify the sentiment scores conveyed by tweets on a large scale. Then we statistically analyzed the sentiment scores across different spaces and times. A multivariate linear mixed-effects model was used to quantify the fixed effects of land use and the time period on the variations in sentiment scores, considering the clustering effect of users. The results exposed clear spatiotemporal patterns of users’ sentiment. Higher sentiment scores were mainly observed in the commercial and public areas, during the noon/evening and on weekends. Our findings suggest that social media outputs can be used to better understand the spatial and temporal patterns of public happiness and well-being in cities and regions.
The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.
Background: Zhejiang Province, located in southeastern China, is frequently hit by tropical cyclones. This study quantified the associations between infectious diarrhea and the seven tropical cyclones that landed in Zhejiang from 2005–2011 to assess the impacts of the accompanying precipitation on the studied diseases. Method: A unidirectional case-crossover study design was used to evaluate the impacts of tropical storms and typhoons on infectious diarrhea. Principal component analysis (PCA) was applied to eliminate multicollinearity. A multivariate logistic regression model was used to estimate the odds ratios (ORs) and the 95% confidence intervals (CIs). Results: For all typhoons studied, the greatest impacts on bacillary dysentery and other infectious diarrhea were identified on lag 6 days (OR = 2.30, 95% CI: 1.81–2.93) and lag 5 days (OR = 3.56, 95% CI: 2.98–4.25), respectively. For all tropical storms, impacts on these diseases were highest on lag 2 days (OR = 2.47, 95% CI: 1.41–4.33) and lag 6 days (OR = 2.46, 95% CI: 1.69–3.56), respectively. The tropical cyclone precipitation was a risk factor for both bacillary dysentery and other infectious diarrhea when daily precipitation reached 25 mm and 50 mm with the largest OR = 3.25 (95% CI: 1.45–7.27) and OR = 3.05 (95% CI: 2.20–4.23), respectively. Conclusions: Both typhoons and tropical storms could contribute to an increase in risk of bacillary dysentery and other infectious diarrhea in Zhejiang. Tropical cyclone precipitation may also be a risk factor for these diseases when it reaches or is above 25 mm and 50 mm, respectively. Public health preventive and intervention measures should consider the adverse health impacts from tropical cyclones.
The surveillance of multidrug resistance among patients with previously treated TB who also possess these risk factors and the management of patients with MDR-TB should be reinforced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.