Zhengyi Deng scite author profile

PURPOSE Quantifying the risk of cancer associated with pathogenic mutations in germline cancer susceptibility genes—that is, penetrance—enables the personalization of preventive management strategies. Conducting a meta-analysis is the best way to obtain robust risk estimates. We have previously developed a natural language processing (NLP) –based abstract classifier which classifies abstracts as relevant to penetrance, prevalence of mutations, both, or neither. In this work, we evaluate the performance of this NLP-based procedure. MATERIALS AND METHODS We compared the semiautomated NLP-based procedure, which involves automated abstract classification and text mining, followed by human review of identified studies, with the traditional procedure that requires human review of all studies. Ten high-quality gene–cancer penetrance meta-analyses spanning 16 gene–cancer associations were used as the gold standard by which to evaluate the performance of our procedure. For each meta-analysis, we evaluated the number of abstracts that required human review (workload) and the ability to identify the studies that were included by the authors in their quantitative analysis (coverage). RESULTS Compared with the traditional procedure, the semiautomated NLP-based procedure led to a lower workload across all 10 meta-analyses, with an overall 84% reduction (2,774 abstracts v 16,941 abstracts) in the amount of human review required. Overall coverage was 93%—we are able to identify 132 of 142 studies—before reviewing references of identified studies. Reasons for the 10 missed studies included blank and poorly written abstracts. After reviewing references, nine of the previously missed studies were identified and coverage improved to 99% (141 of 142 studies). CONCLUSION We demonstrated that an NLP-based procedure can significantly reduce the review workload without compromising the ability to identify relevant studies. NLP algorithms have promising potential for reducing human efforts in the literature review process.

show abstract

Performance of Breast Cancer Risk-Assessment Models in a Large Mammography Cohort

McCarthy

Guan

Welch

et al. 2019

View full text Add to dashboard Cite

Background Several breast cancer risk-assessment models exist. Few studies have evaluated predictive accuracy of multiple models in large screening populations. Methods We evaluated the performance of the BRCAPRO, Gail, Claus, Breast Cancer Surveillance Consortium (BCSC), and Tyrer-Cuzick models in predicting risk of breast cancer over 6 years among 35 921 women aged 40–84 years who underwent mammography screening at Newton-Wellesley Hospital from 2007 to 2009. We assessed model discrimination using the area under the receiver operating characteristic curve (AUC) and assessed calibration by comparing the ratio of observed-to-expected (O/E) cases. We calculated the square root of the Brier score and positive and negative predictive values of each model. Results Our results confirmed the good calibration and comparable moderate discrimination of the BRCAPRO, Gail, Tyrer-Cuzick, and BCSC models. The Gail model had slightly better O/E ratio and AUC (O/E = 0.98, 95% confidence interval [CI] = 0.91 to 1.06, AUC = 0.64, 95% CI = 0.61 to 0.65) compared with BRCAPRO (O/E = 0.94, 95% CI = 0.88 to 1.02, AUC = 0.61, 95% CI = 0.59 to 0.63) and Tyrer-Cuzick (version 8, O/E = 0.84, 95% CI = 0.79 to 0.91, AUC = 0.62, 95% 0.60 to 0.64) in the full study population, and the BCSC model had the highest AUC among women with available breast density information (O/E = 0.97, 95% CI = 0.89 to 1.05, AUC = 0.64, 95% CI = 0.62 to 0.66). All models had poorer predictive accuracy for human epidermal growth factor receptor 2 positive and triple-negative breast cancers than hormone receptor positive human epidermal growth factor receptor 2 negative breast cancers. Conclusions In a large cohort of patients undergoing mammography screening, existing risk prediction models had similar, moderate predictive accuracy and good calibration overall. Models that incorporate additional genetic and nongenetic risk factors and estimate risk of tumor subtypes may further improve breast cancer risk prediction.

show abstract

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Bao

Deng

Wang

et al. 2019

JCO Clinical Cancer Informatics

View full text Add to dashboard Cite

PURPOSE:The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS:We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS:For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date.

show abstract

Using Twitter to Better Understand the Spatiotemporal Patterns of Public Sentiment: A Case Study in Massachusetts, USA

Cao

MacNaughton

Deng

et al. 2018

IJERPH

View full text Add to dashboard Cite

Twitter provides a rich database of spatiotemporal information about users who broadcast their real-time opinions, sentiment, and activities. In this paper, we sought to investigate the holistic influence of land use and time period on public sentiment. A total of 880,937 tweets posted by 26,060 active users were collected across Massachusetts (MA), USA, through 31 November 2012 to 3 June 2013. The IBM Watson Alchemy API (application program interface) was employed to quantify the sentiment scores conveyed by tweets on a large scale. Then we statistically analyzed the sentiment scores across different spaces and times. A multivariate linear mixed-effects model was used to quantify the fixed effects of land use and the time period on the variations in sentiment scores, considering the clustering effect of users. The results exposed clear spatiotemporal patterns of users’ sentiment. Higher sentiment scores were mainly observed in the commercial and public areas, during the noon/evening and on weekends. Our findings suggest that social media outputs can be used to better understand the spatial and temporal patterns of public happiness and well-being in cities and regions.

show abstract

Pathologic findings in reduction mammoplasty specimens: a surrogate for the population prevalence of breast cancer and high-risk lesions

Acevedo

Armengol

Deng

et al. 2018

Breast Cancer Res Treat

View full text Add to dashboard Cite

show abstract

Shorter survival and later stage at diagnosis among unmarried patients with cutaneous melanoma: A US national and tertiary care center study

Rachidi

Deng

Sullivan

et al. 2020

Journal of the American Academy of Dermatology

View full text Add to dashboard Cite

Impacts of Tropical Cyclones and Accompanying Precipitation on Infectious Diarrhea in Cyclone Landing Areas of Zhejiang Province, China

Deng

Xun

Zhou

et al. 2015

IJERPH

View full text Add to dashboard Cite

Background: Zhejiang Province, located in southeastern China, is frequently hit by tropical cyclones. This study quantified the associations between infectious diarrhea and the seven tropical cyclones that landed in Zhejiang from 2005–2011 to assess the impacts of the accompanying precipitation on the studied diseases. Method: A unidirectional case-crossover study design was used to evaluate the impacts of tropical storms and typhoons on infectious diarrhea. Principal component analysis (PCA) was applied to eliminate multicollinearity. A multivariate logistic regression model was used to estimate the odds ratios (ORs) and the 95% confidence intervals (CIs). Results: For all typhoons studied, the greatest impacts on bacillary dysentery and other infectious diarrhea were identified on lag 6 days (OR = 2.30, 95% CI: 1.81–2.93) and lag 5 days (OR = 3.56, 95% CI: 2.98–4.25), respectively. For all tropical storms, impacts on these diseases were highest on lag 2 days (OR = 2.47, 95% CI: 1.41–4.33) and lag 6 days (OR = 2.46, 95% CI: 1.69–3.56), respectively. The tropical cyclone precipitation was a risk factor for both bacillary dysentery and other infectious diarrhea when daily precipitation reached 25 mm and 50 mm with the largest OR = 3.25 (95% CI: 1.45–7.27) and OR = 3.05 (95% CI: 2.20–4.23), respectively. Conclusions: Both typhoons and tropical storms could contribute to an increase in risk of bacillary dysentery and other infectious diarrhea in Zhejiang. Tropical cyclone precipitation may also be a risk factor for these diseases when it reaches or is above 25 mm and 50 mm, respectively. Public health preventive and intervention measures should consider the adverse health impacts from tropical cyclones.

show abstract

Proportions and Risk Factors of Developing Multidrug Resistance Among Patients with Tuberculosis in China: A Population-Based Case–Control Study

Huai

Huang

Cheng

et al. 2016

Microbial Drug Resistance

View full text Add to dashboard Cite

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Zhengyi Deng

Validation of a Semiautomated Natural Language Processing–Based Procedure for Meta-Analysis of Cancer Susceptibility Gene Penetrance

Performance of Breast Cancer Risk-Assessment Models in a Large Mammography Cohort

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Using Twitter to Better Understand the Spatiotemporal Patterns of Public Sentiment: A Case Study in Massachusetts, USA

Pathologic findings in reduction mammoplasty specimens: a surrogate for the population prevalence of breast cancer and high-risk lesions

Shorter survival and later stage at diagnosis among unmarried patients with cutaneous melanoma: A US national and tertiary care center study

Impacts of Tropical Cyclones and Accompanying Precipitation on Infectious Diarrhea in Cyclone Landing Areas of Zhejiang Province, China

Proportions and Risk Factors of Developing Multidrug Resistance Among Patients with Tuberculosis in China: A Population-Based Case–Control Study

Contact Info

Product

Resources

About