Jason Wei scite author profile

We present EDA: easy data augmentation techniques for boosting performance on text classification tasks. EDA consists of four simple but powerful operations: synonym replacement, random insertion, random swap, and random deletion. On five text classification tasks, we show that EDA improves performance for both convolutional and recurrent neural networks. EDA demonstrates particularly strong results for smaller datasets; on average, across five datasets, training with EDA while using only 50% of the available training set achieved the same accuracy as normal training with all available data. We also performed extensive ablation studies and suggest parameters for practical use.

show abstract

Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks

Wei

Tafe

Linnik

et al. 2019

Sci Rep

209

169

View full text Add to dashboard Cite

Classification of histologic patterns in lung adenocarcinoma is critical for determining tumor grade and treatment for patients. However, this task is often challenging due to the heterogeneous nature of lung adenocarcinoma and the subjective criteria for evaluation. In this study, we propose a deep learning model that automatically classifies the histologic patterns of lung adenocarcinoma on surgical resection slides. Our model uses a convolutional neural network to identify regions of neoplastic cells, then aggregates those classifications to infer predominant and minor histologic patterns for any given whole-slide image. We evaluated our model on an independent set of 143 whole-slide images. It achieved a kappa score of 0.525 and an agreement of 66.6% with three pathologists for classifying the predominant patterns, slightly higher than the inter-pathologist kappa score of 0.485 and agreement of 62.7% on this test set. All evaluation metrics for our model and the three pathologists were within 95% confidence intervals of agreement. If confirmed in clinical practice, our model can assist pathologists in improving classification of lung adenocarcinoma patterns by automatically pre-screening and highlighting cancerous regions prior to review. Our approach can be generalized to any whole-slide image classification task, and code is made publicly available at https://github.com/BMIRDS/deepslide .

show abstract

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Wei

Zou

2019

Preprint

102

151

View full text Add to dashboard Cite

Automating the Paris System for urine cytopathology—A hybrid deep‐learning and morphometric approach

Vaickus

Suriawinata

Wei

et al. 2019

Cancer Cytopathology

View full text Add to dashboard Cite

Background The Paris System for Urine Cytopathology (the Paris System) has succeeded in making the analysis of liquid‐based urine preparations more reproducible. Any algorithm seeking to automate this system must accurately estimate the nuclear‐to‐cytoplasmic (N:C) ratio and produce a qualitative “atypia score.” The authors propose a hybrid deep‐learning and morphometric model that reliably automates the Paris System. Methods Whole‐slide images (WSI) of liquid‐based urine cytology specimens were extracted from 51 negative, 60 atypical, 52 suspicious, and 54 positive cases. Morphometric algorithms were applied to decompose images to their component parts; and statistics, including the NC ratio, were tabulated using segmentation algorithms to create organized data structures, dubbed rich information matrices (RIMs). These RIM objects were enhanced using deep‐learning algorithms to include qualitative measures. The augmented RIM objects were then used to reconstruct WSIs with filtering criteria and to generate pancellular statistical information. Results The described system was used to calculate the N:C ratio for all cells, generate object classifications (atypical urothelial cell, squamous cell, crystal, etc), filter the original WSI to remove unwanted objects, rearrange the WSI to an efficient, condensed‐grid format, and generate pancellular statistics containing quantitative/qualitative data for every cell in a WSI. In addition to developing novel techniques for managing WSIs, a system capable of automatically tabulating the Paris System criteria also was generated. Conclusions A hybrid deep‐learning and morphometric algorithm was developed for the analysis of urine cytology specimens that could reliably automate the Paris System and provide many avenues for increasing the efficiency of digital screening for urine WSIs and other cytology preparations.

show abstract

Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides

et al. 2019

View full text Add to dashboard Cite

IMPORTANCE Deep learning–based methods, such as the sliding window approach for cropped-image classification and heuristic aggregation for whole-slide inference, for analyzing histological patterns in high-resolution microscopy images have shown promising results. These approaches, however, require a laborious annotation process and are fragmented. OBJECTIVE To evaluate a novel deep learning method that uses tissue-level annotations for high-resolution histological image analysis for Barrett esophagus (BE) and esophageal adenocarcinoma detection. DESIGN, SETTING, AND PARTICIPANTS This diagnostic study collected deidentified high-resolution histological images (N = 379) for training a new model composed of a convolutional neural network and a grid-based attention network. Histological images of patients who underwent endoscopic esophagus and gastroesophageal junction mucosal biopsy between January 1, 2016, and December 31, 2018, at Dartmouth-Hitchcock Medical Center (Lebanon, New Hampshire) were collected. MAIN OUTCOMES AND MEASURES The model was evaluated on an independent testing set of 123 histological images with 4 classes: normal, BE-no-dysplasia, BE-with-dysplasia, and adenocarcinoma. Performance of this model was measured and compared with that of the current state-of-the-art sliding window approach using the following standard machine learning metrics: accuracy, recall, precision, and F1 score. RESULTS Of the independent testing set of 123 histological images, 30 (24.4%) were in the BE-nodysplasia class, 14 (11.4%) in the BE-with-dysplasia class, 21 (17.1%) in the adenocarcinoma class, and 58 (47.2%) in the normal class. Classification accuracies of the proposed model were 0.85 (95% CI, 0.81–0.90) for the BE-no-dysplasia class, 0.89 (95% CI, 0.84–0.92) for the BE-with-dysplasia class, and 0.88 (95% CI, 0.84–0.92) for the adenocarcinoma class. The proposed model achieved a mean accuracy of 0.83 (95% CI, 0.80–0.86) and marginally outperformed the sliding window approach on the same testing set. The F1 scores of the attention-based model were at least 8% higher for each class compared with the sliding window approach: 0.68 (95% CI, 0.61–0.75) vs 0.61 (95% CI, 0.53–0.68) for the normal class, 0.72 (95% CI, 0.63–0.80) vs 0.58 (95% CI, 0.45–0.69) for the BE-nodysplasia class, 0.30 (95% CI, 0.11–0.48) vs 0.22 (95% CI, 0.11–0.33) for the BE-with-dysplasia class, and 0.67 (95% CI, 0.54–0.77) vs 0.58 (95% CI, 0.44–0.70) for the adenocarcinoma class. However, this outperformance was not statistically significant. CONCLUSIONS AND RELEVANCE Results of this study suggest that the proposed attention-based deep neural network framework for BE and esophageal adenocarcinoma detection is important because it is based solely on tissue-level annotations, unlike existing methods that are based on regions of interest. This new model is expected to open avenues for applying deep learning to digital pathology.

show abstract

Evaluation of a Deep Neural Network for Automated Classification of Colorectal Polyps on Histopathologic Slides

et al. 2020

View full text Add to dashboard Cite

Are deep neural networks trained on data from a single institution for classification of colorectal polyps on digitized histopathology slides generalizable across multiple external institutions? Findings: A new deep neural network was developed based on 326 slide images from our institution to classify the four most common polyp types on digitized histopathology slides. In addition to evaluation on an internal test set of 157 slide images, we evaluated the model on an external test set of 238 slide images from 24 institutions across 13 states in the United States.This model achieved mean accuracies of 93.5% and 87.0% on the internal and external test sets, respectively, which were comparable with the performance of local pathologists on these test sets.Meaning: Deep neural networks could provide a generalizable approach for the classification of colorectal polyps on digitized histopathology slides and if confirmed in clinical trials, could potentially improve the efficiency, reproducibility, and accuracy of one of the most common cancer screening procedures.

show abstract

A Survey of Data Augmentation Approaches for NLP

et al. 2021

View full text Add to dashboard Cite

Data augmentation has recently seen increased interest in NLP due to more work in lowresource domains, new tasks, and the popularity of large-scale neural networks that require large amounts of training data. Despite this recent upsurge, this area is still relatively underexplored, perhaps due to the challenges posed by the discrete nature of language data. In this paper, we present a comprehensive and unifying survey of data augmentation for NLP by summarizing the literature in a structured manner. We first introduce and motivate data augmentation for NLP, and then discuss major methodologically representative approaches. Next, we highlight techniques that are used for popular NLP applications and tasks. We conclude by outlining current challenges and directions for future research. Overall, our paper aims to clarify the landscape of existing literature in data augmentation for NLP and motivate additional work in this area. We also present a GitHub repository with a paper list that will be continuously updated at https://github.com/styfeng/DataAug4NLP.

show abstract

Automated Detection of Celiac Disease on Duodenal Biopsy Slides: A Deep Learning Approach

Wei

Jackson

et al. 2019

Journal of Pathology Informatics

View full text Add to dashboard Cite

Context: Celiac disease (CD) prevalence and diagnosis have increased substantially in recent years. The current gold standard for CD confirmation is visual examination of duodenal mucosal biopsies. An accurate computer-aided biopsy analysis system using deep learning can help pathologists diagnose CD more efficiently. Subjects and Methods: In this study, we trained a deep learning model to detect CD on duodenal biopsy images. Our model uses a state-of-the-art residual convolutional neural network to evaluate patches of duodenal tissue and then aggregates those predictions for whole-slide classification. We tested the model on an independent set of 212 images and evaluated its classification results against reference standards established by pathologists. Results: Our model identified CD, normal tissue, and nonspecific duodenitis with accuracies of 95.3%, 91.0%, and 89.2%, respectively. The area under the receiver operating characteristic curve was >0.95 for all classes. Conclusions: We have developed an automated biopsy analysis system that achieves high performance in detecting CD on biopsy slides. Our system can highlight areas of interest and provide preliminary classification of duodenal biopsies before review by pathologists. This technology has great potential for improving the accuracy and efficiency of CD diagnosis.

show abstract

12 3 4 5

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jason Wei

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks

EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks

Automating the Paris System for urine cytopathology—A hybrid deep‐learning and morphometric approach

Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides

Evaluation of a Deep Neural Network for Automated Classification of Colorectal Polyps on Histopathologic Slides

A Survey of Data Augmentation Approaches for NLP

Automated Detection of Celiac Disease on Duodenal Biopsy Slides: A Deep Learning Approach

Contact Info

Product

Resources

About