In order to provide high-quality care, health professionals must efficiently identify the presence, possibility, or absence of symptoms, treatments and other relevant entities in freetext clinical notes. Such is the task of assertion detection -to identify the assertion class (present, possible, absent) of an entity based on textual cues in unstructured text. We evaluate state-of-the-art medical language models on the task and show that they outperform the baselines in all three classes. As transferability is especially important in the medical domain we further study how the best performing model behaves on unseen data from two other medical datasets. For this purpose we introduce a newly annotated set of 5,000 assertions for the publicly available MIMIC-III dataset. We conclude with an error analysis that reveals situations in which the models still go wrong and points towards future research directions.
Extracting structured information from unstructured data is one of the key challenges in modern information retrieval applications, including e-commerce. Here, we demonstrate how recent advances in machine learning, combined with a recently published multilingual data set with standardized fine-grained product category information, enable robust product attribute extraction in challenging transfer learning settings. Our models can reliably predict product attributes across online shops, languages, or both. Furthermore, we show that our models can be used to match product taxonomies between online retailers.
Background: Automated cell-level characterization of the tumor microenvironment (TME) at scale is key to data-driven immuno-oncology. Artificial intelligence (AI)-powered analysis of hematoxylin and eosin (H&E) images scales and has recently been translated into diagnostics. However, robust TME analysis solely based on H&E data is bound by the stain's properties and by manual pathologist annotations, both in number and accuracy. In this study, we quantify the error introduced by pathologists' morphological assessment and mitigate this error by training AI-systems without manual pathologist annotations, using labels determined directly from IHC profiles. Methods: The work was carried out on 239 clinical NSCLC cases. CK-KL1, CD3+CD20, and Mum1 were used for defining carcinoma (CA), lymphocyte (LY), and plasma (PL) cells. For evaluation, representative regions were annotated by 3 trained pathologists. The workflow is based on co-registration of same-section H&E and IHC stained images with single cell precision. Cells were detected in H&E and labelled using rule-based algorithms that incorporated IHC information. This H&E data was used to train neural networks (NN). Results: (A) The inter-rater agreement of pathologists annotating on H&E is increased when information from registered IHC images is provided. (B) The concordance of pathologists on H&E-only compared to on H&E+IHC shows that pathologists miss or misclassify cells with a certain error. (C) NNs trained with IHC-based labels achieve similar performance for cell type classification on H&E as pathologists on H&E. Conclusion: This study demonstrates the value of combining histomorphological and IHC data for improved cell annotation. Our novel workflow provides a quantitative benchmark and facilitates training of accurate AI models for quantitative characterization of tumor and TME from H&E sections. A) Inter-rater agreement by metric, stain, and cell type By cell count, Pearson correlation By cell count, Pearson correlation By cell location, Krippendorff’s alpha By cell location, Krippendorff’s alpha Cell type H&E-only H&E+IHC H&E-only H&E+IHC CA 0.86 0.98 0.43 0.90 LY 0.88 0.99 0.21 0.76 PL 0.77 0.96 0.32 0.87 B) Performance of individual pathologists in H&E Against consensus in H&E+IHC Against own annotations in H&E+IHC Against own annotations in H&E+IHC Cell type By cell count, Pearson correlation By cell location, Precision By cell location, Recall CA 0.84 0.76 0.77 LY 0.78 0.70 0.60 PL 0.76 0.69 0.21 C) NN against annotator H&E+IHC consensus Cell Type By cell count, Pearson correlation CA 0.84 LY 0.92 PL 0.75 Citation Format: Thomas Mrowiec, Sharon Ruane, Simon Schallenberg, Gabriel Dernbach, Rumyana Todorova, Cornelius Böhm, Walter de Back, Blanca Pablos, Roman Schulte-Sasse, Ivana Trajanovska, Adelaida Creosteanu, Emil Barbuta, Marcus Otte, Christian Ihling, Hans Juergen Grote, Juergen Scheuenpflug, Viktor Matyas, Maximilian Alber, Frederick Klauschen. Immunohistochemistry-informed AI systems for improved characterization of tumor-microenvironment in clinical non-small cell lung cancer H&E samples [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr 457.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.