Medical Imaging 2020: Imaging Informatics for Healthcare, Research, and Applications 2020
DOI: 10.1117/12.2549565
|View full text |Cite
|
Sign up to set email alerts
|

Exploiting biomedical literature to mine out a large multimodal dataset of rare cancer studies

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
2
1

Relationship

1
2

Authors

Journals

citations
Cited by 3 publications
(4 citation statements)
references
References 0 publications
0
4
0
Order By: Relevance
“…For images detected as histopathology to be useful for training clinical decision support systems, further steps are required [12]. Much of the cancer research is on animals so, when learning human tissue classification, it is important to filter out animal tissue samples even from the same organ.…”
Section: Combining Text and Images For Data Analysismentioning
confidence: 99%
See 1 more Smart Citation
“…For images detected as histopathology to be useful for training clinical decision support systems, further steps are required [12]. Much of the cancer research is on animals so, when learning human tissue classification, it is important to filter out animal tissue samples even from the same organ.…”
Section: Combining Text and Images For Data Analysismentioning
confidence: 99%
“…With more than 2 million articles in total, an average of 3.5 figures per article including 1.5 compound figures of 4 subfigures each, it sums up to approximately seven million figures, including 3 million compound figures with 12 million subfigures for a total of >16 million figures available in 2018 if separated correctly. With an expected increase of nearly 3 million figures in 2019 and more in the following years, it promises in the near future very large amounts of training data in various applications, modalities and particularly rare cases that are strongly oversampled in the literature compared to clinical archives [12]. Difficulties related to this type of data, that will be developed in more details in Section 3, include the heterogeneity and non-guaranteed quality of the images, the presence of compound images and the automation of ground truth labels extraction from the text.…”
Section: Introductionmentioning
confidence: 99%
“…Biomedical datasets are increasingly made publicly available leading to thousands of potentially available multi-modal data, e.g. from challenges [5][6][7] , open-access databases [8][9][10][11][12] , and scientific literature 13 . Some examples of online databases are UniProt 14 which aims to provide comprehensive and high-quality resources on protein sequences and functional information and the Kyoto Encyclopedia of Genes and Genomes (KEGG), a professional knowledge base for the biological interpretation of large-scale molecular datasets, such as genomic and metagenomic sequences 15 .Semantic-based approaches represent an extraordinary and increasingly exploited opportunity for biomedical sciences because they allow to create a unique, multilingual representation of medical concepts.While in Linguistics Semantics refers to the meaning of words, phrases, or sentences, in Computer Science Semantics refers to the study of properties, categories, and relationships among concepts of a specific area 16 .…”
mentioning
confidence: 99%
“…Biomedical datasets are increasingly made publicly available leading to thousands of potentially available multi-modal data, e.g. from challenges [5][6][7] , open-access databases [8][9][10][11][12] , and scientific literature 13 . Some examples of online databases are UniProt 14 which aims to provide comprehensive and high-quality resources on protein sequences and functional information and the Kyoto Encyclopedia of Genes and Genomes (KEGG), a professional knowledge base for the biological interpretation of large-scale molecular datasets, such as genomic and metagenomic sequences 15 .…”
mentioning
confidence: 99%