2020
DOI: 10.1109/access.2020.3005684
|View full text |Cite
|
Sign up to set email alerts
|

Information Extraction for Intestinal Cancer Electronic Medical Records

Abstract: The data generated by the structured electronic medical records is helpful for mining and extracting medical data, and it is an effective way to make effective use of valuable data resources. However, the hospitals have accumulated a large number of unstructured data in electronic medical records, which cannot be effectively searched, resulting in serious waste of resources. In this paper, we study the problem of extracting attribute values from the unstructured text in electronic medical records. By observing… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

1
0
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 31 publications
(33 reference statements)
1
0
0
Order By: Relevance
“…Our comparison of embedding features and unigram features clearly demonstrates the added value of lexically-abstracted embedding features, which enable data-driven models to capitalize on similar and related words beyond exact matches (46,73). As observed in prior literature (30,74,75), word embeddings that balance a training corpus that is representative of the target information with corpus size achieve the best performance for specialized tasks. While our results led us to use the most specialized PT-OT corpus for our word2vec embeddings, the performance of our more general NIHCC corpus (approximately 155,000 documents) was comparable to PT-OT results, and MIMIC embeddings were not far behind.…”
Section: A Template For Expanding Automated Coding To New Concept Domsupporting
confidence: 60%
“…Our comparison of embedding features and unigram features clearly demonstrates the added value of lexically-abstracted embedding features, which enable data-driven models to capitalize on similar and related words beyond exact matches (46,73). As observed in prior literature (30,74,75), word embeddings that balance a training corpus that is representative of the target information with corpus size achieve the best performance for specialized tasks. While our results led us to use the most specialized PT-OT corpus for our word2vec embeddings, the performance of our more general NIHCC corpus (approximately 155,000 documents) was comparable to PT-OT results, and MIMIC embeddings were not far behind.…”
Section: A Template For Expanding Automated Coding To New Concept Domsupporting
confidence: 60%
“…Syndrome differentiation of Yin and Yang deficiency is based on the physiological and pathological characteristics of the Yin and the Yang, and involves analyzing and summarizing a variety of disease-related information that is collected according to four diagnostics for identification [6]. A large amount of critical information on healthcare is buried in unstructured narratives, such as medical records, which makes its computational analysis difficult [7]. Moreover, mastering syndrome differentiation in TCM is a complicated and time-consuming process.…”
Section: Introductionmentioning
confidence: 99%