2019
DOI: 10.1093/jamia/ocz153
|View full text |Cite
|
Sign up to set email alerts
|

Automatic extraction of cancer registry reportable information from free-text pathology reports using multitask convolutional neural networks

Abstract: Objective We implement 2 different multitask learning (MTL) techniques, hard parameter sharing and cross-stitch, to train a word-level convolutional neural network (CNN) specifically designed for automatic extraction of cancer data from unstructured text in pathology reports. We show the importance of learning related information extraction (IE) tasks leveraging shared representations across the tasks to achieve state-of-the-art performance in classification accuracy and computational efficie… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
63
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
3
2
1

Relationship

1
5

Authors

Journals

citations
Cited by 65 publications
(63 citation statements)
references
References 18 publications
0
63
0
Order By: Relevance
“…Each pathology report in our dataset is associated with a unique tumor ID; the same tumor ID may be associated with one or more pathology reports. For each tumor ID, Certified Tumor Registrars (CTRs) manually assigned ground truth labels for key data elements such as cancer site and histology based off all data available for that tumor ID according to the SEER program coding and staging manual [2] . We note that these ground truth labels are at the tumor level rather than at the report level; as a consequence of this labelling scheme, tumor IDs associated with multiple pathology reports may have a tumor-level label that does not reflect the content within individual pathology reports.…”
Section: Dataset and Pre-processingmentioning
confidence: 99%
See 4 more Smart Citations
“…Each pathology report in our dataset is associated with a unique tumor ID; the same tumor ID may be associated with one or more pathology reports. For each tumor ID, Certified Tumor Registrars (CTRs) manually assigned ground truth labels for key data elements such as cancer site and histology based off all data available for that tumor ID according to the SEER program coding and staging manual [2] . We note that these ground truth labels are at the tumor level rather than at the report level; as a consequence of this labelling scheme, tumor IDs associated with multiple pathology reports may have a tumor-level label that does not reflect the content within individual pathology reports.…”
Section: Dataset and Pre-processingmentioning
confidence: 99%
“…Similar to our previous studies, we applied standard text pre-processing techniques to clean our corpus [2,3]. After excluding metadata (e.g., patient ID, registry ID) in cancer pathology reports, text was cleaned by removing any consecutive punctuation and by lowercasing all alphabetical characters.…”
Section: Dataset and Pre-processingmentioning
confidence: 99%
See 3 more Smart Citations