An automated parsing routine was written for extracting the »site«, »diagnostic«, and »modifier« components of the diagnostic statements of the diagnostic summary of surgical pathology reports. Such parsed reports appear to be suitable for input into an information retrieval system for the surgical pathology reports.Data was input through a key-to-tape device producing a computer compatible magnetic tape with a record size of 870 bytes. The statements were parsed through syntactic and morphological analysis utilizing the common prepositions, the common punctuations and the morphemal constructions common in medical terms. (A total of sixty-two delimiters were used). Certain suffix transformations were performed, converting some »site« adjective to »site« nouns, and some »diagnostic« nouns to »site« nouns. 1,108 diagnostic statements were processed with an error rate of 9.3% for the latest version on the last 493 statements.
With the objective of providing easier access to pathology specimens, slides and kodachromes with linkage to x-ray and the remainder of the patient’s medical records, an automated natural language parsing routine, based on dictionary look-up, was written for Surgical Pathology document-pairs, each consisting of a Request for Examination (authored by clinicians) and its corresponding report (authored by pathologists). These documents were input to the system in free-text English without manual editing or coding.Two types of indices were prepared. The first was an »inverted« file, available for on-line retrieval, for display of the content of the document-pairs, frequency counts of cases or listing of cases in table format. Retrievable items are patient’s and specimen’s identification data, date of operation, name of clinician and pathologist, etc. The English content of the operative procedure, clinical findings and pathologic diagnoses can be retrieved through logical combination of key words. The second type of index was a catalog. Three catalog files — »operation«, »clinical«, and »pathology« — were prepared by alphabetization of lines formed by the rotation of phrases, headed by keywords. These keywords were automatically selected and standardized by the parsing routine and the phrases were extracted from each sentence of each input document. Over 2,500 document-pairs have been entered and are currently being utilized for purpose of medical education.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.