Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text

Wong, Ruth L.; Reno, James D.; Hain, Timothy C.; Platt, Robert C.; Gaynon, P.; Joseph, David

doi:10.1016/0010-4809(80)90029-4

Computers and Biomedical Research

1980

DOI: 10.1016/0010-4809(80)90029-4

|View full text |Cite

Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text

Ruth L. Wong

James D. Reno

Timothy C. Hain

et al.

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

Supporting

Mentioning

Contrasting

Year Published

1982

2004

Publication Types

Select...

Article3

Relationship

Self Cite0

Independent3

Authors

Journals

Cited by 3 publications

References 16 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

Pollock¹

1982

Journal of Documentation

View full text Add to dashboard Cite

INTRODUCTIONNOT ONLY DOES the problem of correcting spelling errors by computer have a long history, it is evidently of considerable current interest as papers 17,95 and letters 18,30,57,66,69 on the topic continue to appear rapidly. This is not surprising, since techniques useful in detecting and correcting mis-spellings normally have other important applications. Moreover, both the power of small computers and the routine production of machine-readable text have increased enormously over the last decade to the point where automatic spelling error detection/correction has become not only feasible but highly desirable.Potential applications for spelling error detection/correction techniques arise in numerous applications. Early papers focused on the correction of output from optical character recognition (OCR), voice recognition, or Morse code, or on spelling errors in program code, but the domain of most interest today is probably the correction of machine-readable text made available by word processing. However, methods for assessing the similarity of two strings of symbols, which are widely used to compare mis-spellings with dictionary words, are of very general interest; e.g., for determining the evolutionary distance of proteins. 56,70,72 Similarly, one can imagine spelling correction techniques being extended to almost any kind of error-prone transmission, even to partially decrypted code. Also, spelling error detection involves searching large dictionaries; and this capability is obviously of widespread utility.This note attempts to provide a comprehensive bibliography of papers in English on the major aspects of spelling error detection and correction of English text. The author is solely reponsible for the content of the annotations. SPELLING ERROR DETECTIONThe goal of spelling error detection is basically to decide if a text string is a valid word; this is normally done by determining whether or not the string is in a system dictionary. As both the dictionary and the number of words to be processed are usually large in real-world systems, it is important to make the dictionary search highly efficient. Note that words need not be literally present in the dictionary; they may be stored much more economically as, for example, hash codes, patterns of bits distributed over a long string, or n-grams. However, in compressed representations, one usually has to be content with a very high probability that a given word is present or not rather than with the certainty given by a literal dictionary. Similarly, the dictionary may be searched via tries, trees, hash coding (scatter storage) or a variety of other techniques.

show abstract

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

Pollock¹

1982

Journal of Documentation

View full text Add to dashboard Cite

show abstract

A pathology database system for autopsy diagnoses using free-text method

Ohtsubo

Shibasaki

Kawamura

et al. 1992

Medical Informatics

View full text Add to dashboard Cite

Using natural language a computerized indexing and retrieval system was developed on a commercial database program, DATATRIEVE (Digital Equipment Corporation, Japan). Summarized anatomical diagnoses of nearly 4000 autopsy cases have been registered over a 13-year period at Tokyo Metropolitan Geriatric Hospital. There were 187,367 words in the pathological diagnoses with 4689 distinct words excluding articles, prepositions and conjunctions. 'Atrophy', 'congestion' and 'metastasis' were the most frequent words with frequencies of 4335, 3377, and 3373, respectively. Distinct clinical diagnoses were 2497, among which 'pneumonia', 'hypertension' and 'DIC' predominated. Each step of retrieval by character strings from the sequential data file requires less than a minute.

show abstract

Pathology Abbreviated: A Long Review of Short Terms

Berman¹

2004

Archives of Pathology &Amp; Laboratory Medicine

View full text Add to dashboard Cite

Context.—Abbreviations are used frequently in pathology reports and medical records. Efforts to identify and organize free-text concepts must correctly interpret medical abbreviations. During the past decade, the author has collected more than 12 000 medical abbreviations, concentrating on terms used or interpreted by pathologists. Objective.—The purpose of the study is to provide readers with a listing of abbreviations. The listing of abbreviations is reviewed for the purpose of determining the variety of ways that long forms are shortened. Design.—Abbreviations fell into different classes. These classes seemed amenable to distinct algorithmic approaches to their correct expansions. A discussion of these abbreviation classes was included to assist informaticians who are searching for ways to write software that expands abbreviations found in medical text. Classes were separated by the algorithmic approaches that could be used to map abbreviations to their correct expansions. A Perl implementation was developed to automatically match expansions with Unified Medical Language System concepts. Measurements.—The abbreviation list contained 12 097 terms; 5772 abbreviations had unique expansions. There were 6325 polysemous abbreviation/expansion pairs. The expansions of 8599 abbreviations mapped to Unified Medical Language System concepts. Three hundred twenty-four abbreviations could be confused with unabbreviated words. Two hundred thirteen abbreviations had different expansions depending on whether the American or the British spellings were used. Nine hundred seventy abbreviations ended in the letter “s.” Results.—There were 6 nonexclusive groups of abbreviations classed by expansion algorithm, as follows: (1) ephemeral; (2) hyponymous; (3) monosemous; (4) polysemous; (5) masqueraders of common words; and (6) fatal (abbreviations whose incorrect expansions could easily result in clinical errors). Conclusion.—Collecting and classifying abbreviations creates a logical approach to the development of class-specific algorithms designed to expand abbreviations. A large listing of medical abbreviations is placed into the public domain. The most current version is available at http://www.pathologyinformatics.org/downloads/abbtwo.htm.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Profile of a dictionary compiled from scanning over one million words of surgical pathology narrative text

Cited by 3 publications

References 16 publications

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

Spelling Error Detection and Correction by Computer: Some Notes and a Bibliography

A pathology database system for autopsy diagnoses using free-text method

Pathology Abbreviated: A Long Review of Short Terms

Contact Info

Product

Resources

About