Search citation statements
Paper Sections
Citation Types
Year Published
Publication Types
Relationship
Authors
Journals
INTRODUCTIONNOT ONLY DOES the problem of correcting spelling errors by computer have a long history, it is evidently of considerable current interest as papers 17,95 and letters 18,30,57,66,69 on the topic continue to appear rapidly. This is not surprising, since techniques useful in detecting and correcting mis-spellings normally have other important applications. Moreover, both the power of small computers and the routine production of machine-readable text have increased enormously over the last decade to the point where automatic spelling error detection/correction has become not only feasible but highly desirable.Potential applications for spelling error detection/correction techniques arise in numerous applications. Early papers focused on the correction of output from optical character recognition (OCR), voice recognition, or Morse code, or on spelling errors in program code, but the domain of most interest today is probably the correction of machine-readable text made available by word processing. However, methods for assessing the similarity of two strings of symbols, which are widely used to compare mis-spellings with dictionary words, are of very general interest; e.g., for determining the evolutionary distance of proteins. 56,70,72 Similarly, one can imagine spelling correction techniques being extended to almost any kind of error-prone transmission, even to partially decrypted code. Also, spelling error detection involves searching large dictionaries; and this capability is obviously of widespread utility.This note attempts to provide a comprehensive bibliography of papers in English on the major aspects of spelling error detection and correction of English text. The author is solely reponsible for the content of the annotations. SPELLING ERROR DETECTIONThe goal of spelling error detection is basically to decide if a text string is a valid word; this is normally done by determining whether or not the string is in a system dictionary. As both the dictionary and the number of words to be processed are usually large in real-world systems, it is important to make the dictionary search highly efficient. Note that words need not be literally present in the dictionary; they may be stored much more economically as, for example, hash codes, patterns of bits distributed over a long string, or n-grams. However, in compressed representations, one usually has to be content with a very high probability that a given word is present or not rather than with the certainty given by a literal dictionary. Similarly, the dictionary may be searched via tries, trees, hash coding (scatter storage) or a variety of other techniques.
INTRODUCTIONNOT ONLY DOES the problem of correcting spelling errors by computer have a long history, it is evidently of considerable current interest as papers 17,95 and letters 18,30,57,66,69 on the topic continue to appear rapidly. This is not surprising, since techniques useful in detecting and correcting mis-spellings normally have other important applications. Moreover, both the power of small computers and the routine production of machine-readable text have increased enormously over the last decade to the point where automatic spelling error detection/correction has become not only feasible but highly desirable.Potential applications for spelling error detection/correction techniques arise in numerous applications. Early papers focused on the correction of output from optical character recognition (OCR), voice recognition, or Morse code, or on spelling errors in program code, but the domain of most interest today is probably the correction of machine-readable text made available by word processing. However, methods for assessing the similarity of two strings of symbols, which are widely used to compare mis-spellings with dictionary words, are of very general interest; e.g., for determining the evolutionary distance of proteins. 56,70,72 Similarly, one can imagine spelling correction techniques being extended to almost any kind of error-prone transmission, even to partially decrypted code. Also, spelling error detection involves searching large dictionaries; and this capability is obviously of widespread utility.This note attempts to provide a comprehensive bibliography of papers in English on the major aspects of spelling error detection and correction of English text. The author is solely reponsible for the content of the annotations. SPELLING ERROR DETECTIONThe goal of spelling error detection is basically to decide if a text string is a valid word; this is normally done by determining whether or not the string is in a system dictionary. As both the dictionary and the number of words to be processed are usually large in real-world systems, it is important to make the dictionary search highly efficient. Note that words need not be literally present in the dictionary; they may be stored much more economically as, for example, hash codes, patterns of bits distributed over a long string, or n-grams. However, in compressed representations, one usually has to be content with a very high probability that a given word is present or not rather than with the certainty given by a literal dictionary. Similarly, the dictionary may be searched via tries, trees, hash coding (scatter storage) or a variety of other techniques.
Using natural language a computerized indexing and retrieval system was developed on a commercial database program, DATATRIEVE (Digital Equipment Corporation, Japan). Summarized anatomical diagnoses of nearly 4000 autopsy cases have been registered over a 13-year period at Tokyo Metropolitan Geriatric Hospital. There were 187,367 words in the pathological diagnoses with 4689 distinct words excluding articles, prepositions and conjunctions. 'Atrophy', 'congestion' and 'metastasis' were the most frequent words with frequencies of 4335, 3377, and 3373, respectively. Distinct clinical diagnoses were 2497, among which 'pneumonia', 'hypertension' and 'DIC' predominated. Each step of retrieval by character strings from the sequential data file requires less than a minute.
Context.—Abbreviations are used frequently in pathology reports and medical records. Efforts to identify and organize free-text concepts must correctly interpret medical abbreviations. During the past decade, the author has collected more than 12 000 medical abbreviations, concentrating on terms used or interpreted by pathologists. Objective.—The purpose of the study is to provide readers with a listing of abbreviations. The listing of abbreviations is reviewed for the purpose of determining the variety of ways that long forms are shortened. Design.—Abbreviations fell into different classes. These classes seemed amenable to distinct algorithmic approaches to their correct expansions. A discussion of these abbreviation classes was included to assist informaticians who are searching for ways to write software that expands abbreviations found in medical text. Classes were separated by the algorithmic approaches that could be used to map abbreviations to their correct expansions. A Perl implementation was developed to automatically match expansions with Unified Medical Language System concepts. Measurements.—The abbreviation list contained 12 097 terms; 5772 abbreviations had unique expansions. There were 6325 polysemous abbreviation/expansion pairs. The expansions of 8599 abbreviations mapped to Unified Medical Language System concepts. Three hundred twenty-four abbreviations could be confused with unabbreviated words. Two hundred thirteen abbreviations had different expansions depending on whether the American or the British spellings were used. Nine hundred seventy abbreviations ended in the letter “s.” Results.—There were 6 nonexclusive groups of abbreviations classed by expansion algorithm, as follows: (1) ephemeral; (2) hyponymous; (3) monosemous; (4) polysemous; (5) masqueraders of common words; and (6) fatal (abbreviations whose incorrect expansions could easily result in clinical errors). Conclusion.—Collecting and classifying abbreviations creates a logical approach to the development of class-specific algorithms designed to expand abbreviations. A large listing of medical abbreviations is placed into the public domain. The most current version is available at http://www.pathologyinformatics.org/downloads/abbtwo.htm.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.