Abstract:Reference intervals are critical for the interpretation of laboratory
results. The development of reference intervals using traditional methods is
time consuming and costly. An alternative approach, known as an a
posteriori method, requires an expert to enumerate diagnoses and
procedures that can affect the measurement of interest. We develop a method,
LIMIT, to use laboratory test results from a clinical database to identify ICD9
codes that are associated with extreme laboratory results, thus automating the
a… Show more
“…Some studies have used repeat testing as an assumption of illness and, therefore, an exclusion criterion (11,12), whereas Kouri et al (1) used discharge diagnoses related to the analyte of interest to exclude patients. Poole et al (13) recently used unsupervised computer learning to identify ICD9 codes associated with extreme laboratory values and then used these codes to exclude individuals from the reference population. Here, we demonstrate how data mining coupled with disease-specific exclusion criteria was used in the a posteriori sampling technique to select a large reference population of euthyroid patients from the laboratory database to establish age-based thyroid-stimulating hormone (TSH) reference intervals.…”
Background
Serum thyroid-stimulating hormone (TSH) reference intervals are dependent on population characteristics, including prevalent thyroid disease and iodine status. Studies in the US have demonstrated increasing TSH levels with age, and the American Thyroid Association recommends higher TSH goals for older patients taking thyroid supplementation, but few laboratories offer age-specific reference intervals for TSH. Our objective was to establish TSH reference ranges in our racially diverse population in northern California.
Methods
Data mining of electronic medical records was used with the a posteriori approach to select a euthyroid reference population for TSH reference intervals. A report gathered all TSH results from 2 weeks from >1 year in the past, excluding results from patients with thyroid-related disease or medication use at any time before or after the TSH test.
Results
The reference population numbered 33038 and consisted of approximately 44% of the total TSH results reported in the selected time periods. The population identified as 46.5% white, 18.3% Asian, 17.0% Hispanic/Latino, 8.0% black/African American, and 10.3% other or unknown. These data demonstrate an increase in the median and 97.5 percentile of TSH levels with increasing age in adults. No clinically significant difference was seen between female and male individuals or between the self-identified races, except for lower TSH levels in the black/African American population.
Conclusions
The a posteriori approach using data mining for disease-specific criteria proved to be an efficient method for obtaining a large healthy reference population. Age-specific TSH reference ranges could prevent inappropriate diagnoses of subclinical hypothyroidism in older patients.
“…Some studies have used repeat testing as an assumption of illness and, therefore, an exclusion criterion (11,12), whereas Kouri et al (1) used discharge diagnoses related to the analyte of interest to exclude patients. Poole et al (13) recently used unsupervised computer learning to identify ICD9 codes associated with extreme laboratory values and then used these codes to exclude individuals from the reference population. Here, we demonstrate how data mining coupled with disease-specific exclusion criteria was used in the a posteriori sampling technique to select a large reference population of euthyroid patients from the laboratory database to establish age-based thyroid-stimulating hormone (TSH) reference intervals.…”
Background
Serum thyroid-stimulating hormone (TSH) reference intervals are dependent on population characteristics, including prevalent thyroid disease and iodine status. Studies in the US have demonstrated increasing TSH levels with age, and the American Thyroid Association recommends higher TSH goals for older patients taking thyroid supplementation, but few laboratories offer age-specific reference intervals for TSH. Our objective was to establish TSH reference ranges in our racially diverse population in northern California.
Methods
Data mining of electronic medical records was used with the a posteriori approach to select a euthyroid reference population for TSH reference intervals. A report gathered all TSH results from 2 weeks from >1 year in the past, excluding results from patients with thyroid-related disease or medication use at any time before or after the TSH test.
Results
The reference population numbered 33038 and consisted of approximately 44% of the total TSH results reported in the selected time periods. The population identified as 46.5% white, 18.3% Asian, 17.0% Hispanic/Latino, 8.0% black/African American, and 10.3% other or unknown. These data demonstrate an increase in the median and 97.5 percentile of TSH levels with increasing age in adults. No clinically significant difference was seen between female and male individuals or between the self-identified races, except for lower TSH levels in the black/African American population.
Conclusions
The a posteriori approach using data mining for disease-specific criteria proved to be an efficient method for obtaining a large healthy reference population. Age-specific TSH reference ranges could prevent inappropriate diagnoses of subclinical hypothyroidism in older patients.
“…This may simply involve excluding values beyond an arbitrary limit, such as those more than 10 times the upper reference limit 13 or involve a statistical test, such as that of Tukey, or others. [14][15][16] Another data pre-processing step that may be used is the exclusion of data from particular referral sites where there is a high likelihood that the patients have significant disease, such as intensive care units and oncology departments. 13,17 It may also be appropriate to exclude data from additional referral sites depending on the analyte of interest, for example lipid and renal clinics.…”
Section: Data Pre-processingmentioning
confidence: 99%
“…This was done by the Laboratory Mining for Individualized Threshold (LIMIT) study, which used an unsupervised machine learning algorithm to identify diagnostic codes that were significantly associated with outlier results for the analyte of interest. 14 The 'learning' component of the algorithm involved setting values for 4 parameters (one of which, for instance, governed the sensitivity to outlier detection). These values were set using data for serum sodium because of its well-established reference interval.…”
Section: Subjects With Disease Excluded From the Extracted Datamentioning
Reference intervals are relied upon by clinicians when interpreting their patients’ test results. Therefore, laboratorians directly contribute to patient care when they report accurate reference intervals. The traditional approach to establishing reference intervals is to perform a study on healthy volunteers. However, the practical aspects of the staff time and cost required to perform these studies make this approach difficult for clinical laboratories to routinely use. Indirect methods for deriving reference intervals, which utilise patient results stored in the laboratory’s database, provide an alternative approach that is quick and inexpensive to perform. Additionally, because large amounts of patient data can be used, the approach can provide more detailed reference interval information when multiple partitions are required, such as with different age-groups.
However, if the indirect approach is to be used to derive accurate reference intervals, several considerations need to be addressed. The laboratorian must assess whether the assay and patient population were stable over the study period, whether data ‘clean-up’ steps should be used prior to data analysis and, often, how the distribution of values from healthy individuals should be modelled. The assumptions and potential pitfalls of the particular indirect technique chosen for data analysis also need to be considered. A comprehensive understanding of all aspects of the indirect approach to establishing reference intervals allows the laboratorian to harness the power of the data stored in their laboratory database and ensure the reference intervals they report are accurate.
“…In this case, the very notion of reference limits would change, and ML, by leveraging and improving other statistical approaches, could help limit the misinterpretation of values outside of reference limits or of apparently normal data but also diagnostic for some conditions (e.g. [25]). Furthermore, some envision an ML-based clinical decision support that, by predicting correlated test results and enhancing the diagnostic value of multianalyte sets of test results, could help to reduce redundant laboratory testing [26] and, hence, lower healthcare costs, which are estimated to total $5 billion yearly in the United States alone [27].…”
Section: Review Of Machine Learning In Medicinementioning
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.