An unsupervised learning method to identify reference intervals from a clinical database

Poole, Sarah; Schroeder, Lee F.; Shah, Nigam H.

doi:10.1016/j.jbi.2015.12.010

Cited by 45 publications

(25 citation statements)

References 39 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Some studies have used repeat testing as an assumption of illness and, therefore, an exclusion criterion (11,12), whereas Kouri et al (1) used discharge diagnoses related to the analyte of interest to exclude patients. Poole et al (13) recently used unsupervised computer learning to identify ICD9 codes associated with extreme laboratory values and then used these codes to exclude individuals from the reference population. Here, we demonstrate how data mining coupled with disease-specific exclusion criteria was used in the a posteriori sampling technique to select a large reference population of euthyroid patients from the laboratory database to establish age-based thyroid-stimulating hormone (TSH) reference intervals.…”

mentioning

confidence: 99%

Reference Intervals Generated by Electronic Medical Record Data Mining with Clinical Exclusions: Age-Specific Intervals for Thyroid-Stimulating Hormone from 33038 Euthyroid Patients

Drees

Huang

Petrie

et al. 2018

The Journal of Applied Laboratory Medicine

View full text Add to dashboard Cite

Background Serum thyroid-stimulating hormone (TSH) reference intervals are dependent on population characteristics, including prevalent thyroid disease and iodine status. Studies in the US have demonstrated increasing TSH levels with age, and the American Thyroid Association recommends higher TSH goals for older patients taking thyroid supplementation, but few laboratories offer age-specific reference intervals for TSH. Our objective was to establish TSH reference ranges in our racially diverse population in northern California. Methods Data mining of electronic medical records was used with the a posteriori approach to select a euthyroid reference population for TSH reference intervals. A report gathered all TSH results from 2 weeks from >1 year in the past, excluding results from patients with thyroid-related disease or medication use at any time before or after the TSH test. Results The reference population numbered 33038 and consisted of approximately 44% of the total TSH results reported in the selected time periods. The population identified as 46.5% white, 18.3% Asian, 17.0% Hispanic/Latino, 8.0% black/African American, and 10.3% other or unknown. These data demonstrate an increase in the median and 97.5 percentile of TSH levels with increasing age in adults. No clinically significant difference was seen between female and male individuals or between the self-identified races, except for lower TSH levels in the black/African American population. Conclusions The a posteriori approach using data mining for disease-specific criteria proved to be an efficient method for obtaining a large healthy reference population. Age-specific TSH reference ranges could prevent inappropriate diagnoses of subclinical hypothyroidism in older patients.

show abstract

mentioning

confidence: 99%

Reference Intervals Generated by Electronic Medical Record Data Mining with Clinical Exclusions: Age-Specific Intervals for Thyroid-Stimulating Hormone from 33038 Euthyroid Patients

Drees

Huang

Petrie

et al. 2018

The Journal of Applied Laboratory Medicine

View full text Add to dashboard Cite

show abstract

“…This may simply involve excluding values beyond an arbitrary limit, such as those more than 10 times the upper reference limit 13 or involve a statistical test, such as that of Tukey, or others. [14][15][16] Another data pre-processing step that may be used is the exclusion of data from particular referral sites where there is a high likelihood that the patients have significant disease, such as intensive care units and oncology departments. 13,17 It may also be appropriate to exclude data from additional referral sites depending on the analyte of interest, for example lipid and renal clinics.…”

Section: Data Pre-processingmentioning

confidence: 99%

“…This was done by the Laboratory Mining for Individualized Threshold (LIMIT) study, which used an unsupervised machine learning algorithm to identify diagnostic codes that were significantly associated with outlier results for the analyte of interest. 14 The 'learning' component of the algorithm involved setting values for 4 parameters (one of which, for instance, governed the sensitivity to outlier detection). These values were set using data for serum sodium because of its well-established reference interval.…”

Section: Subjects With Disease Excluded From the Extracted Datamentioning

confidence: 99%

Indirect Reference Intervals: Harnessing the Power of Stored Laboratory Data

2019

CBR

View full text Add to dashboard Cite

Reference intervals are relied upon by clinicians when interpreting their patients’ test results. Therefore, laboratorians directly contribute to patient care when they report accurate reference intervals. The traditional approach to establishing reference intervals is to perform a study on healthy volunteers. However, the practical aspects of the staff time and cost required to perform these studies make this approach difficult for clinical laboratories to routinely use. Indirect methods for deriving reference intervals, which utilise patient results stored in the laboratory’s database, provide an alternative approach that is quick and inexpensive to perform. Additionally, because large amounts of patient data can be used, the approach can provide more detailed reference interval information when multiple partitions are required, such as with different age-groups. However, if the indirect approach is to be used to derive accurate reference intervals, several considerations need to be addressed. The laboratorian must assess whether the assay and patient population were stable over the study period, whether data ‘clean-up’ steps should be used prior to data analysis and, often, how the distribution of values from healthy individuals should be modelled. The assumptions and potential pitfalls of the particular indirect technique chosen for data analysis also need to be considered. A comprehensive understanding of all aspects of the indirect approach to establishing reference intervals allows the laboratorian to harness the power of the data stored in their laboratory database and ensure the reference intervals they report are accurate.

show abstract

“…In this case, the very notion of reference limits would change, and ML, by leveraging and improving other statistical approaches, could help limit the misinterpretation of values outside of reference limits or of apparently normal data but also diagnostic for some conditions (e.g. [25]). Furthermore, some envision an ML-based clinical decision support that, by predicting correlated test results and enhancing the diagnostic value of multianalyte sets of test results, could help to reduce redundant laboratory testing [26] and, hence, lower healthcare costs, which are estimated to total $5 billion yearly in the United States alone [27].…”

Section: Review Of Machine Learning In Medicinementioning

confidence: 99%