BackgroundElectronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive.ObjectiveOur objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data.MethodsWe developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms.ResultsWe evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs.ConclusionsOur results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.
Background
Quantifying associations between genetic mutations and loss of ambulation (LoA) among males diagnosed with childhood‐onset dystrophinopathy is important for understanding variation in disease progression and may be useful in clinical trial design.
Methods
Genetic and clinical data from the Muscular Dystrophy Surveillance, Tracking, and Research Network for 358 males born and diagnosed from 1982 to 2011 were analyzed. LoA was defined as the age at which independent ambulation ceased. Genetic mutations were defined by overall type (deletion/duplication/point mutation) and among deletions, those amenable to exon‐skipping therapy (exons 8, 20, 44–46, 51–53) and another group. Cox proportional hazards regression modeling was used to estimate hazard ratios (HRs) and 95% confidence intervals (CIs).
Results
Mutation type did not predict time to LoA. Controlling for corticosteroids, Exons 8 (HR = 0.22; 95% CI = 0.08, 0.63) and 44 (HR = 0.30; 95% CI = 0.12, 0.78) were associated with delayed LoA compared to other exon deletions.
Conclusions
Delayed LoA in males with mutations amenable to exon‐skipping therapy is consistent with previous studies. These findings suggest that clinical trials including exon 8 and 44 skippable males should consider mutation information prior to randomization.
The diagnosis of Duchenne and Becker muscular dystrophy (DBMD) is made by genetic testing in approximately 95% of cases. Although specific mutations can be associated with skeletal muscle phenotype, pulmonary and cardiac comorbidities (leading causes of death in Duchenne) have not been associated with Duchenne muscular dystrophy mutation type or location and vary within families. Therefore, identifying predictors for phenotype severity beyond frameshift prediction is important clinically. We performed a systematic review assessing research related to genotype–phenotype correlations in DBMD. While there are severity differences across the spectrum and within mild and severe forms of DBMD, few protective or exacerbating mutations within the dystrophin gene were reported. Except for intellectual disability, clinical test results reporting genotypic information are insufficient for clinical prediction of severity and comorbidities and the predictive validity is too low to be useful when advising families. Including expanded information coupled with proposed severity predictions in clinical genetic reports for DBMD is critical for improving anticipatory guidance.
Key Clinical MessageWe present a patient with a clinical diagnosis of Joubert syndrome with COACH phenotype who carries two TMEM67 variants of uncertain significance (VUS). One VUS can be reclassified as “likely pathogenic” by adding clinical data. As genetic testing becomes more accessible, more VUS will require clinical correlation for accurate classification.
Documentation in medical records during clinical evaluations for FAS is lower than optimal for cross-provider communication and surveillance purposes. Lack of documentation limits the quality and quantity of information in records that serve as a major source of data for public health surveillance systems.
This study demonstrates a significant difference in the distribution of lesion level of spina bifida patients born in the postfortification era, based on neurologic function. Further research with a larger sample size is needed to determine if this observation holds true nationally.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.