Human syntactic processing shows many signs of taking place within a general-purpose short-term memory. But this kind of memory is known to have a severely constrained storage capacity—possibly constrained to as few as three or four distinct elements. This article describes a model of syntactic processing that operates successfully within these severe constraints, by recognizing constituents in a right-corner transformed representation (a variant of left-corner parsing) and mapping this representation to random variables in a Hierarchic Hidden Markov Model, a factored time-series model which probabilistically models the contents of a bounded memory store over time. Evaluations of the coverage of this model on a large syntactically annotated corpus of English sentences, and the accuracy of a a bounded-memory parsing strategy based on this model, suggest this model may be cognitively plausible.
Background Social risk factors are important dimensions of health and are linked to access to care, quality of life, health outcomes and life expectancy. However, in the Electronic Health Record, data related to many social risk factors are primarily recorded in free-text clinical notes, rather than as more readily computable structured data, and hence cannot currently be easily incorporated into automated assessments of health. In this paper, we present Moonstone , a new, highly configurable rule-based clinical natural language processing system designed to automatically extract information that requires inferencing from clinical notes. Our initial use case for the tool is focused on the automatic extraction of social risk factor information — in this case, housing situation , living alone , and social support — from clinical notes. Nursing notes, social work notes, emergency room physician notes, primary care notes, hospital admission notes, and discharge summaries, all derived from the Veterans Health Administration, were used for algorithm development and evaluation. Results An evaluation of Moonstone demonstrated that the system is highly accurate in extracting and classifying the three variables of interest ( housing situation , living alone , and social support ). The system achieved positive predictive value (i.e. precision) scores ranging from 0.66 ( homeless/marginally housed ) to 0.98 ( lives at home/not homeless ), accuracy scores ranging from 0.63 ( lives in facility ) to 0.95 ( lives alone ), and sensitivity (i.e. recall) scores ranging from 0.75 ( lives in facility ) to 0.97 ( lives alone ). Conclusions The Moonstone system is — to the best of our knowledge — the first freely available, open source natural language processing system designed to extract social risk factors from clinical text with good ( lives in facility ) to excellent ( lives alone ) performance. Although developed with the social risk factor identification task in mind, Moonstone provides a powerful tool to address a range of clinical natural language processing tasks, especially those tasks that require nuanced linguistic processing in conjunction with inference capabilities.
Opinion mining aims at extracting useful subjective information from reliable amounts of text. Opinion mining holder recognition is a task that has not been considered yet in Arabic Language. This task essentially requires deep understanding of clauses structures. Unfortunately, the lack of a robust, publicly available, Arabic parser further complicates the research. This paper presents a leading research for the opinion holder extraction in Arabic news independent from any lexical parsers. We investigate constructing a comprehensive feature set to compensate the lack of parsing structural outcomes. The proposed feature set is tuned from English previous works coupled with our proposed semantic field and named entities features. Our feature analysis is based on Conditional Random Fields (CRF) and semi-supervised pattern recognition techniques. Different research models are evaluated via cross-validation experiments achieving 54.03 F-measure. We publicly release our own research outcome corpus and lexicon for opinion mining community to encourage further research.
The aim of this study was to estimate the contribution by type 2 diabetic patients in a low-income country from their own income in attempting to control the disease and to prevent chronic diabetes complications through good glycaemic control. Socio-economic and demographic data for study subjects were obtained from 822 adult diabetic patients attending public or private diabetic clinics in Khartoum State, Sudan. The average annual income of diabetic patients was estimated as USD 1.923. The direct cost of diabetes control was USD 175 per year. This included cost of drugs and ambulatory care, although drug supply was insufficient for 52% of the patients. Glycosylated hemoglobin as a measure of glycaemic control was determined to be unsatisfactory in 77% of patients. This gives an immediate indication that current practices in diabetes control in urban Sudan are not cost-effective. Patients attending private clinics had a higher income and cost of diabetes control than those attending public clinics. However, both groups had similar proportion of poor glycaemic control, which reflects the insufficient care given to diabetic patients, mainly due to deficient resources and inefficient utilization of what is scarcely available.
Objective: To design and assess a method to leverage individuals' temporal data for predicting their healthcare cost. To achieve this goal, we first used patients' temporal data in their fine-grain form as opposed to coarse-grain form. Second, we devised novel spike detection features to extract temporal patterns that improve the performance of cost prediction. Third, we evaluated the effectiveness of different types of temporal features based on cost information, visit information and medical information for the prediction task. Materials and methods: We used three years of medical and pharmacy claims data from 2013 to 2016 from a healthcare insurer, where the first two years were used to build the model to predict the costs in the third year. To prepare the data for modeling and prediction, the time series data of cost, visit and medical information were extracted in the form of fine-grain features (i.e., segmenting each time series into a sequence of consecutive windows and representing each window by various statistics such as sum). Then, temporal patterns of the time series were extracted and added to fine-grain features using a novel set of spike detection features (i.e., the fluctuation of data points). Gradient Boosting was applied on the final set of extracted features. Moreover, the contribution of each type of data (i.e., cost, visit and medical) was assessed. We benchmarked the proposed predictors against extant methods including those that used coarse-grain features which represent each time series with various statistics such as sum and the most recent portion of the values in the entire series. All prediction performances were measured in terms of Mean Absolute Percentage Error (MAPE). Results: Gradient Boosting applied on fine-grain predictors outperformed coarse-grain predictors with a MAPE of 3.02 versus 8.14 (p<0.01). Enhancing the fine-grain features with the temporal pattern extraction features (i.e., spike detection features) further improved the MAPE to 2.04 (p<0.01). Removing cost, visit and medical status data resulted in MAPEs of 10.24, 2.22 and 2.07 respectively (p<0.01 for the first two comparisons and p=0.63 for the third comparison). Conclusions: Leveraging fine-grain temporal patterns for healthcare cost prediction significantly improves prediction performance. Enhancing fine-grain features with extraction of temporal cost and visit patterns significantly improved the performance. However, medical features did not have a significant effect on prediction performance. Gradient Boosting outperformed all other prediction models.
BackgroundThe aim of this study was to propose an analytical approach to develop high-performing predictive models for congestive heart failure (CHF) readmission using an operational dataset with incomplete records and changing data over time.MethodsOur analytical approach involves three steps: pre-processing, systematic model development, and risk factor analysis. For pre-processing, variables that were absent in >50% of records were removed. Moreover, the dataset was divided into a validation dataset and derivation datasets which were separated into three temporal subsets based on changes to the data over time. For systematic model development, using the different temporal datasets and the remaining explanatory variables, the models were developed by combining the use of various (i) statistical analyses to explore the relationships between the validation and the derivation datasets; (ii) adjustment methods for handling missing values; (iii) classifiers; (iv) feature selection methods; and (iv) discretization methods. We then selected the best derivation dataset and the models with the highest predictive performance. For risk factor analysis, factors in the highest-performing predictive models were analyzed and ranked using (i) statistical analyses of the best derivation dataset, (ii) feature rankers, and (iii) a newly developed algorithm to categorize risk factors as being strong, regular, or weak.ResultsThe analysis dataset consisted of 2,787 CHF hospitalizations at University of Utah Health Care from January 2003 to June 2013. In this study, we used the complete-case analysis and mean-based imputation adjustment methods; the wrapper subset feature selection method; and four ranking strategies based on information gain, gain ratio, symmetrical uncertainty, and wrapper subset feature evaluators. The best-performing models resulted from the use of a complete-case analysis derivation dataset combined with the Class-Attribute Contingency Coefficient discretization method and a voting classifier which averaged the results of multi-nominal logistic regression and voting feature intervals classifiers. Of 42 final model risk factors, discharge disposition, discretized age, and indicators of anemia were the most significant. This model achieved a c-statistic of 86.8%.ConclusionThe proposed three-step analytical approach enhanced predictive model performance for CHF readmissions. It could potentially be leveraged to improve predictive model performance in other areas of clinical medicine.
The 2015 Clinical TempEval Challenge addressed the problem of temporal reasoning in the clinical domain by providing an annotated corpus of pathology and clinical notes related to colon cancer patients. The challenge consisted of six subtasks: TIMEX3 and event span detection, TIMEX3 and event attribute classification, document relation time and narrative container relation classification. Our BluLab team participated in all six subtasks. For the TIMEX3 and event subtasks, we developed a ClearTK support vector machine pipeline using mainly simple lexical features along with information from rule-based systems. For the relation subtasks, we employed a conditional random fields classification approach, with input from a rule-based system for the narrative container relation subtask. Our team ranked first for all TIMEX3 and event subtasks, as well as for the document relation subtask.
Objectives: Finding recent clinical studies that warrant changes in clinical practice (“high impact” clinical studies) in a timely manner is very challenging. We investigated a machine learning approach to find recent studies with high clinical impact to support clinical decision making and literature surveillance. Methods: To identify recent studies, we developed our classification model using time-agnostic features that are available as soon as an article is indexed in PubMed®, such as journal impact factor, author count, and study sample size. Using a gold standard of 541 high impact treatment studies referenced in 11 disease management guidelines, we tested the following null hypotheses: 1) the high impact classifier with time-agnostic features (HI-TA) performs equivalently to PubMed’s Best Match sort and a MeSH-based Naïve Bayes classifier; and 2) HI-TA performs equivalently to the high impact classifier with both time-agnostic and time-sensitive features (HI-TS) enabled in a previous study. The primary outcome for both hypotheses was mean top 20 precision. Results: The differences in mean top 20 precision between HI-TA and three baselines (PubMed’s Best Match, a MeSH-based Naïve Bayes classifier, and HI-TS) were not statistically significant (12% vs. 3%, p=0.101; 12% vs. 11%, p=0.720; 12% vs. 25%, p=0.094, respectively). Recall of HI-TA was low (7%). Conclusion: HI-TA had equivalent performance to state-of-the-art approaches that depend on time-sensitive features. With the advantage of relying only on time-agnostic features, the proposed approach can be used as an adjunct to help clinicians identify recent high impact clinical studies to support clinical decision-making. However, low recall limits the use of HI-TA for literature surveillance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.