Label ranking aims to learn a mapping from instances to rankings over a finite number of predefined labels. Random forest is a powerful and one of the most successful general-purpose machine learning algorithms of modern times. In this paper, we present a powerful random forest label ranking method which uses random decision trees to retrieve nearest neighbors. We have developed a novel two-step rank aggregation strategy to effectively aggregate neighboring rankings discovered by the random forest into a final predicted ranking. Compared with existing methods, the new random forest method has many advantages including its intrinsically scalable tree data structure, highly parallel-able computational architecture and much superior performance. We present extensive experimental results to demonstrate that our new method achieves the highly competitive performance compared with state-of-the-art methods for datasets with complete ranking and datasets with only partial ranking information.
Clinical Named Entity Recognition (CNER) aims to identify and classify clinical terms such as diseases, symptoms, treatments, exams, and body parts in electronic health records, which is a fundamental and crucial task for clinical and translational research. In recent years, deep neural networks have achieved significant success in named entity recognition and many other Natural Language Processing (NLP) tasks. Most of these algorithms are trained end to end, and can automatically learn features from large scale labeled datasets. However, these data-driven methods typically lack the capability of processing rare or unseen entities. Previous statistical methods and feature engineering practice have demonstrated that human knowledge can provide valuable information for handling rare and unseen cases. In this paper, we address the problem by incorporating dictionaries into deep neural networks for the Chinese CNER task. Two different architectures that extend the Bi-directional Long Short-Term Memory (Bi-LSTM) neural network and five different feature representation schemes are proposed to handle the task. Computational results on the CCKS-2017 Task 2 benchmark dataset show that the proposed method achieves the highly competitive performance compared with the state-of-the-art deep learning methods.
The evaluation of classifiers' performances plays a critical role in construction and selection of classification model. Although many performance metrics have been proposed in machine learning community, no general guidelines are available among practitioners regarding which metric to be selected for evaluating a classifier's performance. In this paper, we attempt to provide practitioners with a strategy on selecting performance metrics for classifier evaluation. Firstly, the authors investigate seven widely used performance metrics, namely classification accuracy, F-measure, kappa statistic, root mean square error, mean absolute error, the area under the receiver operating curve, and the area under the precision-recall curve. Secondly, the authors resort to using Pearson linear correlation and Spearman rank correlation to analyses the potential relationship among these seven metrics. Experimental results show that these commonly used metrics can be divided into three groups, and all metrics within a given group are highly correlated but less correlated with metrics from different groups.
Entity and relation extraction is the necessary step in structuring medical text. However, the feature extraction ability of the bidirectional long short term memory network in the existing model does not achieve the best effect. At the same time, the language model has achieved excellent results in more and more natural language processing tasks. In this paper, we present a focused attention model for the joint entity and relation extraction task. Our model integrates well-known BERT language model into joint learning through dynamic range attention mechanism, thus improving the feature representation ability of shared parameter layer. Experimental results on coronary angiography texts collected from Shuguang Hospital show that the F1-scores of named entity recognition and relation classification tasks reach 96.89% and 88.51%, which outperform state-of-the-art methods by 1.65% and 1.22%, respectively.
Critical node problems (CNPs) involve finding a set of critical nodes from a graph whose removal results in optimizing a predefined measure over the residual graph. As useful models for a variety of practical applications, these problems are computationally challenging. In this paper, we study the classic CNP and introduce an effective memetic algorithm for solving CNP. The proposed algorithm combines a double backbone-based crossover operator (to generate promising offspring solutions), a component-based neighborhood search procedure (to find high-quality local optima), and a rank-based pool updating strategy (to guarantee a healthy population). Extensive evaluations on 42 synthetic and real-world benchmark instances show that the proposed algorithm discovers 24 new upper bounds and matches 15 previous best-known bounds. We also demonstrate the relevance of our algorithm for effectively solving a variant of the classic CNP, called the cardinality-constrained CNP. Finally, we investigate the usefulness of each key algorithmic component.
The relationship between abnormal peripheral nerve electrophysiology and abnormal cardiovascular autonomic function has been studied in four groups of diabetic subjects, comparable with regard to age, duration, and type of diabetes. Thirty-three had no symptoms of neuropathy, 28 had newly developed painful neuropathy, 24 had chronic painful neuropathy, and 21 had painless neuropathy with associated recurrent foot ulcers. In all three symptomatic groups, electrophysiology and autonomic function were more abnormal than in asymptomatic diabetic subjects. There was a significant overall relationship between peripheral nerve (electrophysiologic) and autonomic (cardiovascular reflex) dysfunction. However, when considered by groups, the degree of cardiovascular reflex abnormality was similar in the three symptomatic groups, whereas electrophysiology was appreciably worse in the foot ulcer group than in patients with painful neuropathy. Thus, patients with painful neuropathy had a higher ratio of autonomic (small fiber) abnormality to electrophysiologic (large fiber) abnormality. By contrast, foot ulceration was associated with the worst electrophysiologic (large fiber) abnormality. Heavier alcohol consumption and more severe retinopathy were also related to foot ulceration. In diabetic subjects with symmetrical sensory neuropathy, the relationship between large fiber and small fiber damage is not uniform. We conclude that there may be different etiologic influences on large and small fiber neuropathy in diabetic subjects and that the predominant type of fiber damage may determine the form of the presenting clinical syndrome.
As a usual model for a variety of practical applications, the maximum diversity problem (MDP) is computational challenging. In this paper, we present an opposition-based memetic algorithm (OBMA) for solving MDP, which integrates the concept of opposition-based learning (OBL) into the wellknown memetic search framework. OBMA explores both candidate solutions and their opposite solutions during its initialization and evolution processes. Combined with a powerful local optimization procedure and a rank-based quality-and-distance pool updating strategy, OBMA establishes a suitable balance between exploration and exploitation of its search process. Computational results on 80 popular MDP benchmark instances show that the proposed algorithm matches the best-known solutions for most of instances, and finds improved best solutions (new lower bounds) for 22 instances. We provide experimental evidences to highlight the beneficial effect of opposition-based learning for solving MDP.
BackgroundElectronic health records (EHRs) provide possibilities to improve patient care and facilitate clinical research. However, there are many challenges faced by the applications of EHRs, such as temporality, high dimensionality, sparseness, noise, random error and systematic bias. In particular, temporal information is difficult to effectively use by traditional machine learning methods while the sequential information of EHRs is very useful.MethodIn this paper, we propose a general-purpose patient representation learning approach to summarize sequential EHRs. Specifically, a recurrent neural network based denoising autoencoder (RNN-DAE) is employed to encode inhospital records of each patient into a low dimensional dense vector.ResultsBased on EHR data collected from Shuguang Hospital affiliated to Shanghai University of Traditional Chinese Medicine, we experimentally evaluate our proposed RNN-DAE method on both mortality prediction task and comorbidity prediction task. Extensive experimental results show that our proposed RNN-DAE method outperforms existing methods. In addition, we apply the “Deep Feature” represented by our proposed RNN-DAE method to track similar patients with t-SNE, which also achieves some interesting observations.ConclusionWe propose an effective unsupervised RNN-DAE method to summarize patient sequential information in EHR data. Our proposed RNN-DAE method is useful on both mortality prediction task and comorbidity prediction task.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.