Biomedical named entity recognition (BNER) is the basis of biomedical text mining and one of the core sub-tasks of information extraction. Previous BNER models based on conventional machine learning rely on time-consuming feature engineering. Though most neural network methods improve the problems with automatic learning, they cannot pay attention to the significant areas when capturing features. In this paper, we propose an attention-based BiLSTM-CRF model. First, this model adopts a bidirectional long short-term memory network (BiLSTM) to obtain more complete context information. At the same time, the attention mechanism is proposed to improve the vector representations in BiLSTM. We design different attention weight redistribution methods and fuse them. It effectively prevents the significant information loss when extracting features. Finally, combining BiLSTM with conditional random field (CRF) layer effectively solves the problems of the inability to handle the strong dependence of tags in the sequence. With the simple architecture, our model achieves a reasonable performance on the JNLPBA corpus. It obtains an F1-score of 73.50. Our model can enhance the ability of the neural network to extract significant information and does not rely on any feature engineering, with only general pre-training word vectors. It makes our model have high portability and extendibility.INDEX TERMS Biomedical text, named entity recognition, attention mechanism, long short-term memory, conditional random field.
In recent years, separating effective target signals from mixed signals has become a hot and challenging topic in signal research. The SI-BSS (Blind source separation (BSS) based on swarm intelligence (SI) algorithm) has become an effective method for the linear mixture BSS. However, the SI-BSS has the problem of incomplete separation, as not all the signal sources can be separated. An improved algorithm for BSS with SI based on signal cross-correlation (SI-XBSS) is proposed in this paper. Our method created a candidate separation pool that contains more separated signals than the traditional SI-BSS does; it identified the final separated signals by the value of the minimum cross-correlation in the pool. Compared with the traditional SI-BSS, the SI-XBSS was applied in six SI algorithms (Particle Swarm Optimization (PSO), Genetic Algorithm (GA), Differential Evolution (DE), Sine Cosine Algorithm (SCA), Butterfly Optimization Algorithm (BOA), and Crow Search Algorithm (CSA)). The results showed that the SI-XBSS could effectively achieve a higher separation success rate, which was over 35% higher than traditional SI-BSS on average. Moreover, SI-SDR increased by 14.72 on average.
As the biomedical literature increases exponentially, biomedical named entity recognition (BNER) has become an important task in biomedical information extraction. In the previous studies based on deep learning, pretrained word embedding becomes an indispensable part of the neural network models, effectively improving their performance. However, the biomedical literature typically contains numerous polysemous and ambiguous words. Using fixed pretrained word representations is not appropriate. Therefore, this paper adopts the pretrained embeddings from language models (ELMo) to generate dynamic word embeddings according to context. In addition, in order to avoid the problem of insufficient training data in specific fields and introduce richer input representations, we propose a multitask learning multichannel bidirectional gated recurrent unit (BiGRU) model. Multiple feature representations (e.g., word-level, contextualized word-level, character-level) are, respectively, or collectively fed into the different channels. Manual participation and feature engineering can be avoided through automatic capturing features in BiGRU. In merge layer, multiple methods are designed to integrate the outputs of multichannel BiGRU. We combine BiGRU with the conditional random field (CRF) to address labels’ dependence in sequence labeling. Moreover, we introduce the auxiliary corpora with same entity types for the main corpora to be evaluated in multitask learning framework, then train our model on these separate corpora and share parameters with each other. Our model obtains promising results on the JNLPBA and NCBI-disease corpora, with F1-scores of 76.0% and 88.7%, respectively. The latter achieves the best performance among reported existing feature-based models.
With the rapid growth of Smart Grid, electricity load analysis has become the simplest and most effective way to divide user groups and understand user behavior. This paper proposes an AUD-MTS (Abnormal User Detection approach based on power load multi-step clustering with Multiple Time Scales). Firstly, we combine RBM (Restricted Boltzmann Machine) hidden feature learning with K-Means clustering to extract typical load patterns in the short-term. Secondly, time scale conversion is performed so that the analysis subject can be transformed from load pattern to user behavior. Finally, a two-step clustering in long-term is adopted to divide users from both coarse-grained and fine-grained dimensions so as to detect abnormal users referring to customized OutlierIndex. Experiments are conducted using annual 24-point power load data of American users in all states. The accuracy of clustering methods in AUD-MTS reaches 87.5% referring to the 16 commercial building types defined by the U.S. Department of Energy, which outperforms other common clustering algorithms on AMI (Advanced Metering Infrastructure). After that, the OutlierIndex score of AUD-MTS can be increased by 0.16 compared with other outlier detection algorithms, which shows that the proposed method can detect abnormal users precisely and efficiently. Furthermore, we summarized possible causes including federal holidays, climate zones and summertime that may lead to abnormal behavior changes and discussed countermeasures respectively, which accounts for 82.3% of anomalies. The rest may be potential electricity stealing users, which requires further investigation.
Plant growth is closely related to the structure of its stem. The ultrasonic echo signal of the plant stem carries much information on the stem structure, providing an effective means for analyzing stem structure characteristics. In this paper, we proposed to extract energy features of ultrasonic echo signals to study the structure of the plant stem. Firstly, it is found that there are obvious different ultrasonic energy changes in different kinds of plant stems whether in the time domain or the frequency domain. Then, we proposed a feature extraction method, density energy feature, to better depict the interspecific differences of the plant stems. In order to evaluate the extracted 24-dimensional features of the ultrasound, the information gain method and correlation evaluation method were adopted to compute their contributions. The results showed that the mean density, an improved feature, was the most significant contributing feature in the four living plant stems. Finally, the top three features in the feature contribution were selected, and each two of them composed as 2-D feature maps, which have significant differentiation of the stem species, especially for grass and wood stems. The above research shows that the ultrasonic energy features of plant stems can provide a new perspective for the study of distinguishing the structural differences among plant stem species.
The detection of water changes in plant stems by non-destructive online methods has become a hot spot in studying the physiological activity of plant water. In this paper, the ultrasonic radio-frequency echo (RFID) technique was used to detect water changes in stems. An algorithm (improved hybrid differential Akaike’s Information Criterion (AIC)) was proposed to automatically compute the position of the primary ultrasonic echo of stems, which is the key parameter of water changes in stems. This method overcame the inaccurate location of the primary echo, which was caused by the anisotropic ultrasound propagation and heterogeneous stems. First of all, the improved algorithm was analyzed and its accuracy was verified by a set of simulated signals. Then, a set of cutting samples from stems were taken for ultrasonic detection in the process of water absorption. The correlation between the moisture content of stems and ultrasonic velocities was computed with the algorithm. It was found that the average correlation coefficient of the two parameters reached about 0.98. Finally, living sunflowers with different soil moistures were subjected to ultrasonic detection from 9:00 to 18:00 in situ. The results showed that the soil moisture and the primary ultrasonic echo position had a positive correlation, especially from 12:00 to 18:00; the average coefficient was 0.92. Meanwhile, our results showed that the ultrasonic detection of sunflower stems with different soil moistures was significantly distinct. Therefore, the improved AIC algorithm provided a method to effectively compute the primary echo position of limbs to help detect water changes in stems in situ.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.