Chen Gao scite author profile

Summary Short text similarity plays an important role in natural language processing (NLP). It has been applied in many fields. Due to the lack of sufficient context in the short text, it is difficult to measure the similarity. The use of semantics similarity to calculate textual similarity has attracted the attention of academia and industry and achieved better results. In this survey, we have conducted a comprehensive and systematic analysis of semantic similarity. We first propose three categories of semantic similarity: corpus‐based, knowledge‐based, and deep learning (DL)‐based. We analyze the pros and cons of representative and novel algorithms in each category. Our analysis also includes the applications of these similarity measurement methods in other areas of NLP. We then evaluate state‐of‐the‐art DL methods on four common datasets, which proved that DL‐based can better solve the challenges of the short text similarity, such as sparsity and complexity. Especially, bidirectional encoder representations from transformer model can fully employ scarce information of short texts and semantic information and obtain higher accuracy and F1 value. We finally put forward some future directions.

show abstract

Data and knowledge-driven named entity recognition for cyber security

Gao

Zhang

Liu

2021

Cybersecur

View full text Add to dashboard Cite

Named Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of datasets, but this data-driven method usually lacks the ability to deal with rare entities. Gasmi et al. proposed a deep learning method for named entity recognition in the field of cyber security, and achieved good results, reaching an F1 value of 82.8%. But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge, this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition. In addition, based on the data-driven deep learning model, an attention mechanism is adopted to enrich the local features of the text, better models the context, and improves the recognition effect of complex entities. Experimental results show that our method is better than the baseline model. Our model is more effective in identifying cyber security entities. The Precision, Recall and F1 value reached 90.19%, 86.60% and 88.36% respectively.

show abstract

The characteristics and dynamics of management controls in IJVs: Evidence from a Sino-Japanese case

Tang

Okano

et al. 2013

Management Accounting Research

View full text Add to dashboard Cite

Numerical Control Machine Tool Fault Diagnosis Using Hybrid Stationary Subspace Analysis and Least Squares Support Vector Machine with a Single Sensor

et al. 2017

View full text Add to dashboard Cite

Tool fault diagnosis in numerical control (NC) machines plays a significant role in ensuring manufacturing quality. However, current methods of tool fault diagnosis lack accuracy. Therefore, in the present paper, a fault diagnosis method was proposed based on stationary subspace analysis (SSA) and least squares support vector machine (LS-SVM) using only a single sensor. First, SSA was used to extract stationary and non-stationary sources from multi-dimensional signals without the need for independency and without prior information of the source signals, after the dimensionality of the vibration signal observed by a single sensor was expanded by phase space reconstruction technique. Subsequently, 10 dimensionless parameters in the time-frequency domain for non-stationary sources were calculated to generate samples to train the LS-SVM. Finally, the measured vibration signals from tools of an unknown state and their non-stationary sources were separated by SSA to serve as test samples for the trained SVM. The experimental validation demonstrated that the proposed method has better diagnosis accuracy than three previous methods based on LS-SVM alone, Principal component analysis and LS-SVM or on SSA and Linear discriminant analysis.

show abstract

A Novel Hot Topic Detection Framework With Integration of Image and Short Text Information From Twitter

Zhang

et al. 2019

IEEE Access

View full text Add to dashboard Cite

Twitter exhibits several characteristics, including a limited number of features and noisy text information. Extracting valuable information from Twitter has made hot topic detection a challenging task. In this paper, a novel four-stage framework is proposed to improve the performance of topic detection. Data preprocessing is the first stage. Deep learning is then exploited to enrich short text information via image understanding. Next, improved latent Dirichlet allocation is used to optimize the image effective word pairs, which improves the accuracy of the extracted topic words. Finally, both short text and images are integrated for topic detection, in which the corresponding topics are mined based on fuzzy matching of topic words. A large number of experiments show that the proposed framework significantly improves the performance of topic detection and outperforms the selected baseline methods.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Chen Gao

A survey on the techniques, applications, and performance of short text semantic similarity

Data and knowledge-driven named entity recognition for cyber security

The characteristics and dynamics of management controls in IJVs: Evidence from a Sino-Japanese case

Numerical Control Machine Tool Fault Diagnosis Using Hybrid Stationary Subspace Analysis and Least Squares Support Vector Machine with a Single Sensor

A Novel Hot Topic Detection Framework With Integration of Image and Short Text Information From Twitter

Contact Info

Product

Resources

About