WEAC: Word embeddings for anomaly classification from event logs

Pande, Amit; Ahuja, Vishal

doi:10.1109/bigdata.2017.8258034

Cited by 13 publications

(12 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…The applied anomaly detection techniques based on one-class SVM and k-medoids cluster dissimilarity are probably for the first time combined with the GloVe representation, applied to text data, and compared to their counterparts using the bag of words representation. Prior work using word2vec for anomaly detection (Bertero et al 2017;Pande and Ahuja 2017;Bakarov et al 2018) did not combine it with clustering-based detection methods and did not include comparisons with bag of words. 3.…”

Section: Noveltymentioning

confidence: 99%

“…This work revisits the less popular but more easily and widely applicable idea of applying general-purpose algorithms to text transformed to a vector representation (Manevitz and Yousef 2002). There is already some evidence that recent developments in word embeddings make this path more useful (Bertero et al 2017;Pande and Ahuja 2017;Bakarov et al 2018).…”

Section: Unsupervised Anomaly Detectionmentioning

confidence: 99%

“…This produces word vectors (vector representations of words), but a related doc2vec algorithm can be used to obtain document vectors as well (Le and Mikolov 2014). There are some recent encouraging results with anomaly detection using the word2vec representation (Bertero et al 2017;Pande and Ahuja 2017;Bakarov et al 2018). A related embedding-based data representation was also proposed for anomaly detection in a series of categorical events (Chen et al 2016).…”

Section: Global Vectorsmentioning

confidence: 99%

“…Besides the most common bag of words text representation, a recent more refined Global Vectors (GloVe) representation (Pennington, Socher, and Manning 2014) based on word embeddings is employed which makes it easy to control the dimensionality and apply arbitrary general-purpose classification and clustering algorithms. While text representations based on word embeddings are becoming popular (Goldberg and Levy 2014; Lau and Baldwin 2016; Bakarov 2018), there have been only few demonstrations of their utility for anomaly detection (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018), using word2vec (Mikolov et al 2013a) rather than GloVe embeddings.The applied anomaly detection techniques based on one-class SVM and k -medoids cluster dissimilarity are probably for the first time combined with the GloVe representation, applied to text data, and compared to their counterparts using the bag of words representation. Prior work using word2vec for anomaly detection (Bertero et al 2017; Pande and Ahuja 2017; Bakarov et al 2018) did not combine it with clustering-based detection methods and did not include comparisons with bag of words.Unlike in most prior work on anomaly detection using word embeddings (Bertero et al 2017; Bakarov et al 2018), large datasets are used (tens of thousands rather than hundreds) to better match the scale of realistic applications.Unlike in most prior work on clustering-based anomaly detection (He et al 2003; Al-Zoubi 2009; Gao 2009; Amer and Goldstein 2012), the cluster dissimilarity approach is applied as a modeling algorithm, that is, with an anomaly detection model created on the training set and applicable to new data.New cluster dissimilarity-based anomaly score definitions are proposed that may be promising alternatives to those known from the literature (He et al 2003; Amer and Goldstein 2012).…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Cichosz

2020

Nat. Lang. Eng.

View full text Add to dashboard Cite

Anomaly detection can be seen as an unsupervised learning task in which a predictive model created on historical data is used to detect outlying instances in new data. This work addresses possibly promising but relatively uncommon application of anomaly detection to text data. Two English-language and one Polish-language Internet discussion forums devoted to psychoactive substances received from home-grown plants, such as hashish or marijuana, serve as text sources that are both realistic and possibly interesting on their own, due to potential associations with drug-related crime. The utility of two different vector text representations is examined: the simple bag of words representation and a more refined Global Vectors (GloVe) representation, which is an example of the increasingly popular word embedding approach. They are both combined with two unsupervised anomaly detection methods, based on one-class support vector machines (SVM) and based on dissimilarity to k-medoids clusters. The GloVe representation is found definitely more useful for anomaly detection, permitting better detection quality and ameliorating the curse of dimensionality issues with text clustering. The cluster dissimilarity approach combined with this representation outperforms one-class SVM with respect to detection quality and appears a more promising approach to anomaly detection in text data.

show abstract

Section: Noveltymentioning

confidence: 99%

Section: Unsupervised Anomaly Detectionmentioning

confidence: 99%

Section: Global Vectorsmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Cichosz

2020

Nat. Lang. Eng.

View full text Add to dashboard Cite

show abstract

“…3. Перед применением алгоритма классификации временные окна каждого журнала проходят процесс приведения к нормальному виду, построение векторного представления [16] и присвоения каждому слову весовых коэффициентов TF-IDF [17]. 4.…”

Section: использование журналов спо схд для диагностики неисправностейunclassified

Application of Ontological Modelling Methods and Text Classification Algorithms for Storage System Faults Detection

Uspenskij¹

2020

Izvestiya of SSC RAS

View full text Add to dashboard Cite

This paper describes application of diagnostic model, created with ontological modelling methods and machine learning text classifi cation algorithms, for fault detection, based on system log messages data, in enterprise-level storage system. Proposed fault detection model uses external procedures for the description ofthe relations between parameters and states of storage systems, based on the implementation of the machine learning algorithms. As an example of such relation, author describes application of the text classifi cation method for the task of software log analysis.

show abstract

Anomaly Classification with Unknown, Imbalanced and Few Labeled Log Data

2022

AI and Machine Learning for Network and Security Management

View full text Add to dashboard Cite

WEAC: Word embeddings for anomaly classification from event logs

Cited by 13 publications

References 12 publications

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Unsupervised modeling anomaly detection in discussion forums posts using global vectors for text representation

Application of Ontological Modelling Methods and Text Classification Algorithms for Storage System Faults Detection

Anomaly Classification with Unknown, Imbalanced and Few Labeled Log Data

Contact Info

Product

Resources

About