An integrated framework for optimizing automatic monitoring systems in large IT infrastructures

Tang, Liang; Li, Tao; Shwartz, Larisa; Pinel, Florian; Grabarnik, Genady Ya.

doi:10.1145/2487575.2488209

Cited by 25 publications

(5 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…LogSig [26] is a data mining based log parsing method that has been demonstrated in [47]. It parses logs through a threestep process: (1) word pair generation, (2) log clustering, and (3) log template generation.…”

Section: B Log Parsing Methods For Servers (Supercomputers) Distributed Systems and Applicationsmentioning

confidence: 99%

Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks

et al. 2020

View full text Add to dashboard Cite

Syslog parsing is of vital importance for the detection, diagnosis and prediction of network device failures in a datacenter. A common approach to syslog parsing is to extract templates from historical syslogs, after which syslogs are matched to these templates. To address the problems in the existing syslog parsing techniques, we propose a novel framework, Craftsman, which identifies frequent combinations of (syslog) words and then applies them as templates. Craftsman empirically extracts templates accurately, is extremely efficient in template matching, and naturally supports incremental learning. To compare the performance of Craftsman and three other template learning techniques designed for network devices, we experiment them on two-years' worth of syslogs collected from network devices deployed across 10+ datacenters of a tier-one service provider. The experiments demonstrate that Craftsman achieves a close-toone accuracy (as measured by rand index), and improves the computational efficiency by 6.88 to 10.25 times in template matching, and by 730 to 6847 times in syslog parsing. It also improves the accuracy (as measured by F1 measure) of failure prediction by 13.07% to 188%. In addition, we demonstrate Craftsman's superior generality by comparing it with three widely-applied log parsing methods over five large log datasets collected from servers, distributed systems and applications.INDEX TERMS Syslog parsing, network device, prefix tree, datacenter network, frequent pattern.

show abstract

Section: B Log Parsing Methods For Servers (Supercomputers) Distributed Systems and Applicationsmentioning

confidence: 99%

Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks

et al. 2020

View full text Add to dashboard Cite

show abstract

“…In a number of real-world applications, such as healthcare and network security, it is crucial to reduce the Bayes risk (BR), i.e., the expected misclassification loss. For example, a typical case is to distinguish between the false positive errors and the false negative errors and treat them differently [55], [54].…”

Section: Related Workmentioning

confidence: 99%

MAS-Encryption and its Applications in Privacy-Preserving Classifiers

Gao

Xia

et al. 2022

IEEE Trans. Knowl. Data Eng.

View full text Add to dashboard Cite

Homomorphic encryption (HE) schemes, such as fully homomorphic encryption (FHE), support a number of useful computations on ciphertext in a broad range of applications, such as e-voting, private information retrieval, cloud security, and privacy protection. While FHE schemes do not require any interaction during computation, the key limitations are large ciphertext expansion and inefficiency. Thus, to overcome these limitations, we develop a novel cryptographic tool, MAS-Encryption (MASE), to support real-value input and secure computation on the multiply-add structure. The multiply-add structures exist in many important protocols, such as classifiers and outsourced protocols, and we will explain how MASE can be used to protect the privacy of these protocols, using two case study examples. Specifically, the first case study example is the privacy-preserving Naive Bayes classifier that can achieve minimal Bayes risk, and the other example is the privacy-preserving support vector machine. We prove that the constructed classifiers are secure and evaluate their performance using real-world datasets. Experiments show that our proposed MASE scheme and MASE based classifiers are efficient, in the sense that we achieve an optimal tradeoff between computation efficiency and communication interactions. Thus, we avoid the inefficiency of FHE based paradigm.

show abstract

“…Alternately, some research efforts, such as those in [37,38,39], have noted the importance of ticket correlation for incident resolution, claiming that the latter can be extended with advanced functions to enhance the incident resolution process, as the information in the tickets is related to incidents generated by events that have already been identified as network failures, and as such, some related alerts should exist. Other efforts, such as those in [40,41,42], use ITSs for several purposes, such as studying and characterizing the nature and causes of routing changes and the observed instability. In these references, the authors use simple ticket preprocessing operations to reduce the total number of tickets before correlating them.…”

Section: Related Workmentioning

confidence: 99%

Fusing information from tickets and alerts to improve the incident resolution process

2019

View full text Add to dashboard Cite

In the context of network incident monitoring, alerts are useful notifications that provide IT management staff with information about incidents. They are usually triggered in an automatic manner by network equipment and monitoring systems, thus containing only technical information available to the systems that are generating them. On the other hand, ticketing systems play a different role in this context. Tickets represent the business point of view of incidents. They are usually generated by human intervention and contain enriched semantic information about ongoing and past incidents. In this article, our main hypothesis is that incorporating tickets information into the alert correlation process will be beneficial to the incident resolution life-cycle in terms of accuracy, timing, and overall incident's description. We propose a methodology to validate this hypothesis and suggest a solution to the main challenges that appear. The proposed correlation approach is based on the time alignment of the events (alerts and tickets) that affect common elements in the network. For this we use real alert and ticket datasets obtained from a large telecommunications network. The results have shown that using ticket information enhances the incident resolution process, mainly by reducing and aggregating a higher percentage of alerts compared with standard alert correlation systems that only use alerts as the main source of information. Finally, we also show the applicability and usability of this model by applying it to a case study where we analyze the performance of the management staff.

show abstract

An integrated framework for optimizing automatic monitoring systems in large IT infrastructures

Cited by 25 publications

References 29 publications

Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks

Efficient and Robust Syslog Parsing for Network Devices in Datacenter Networks

MAS-Encryption and its Applications in Privacy-Preserving Classifiers

Fusing information from tickets and alerts to improve the incident resolution process

Contact Info

Product

Resources

About