2018 IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS) 2018
DOI: 10.1109/indis.2018.00004
|View full text |Cite
|
Sign up to set email alerts
|

Flowzilla: A Methodology for Detecting Data Transfer Anomalies in Research Networks

Abstract: Research networks are designed to support high volume scientific data transfers that span multiple network links. Like any other network, research networks experience anomalies. Anomalies are deviations from profiles of normality in a research network's traffic levels. Diagnosing anomalies is critical both for network operators and users (e.g., scientists). In this paper we present Flowzilla, a general framework for detecting and quantifying anomalies on scientific data transfers of arbitrary size. Flowzilla i… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
4
1
1

Relationship

1
5

Authors

Journals

citations
Cited by 7 publications
(2 citation statements)
references
References 16 publications
0
2
0
Order By: Relevance
“…Therefore we also consider two additional cases -a large number of small file transfers, and a small number of large file transfers. This simulates the different distributions that have been observed on a real NERSC DTN [21]. In each case, the time between file transfers has been chosen such that the total expected data transferred is roughly 750 GB per day.…”
Section: Methodsmentioning
confidence: 99%
“…Therefore we also consider two additional cases -a large number of small file transfers, and a small number of large file transfers. This simulates the different distributions that have been observed on a real NERSC DTN [21]. In each case, the time between file transfers has been chosen such that the total expected data transferred is roughly 750 GB per day.…”
Section: Methodsmentioning
confidence: 99%
“…Machine Learning (ML) methods have been developed for a number of networking tasks for science data flows, for example, detecting flow anomalies [6] and classifying elephant and mice flows [4]. In particular, ML methods are developed to estimate the connection RTT and loss rate under deterministic periodic losses in [15] for 10Gbps emulated connections with 0-366ms RTT; these connections represent local, cross country, continental and round the earth distances.…”
Section: Introductionmentioning
confidence: 99%