Ting-Fang Yen scite author profile

Abstract-RecentWe address the problem of detecting early-stage infection in an enterprise setting by proposing a new framework based on belief propagation inspired from graph theory. Belief propagation can be used either with "seeds" of compromised hosts or malicious domains (provided by the enterprise security operation center -SOC) or without any seeds. In the latter case we develop a detector of C&C communication particularly tailored to enterprises which can detect a stealthy compromise of only a single host communicating with the C&C server.We demonstrate that our techniques perform well on detecting enterprise infections. We achieve high accuracy with low false detection and false negative rates on two months of anonymized DNS logs released by Los Alamos National Lab (LANL), which include APT infection attacks simulated by LANL domain experts. We also apply our algorithms to 38TB of real-world web proxy logs collected at the border of a large enterprise. Through careful manual investigation in collaboration with the enterprise SOC, we show that our techniques identified hundreds of malicious domains overlooked by state-of-the-art security products.

show abstract

Traffic Aggregation for Malware Detection

Yen

Reiter

122

View full text Add to dashboard Cite

Abstract. Stealthy malware, such as botnets and spyware, are hard to detect because their activities are subtle and do not disrupt the network, in contrast to DoS attacks and aggressive worms. Stealthy malware, however, does communicate to exfiltrate data to the attacker, to receive the attacker's commands, or to carry out those commands. Moreover, since malware rarely infiltrates only a single host in a large enterprise, these communications should emerge from multiple hosts within coarse temporal proximity to one another. In this paper, we describe a system called TĀMD (pronounced "tamed") with which an enterprise can identify candidate groups of infected computers within its network. TĀMD accomplishes this by finding new communication "aggregates" involving multiple internal hosts, i.e., communication flows that share common characteristics. We describe characteristics for defining aggregates-including flows that communicate with the same external network, that share similar payload, and/or that involve internal hosts with similar software platforms-and justify their use in finding infected hosts. We also detail efficient algorithms employed by TĀMD for identifying such aggregates, and demonstrate a particular configuration of TĀMD that identifies new infections for multiple bot and spyware examples, within traces of traffic recorded at the edge of a university network. This is achieved even when the number of infected hosts comprise only about 0.0097% of all internal hosts in the network.

show abstract

An Epidemiological Study of Malware Encounters in a Large Enterprise

Yen¹,

Heorhiadi

Oprea³

et al. 2014

View full text Add to dashboard Cite

We present an epidemiological study of malware encounters in a large, multi-national enterprise. Our data sets allow us to observe or infer not only malware presence on enterprise computers, but also malware entry points, network locations of the computers (i.e., inside the enterprise network or outside) when the malware were encountered, and for some web-based malware encounters, web activities that gave rise to them. By coupling this data with demographic information for each host's primary user, such as his or her job title and level in the management hierarchy, we are able to paint a reasonably comprehensive picture of malware encounters for this enterprise. We use this analysis to build a logistic regression model for inferring the risk of hosts encountering malware; those ranked highly by our model have a > 3× higher rate of encountering malware than the base rate. We also discuss where our study confirms or refutes other studies and guidance that our results suggest.

show abstract

Are Your Hosts Trading or Plotting? Telling P2P File-Sharing and Bots Apart

Yen

Reiter

2010

View full text Add to dashboard Cite

Abstract-Peer-to-peer (P2P) substrates are now widely used for both file-sharing and botnet command-andcontrol. Despite the commonality of their substrates, we show that the different goals and circumstances of these applications give rise to behaviors that can be distinguished in network flow records. Using features related to traffic volume, persistence of network connections, amount of "churn" among peers, and differences between humandriven and machine-driven traffic, we develop a technique for identifying P2P bots (the Plotters) and, in particular, separating them from file-sharing hosts (the Traders). Evaluations performed on traffic recorded at the edge of a university network show that we can achieve, e.g., 87.50% detection of Storm bots with a 0.81% false positive rate. We also demonstrate the significant extent to which Plotter behaviors would need to change to evade our techniques.

show abstract

Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications

Yen

Huang

Monrose

et al. 2009

View full text Add to dashboard Cite

Abstract. We demonstrate that the browser implementation used at a host can be passively identified with significant precision and recall, using only coarse summaries of web traffic to and from that host. Our techniques utilize connection records containing only the source and destination addresses and ports, packet and byte counts, and the start and end times of each connection. We additionally provide two applications of browser identification. First, we show how to extend a network intrusion detection system to detect a broader range of malware. Second, we demonstrate the consequences of web browser identification to the deanonymization of web sites in flow records that have been anonymized.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Ting-Fang Yen

Detection of Early-Stage Enterprise Infection by Mining Large-Scale Log Data

Traffic Aggregation for Malware Detection

An Epidemiological Study of Malware Encounters in a Large Enterprise

Are Your Hosts Trading or Plotting? Telling P2P File-Sharing and Bots Apart

Browser Fingerprinting from Coarse Traffic Summaries: Techniques and Implications

Contact Info

Product

Resources

About