Xusheng Xiao scite author profile

The rapid increase in both the quantity and complexity of data that are being generated daily in the field of environmental science and engineering (ESE) demands accompanied advancement in data analytics. Advanced data analysis approaches, such as machine learning (ML), have become indispensable tools for revealing hidden patterns or deducing correlations for which conventional analytical methods face limitations or challenges. However, ML concepts and practices have not been widely utilized by researchers in ESE. This feature explores the potential of ML to revolutionize data analysis and modeling in the ESE field, and covers the essential knowledge needed for such applications. First, we use five examples to illustrate how ML addresses complex ESE problems. We then summarize four major types of applications of ML in ESE: making predictions; extracting feature importance; detecting anomalies; and discovering new materials or chemicals. Next, we introduce the essential knowledge required and current shortcomings in ML applications in ESE, with a focus on three important but often overlooked components when applying ML: correct model development, proper model interpretation, and sound applicability analysis. Finally, we discuss challenges and future opportunities in the application of ML tools in ESE to highlight the potential of ML in this field.

show abstract

AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context

Yang

et al. 2015

View full text Add to dashboard Cite

Mobile malware attempts to evade detection during app analysis by mimicking security-sensitive behaviors of benign apps that provide similar functionality (e.g., sending SMS messages), and suppressing their payload to reduce the chance of being observed (e.g., executing only its payload at night). Since current approaches focus their analyses on the types of securitysensitive resources being accessed (e.g., network), these evasive techniques in malware make differentiating between malicious and benign app behaviors a difficult task during app analysis. We propose that the malicious and benign behaviors within apps can be differentiated based on the contexts that trigger securitysensitive behaviors, i.e., the events and conditions that cause the security-sensitive behaviors to occur. In this work, we introduce AppContext, an approach of static program analysis that extracts the contexts of security-sensitive behaviors to assist app analysis in differentiating between malicious and benign behaviors. We implement a prototype of AppContext and evaluate AppContext on 202 malicious apps from various malware datasets, and 633 benign apps from the Google Play Store. AppContext correctly identifies 192 malicious apps with 87.7% precision and 95% recall. Our evaluation results suggest that the maliciousness of a security-sensitive behavior is more closely related to the intention of the behavior (reflected via contexts) than the type of the security-sensitive resources that the behavior accesses.

show abstract

High Fidelity Data Reduction for Big Data Security Dependency Analyses

Zhang¹,

Wu²,

Li³

et al. 2016

106

View full text Add to dashboard Cite

Inferring method specifications from natural language API descriptions

et al. 2012

View full text Add to dashboard Cite

Precise identification of problems for structural test generation

Xiao

Xie

Tillmann

et al. 2011

View full text Add to dashboard Cite

An important goal of software testing is to achieve at least high structural coverage. To reduce the manual efforts of producing such high-covering test inputs, testers or developers can employ tools built based on automated structural test-generation approaches. Although these tools can easily achieve high structural coverage for simple programs, when they are applied on complex programs in practice, these tools face various problems, such as (1) the external-methodcall problem (EMCP), where tools cannot deal with method calls to external libraries; (2) the object-creation problem (OCP), where tools fails to generate method-call sequences to produce desirable object states. Since these tools currently could not be powerful enough to deal with these problems in testing complex programs in practice, we propose cooperative developer testing, where developers provide guidance to help tools achieve higher structural coverage. To reduce the efforts of developers in providing guidance to tools, in this paper, we propose a novel approach, called Covana, which precisely identifies and reports problems that prevent the tools from achieving high structural coverage primarily by determining whether branch statements containing notcovered branches have data dependencies on problem candidates. We provide two techniques to instantiate Covana to identify EMCPs and OCPs. Finally, we conduct evaluations on two open source projects to show the effectiveness of Covana in identifying EMCPs and OCPs.

show abstract

Automated extraction of security policies from natural-language software documents

et al. 2012

View full text Add to dashboard Cite

Access Control Policies (ACP) specify which principals such as users have access to which resources. Ensuring the correctness and consistency of ACPs is crucial to prevent security vulnerabilities. However, in practice, ACPs are commonly written in Natural Language (NL) and buried in large documents such as requirements documents, not amenable for automated techniques to check for correctness and consistency. It is tedious to manually extract ACPs from these NL documents and validate NL functional requirements such as use cases against ACPs for detecting inconsistencies. To address these issues, we propose an approach, called Text2Policy, to automatically extract ACPs from NL software documents and resource-access information from NL scenario-based functional requirements. We conducted three evaluations on the collected ACP sentences from publicly available sources along with use cases from both open source and proprietary projects. The results show that Text2Policy effectively identifies ACP sentences with the precision of 88.7% and the recall of 89.4%, extracts ACP rules with the accuracy of 86.3%, and extracts action steps with the accuracy of 81.9%.

show abstract

Relation extraction for inferring access control rules from natural language artifacts

Slankas

Xiao

Williams

et al. 2014

View full text Add to dashboard Cite

With over forty years of use and refinement, access control, often in the form of access control rules (ACRs), continues to be a significant control mechanism for information security. However, ACRs are typically either buried within existing natural language (NL) artifacts or elicited from subject matter experts. To address the first situation, our research goal is to aid developers who implement ACRs by inferring ACRs from NL artifacts. To aid in rule inference, we propose an approach that extracts relations (i.e., the relationship among two or more items) from NL artifacts such as requirements documents. Unlike existing approaches, our approach combines techniques from information extraction and machine learning. We develop an iterative algorithm to discover patterns that represent ACRs in sentences. We seed this algorithm with frequently occurring nouns matching a subject-actionresource pattern throughout a document. The algorithm then searches for additional combinations of those nouns to discover additional patterns. We evaluate our approach on documents from three systems in three domains: conference management, education, and healthcare. Our evaluation results show that ACRs exist in 47% of the sentences, and our approach effectively identifies those ACR sentences with a precision of 81% and recall of 65%; our approach extracts ACRs from those identified ACR sentences with an average precision of 76% and an average recall of 49%.

show abstract

Context-sensitive delta inference for identifying workload-dependent performance bottlenecks

Xiao

Han

Zhang

et al. 2013

View full text Add to dashboard Cite

Software hangs can be caused by expensive operations in responsive actions (such as time-consuming operations in UI threads). Some of the expensive operations depend on the input workloads, referred to as workload-dependent performance bottlenecks (WDPBs). WDPBs are usually caused by workload-dependent loops (i.e., WDPB loops) that contain relatively expensive operations. Traditional performance testing and single-execution profiling may not reveal WDPBs due to incorrect assumptions of workloads. To address these issues, we propose the ∆Infer approach that predicts WDPB loops under large workloads via inferring iteration counts of WDPB loops using complexity models for the workload size. ∆Infer incorporates a novel concept named context-sensitive delta inference that consists of two parts: temporal inference for inferring the complexity models of different program locations, and spatial inference for identifying WDPB loops as WDPB candidates. We conducted evaluations on two popular open-source GUI applications, and identified impactful WDPBs that caused 10 performance bugs.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Xusheng Xiao

Machine Learning: New Ideas and Tools in Environmental Science and Engineering

AppContext: Differentiating Malicious and Benign Mobile App Behaviors Using Context

High Fidelity Data Reduction for Big Data Security Dependency Analyses

Inferring method specifications from natural language API descriptions

Precise identification of problems for structural test generation

Automated extraction of security policies from natural-language software documents

Relation extraction for inferring access control rules from natural language artifacts

Context-sensitive delta inference for identifying workload-dependent performance bottlenecks

Contact Info

Product

Resources

About