Are Your Training Datasets Yet Relevant?

Allix, Kevin; Bissyandé, Tegawendé F.; Klein, Jacques; Traon, Yves Le

doi:10.1007/978-3-319-15618-7_5

Cited by 54 publications

(50 citation statements)

References 29 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…These results are particularly notable since previous work has demonstrated that machine learning-based Android malware detection was unable to obtain an F1 score higher than 70% in a time-aware scenario [56]. In that work, dates newer in time resulted in lower F1 scores; however, RevealDroid actually improves to as high as 99%.…”

Section: A31 Rq1: Detection Accuracymentioning

confidence: 72%

“…We grouped apps into two-year time periods, due to the fact that some years only have a few apps, mainly 2009 with 29 apps, and 2017 with 130 apps. Similar to [55], we consider the year of the last modified date of classes.dex in an app as the year from which it originates. We consider any transformed app as belonging to the same year as its original version, in order to determine the actual effect of obfuscation on product accuracy for each time period.…”

Section: Rq3 Time-aware Analysismentioning

confidence: 99%

“…In a time-agnostic scenario, training and testing as part of machine learning is conducted without considering the age of apps in the dataset. This scenario has been utilized to evaluate an overwhelming majority of machine learning-based Android malware-detection approaches [56]. A time-aware scenario uses the modification date of apps to determine training and testing sets, which avoids training on apps from the future to test on apps from the past.…”

Section: A31 Rq1: Detection Accuracymentioning

confidence: 99%

“…for precision, recall, and the F1 score; and the Total number of apps. To evaluate RevealDroid in a time-aware scenario, we followed the methodology described by Allix et al [56]. Specifically, we extracted the modification date of the classes.dex file in each app's APK file.…”

Section: A31 Rq1: Detection Accuracymentioning

confidence: 99%

See 3 more Smart Citations

Self-protection of Android systems from inter-component communication attacks

Hammad

Garcia

Malek

2018

Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

View full text Add to dashboard Cite

Section: A31 Rq1: Detection Accuracymentioning

confidence: 72%

Section: Rq3 Time-aware Analysismentioning

confidence: 99%

Section: A31 Rq1: Detection Accuracymentioning

confidence: 99%

Section: A31 Rq1: Detection Accuracymentioning

confidence: 99%

See 2 more Smart Citations

Self-protection of Android systems from inter-component communication attacks

Hammad

Garcia

Malek

2018

Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering

View full text Add to dashboard Cite

“…In a time-agnostic scenario, training and testing as part of machine learning is conducted without considering the age of apps in the dataset. This scenario has been utilized to evaluate an overwhelming majority of machine-learning-based Android malware-detection approaches [13]. A time-aware scenario uses the modification date of apps to determine training and testing sets, which avoids training on apps from the future to test on apps from the past.…”

Section: Rq1: Detection Accuracymentioning

confidence: 99%

Untitled

2018

TOSEM

View full text Add to dashboard Cite

The number of malicious Android apps is increasing rapidly. Android malware can damage or alter other files or settings, install additional applications, and so on. To determine such behaviors, a security analyst can significantly benefit from identifying the family to which an Android malware belongs rather than only detecting if an app is malicious. Techniques for detecting Android malware, and determining their families, lack the ability to handle certain obfuscations that aim to thwart detection. Moreover, some prior techniques face scalability issues, preventing them from detecting malware in a timely manner. To address these challenges, we present a novel machine-learning-based Android malware detection and family identification approach, RevealDroid, that operates without the need to perform complex program analyses or to extract large sets of features. Specifically, our selected features leverage categorized Android API usage, reflection-based features, and features from native binaries of apps. We assess RevealDroid for accuracy, efficiency, and obfuscation resilience using a large dataset consisting of more than 54,000 malicious and benign apps. Our experiments show that RevealDroid achieves an accuracy of 98% in detection of malware and an accuracy of 95% in determination of their families. We further demonstrate RevealDroid's superiority against state-of-the-art approaches. CCS Concepts: • Security and privacy → Software security engineering; • Software and its engineering → Software reliability;

show abstract

Attacking Speaker Recognition Systems with Phoneme Morphing

Turner

Lovisotto

Martinovic

2019

Lecture Notes in Computer Science

View full text Add to dashboard Cite

As voice interfaces become more widely available they increasingly implement speaker recognition, to provide both personalized functionalities and security via authentication. In this paper, we present a method that transforms the voice of one person so that it resembles the voice of a victim, such that it can be used to deceive speaker recognition systems into believing an utterance was spoken by the victim. The transformation only requires short pieces of audio recordings from the source and victim voices, and does not require specific words to be spoken by the victim. We show that the attack can be improved by using a population of source voices and we provide a metric to identify promising source voices, from within such a population. We evaluate our attack along a set of dimensions, including: varying quantity, quality and types of known victim audio, verification and identification systems, white-and black-box models and both over-the-wire and over-the-air access. We test the audio transformation on two different proprietary models: (i) the Azure Speaker Recognition API and (ii) the Siri voice activation of an Apple iPhone, showing that individuals can easily be impersonated by obtaining as little as one minute of their audio, even when such audio is recorded in noisy conditions. With attempts from only three source voices, our attack achieves success rates of over 40% in the weakest assumption scenario against the Azure Verification API and rates of over 80% in all scenarios against Siri.

show abstract

Are Your Training Datasets Yet Relevant?

Cited by 54 publications

References 29 publications

Self-protection of Android systems from inter-component communication attacks

Self-protection of Android systems from inter-component communication attacks

Untitled

Attacking Speaker Recognition Systems with Phoneme Morphing

Contact Info

Product

Resources

About