CTF-All-In-One

8.15 EMULATOR vs REAL PHONE: Android Malware Detection Using Machine Learning

paper

What is your take-away message from this paper

The authors present an investigation of machine learning based malware detection using dynamic analysis on real devices.

What are motivations for this work

malware

The rapid increase in malware numbers targeting Android devices has highlighted the need for efficient detection mechanisms to detect zero-day malware.

anti-emulator techniques

Sophisticated Android malware employ detection avoidance techniques in order to hide their malicious activities from analysis tools. These include a wide range of anti-emulator techniques, where the malware programs attempt to hide their malicious activities by detecting the emulator.

What is the proposed solution

Hence, we have designed and imple- mented a python-based tool to enable dynamic analysis using real phones to automatically extract dynamic features and potentially mitigate anti-emulation detection. Further- more, in order to validate this approach, we undertake a comparative analysis of emulator vs device based detection by means of several machine learning algorithms. We examine the performance of these algorithms in both environments after investigating the effectiveness of obtaining the run-time features within both environments.

phone based dynamic analysis and feature extraction

Since our aim is to perform experiments to compare emulator based detection with device based detection we need to extract features for the supervised learning fromboth environments. For the emulator based learning, we utilized the DynaLog dynamic analysis framework.

img

Features extraction

After using DynaLog, the outputs are pre-procesed into a file of feature vectors representing the features extracted from each application. Then use InfoGain feature ranking algorithm in WEKA to get the top 100 ranked features.

Machine learning classifiers

The features were divided into file different sets to compare the performance using machine learning algorithms.

What is the work’s evaluation of the proposed solution

Dataset

The dataset used for the experiments consists of a total of 2444 Android applications. Of these, 1222 were malware samples obtained from 49 families of the Android malware genome project. The rest were 1222 benign samples obtained from Intel Security (McAfee Labs).

Machine learning algorithms

The following algorithms were used in the experiments:

Metrics

Five metrics were used for the performance emulation of the detection approaches.

Experiment 1: Emulator vs Device analysis and feature extraction

img

Experiment 2: Emulator vs Device Machine learning detection comparison

img

img

Our experiments showed that several features were extractedmore effectively fromthe phone than the emulator using the same dataset. Furthermore, 23.8% more apps were fully analyzed on the phone compared to emulator.

This shows that for more efficient analysis the phone is definitely a better environment as far more apps crash when being analysed on the emulator.

The results of our phone-based analysis obtained up to 0.926 F-measure and 93.1%TPR and 92%FPR with the RandomForest classifier and in general, phone-based results were better than emulator based results.

Thus we conclude that as an in- centive to reduce the impact of malware anti-emulation and environmental shortcomings of emulators which affect analysis efficiency, it is important to develop more effective ma- chine learning device based detection solutions.

What is your analysis of the identified problem, idea and evaluation

Countermeasures against anti-emulator are becoming increasingly important in Android malware detection.

What are the contributions

What are future directions for this research

Hence future work will aim to investigate more effective, larger scale device based machine learning solutions using larger sample datasets. Future work could also investigate alternative set of dynamic features to those utilized in this study.

What questions are you left with