The paper put forward a machine learning detection method that based on the actually used Permissions Combination and API calls.
Current Android system has not any restrictions to the number of permissions that an application can request, developers tend to apply more than actually needed permissions in order to ensure the successful running of the application, which results in the abuse of permissions.
Some traditional detection methods only consider the requested permissions and ignore whether it is actually used, which lead to incorrect identification of some malwares.
We present a machine learning detection method which is based on the actually used permission combinations and API calls.
The framework contains mainly four parts:
The authors collected a total of 2375 Android applications. the 1170 malware samples are composed of 23 families from genetic engineering. 1205 benign samples are from Google officail market.
We evaluate the classification performance of five different algorithms in terms of feature sets that have been extracted from applications, including API calls, permissions combination, the combination of actually used permissions combination and API calls, requested permissions. Inaddition, information gain and CFS feature selection algorithms are used to select the useful features to improve the efficiency of classifiers.
From the feature extraction, there is some differences between requested permissions and actually used permissions, it is imporant to improve the efficiency:
The experiments show that the feature of actually used permissions combination an API calls can achieve better performance:
The main idea of the paper is useing actually uesd permissions instead of declared permissons. But PScout can’t get the whole mapping of permissons and API calls. This can make some errors.
Why not evaluate the performance of classifiers obtained when using the combination of declared permissions combination and API calls?