An Ensemble-Based Malware Detection Model Using Minimum Feature Set

  • Ivan Zelinka Technical University of Ostrava, Czech Republic
  • Eslam Amer
Keywords: malware detection, machine learning, ensemble learning


Current commercial antivirus detection engines still rely on signature-based methods. However, with the huge increase in the number of new malware, current detection methods become not suitable. In this paper, we introduce a malware detection model based on ensemble learning. The model is trained using the minimum number of signification features that are extracted from the file header. Evaluations show that the ensemble models slightly outperform individual classification models. Experimental evaluations show that our model can predict unseen malware with an accuracy rate of 0.998 and with a false positive rate of 0.002. The paper also includes a comparison between the performance of the proposed model and with different machine learning techniques. We are emphasizing the use of machine learning based approaches to replace conventional signature-based methods.


Kumar, A., Kuppusamy, K. S., and Aghila, G. 2017. A learning model to detect maliciousness of portable executable using integrated feature set. Journal of King Saud University-Computer and Information Sciences 31, 2, pp. 252–265.

Bahador, M. B., Abadi, M., and Tajoddin, A. 2019. HLMD: a signature-based approach to hardware-level behavioral malware detection and classification. The Journal of Supercomputing 75, 5551–5582.

Ndibanje, B. et al. 2019. Cross-Method-Based Analysis and Classification of Malicious Behavior by API Calls Extraction. Applied Sciences 9, 2, 239. DOI: 10.3390/app9020239

Alazab, M., Venkatraman, S., and Watters, P. 2009. Effective digital forensic analysis of the NTFS disk image. Ubiquitous Computing and Communication Journal 4, 1, pp. 551–558.

Smith, M. et al. 2018. Dynamic Analysis of Executables to Detect and Characterize Malware. 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE. DOI: 10.1109/ICMLA.2018.00011

Yousefi-Azar, M. et al. 2018. Malytics: a malware detection scheme. IEEE Access 6, pp. 49418–49431.

Rhode, M., Burnap, P., and Jones, K. 2018. Early-stage malware prediction using recurrent neural networks. Computers & Security 77, pp. 578–594.

Sayadi, H., Patel, N., Sasan, A., Rafatirad, S., and Homayoun, H. 2018. Ensemble learning for effective run-time hardware-based malware detection: A comprehensive analysis and classification. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). ACM. DOI: 10.1145/3195970.3196047

Ucci, D., Aniello, L., and Baldoni, R. 2019. Survey of machine learning techniques for malware analysis. Computers & Security 81, pp. 123–147.

Song, J. et al. 2017. Practical in-depth analysis of ids alerts for tracing and identifying potential attackers on darknet. Sustainability 9, 2, pp. 1–18.

Kolosnjaji, B. et al. 2016. Deep learning for classification of malware system call sequences. Australasian Joint Conference on Artificial Intelligence. Springer, Cham, pp. 137–149.

Gandotra, E., Bansal, D., and Sofat, S. 2014. Malware analysis and classification: A survey. Journal of Information Security 5, pp. 56–64. DOI: 10.4236/jis.2014.52006

Burnap, P. et al. 2018. Malware classification using self organising feature maps and machine activity data. Computers & Security 73, pp. 399–410.

Qiao, Y. et al. 2014. CBM: free, automatic malware analysis framework using API call sequences. Knowledge engineering and management. Springer, Berlin, Heidelberg, pp. 225–236.

Luo, X. et al. 2016. An incremental-and-static-combined scheme for matrix-factorization-based collaborative filtering. IEEE transactions on automation science and engineering 13, 1, pp. 333–343.

Zeng, N. et al. 2014. Image-based quantitative analysis of gold immunochromatographic strip via cellular neural network approach. IEEE transactions on medical imaging 33, 5, pp. 1129–1136.

Ranveer, S. and Hiray, S. 2015. Comparative analysis of feature extraction methods of malware detection. International Journal of Computer Applications 120, 5, pp. 1–7.

Saxe, J., and Berlin, K. 2015. Deep neural network based malware detection using two dimensional binary program features. 10th International Conference on Malicious and Unwanted Software (MALWARE). IEEE, DOI: 10.1109/MALWARE.2015.7413680

Grosse, K. et al. 2017. Adversarial examples for malware detection. European Symposium on Research in Computer Security. Springer, Cham, pp. 62–79.

Tian, R. et al. 2010. Differentiating malware from cleanware using behavioural analysis. 5th international conference on malicious and unwanted software. IEEE. DOI: 10.1109/MALWARE.2010.5665796

Damodaran, A. et al. 2017. A comparison of static, dynamic, and hybrid analysis for malware detection. Journal of Computer Virology and Hacking Techniques 13, 1, pp. 1–12.

Fang, Y., Yu, B., Tang, Y., Liu, L., Lu, Z., Wang, Y., and Yang, Q. 2017. A new malware classification approach based on malware dynamic analysis. In Australasian Conference on Information Security and Privacy. Springer, Cham, pp. 173–189.

Zhang, Y., Huang, Q., Ma, X., Yang, Z., and Jiang, J. 2016. Using multi-features and ensemble learning method for imbalanced malware classification. In 2016 IEEE Trustcom/BigDataSE/ISPA. IEEE, pp. 965–973.

Guyon, I. and Elisseeff, A. 2003. An introduction to variable and feature selection. Journal of machine learning research 3, pp. 1157–1182.

Nguyen, T.-T., Huang, J. Z., and Nguyen, T. T. 2015. Unbiased feature selection in learning random forests for high-dimensional data. The Scientific World Journal 2015. DOI: 10.1155/2015/471371

Genuer, R., Poggi, J.-M., and Tuleau-Malot, C. 2010. Variable selection using random forests. Pattern Recognition Letters 31, 14, pp. 2225–2236.

Al-Azani, S. and El-Alfy, E.-S. 2017. Using word embedding and ensemble learning for highly imbalanced data sentiment analysis in short arabic text. Procedia Computer Science 109, pp. 359–366.

McCarthy, R. V., McCarthy, M. M., Ceccucci, W., and Halawi, L. 2019. Predictive Models Using Decision Trees. In Applying Predictive Analytics. Springer, Cham, pp. 123–144.

How to Cite
ZelinkaI. and AmerE. 2019. An Ensemble-Based Malware Detection Model Using Minimum Feature Set. MENDEL. 25, 2 (Dec. 2019), 1-10. DOI: