Deep Learning-Based Malware Detection Under Real-World Constraints: A Systematic Review of Class Imbalance, Concept Drift, and Interpretability

Abubakar Bello Bodinga; Ahmed Baita Garko; Nuruddeen Mahmud Ibrahim; Danlami Gabi

doi:10.63561/jca.v3i1.1206

Authors

Abubakar Bello Bodinga Department of Computer Science, Abdullahi Fodio University of Science and Technology, Kebbi State, Nigeria | Information and Communication Technology Department, Usmanu Danfodiyo University Teaching Hospital, Sokoto State, Nigeria
Ahmed Baita Garko Department of Computer Science, Federal University Dutse, Dutse Jigawa State Nigeria
Nuruddeen Mahmud Ibrahim Department of Cyber Security, Nile University of Nigeria
Danlami Gabi Department of Computer Science, Faculty of Computing, University of Technology, Malaysia, Skudai, Johor

DOI:

https://doi.org/10.63561/jca.v3i1.1206

Keywords:

Deep Learning, Malware, Real-World Constraints, Systematic Review, Class Imbalance

Abstract

Malware detection remains one of the most persistent and complex challenges in cybersecurity. The rapid evolution of attack techniques, fueled by the professionalization of cybercrime, continually outpaces traditional defenses. While deep learning has significantly enhanced detection capabilities, its real-world deployment is critically hampered by three interconnected and often overlooked challenges: extreme class imbalance, concept drift (including adversarial evolution), and the interpretability gap of black-box models. This Systematic Literature Review (SLR) synthesizes state-of-the-art research from 2014 to 2025 on malware detection across static, dynamic, and hybrid analysis methods, with a focused analysis on these three constraints. Following a PRISMA-guided methodology, this review analyzes 162 high-quality studies. It reveals that while research has progressed from foundational deep learning applications to advanced solutions like generative augmentation for imbalance, self-supervised test-time adaptation for drift, and integrated explainable AI (XAI) pipelines critical gaps persist. Our synthesis yields five key insights: (1) deep learning enhances accuracy but remains brittle under real-world data imbalance and adversarial drift; (2) current drift adaptation strategies, including recent federated and hybrid approaches, seldom holistically model adversarial intent; (3) GAN-based augmentation improves minority-class detection but lacks robust, security-focused evaluation of synthetic samples; (4) interpretability studies, despite recent integration efforts, remain fragmented and are rarely validated with human analysts to ensure actionable intelligence; and most critically, (5) no existing architecture jointly and seamlessly integrates continuous drift adaptation, dynamic imbalance correction, and operational interpretability. This review not only maps the evolution of these challenges but also crystallizes the pressing need for a unified framework. It provides the foundational justification for the proposed MAD-FIT (Malware Adaptive Detection with Fusion, Interpretation, and Training Dynamics) framework, which is designed to bridge these gaps and advance the field toward robust, adaptive, and trustworthy next-generation malware detection systems.

References

Abdallah, A., Maarof, M. A., & Zainal, A. (2020). Feature Selection and Explainable Intrusion Detection Using SHAP Values. Journal of Information Security and Applications, 55, 102596.

Ajayi, B., Barakat, B., & McGarry, K. (2025). Leveraging VAE-Derived Latent Spaces for Enhanced Malware Detection with Machine Learning Classifiers. ArXiv Preprint ArXiv:2503.20803.

Akgündoğdu, A., & Çelikbaş, Ş. (2025). Explainable deep learning framework for brain tumor detection: Integrating LIME, Grad-CAM, and SHAP for enhanced accuracy. Medical Engineering & Physics, 144, 104405. https://doi.org/https://doi.org/10.1016/j.medengphy.2025.104405

Aljurayyil, S., Al-Haj, A., & Farhat, W. (2022). Explainable deep learning for malware detection using SHAP. ACM Workshop on AI and Security, 1–10. https://doi.org/10.1145/3564292.3564294

Almajed, H., Alsaqer, A., & Frikha, M. (2025). Imbalance Datasets in Malware Detection: A Review of Current Solutions and Future Directions. International Journal of Advanced Computer Science and Applications. https://api.semanticscholar.org/CorpusID:276119764

Alshoulie, M., & Mehmood, A. (2025). Deep Learning Approaches for Malware Detection: A Comprehensive Review of Techniques, Challenges, and Future Directions. IEEE Access, 13, 118652–118677. https://doi.org/10.1109/ACCESS.2025.3582875

Alzaylaee, M. K., Yerima, S. Y., & Sezer, S. (2020). DL-Droid: Deep Learning Based Android Malware Detection Using Real Devices. Computers & Security, 89, 101663.

Aryal, K., Gupta, M., Abdelsalam, M., Kunwar, P., & Thuraisingham, B. (2025). A Survey on Adversarial Attacks for Malware Analysis. IEEE Access, 13, 428–459. https://doi.org/10.1109/ACCESS.2024.3519524

Aslan, Ö., & Samet, R. (2020). A Comprehensive Review on Malware Detection Approaches. IEEE Access, 8, 6249–6271. https://doi.org/10.1109/ACCESS.2019.2963724

Athiwaratkun, B., & Stokes, J. W. (2017). Malware Classification with LSTM and GRU Language Models and a Character-Level CNN. 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2482–2486.

Augello, A., Paola, A. De, & Re, G. Lo. (2025a). Hybrid Multilevel Detection of Mobile Devices Malware Under Concept Drift. Journal of Network and Systems Management, 33(2), 36. https://doi.org/10.1007/s10922-025-09906-3

Augello, A., Paola, A. De, & Re, G. Lo. (2025b). M2FD: Mobile malware federated detection under concept drift. Computers & Security. https://doi.org/10.1016/j.cose.2025.103999

Bayer, U., Comparetti, P. M., Hlauschek, C., Kruegel, C., & Kirda, E. (2009). Scalable, Behavior-Based Malware Clustering. Proceedings of the Network and Distributed System Security Symposium, 8–11.

Berrios, S., Leiva, D., Olivares, B., Allende-Cid, H., & Hermosilla, P. (2025). Systematic Review: Malware Detection and Classification in Cybersecurity. Applied Sciences, 15(14). https://doi.org/10.3390/app15147747

Brezinski, K., & Ferens, K. (2023). Metamorphic malware and obfuscation: a survey of techniques, variants, and generation kits. Security and Communication Networks, 2023(1), 8227751.

Buriro, A. B., Luccio, F. V., & Yaqub, M. A. B. (2025). Balancing the Scales: Using GANs and Class Balance for Superior Malware Detection. Proceedings of the 40th ACM/SIGAPP Symposium on Applied Computing, 2032–2039. https://doi.org/10.1145/3672608.3707800

Chakravarty, A. K., Raj, A., Paul, S., & Apoorva, S. (2019). A study of signature-based and behaviour-based malware detection approaches. Int. J. Adv. Res. Ideas Innov. Technol, 5(3), 1509–1511.

Choi, A., Giang, A., Jumani, S., Luong, D., & Di Troia, F. (2023). Synthetic malware using deep variational autoencoders and generative adversarial networks. EAI Endorsed Transactions on Internet of Things, 10.

Cuckoo Sandbox: Open Source Automated Malware Analysis. (2021).

Cui, Z., Xue, F., Cai, X., Cao, Y., Wang, G.-G., & Chen, J. (2020). Detection of malicious code variants based on deep learning. IEEE Transactions on Industrial Informatics, 16(2), 1436–1444.

Cybersecurity Ventures. (2023). 2023 Cybersecurity Almanac: 100 Facts, Figures, Predictions, and Statistics. https://cybersecurityventures.com/cybersecurity-almanac-2023/

F. Alshmarni, A., & A. Alliheedi, M. (2024). Enhancing Malware Detection by Integrating Machine Learning with Cuckoo Sandbox. Journal of Information Security and Cybercrimes Research, 7(1), 85–92. https://doi.org/10.26735/wzng1384

Gama, J., Medas, P., Castillo, G., & Rodrigues, P. (2004). Learning with Drift Detection. Brazilian Symposium on Artificial Intelligence, 286–295.

Gama, J., Zliobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. (2014). A Survey on Concept Drift Adaptation. ACM Computing Surveys, 46(4), 44:1-44:37. https://doi.org/https://dl.acm.org/doi/10.1145/2523813

Gibert, D., Mateu, C., & Planes, J. (2020). The rise of machine learning for detection and classification of malwGibert, D., Mateu, C., & Planes, J. (2020). The rise of machine learning for detection and classification of malware: Research developments, trends and challenges. Journal of Network and. Journal of Network and Computer Applications, 153(July 2019), 102526. https://doi.org/10.1016/j.jnca.2019.102526

Goodman, B., & Flaxman, S. (2017). European Union Regulations on Algorithmic Decision{-}Making and a ``Right to Explanation’’. AI Magazine, 38(3), 50–57.

Halbouni, A., Gunawan, T. S., Habaebi, M. H., Halbouni, M., Kartiwi, M., & Ahmad, R. (2022). Machine Learning and Deep Learning Approaches for CyberSecurity: A Review. IEEE Access, 10, 19572–19585. https://doi.org/10.1109/ACCESS.2022.3151248

He, H., & Garcia, E. A. (2009). Learning from Imbalanced Data. IEEE Transactions on Knowledge and Data Engineering, 21(9), 1263–1284.

Iadarola, G., Martinelli, F., Mercaldo, F., & Santone, A. (2021). Towards an interpretable deep learning model for mobile malware detection and family identification. Computers & Security, 105, 102198. https://doi.org/https://doi.org/10.1016/j.cose.2021.102198

Jafari, M., & Shameli-Sendi, A. (2026). Evaluating the robustness of adversarial defenses in malware detection systems. Computers and Electrical Engineering, 130, 110845. https://doi.org/10.1016/j.compeleceng.2025.110845

Joshi, C., Kumar, J., & Kumawat, G. (2025). Detection of unseen malware threats using generative adversarial networks and deep learning models. Scientific Reports, 15(1), 34804. https://doi.org/10.1038/s41598-025-18811-3

Khan, S. H., Alahmadi, T. J., Ullah, W., Iqbal, J., Rahim, A., Alkahtani, H. K., Alghamdi, W., & Almagrabi, A. O. (2023). A new deep boosted CNN and ensemble learning based IoT malware detection. Computers & Security, 133, 103385. https://doi.org/https://doi.org/10.1016/j.cose.2023.103385

Ki, Y., Kim, E., & Kim, H. K. (2015). A novel approach to detect malware based on API call sequence analysis. International Journal of Distributed Sensor Networks, 11(6), 659101.

Kim, C., Chang, S.-Y., Kim, J., Lee, D., & Kim, J. (2023). Automated, Reliable Zero-Day Malware Detection Based on Autoencoding Architecture. IEEE Transactions on Network and Service Management, 20(3), 3900–3914. https://doi.org/10.1109/TNSM.2023.3251282

Li, C., Zhiyuan, J., Yongjun, W., Tian, X., Yayuan, Z., & Yuhang, M. (2025). MiniMal: Hard-Label Adversarial Attack Against Static Malware Detection with Minimal Perturbation. In J. Kwok (Ed.), Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence, {IJCAI-25} (pp. 5589–5597). International Joint Conferences on Artificial Intelligence Organization. https://doi.org/10.24963/ijcai.2025/622

Li, J., Xue, D., Wu, W., & Wang, J. (2020). Incremental learning for malware classification in small datasets. Security and Communication Networks, 2020(1), 6309243. https://doi.org/xiang Wang First published: 20 February 2020 https://doi.org/10.1155/2020/6309243

Lopez Pinaya, W. H., Vieira, S., Garcia-Dias, R., & Mechelli, A. (2020). Chapter 11 - Autoencoders. In A. Mechelli & S. Vieira (Eds.), Machine Learning (pp. 193–208). Academic Press. https://doi.org/https://doi.org/10.1016/B978-0-12-815739-8.00011-0

Luo, X., Liu, C., Gou, G., Xiong, G., Li, Z., & Fang, B. (2024). Identifying malicious traffic under concept drift based on intraclass consistency enhanced variational autoencoder. Science China Information Sciences, 67(8), 182302. https://doi.org/10.1007/s11432-023-4010-4

Madamidola, O. A., Ngobigha, F., & Ez-zizi, A. (2025). Detecting new obfuscated malware variants: A lightweight and interpretable machine learning approach. Intelligent Systems with Applications, 25, 200472. https://doi.org/10.1016/j.iswa.2024.200472

McFadden, S., Foley, M., D’Onghia, M., Hicks, C., Mavroudis, V., Paoletti, N., & Pierazzi, F. (2025). DRMD: Deep Reinforcement Learning for Malware Detection under Concept Drift. ArXiv Preprint ArXiv:2508.18839.

Nataraj, L., Karthikeyan, S., Jacob, G., & Manjunath, B. S. (2011). Malware Images: Visualization and Automatic Classification. Proceedings of the 8th International Symposium on Visualization for Cyber Security, 1–7.

Nguyen, T., Patel, D., & Singh, R. (2020). Attention-based LSTM for malware behavior detection. USENIX Workshop on Offensive Technologies (WOOT). https://www.usenix.org/conference/woot20

Nikolopoulos, S. D., & Polenakis, I. (2015). A graph-based model for malicious code detection exploiting dependencies of system-call groups. Proceedings of the 16th International Conference on Computer Systems and Technologies, 228–235.

Ofusori, L., Bokaba, T., & Mhlongo, S. (2025). Explainability and interpretability of artificial intelligence use in cybersecurity. Discover Computing, 28(1), 241. https://doi.org/10.1007/s10791-025-09760-6

Page, M. J., McKenzie, J. E., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer, L., Tetzlaff, J. M., Akl, E. A., Brennan, S. E., Chou, R., Glanville, J., Grimshaw, J. M., Hróbjartsson, A., Lalu, M. M., Li, T., Loder, E. W., Mayo-Wilson, E., McDonald, S., … Moher, D. (2021). The PRISMA 2020 statement: An updated guideline for reporting systematic reviews. The BMJ, 372. https://doi.org/10.1136/bmj.n71

Panda, B., Bisoyi, S. S., Panigrahy, S., & Mohanty, P. (2025). Machine learning techniques for imbalanced multiclass malware classification through adaptive feature selection. PeerJ Computer Science, 11, e2752.

Park, S., & Lee, K. (2018). Time-aware RNNs for malware sequence modeling. AAAI Workshop on Artificial Intelligence for Cybersecurity. https://aaai.org/

Pascanu, R., Stokes, J. W., Sanossian, H., Marinescu, M., & Thomas, A. (2015). Malware Classification with Recurrent Networks. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1916–1920.

Patsakis, C., Arroyo, D., & Casino, F. (2025). The Malware as a Service Ecosystem. In D. Gritzalis, K.-K. R. Choo, & C. Patsakis (Eds.), Malware: Handbook of Prevention and Detection (pp. 371–394). Springer Nature Switzerland. https://doi.org/10.1007/978-3-031-66245-4_16

Pendlebury, F., Pierazzi, F., Jordaney, R., Kinder, J., & Cavallaro, L. (2019). {TESSERACT}: Eliminating experimental bias in malware classification across space and time. 28th {USENIX} Security Symposium ({USENIX} Security 19), 729–746.

Raff, E., Barker, J., Sylvester, J., Brandon, R., Catanzaro, B., & Nicholas, C. (2018). Malware Detection by Eating a Whole EXE. The Workshops of the Thirty-Second AAAI Conference on Artificial Intelligence, 268–276. https://doi.org/10.13016/m2rt7w-bkok

Rebuffi, S. A., Kolesnikov, A., Sperl, G., & Lampert, C. H. (2017). iCaRL: Incremental Classifier and Representation Learning. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2001–2010.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). “Why Should I Trust You?” Explaining the Predictions of Any Classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1135–1144.

Roh, E., Kaya, Y., Kruegel, C., Vigna, G., & Hong, S. (2025). MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation. ArXiv Preprint ArXiv:2505.18734.

Sabbah, A., Jarrar, R., Zein, S., & Mohaisen, D. (2025). Empirical Evaluation of Concept Drift in ML-Based Android Malware Detection. ArXiv Preprint ArXiv:2507.22772. https://doi.org/https://doi.org/10.48550/arXiv.2507.22772

Sari, N. V., & Aci, M. (2025). A hybrid CNN-GRU model with XAI-Driven interpretability using LIME and SHAP for static analysis in malware detection. PeerJ Computer Science, 11, e3258.

Saxe, J., & Berlin, K. (2015). Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features. 2015 10th International Conference on Malicious and Unwanted Software (MALWARE), 11–20.

Shokouhinejad, H., Higgins, G., Razavi-Far, R., Mohammadian, H., & Ghorbani, A. A. (2025). On the Consistency of GNN Explanations for Malware Detection. Information Sciences, 721, 122603. https://doi.org/https://doi.org/10.1016/j.ins.2025.122603

Shokouhinejad, H., Razavi-Far, R., Mohammadian, H., Rabbani, M., Ansong, S., Higgins, G., & Ghorbani, A. A. (2025). Recent advances in malware detection: Graph learning and explainability. ArXiv Preprint ArXiv:2502.10556. https://doi.org/https://doi.org/10.48550/arXiv.2502.10556 Focus to learn more

Souza, J. V. S., Vieira, C. B., Cavalcanti, G. D. C., & Cruz, R. M. O. (2025). Imbalanced malware classification: an approach based on dynamic classifier selection. 2025 IEEE Symposium on Computational Intelligence in Security, Defence and Biometrics Companion (CISDB Companion), 1–5.

Sun, T., Daoudi, N., Pian, W., Kim, K., Allix, K., Bissyandé, T. F., & Klein, J. (2025). Temporal-Incremental Learning for Android Malware Detection. ACM Trans. Softw. Eng. Methodol., 34(4). https://doi.org/10.1145/3702990

Tang, A., Sethumadhavan, S., & Stolfo, S. J. (2014). Unsupervised Anomaly-Based Malware Detection Using Hardware Features. In A. Stavrou, H. Bos, & G. Portokalidis (Eds.), Research in Attacks, Intrusions and Defenses (pp. 109–129). Springer International Publishing.

Thomas, J., & Harden, A. (2008). Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology, 8(45). https://doi.org/10.1186/1471-2288-8-45

Tuan, T. A., Nguyen, P. S., Van, P. N., Hai, N. D., Trung, P. D., Son, N. T. K., & Long, H. V. (2025). A novel framework for cross-platform malware detection via AFSP and ADASYN-based balancing. Computers and Electrical Engineering, 128, 110625. https://doi.org/https://doi.org/10.1016/j.compeleceng.2025.110625

Upender, T., Neelakantappa, M., Rao, C. P., Gera, J., Reddy, V. L., & Yamsani, N. (2025). CyberDetect MLP a big data enabled optimized deep learning framework for scalable cyberattack detection in IoT environments. Scientific Reports, 15(1), 40865. https://doi.org/10.1038/s41598-025-24459-w

Ye, Y., Li, T., Adjeroh, D., & Iyengar, S. S. (2017). A survey on malware detection using data mining techniques. ACM Computing Surveys, 50(3). https://doi.org/10.1145/3073559

Zakeri, M., Faraji Daneshgar, F., & Abbaspour, M. (2015). A static heuristic approach to detecting malware targets. Security and Communication Networks, 8(17), 3015–3027.

Zhang, X., Zhao, J., & LeCun, Y. (2021). Character-level convolutional networks for text classification. Advances in Neural Information Processing Systems, 28.

Zhang, Y. (2021). Graph Neural Networks for Malware Detection: Methods and Applications. University of California, Berkeley.

Zhao, B. (2019). System call dependence graph based behavior decomposition of Android applications. International Journal of Network Security & Its Applications (IJNSA) Vol, 11.

Deep Learning-Based Malware Detection Under Real-World Constraints: A Systematic Review of Class Imbalance, Concept Drift, and Interpretability

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

Similar Articles

Current Issue

Browse

Information

Sponsored