2024 Symposium Posters

Posters > 2024

Malware Language Processing “MLP”: Developing a new paradigm for malware analysis and classification using Machine Learning and Artificial Intelligence


PDF

Primary Investigator:
Dongyan Xu

Project Members
Solomon Sonya
Abstract
o Malware continues to increase in prevalence and sophistication. Successfully exploiting networks and digital systems has become a highly profitable operation for malicious threat actors. VirusTotal reported a daily submission of 2M+ malware samples in March 2024 (VirusTotal, 2024). Of those 2 million daily submissions, over 1 million were unique malware samples (per day!). Traditional detection mechanisms including antivirus software fail to adequately detect new and varied malware (Jhaveri et al, 2022, Johnson and Haddad, 2021, Geis, 2019). Artificial Intelligence and Machine Learning models provide advanced capabilities that can enhance cybersecurity. Building a robust and automated artificially intelligent malware analysis pipeline and producing new malware datasets however, are not trivial. The purpose of this poster is to present current progress in our research aimed at developing a robust malware classification framework. We are developing this framework to automate malware analysis and feature extraction and produce new, standardized malware datasets for future Machine Learning analysis. Additionally, this research presents status regarding the development of a new Malware Ensemble Classification Facility that leverages several Machine Learning models to enhance the classification of malware. To our knowledge, this is the first research that utilizes Machine Learning to provide enhanced classification of an entire 200+ gigabyte, malware family corpus consisting of 80K+ unique malware samples and 70+ malware families