Marcus Botacin - Texas A&M
Students: Fall 2025, unless noted otherwise, sessions will be virtual on Zoom.
Join us live on Zoom >
( Register to receive a reminder )
Wednesday, Oct 29, 2025 04:30pm - 05:30pm ET
( Register to receive a reminder )
Wednesday, Oct 29, 2025 04:30pm - 05:30pm ET
Malware Detection under Concept Drift: Science and Engineering
Oct 29, 2025
Abstract
The current largest challenge in ML-based malware detection is maintaining high detection rates while samples evolve, causing classifiers to drift. What is the best way to solve this problem? In this talk, Dr. Botacin presents two views on the problem: the scientific and the engineering. In the first part of the talk, Dr. Botacin discusses how to make ML-based drift detectors explainable. The talk discusses how one can split the classifier knowledge into two: (1) the knowledge about the frontier between Malware (M) and Goodware (G); and (2) the knowledge about the concept of the (M and G) classes, to understand whether the concept or the classification frontier changed. The second part of the talk discusses how the experimental conditions in which the drift handling approaches are developed often mismatch the real deployment settings, causing the solutions to fail to achieve the desired results. Dr Botacin points out ideal assumptions that do not hold in reality, such as: (1) the amount of drifted data a system can handle, and (2) the immediate availability of oracle data for drift detection, when in practice, a scenario of label delays is much more frequent. The talk demonstrates a solution for these problems via a 5K+ experiment, which illustrates (1) how to explain every drift point in a malware detection pipeline and (2) how an explainable drift detector also makes online retraining to achieve higher detection rates and requires fewer retraining points than traditional approaches.About the Speaker

Page: >https://marcusbotacin.github.io/
Ways to Watch
