Behavioral Feature Extraction for Network Anomaly Detection
James P. Early
Tech report number
CERIAS TR 2005-55
This dissertation presents an analysis of the features of network traffic commonly used in network-based anomaly detection systems. It is an examination designed to identify how the selection of a particular protocol attribute affects performance. It presents a guide for making judicious selections of features for building network-based anomaly detection models. We introduce a protocol analysis methodology called Inter-flow versus Intra-flow Analysis (IVIA) for partitioning protocol attributes based on operational behavior. The method aids in the construction of flow models and identifies the protocol attributes that contribute to model accuracy, and those that are likely to generate false positive alerts, when used as features for network anomaly detection models. We introduce a set of data preprocessing operations that transform these previously identified ``noisy'' attributes into useful features for anomaly detection. We refer to these as behavioral features. The derivation of this new class of features from observed measurements is both possible and feasible without undue computational effort, and can therefore keep pace with network traffic. Empirical results using unsupervised learning show that models based on behavioral features can achieve higher classification accuracies with markedly lower false positive rates than their traditional packet header feature counterparts. Behavioral features are also used in the context of supervised learning to build classifiers of server application flow behavior.
2005 – 08