Knowledge Modeling of Phishing Emails
Tech report number
CERIAS TR 2016-3
This dissertation investigates whether or not malicious phishing emails are detected better when a meaningful representation of the email bodies is available. The natural language processing theory of Ontological Semantics Technology is used for its ability to model the knowledge representation present in the email messages. Known good and phishing emails were analyzed and their meaning representations fed into machine learning binary classifiers. Unigram language models of the same emails were used as a baseline for comparing the performance of the meaningful data. The end results show how a binary classifier trained on meaningful data is better at detecting phishing emails than a unigram language model binary classifier at least using some of the selected machine learning algorithms.
2016 – 8 – 6