A Semantic Baseline for Spam Filtering

Page Content

Christian F. Hempelmann - Texas A&M University-Commerce

Jan 30, 2013

Size: 264.1MB

Download: Video Icon MP4 Video   Flash Icon Watch in your Browser (Flash Required)  

Abstract

This paper presents a meaning-based method to spam filtering by distinguishing text without content from text with little content from text with normal content, based on the amount of meaning that can be automatically processed in the way humans do. The basic method assumes that a semantic analyzer will be able to produce less output from semantically less grammatical input text than from semantically well-formed text. The method was pilot-tested on a corpus of blog spam. Future improvements, including a method to distinguish semantically unified from semantically disparate text are sketched. The tested method, but even more the projected improvements, will open up the way to taking the spam filtering arms race to a new level very costly to spam producers.

About the Speaker

Christian F. Hempelmann, is Assistant Professor of Computational Linguistics and Director of the Ontological Semantic Technology Lab at Texas A&M-Commerce. He received his PhD in 2003 from Purdue University with a specialization in ontological semantics and NLP applied to information security at the Center for Education and Research in Information Assurance and Security (CERIAS), and humor. After a post-doc in psychology at Memphis University and a professorship at Georgia Southern University, he has worked in the NLP industry since 2006, first at the Internet search engine hakia.com, then at Riverglass, Inc., developing full-scale ontological-semantic solutions. He is a member of the Editorial Board of the International Journal on Advances in Intelligent Systems and the Journal for Humor Research and has (co-)authored over forty articles.

Unless otherwise noted, the security seminar is held on Wednesdays at 4:30P.M. STEW G52, West Lafayette Campus. More information...

© 1999-2013 Purdue University. All rights reserved.

Use/Reuse Guidelines

CERIAS Seminar materials are intended for educational, non-commercial use only and any or all commercial use is prohibited. Any use must attribute "The CERIAS Seminar at Purdue University." Opinions expressed in the recordings are not necessarily representative of the views of CERIAS or of Purdue University.