CERIAS - Center for Education and Research in Information Assurance and Security

Skip Navigation
Purdue University - Discovery Park
Center for Education and Research in Information Assurance and Security

A Semantic Baseline for Spam Filtering

Christian F. Hempelmann - Texas A&M University-Commerce

Jan 30, 2013

Size: 264.1MB

Download: Video Icon MP4 Video  
Watch in your Browser   Watch on Youtube Watch on YouTube


This paper presents a meaning-based method to spam filtering by distinguishing text without content from text with little content from text with normal content, based on the amount of meaning that can be automatically processed in the way humans do. The basic method assumes that a semantic analyzer will be able to produce less output from semantically less grammatical input text than from semantically well-formed text. The method was pilot-tested on a corpus of blog spam. Future improvements, including a method to distinguish semantically unified from semantically disparate text are sketched. The tested method, but even more the projected improvements, will open up the way to taking the spam filtering arms race to a new level very costly to spam producers.

About the Speaker

Christian F. Hempelmann, is Assistant Professor of Computational Linguistics and Director of the Ontological Semantic Technology Lab at Texas A&M-Commerce. He received his PhD in 2003 from Purdue University with a specialization in ontological semantics and NLP applied to information security at the Center for Education and Research in Information Assurance and Security (CERIAS), and humor. After a post-doc in psychology at Memphis University and a professorship at Georgia Southern University, he has worked in the NLP industry since 2006, first at the Internet search engine hakia.com, then at Riverglass, Inc., developing full-scale ontological-semantic solutions. He is a member of the Editorial Board of the International Journal on Advances in Intelligent Systems and the Journal for Humor Research and has (co-)authored over forty articles.

Unless otherwise noted, the security seminar is held on Wednesdays at 4:30P.M. STEW G52 (Suite 050B), West Lafayette Campus. More information...


The views, opinions and assumptions expressed in these videos are those of the presenter and do not necessarily reflect the official policy or position of CERIAS or Purdue University. All content included in these videos, are the property of Purdue University, the presenter and/or the presenter’s organization, and protected by U.S. and international copyright laws. The collection, arrangement and assembly of all content in these videos and on the hosting website exclusive property of Purdue University. You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any other way exploit any part of copyrighted material without permission from CERIAS, Purdue University.