A Semantic Baseline for Spam Filtering
Christian F. Hempelmann - Texas A&M University-Commerce
Jan 30, 2013Size: 264.1MB
Download: MP4 Video
Watch in your Browser Watch on YouTube
AbstractThis paper presents a meaning-based method to spam filtering by distinguishing text without content from text with little content from text with normal content, based on the amount of meaning that can be automatically processed in the way humans do. The basic method assumes that a semantic analyzer will be able to produce less output from semantically less grammatical input text than from semantically well-formed text. The method was pilot-tested on a corpus of blog spam. Future improvements, including a method to distinguish semantically unified from semantically disparate text are sketched. The tested method, but even more the projected improvements, will open up the way to taking the spam filtering arms race to a new level very costly to spam producers.
About the SpeakerChristian F. Hempelmann, is Assistant Professor of Computational Linguistics and Director of the Ontological Semantic Technology Lab at Texas A&M-Commerce. He received his PhD in 2003 from Purdue University with a specialization in ontological semantics and NLP applied to information security at the Center for Education and Research in Information Assurance and Security (CERIAS), and humor. After a post-doc in psychology at Memphis University and a professorship at Georgia Southern University, he has worked in the NLP industry since 2006, first at the Internet search engine hakia.com, then at Riverglass, Inc., developing full-scale ontological-semantic solutions. He is a member of the Editorial Board of the International Journal on Advances in Intelligent Systems and the Journal for Humor Research and has (co-)authored over forty articles.
The views, opinions and assumptions expressed in these videos are those of the presenter and do not necessarily reflect the official policy or position of CERIAS or Purdue University. All content included in these videos, are the property of Purdue University, the presenter and/or the presenter’s organization, and protected by U.S. and international copyright laws. The collection, arrangement and assembly of all content in these videos and on the hosting website exclusive property of Purdue University. You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any other way exploit any part of copyrighted material without permission from CERIAS, Purdue University.