Christian F. Hempelmann - Texas A&M University-Commerce
"A Semantic Baseline for Spam Filtering"
Jan 30, 2013Download: MP4 Video Size: 264.1MB
Watch on YouTube
AbstractThis paper presents a meaning-based method to spam filtering by distinguishing text without content from text with little content from text with normal content, based on the amount of meaning that can be automatically processed in the way humans do. The basic method assumes that a semantic analyzer will be able to produce less output from semantically less grammatical input text than from semantically well-formed text. The method was pilot-tested on a corpus of blog spam. Future improvements, including a method to distinguish semantically unified from semantically disparate text are sketched. The tested method, but even more the projected improvements, will open up the way to taking the spam filtering arms race to a new level very costly to spam producers.
About the Speaker
Christian F. Hempelmann, is Assistant Professor of Computational Linguistics and Director of the Ontological Semantic Technology Lab at Texas A&M-Commerce. He received his PhD in 2003 from Purdue University with a specialization in ontological semantics and NLP applied to information security at the Center for Education and Research in Information Assurance and Security (CERIAS), and humor. After a post-doc in psychology at Memphis University and a professorship at Georgia Southern University, he has worked in the NLP industry since 2006, first at the Internet search engine hakia.com, then at Riverglass, Inc., developing full-scale ontological-semantic solutions. He is a member of the Editorial Board of the International Journal on Advances in Intelligent Systems and the Journal for Humor Research and has (co-)authored over forty articles.
Unless otherwise noted, the security seminar is held on Wednesdays at 4:30P.M. STEW G52 (Suite 050B), West Lafayette Campus. More information...