CERIAS - Center for Education and Research in Information Assurance and Security

Skip Navigation
Purdue University - Discovery Park
Center for Education and Research in Information Assurance and Security

Using Statistical Analysis to Locate Spam Web Pages

Dennis Fetterly - Microsoft

Dec 08, 2004

Size: 215.6MB

Download: Video Icon MP4 Video  
Watch in your Browser   Watch on Youtube Watch on YouTube


Commercial web sites are more dependant than ever on being placed prominently within the result pages returned by a search engine to be successful. "Spam" web pages are web pages that are created for the sole purpose of misleading search engines and misdirecting traffic to target sites. Certain classes of spam pages, in particular those that are machine-generated, diverge in some of their properties from the properties of web pages in general. As a result, these pages can be identified through statistical analysis. We have examined a variety of such properties, including linkage structure, page content, and page evolution, and have found that outliers in the statistical distributions of these properties are predominantly caused by web spam. Joint work with Mark Manasse and Marc Najork.

About the Speaker

Dennis Fetterly is a Technologist in Microsoft Research\'s Silicon Valley lab, which he joined in May, 2003. His research interests include a wide variety of web related topics including web crawling, the evolution and clustering of pages on the web, and identifying spam web pages.

Unless otherwise noted, the security seminar is held on Wednesdays at 4:30P.M. STEW G52 (Suite 050B), West Lafayette Campus. More information...


The views, opinions and assumptions expressed in these videos are those of the presenter and do not necessarily reflect the official policy or position of CERIAS or Purdue University. All content included in these videos, are the property of Purdue University, the presenter and/or the presenter’s organization, and protected by U.S. and international copyright laws. The collection, arrangement and assembly of all content in these videos and on the hosting website exclusive property of Purdue University. You may not copy, reproduce, distribute, publish, display, perform, modify, create derivative works, transmit, or in any other way exploit any part of copyrighted material without permission from CERIAS, Purdue University.