Using Statistical Analysis to Locate Spam Web Pages
Page Content
Dennis Fetterly - Microsoft
Dec 08, 2004
Size:
Download:
MP4 Video
Watch in your Browser (Flash Required)
RealVideo
Abstract
Commercial web sites are more dependant than ever on being placed prominently within the result pages returned by a search engine to be successful. "Spam" web pages are web pages that are created for the sole purpose of misleading search engines and misdirecting traffic to target sites. Certain classes of spam pages, in particular those that are machine-generated, diverge in some of their properties from the properties of web pages in general. As a result, these pages can be identified through statistical analysis. We have examined a variety of such properties, including linkage structure, page content, and page evolution, and have found that outliers in the statistical distributions of these properties are predominantly caused by web spam. Joint work with Mark Manasse and Marc Najork.
About the Speaker
Dennis Fetterly is a Technologist in Microsoft Research\'s Silicon Valley lab, which he joined in May, 2003. His research interests include a wide variety of web related topics including web crawling, the evolution and clustering of pages on the web, and identifying spam web pages.
Unless otherwise noted, the security seminar is held on Wednesdays at 4:30P.M.
STEW G52, West Lafayette Campus.
More information...
© 1999-2013 Purdue University. All rights reserved.
Use/Reuse Guidelines
CERIAS Seminar materials are intended for educational, non-commercial use only and any or all commercial use is prohibited. Any use must attribute "The CERIAS Seminar at Purdue University." Opinions expressed in the recordings are not necessarily representative of the views of CERIAS or of Purdue University.