Rahul Potharaju - Purdue University
"Towards Automated Problem Inference from Trouble Tickets"
Apr 17, 2013Download: MP4 Video Size: 165.0MB
Watch on YouTube
AbstractThe growing demand for cloud services is driving the need to deliver an always-on and safe user experience in accessing their data and applications. Examples include web search, social networking, email, ecommerce, video streaming, data analytics and even mission-critical services such as power grid control. Such environments are required to be highly available and secure. This is often satisfied by having experts monitoring the system 24x7 to ensure that problems, if any, are resolved within a reasonable time. The need to solve a problem within the minimum time gives rise to a "whatever-it-takes-to-fix-the-problem" attitude amongst experts and produces a constant flow of informal text documenting the debugging steps taken to resolve problems. Understanding the content within this informal text at scale is the key to uncovering big problem trends that will enable us learn from mistakes and improve system design.
In this talk, I will present NetSieve, a system that we built that aims to do automated problem inference from trouble tickets. Specifically, I will show you how statistical natural language processing (NLP) can be combined with knowledge representation, ontology modeling and human-guided learning to automatically analyze natural language text in trouble tickets to infer the problem symptoms, troubleshooting activities and resolution actions. I will further discuss fundamental challenges which arise when extracting meaning from such massive open-domain text corpora. Finally, I will then discuss how we applied NetSieve in a massive data center setting to automatically analyze 10K+ network trouble tickets and how we used these results to improve several key network operations.
About the Speaker
Rahul Potharaju is a PhD student in the Computer Science department of Purdue University and a member of CERIAS, advised by Prof. Cristina Nita-Rotaru. Prior to that, in 2009, he earned his Masters Degree in Computer Science from Northwestern University. He has over two years of industrial research experience working on collaboration projects with Microsoft Research, Redmond and Motorola Applied Research Center. His current work focuses on large-scale Internet measurements, problem inference system, intrusion detection and security aspects of smartphones architectures and reliability aspects of data centers both from a hardware and a software perspective. A recurring theme in all his research is combining cross-domain techniques such as those from natural language processing with statistical machine learning and data mining to make surprising inferences in the networking and smartphone areas.
Unless otherwise noted, the security Fall and Spring seminar series is held on Wednesdays at 4:30P.M. STEW G52 (Suite 050B), West Lafayette Campus. More information...