Yuhong Nan - Purdue University
Students: Spring 2025, unless noted otherwise, sessions will be virtual on Zoom.
Semantics-Driven, Learning-Based Privacy Discovery in Mobile Apps
Feb 26, 2020
Download:

Abstract
A long-standing challenge in analyzing information leaks within mobile apps is to automatically identify the codeoperating on sensitive data. With all existing solutions relying on System APIs (e.g., IMEI, GPS location) or features of user interfaces (UI), the content from app servers, like user's Facebook profile, payment history, fall through the crack.
In this talk, I will introduce ClueFinder, a novel semantics-driven solution for automatic discovery of sensitive user data, including those from the server side. ClueFinder utilizes natural language processing (NLP) to automatically locate the program elements (variables, methods, etc.) of interest, and then performs a learning-based program structure analysis to accurately identify those indeed carrying sensitive content. Using this new technique, we analyzed over 400k popular apps, an unprecedented scale for this type of research. Our findings brings to light the pervasiveness of information leaks, and the channels through which the leaks happen, including unintentional over-sharing across libraries and aggressive data acquisition behaviors.
About the Speaker

Ways to Watch
