Privacy in Text and Search
Principal Investigator: Chris Clifton
Text and search have been shown to pose particular privacy challenges, for example the AOL query log anonymization failure. We are developing techniques to allow the identification of relevant texts while controlling disclosure of information, both on the part of those searching for information, and those providing content. This builds on previous success in text mining and privacy-preserving data mining to allow search and analysis of documents while respecting privacy and security constraints. Recent advances include:
- A new methodology to generate “cover queries” that effectively hide user intent from a search engine.
- A technique for efficiently comparing two document corpuses to identify similar documents, without disclosing document contents.
- A method for generalizing text to protect against re-identification through information not removed by traditional de-identification techniques.
Ongoing research includes application of this work in support of healthcare research.
Ahmet Erhan Nergiz, Mehmet Ercan Nergiz, and Thomas Pedersen, and Chris Clifton, “Practical and Secure Integer Comparison and Interval Check”, 2010 IEEE International Conference on Privacy, Security, Risk and Trust (PASSAT2010), Minneapolis, Minnesota, August 20-22, 2010.
Mummoorthy Murugesan, Wei Jiang, Chris Clifton, Luo Si and Jaideep Vaidya, “Efficient Privacy-Preserving Similar Document Detection”, The VLDB Journal, 19(4):457-475, August 2010.
Mummoorthy Murugesan and Chris Clifton, “Providing Privacy through Plausibly Deniable Search”, 2009 SIAM International Conference on Data Mining (SDM09), Sparks, Nevada, April 30-May 2, 2009.
Keywords: anonymization, privacy preserving data mining, security