Shanti S. Gupta Distinguished Professor of Statistics; Professor of Computer Science (Courtesy)
A.B. in Math from Princeton University; Ph.D. from Yale University
Statistics, Data Visualization, Machine Learning, Massive Datasets, Computer Networking, Cybersecurity
Analysis of very large databases of packet traces to study wide variety of network engineering problems. Research in methods of data visualization, data mining, statistical model building, and machine learning
2 Wilcoxon Prizes and Youden Prize from Technometrics, Fellow of American Statistical Association, International Statistical Institute, American Association for the Advancement of Science, and Institute of Mathematical Statistics, ISI Highly Cited, Statistician of the Year, 1996, American Statistical Association.
ACM, IEEE, American Statistical Association, Institute of Mathematical Statistics, International Statistical Institute, The Interface Foundation of North America (Computer Science and Statistics)
120 articles and 4 books. in google scholar.
William S. Cleveland is the Shanti S. Gupta Distinguished
Professor of Statistics and Courtesy Professor of Computer Science
at Purdue University. His areas of methodological research are in
statistics, machine learning, and data visualization. Cleveland has
analyzed data sets ranging from very small to very large in his research
in cybersecurity, computer network performance and control, disease
surveillance, visual perception, environmental science, healthcare
engineering, and customer opinion polling.
In the course of this work, Cleveland has developed many new methods
for data analysis that are widely used throughout engineering, science,
medicine, and business. The work has been disseminated through 120
journal articles, 4 books, and 100s of talks and tutorials. In 2002
he was selected as a Highly Cited Researcher by the American Society
for Information Science & Technology in the newly formed mathematics
category. In 1996 he was chosen national Statistician of the Year by
the Chicago Chapter of the American Statistical Association.
A major current research area being developed by Cleveland is divide
and recombine: an approach to the analysis of the massive datasets that
now occur ubiquitiously in all technical fields. In divide and recombine,
the data are divided into subsets, analysis methods are applied to each
subset, and the output of each method is recombined across subsets. The
approach, which allows "embarrassingly parallel" computation, requires
the rethinking of the tools of data analysis to develop theory and methods
for division and recombination.
In cybersecurity, Cleveland and partners John Gerth and Pat Hanrahan
at Stanford Computer Science, are putting divide and recombine to work
analyzing massive databases of packet traces: timestamps and headers.
They are building models and algorithms for surveillance and forensics
to help cybersecurity analysts detect, understand, and control attacks.