Machine Learning on Massive Data SetsAlexander Gray
College of Computing, Georgia Institute of Technology
Tuesday, April 26, 2011 15:00-16:00,
This talk will discuss the new statistical and computational foundations demanded by next-generation challenges in data analysis. Two challenges which keep increasing in importance and ubiquity are challenges of scale: massive datasets and various curses of dimensionality. New learning methods and new general algorithmic strategies for dealing with the fundamental ``inner-loop'' computations at the root of large classes of statistics and machine learning methods, both classical and modern will be highlighted. The work is general enough that it impacts other areas of scientific computing, such as physical simulation and linear algebra. Applications in a wide variety of areas will be given, as well as an overview of our unique open-source machine learning library.
Speaker Bio: Alexander Gray received Bachelor's degrees in Applied Mathematics and Computer Science from UC Berkeley and a PhD in Computer Science from Carnegie Mellon University, and worked in the Machine Learning Systems Group of NASA's Jet Propulsion Laboratory for 6 years. He currently directs the FASTlab (Fundamental Algorithmic and Statistical Tools Laboratory) at Georgia Tech, consisting of ~20 people including 12 PhD students, which works on the problem of how to perform machine learning/data mining/statistics on massive datasets, and related problems in scientific computing and applied mathematics. Employing a multi-disciplinary array of technical ideas (from discrete algorithms and data structures, computational geometry, computational physics, Monte Carlo methods, convex optimization, linear algebra, distributed computing), the lab has developed the current fastest algorithms for several fundamental statistical methods, and also develops new statistical machine learning methods for difficult aspects of real-world data, such as in astrophysics and biology. This work has enabled high-profile scientific results which have been featured in Science and Nature, and has received a National Science Foundation CAREER award, three best paper awards, and three best paper award nominations. He has given tutorials and invited talks on efficient algorithms for machine learning at venues including ICML, NIPS, SIAM Data Mining, and is a member of the National Academies Committee on the Analysis of Massive Data. He is a frequent invited speaker in the emerging area of astrostatistics/astroinformatics.
Contact: J. E. Terrill
Note: Visitors from outside NIST must contact Robin Bickel; (301) 975-3668; at least 24 hours in advance.