Biography: Suvrit Sra is an Associate Professor in the EECS Department at MIT, and also a core faculty member of the Laboratory for Information and Decision Systems (LIDS), the Institute for Data, Systems, and Society (IDSS), as well as a member of MIT-ML and Statistics groups. He obtained his PhD in Computer Science from the University of Texas at Austin. Before moving to MIT, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, Tübingen, Germany. He has held visiting faculty positions at UC Berkeley (EECS) and Carnegie Mellon University (Machine Learning Department) during 2013-2014. His research bridges a number of mathematical areas such as differential geometry, matrix analysis, convex analysis, probability theory, and optimization with machine learning. He founded the OPT (Optimization for Machine Learning) series of workshops, held from OPT2008–2017 at the NeurIPS (erstwhile NIPS) conference. He has co-edited a book with the same name (MIT Press, 2011). He is also a co-founder and chief scientist of macro-eyes, a global healthcare+AI-for-good startup.
Title: Do we understand how to find critical points in nonsmooth optimization?
Abstract: Machine learning is full of nonconvex nonsmooth optimization problems, yet almost always the “nonsmoothness” is swept under the rug. In this talk, I will not ignore this key property, and will discuss the computational complexity of finding critical points of a rich class of nonsmooth, nonconvex functions. In particular, the class chosen contains widely used ReLU neural networks as a special case. I will focus on two key ideas: first, that it is impossible to find an ϵ-stationary point using first-order methods in finite time; and second, a natural alternative notion of (δ,ϵ)-stationarity. I will describe a formal algorithm (implementable) and its complexity for finding this modified notion of stationarity. Time permitting, I will highlight some open directions and other recent progress too.