Data Science Seminar Series

The data science seminar series aims to promote research related to machine learning, computer science, statistics and their interface. We invite both internal and external speakers to present our latest cutting edge research. All staff and students are welcome to attend our virtual seminars!

Michaelmas Term 2021 

Monday 18 October 2021, 2-3pm - Rada Mihalcea (University of Michigan)

rada mihalceaBiography: Rada Mihalcea is the Janice M. Jenkins Collegiate Professor of Computer Science and Engineering at the University of Michigan and the Director of the Michigan Artificial Intelligence Lab. Her research interests are in computational linguistics, with a focus on lexical semantics, multilingual natural language processing, and computational social sciences. She serves or has served on the editorial boards of the Journals of Computational Linguistics, Language Resources and Evaluations, Natural Language Engineering, Journal of Artificial Intelligence Research, IEEE Transactions on Affective Computing, and  Transactions of the Association for Computational Linguistics. She was a program co-chair for EMNLP 2009 and ACL 2011, and a general chair for NAACL 2015 and *SEM 2019. She currently serves as ACL President. She is the recipient of a Presidential Early Career Award for Scientists and Engineers awarded by President Obama (2009), an ACM Fellow (2019) and a AAAI Fellow (2021). In 2013, she was made an honorary citizen of her hometown of Cluj-Napoca, Romania.

Title: TextRank: Bringing Order Into Texts (Revisited)

Abstract: TextRank is a framework for the application of graph-based ranking algorithms to structures derived from text, which demonstrates how the synergy between graph-theoretical algorithms and graph-based text representations can result in efficient unsupervised methods for several natural language processing tasks. The original TextRank framework was proposed more than ten years ago, when it was found to be effective for several text processing applications, including word sense disambiguation, extractive summarization, and keyphrase extraction. Since then, TextRank has continued to be used in numerous language processing applications, oftentimes leading to performance comparable to state-of-the-art algorithms, despite being a light unsupervised methodology. In this talk, I will revisit TextRank and talk about some of its successful applications over the recent years.

Register here

Monday 8 November 2021, 3-4pm - Suvrit Sra (MIT)

Suvrit SraWebsite

Biography: Suvrit Sra is an Associate Professor in the EECS Department at MIT, and also a core faculty member of the Laboratory for Information and Decision Systems (LIDS), the Institute for Data, Systems, and Society (IDSS), as well as a member of MIT-ML and Statistics groups. He obtained his PhD in Computer Science from the University of Texas at Austin. Before moving to MIT, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, Tübingen, Germany. He has held visiting faculty positions at UC Berkeley (EECS) and Carnegie Mellon University (Machine Learning Department) during 2013-2014. His research bridges a number of mathematical areas such as differential geometry, matrix analysis, convex analysis, probability theory, and optimization with machine learning. He founded the OPT (Optimization for Machine Learning) series of workshops, held from OPT2008–2017 at the NeurIPS (erstwhile NIPS) conference. He has co-edited a book with the same name (MIT Press, 2011). He is also a co-founder and chief scientist of macro-eyes, a global healthcare+AI-for-good startup.

Title: Do we understand how to find critical points in nonsmooth optimization?

Abstract: Machine learning is full of nonconvex nonsmooth optimization problems, yet almost always the “nonsmoothness” is swept under the rug. In this talk, I will not ignore this key property, and will discuss the computational complexity of finding critical points of a rich class of nonsmooth, nonconvex functions. In particular, the class chosen contains widely used ReLU neural networks as a special case. I will focus on two key ideas: first, that it is impossible to find an ϵ-stationary point using first-order methods in finite time; and second, a natural alternative notion of (δ,ϵ)-stationarity. I will describe a formal algorithm (implementable) and its complexity for finding this modified notion of stationarity. Time permitting, I will highlight some open directions and other recent progress too.

Register here

Monday 15 November 2021, 3.30-4.30pm - Po-Ling Loh (University of Cambridge)

Po-Ling LohWebsite

Biography: I am a Lecturer in the Statistical Laboratory in the Department of Pure Mathematics and Mathematical Statistics at the University of Cambridge. My current research interests include high-dimensional statistics, optimization, network inference, robust statistics, and differential privacy. I am also interested in statistical applications to medical imaging and epidemiology. 

Title: A modern take on Huber regression

Abstract: In the first part of the talk, we discuss the use of a penalized Huber M-estimator for high-dimensional linear regression. We explain how a fairly straightforward analysis yields high-probability error bounds that hold even when the additive errors are heavy-tailed. However, the parameter governing the shape of the Huber loss must be chosen in relation to the scale of the error distribution. We discuss how to use an adaptive technique, based on Lepski's method, to overcome the difficulties traditionally faced by applying Huber M-estimation in a context where both location and scale are unknown.

In the second part of the talk, we turn to a more complicated setting where both the covariates and responses may be heavy-tailed and/or adversarially contaminated. We show how to modify the Huber regression estimator by first applying an appropriate "filtering" procedure to the data based on the covariates. We prove that in low-dimensional settings, this filtered Huber regression estimator achieves near-optimal error rates. We further show that the commonly used least trimmed squares and least absolute deviation estimators may similarly be made robust to contaminated covariates via the same covariate filtering step. This is based on joint work with Ankit Pensia (UW-Madison) and Varun Jog (Cambridge).

Take a look at the slides from Po-Ling's talk. 

Register here

Monday 22 November 2021, 2-3pm - Vitaliy Kurlin (University of Liverpool)


Biography: Vitaliy Kurlin is a Data Scientist at the Materials Innovation Factory at Liverpool and the Royal Academy Engineering Fellow at the Cambridge Crystallographic Data Centre, who leads the group developing the new area of Periodic Geometry for applications in crystallography and materials science.

Title: Mathematical Data Science for solid crystalline materials

Abstract: Most real data is ambiguous in the sense that the same real object has too many (often infinitely many) data representations. For example, any periodic lattice can be represented by infinitely many different linear bases. Data Science aims to define equivalence relations on real objects to make their classifications meaningful. The most natural equivalence of solid crystalline materials is rigid motion or isometry, because periodic crystal structures are determined in a rigid form.

This talk will describe recent advances in isometry classifications of finite and periodic point sets. The implemented isometry invariants completely distinguished all (hundreds of thousands) real periodic crystals in the world's largest Cambridge Structural Database. The work is joint with colleagues from the Data Science Theory and Applications group,

Register here

Monday 6 December 2021, 1-2pm - Negar Kiyavash (Ecole polytechnique fédérale de Lausanne)



Biography: Negar Kiyavash is the chair of Business Analytics (BAN) at École polytechnique fédérale de Lausanne (EPFL) at the College of Management of Technology. Prior to joining EPFL, she was a faculty member at the University of Illinois, Urbana-Champaign, and at Georgia Institute of Technology. Her research interests are broadly in the area of statistical learning and applied probability with special focus on network inference and causality. She is a recipient of the NSF CAREER and AFOSR YIP awards.

Title: Database alignment: fundamental limits and efficient algorithms

Abstract: As data collection becomes ubiquitous, understanding the potential benefits as well as the risks posed by the availability of such large amount of data becomes more pressing. Identifying how data from different sources relate to each other, could allow to merge and augment data. On the positive side, this could help for instance in deducting functionality of proteins by comparing protein interaction networks of different species. On the negative side, such alignment could cause unintended exposure of confidential information. A famous case of such breach occurred when customer data from the anonymous Netflix Prize database was revealed through alignment with public IMDB profiles. 

In this talk we present information-theoretic results on database alignment, showing how the size of databases and the correlation between their elements determines the success of alignment. Database alignment is closely related to equally interesting problem of network alignment, a generalization of the graph isomorphism problem.

Register here

Lent Term 2022 

Monday 24 January 2022, 2pm - Silvia Villa (University of Genoa)

Silvia villa.png


Biography: TBC

Title: TBC

Abstract: TBC

Monday 31 January 2022, 2pm - Cynthia Rudin (Duke University)



Biography: Cynthia Rudin is a professor of computer science, electrical and computer engineering, statistical science, mathematics, and biostatistics & bioinformatics at Duke University. She directs the Interpretable Machine Learning Lab, whose goal is to design predictive models with reasoning processes that are understandable to humans. Her lab applies machine learning in many areas, such as healthcare, criminal justice, and energy reliability. She holds an undergraduate degree from the University at Buffalo, and a PhD from Princeton University. She is the recipient of the 2022 Squirrel AI Award for Artificial Intelligence for the Benefit of Humanity from the Association for the Advancement of Artificial Intelligence (the “Nobel Prize of AI”). She is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics. Her work has been featured in many news outlets including the NY Times, Washington Post, Wall Street Journal, and Boston Globe.

Title: TBC

Abstract: TBC

Monday 7 February 2022, 2pm - Joseph Salmon (Université de Montpellier)

Joseph SalmonWebsite

Biography: Since 2018, I am a full professor at Université de Montpellier and an associate member at INRIA Parietal Team. For the spring and summer quarters 2018, I was a visiting assistant professor at UW, Statistics department. From 2012 to 2018 I was an assistant professor at Telecom ParisTech. Back in 2011 and 2012, I was a post-doctoral Associate at Duke university working with Rebecca Willett.

Title: TBC

Abstract: TBC

Monday 14 February 2022, 2pm - Volkan Cevher (Ecole polytechnique fédérale de Lausanne) 

Volkan CevherWebsite

Biography: Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include machine learning, signal processing theory,  optimization theory and methods, and information theory. Dr. Cevher is an ELLIS fellow and was the recipient of the Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.

Title: TBC

Abstract: TBC

Monday 7 March 2022, 2pm - Dino Sejdinovic (University of Oxford) 

dino SWebsite

Biography: I am an Associate Professor in Statistics at the University of Oxford and a Fellow of Mansfield College. I conduct research at the interface between machine learning and statistical methodology.

Title: TBC

Abstract: TBC


Past Seminars

MT 2020 -  LT 2021 (PDF)