alessia-caudiero-1867

Titles and Abstracts

Below you will find the titles and abstracts of our Workshop speakers and also a biography about each of our speakers too.

Day One speakers - 27th March

Martin Anthony, London School of Economics

MA

Biography - Martin Anthony is a Professor of Mathematics at LSE and is currently Head of the Department of Mathematics. His research interests lie in the mathematical theory of machine learning and in the theory of Boolean and pseudo-Boolean functions. 

Title Sample width in classification.

Abstract - It has proven useful, when using the sign of a real-valued function for binary classification, to use functions that achieve a `large margin' on a labeled training sample (since better generalization error bounds are possible, and because such classifiers are also more robust). For general binary-valued functions, not arising in this way directly from real-valued functions, it is not immediately clear what one could use as an analogy to the margin.  We investigate how alternative notions of `regularity' of binary-valued functions with respect to a training sample can analogously be used to guide the selection of a `good' classifier from the class.

Joint work with Joel Ratsaby. 

Moez Draief, Huawei Research 

Moez

Biography - Moez Draief is the Head of the Machine Learning Foundations group at Huawei Technologies. His team is working on Machine Learning and optimization approaches to next generation telecommunications networks such as Software Defined Networks and 5G wireless technology. Prior to joining Huawei, he was an Associate Professor in the Intelligent Systems and Networks at Imperial College London. He also held positions at Cambridge University, University of Pennsylvania and Microsoft Research. He is a Fellow of the Royal Statistical Society and a life fellow of Clare Hall College Cambridge. He previously held a Marie Curie Fellowship and a Leverhulme Trust Fellowship.

Title - Machine Learning for Telecommunications Networks at Huawei.

Abstract - Next generation wireless technology known as 5G is supposed to offer in excess of 20 gigabits per second, compared to less than half that for 4G, and latencies of below 1 millisecond enabling to download a HD movie in a blink of an eye and to roll out applications such as virtual and augmented reality. In Huawei, we are working on a number of research problems that will offer a seemly infinite bandwidth. This requires to cleverly leverage the data at our disposal throughout the network to make it autonomous and aware of the context it is operating in. I will describe a number of challenges in using machine learning approaches to enable this vision in settings such as edge caching, radio map acquisition and application identification.

Yang Cao, University of Edinburgh


Biography - Yang Cao is a research associate at LFCS,  School of Informatics, the University of Edinburgh. He received his Ph.D. degree from the University of Edinburgh in 2016, supervised by Prof. Wenfei Fan. He has been developing methods for querying big data, from both fundamental and practical perspectives. He has published a number of papers in major database conferences and journals such as SIGMOD, PODS, VLDB, ICDE, and TODS. He is the recipient of SIGMOD best paper award, Microsoft Research Asia Fellowship, and Microsoft Research Asia Young Scholarship.

Title - Is Big Data Analytics beyond the Reach of Small Companies?

Abstract - Big data analytics is often prohibitively costly. It is typically conducted by parallel processing with a cluster of machines, and is considered a privilege of big companies that can afford the resources. This talk argues that big data analytics is accessible to small companies with constrained resources. As an evidence, we present BEAS, a system for querying big data with bounded resources. BEAS advocates a resource-bounded query evaluation paradigm, based on a theory of bounded evaluation and a data-driven approximation scheme. As a proof of concept, BEAS is found to improve query evaluation of our industry collaborators by orders of magnitude. 

Piotr Fryzlewicz, London School of Economics

Prof Piotr Fryzlewicz 200x200

Biography - Piotr Fryzlewicz is a professor of statistics in the Department of Statistics at the London School of Economics. He has previously worked at Winton Capital Management, the University of Bristol and Imperial College London. His research interests are in multiscale modelling and estimation, time series, change-point detection, high-dimensional statistical inference and statistical learning. He is a former Joint Editor of the Journal of the Royal Statistical Society Series B.

Title - Multiscale thinking in data analysis, recursive algorithms, and data-adaptive change-point detection.

Abstract - The talk starts on a general note: we first attempt to define a "multiscale" method / algorithm as a recursive program acting on a dataset in a suitable way. Wavelet transformations, unbalanced wavelet transformations and binary segmentation are all examples of multiscale methods in this sense. Using the example of binary segmentation, we illustrate the benefits of the recursive formulation of multiscale algorithms from the software implementation and theoretical tractability viewpoints. 

We then turn more specific and study the canonical problem of a-posteriori detection of multiple change-points in the mean of a piecewise-constant signal observed with noise. Even in this simple set-up, many publicly available state-of-the-art methods struggle for certain classes of signals. In particular, this misperformance is observed in methods that work by minimising a "fit to the data plus a penalty" criterion, the reason being that it is challenging to think of a penalty that works well over a wide range of signal classes. To overcome this issue, we propose a new approach whereby methods "learn" from the data as they proceed, and, as a result, operate differently for different signal classes. As an example of this approach, we revisit our earlier change-point detection algorithm, Wild Binary Segmentation, and make it data-adaptive by equipping it with a recursive mechanism for deciding "on the fly" how many sub-samples of the input data to draw, and where to draw them. This is in contrast to the original Wild Binary Segmentation, which is not recursive. We show that this significantly improves the algorithm particularly for signals with frequent change-points.  

Neil Lawrence, Amazon 

Neil

Biography - Neil Lawrence leads Amazon Research Cambridge where he is a Director of Machine Learning. He is on leave of absence from the University of Sheffield where he is a Professor in Computational Biology and Machine Learning in the the Department of Computer Science.

Neil’s main research interest is machine learning through probabilistic models. He focuses on both the algorithmic side of these models and their application. His recent focus has been on the deployment of machine learning technology in practice, particularly under the banner of data science. He is also the co-host of the Talking Machines podcast.

Title - Time for Professionalisation?

Abstract - Machine learning methods and software are becoming widely deployed. But how are we sharing expertise about bottlenecks and pain points in deploying solutions? In terms of the practice of data science, we seem to be at a similar point today as software engineering was in the early 1980s. Best practice is not widely understood or deployed. In this talk we will focus on two particular components of data science solutions: the preparation of data and the deployment of machine learning systems. 

Laurent Massoulie, INRIA

Laurent 

Biography - Laurent Massoulié graduated from the Ecole Polytechnique, Palaiseau, France, in 1991, and received the Ph.D. degree in automatic control from Paris Sud University, Orsay, France, in 1995. He is a researcher at Inria where he leads the Microsoft Research-Inria Joint Centre, and a Professor at the Applied Mathematics Centre of Ecole Polytechnique. His research focuses on probabilistic modeling and design of algorithms for machine learning as well as “large networks,” including P2P and social networks. He has held positions with France Telecom R&D from 1995 to 1999, Microsoft Research, Cambridge, U.K., from 1999 to 2006, and Technicolor, Paris, France, from 2006 to 2012. Dr. Massoulié has served as Associate Editor of Queueing Systems: Theory and Applications from 2000 to 2006, the IEEE/ACM TRANSACTIONS ON NETWORKING in 2008, and the Stochastic Systems Journal from 2011 to the present. He has coauthored the Best Paper Award-winning papers of IEEE INFOCOM 1999, ACM SIGMETRICS 2005, and ACM CoNEXT 2007, been elected a Technicolor Fellow in 2011, and received the “Grand Prix Scientifique” from the Del Duca Foundation in 2017.

Title - Phase transitions on community detectability for various types of stochastic block models.

Abstract -  In this talk we will survey available results and open questions on detectability of communities using polynomial-time algorithms for several variants of the stochastic block model (SBM). We will in particular consider degree-corrected SBM’s and labelled SBM’s and discuss how the phase transition captured by the so-called Kesten-Stigum threshold in the classical case translates in these other two scenarios.  

Vahab Mirrokni, Google Research 

Vahab

Biography - Vahab Mirrokni is a principal scientist, heading the algorithms research groups at Google Research, New York. The group consist of three main sub-teams: market algorithmslarge-scale graph mining, and large-scale optimization. He received his PhD from MIT in 2005 and his B.Sc. from Sharif University of Technology in 2001. He joined Google Research in 2008, after spending a couple of years at Microsoft Research, MIT and Amazon.com. He is the co-winner of paper awards at KDD'15, ACM EC'08, and SODA'05. His research areas include algorithms, distributed and stochastic optimization, and computational economics. At Google, he is mainly working on algorithmic and economic problems related to search and online advertising. Recently he is working on online ad allocation problems, distributed algorithms for large-scale graph mining , and mechanism design for advertising exchanges. 

Title - Distributed Graph Mining: Theory and Practice

Abstract - Mining huge graphs is an important part of any modern data mining platforms. Dealing with web-scale graphs introduce a variety of challenges both from distributed optimization perspective and from system-level distributed systems perspectives. In this talk, I present techniques that allow us to perform clustering, balanced partitioning, and link prediction tools handling graphs with tens of trillions of edges. Along the way, I will discuss systems like MapReduce, and distributed hash table service, and algorithmic techniques like composable core-sets.

Sofia Olhede, UCL

Sofia

Biography - Sofia Olhede is since 2007 a professor of Statistics and an Honorary professor of Computer Science at University College London. She was awarded her PhD in 2003 at Imperial College London, where she was a Lecturer (assistant professor) and Senior Lecturer (associate professor) between 2002 and 2006. She is Director of UCL's Centre for Data Science. Sofia served on the UK Royal Society’s Machine Learning Committee, the British Academy and Royal Society Data Governance Project, and is a member of the Personal Data and Individual Access Control section of the IEEE Global Initiative for Ethical Considerations in Artificial Intelligence and Autonomous Systems. She currently holds a European Research Council consolidator fellowship, and previously held a five year UK Engineering and Physical Sciences research council Leadership Fellowship. Sofia chairs and founded the UCL ‘Theory of Big Data’ international workshop series that attracts around 200 participants every year, and that is in its 3rd successful year.

Title - Network comparison.

Abstract - The topology of any complex system is key to understanding its structure and function. Fundamentally, algebraic topology guarantees that any system represented by a network can be understood through its closed paths. The length of each path provides a notion of scale, which is vitally important in characterizing dominant modes of system behavior. Here, by combining topology with scale, we prove the existence of universal features which reveal the dominant scales of any network. We use these features to compare several canonical network types in the context of a social media discussion which evolves through the sharing of rumors, leaks and other news. Our analysis enables for the first time a universal understanding of the balance between loops and tree-like structure across network scales, and an assessment of how this balance interacts with the spreading of information online. Crucially, our results allow networks to be quantified and compared in a purely model-free way that is theoretically sound, fully automated, and inherently scalable.

Alexandre Proutiere, KTH Stockholm 

Alexandre

Biography - Alexandre Proutiere is professor in Automatic Control at KTH, Royal Institute of Technology, in Stockholm since 2011; his research interests lie in statistical inference for supervised, unsupervised, and reinforcement learning, and its applications. Before joining KTH, he was a permanent researcher at Microsoft Research in Cambridge (UK). He held an ERC consolidator grant from 2012 to 2017, and is one of the funded professors of WASP, the Wallenberg Autonomous System Program. He is/has been in the editorial boards of IEEE Trans. on Information Theory, IEEE Trans. on Networking, IEEE Trans. on control of networked systems. He received the ACM Sigmetrics rising star award in 2009, the best paper awards at ACM Sigmetrics and Mobihoc in 2004, 2009, and 2010. A. Proutiere graduated in mathematics from Ecole Normale Superieure (Paris), and qualified as an engineer from Telecom Paris Tech. He received his PhD in applied mathematics from Ecole Polytechnique (Palaiseau, France) in 2003.

Title - Online Learning of Optimally Diverse Rankings.

Abstract - Search engines answer users’ queries by listing relevant items (e.g. documents, songs, products, web pages, ...). They rely on algorithms that learn to rank items so as to present an ordered list containing the most relevant items. The main challenge in designing such algorithms stems from the fact that queries often have different meanings for different users. In absence of any contextual information about the query, one often has to adhere to the diversity principle, i.e., to return a list covering the various possible topics or meanings of the query. We formalize this learning-to-rank problem as an online combinatorial optimization problem; we propose fundamental performance limits satisfied by any algorithm, and devise LDR, an algorithm achieving these limits. Numerical experiments on artificial and real-world data illustrate the LDR optimality. The talk is also the opportunity to present recent advances in stochastic online optimization problems, and challenges ahead.  (Joint work with Stefan Magureanu)

Tim Roughgarden, Stanford University 

 

Tim R

Biography - Tim Roughgarden is a Professor of Computer Science and (by courtesy) Management Science and Engineering at Stanford University. He joined the Stanford faculty in 2004, following a PhD at Cornell and a postdoc at UC Berkeley. His research interests include the many connections between computer science and economics, as well as the design, analysis, applications, and limitations of algorithms. For his research, he has been awarded the ACM Grace Murray Hopper Award, the Presidential Early Career Award for Scientists and Engineers (PECASE), the Kalai Prize in Computer Science and Game Theory, the Social Choice and Welfare Prize, the Mathematical Programming Society’s Tucker Prize, and the EATCS-SIGACT Gödel Prize. He was an invited speaker at the 2006 International Congress of Mathematicians, the Shapley Lecturer at the 2008 World Congress of the Game Theory Society, and a Guggenheim Fellow in 2017. His books include Twenty Lectures on Algorithmic Game Theory (2016) andAlgorithms Illuminated (2017).

Title - Distribution-Free Models of Social and Information Networks.

Abstract - The mathematical study of social and information networks has historically centered around generative models for such networks (preferential attachment, the Chung-Lu random graph model, Kronecker graphs, etc.). This talk proposes distribution-free models of social and information networks --- novel classes of graphs that capture all plausible such networks. Our models are motivated by triadic closure, the property that vertices with one or more mutual neighbors tend to also be neighbors; this is one of the most universal signatures of social networks. We prove structural results on the clustering properties of such graphs, and give algorithmic applications to clustering and clique-finding problems.

Includes joint work with Jacob Fox, Rishi Gupta, C. Seshadhri, Fan Wei, and Nicole Wein. 

Day Two speakers - 28th March

Clement Calauzenes, Criteo

Clement

Biography - Clement Calauzenes is a Research Scientist Lead in Criteo Research. He received his PhD in Computer Science from Université Pierre et Marie Curie in 2013, advised by Prof. Patrick Gallinari. His current and past research focused on learning to rank theory, recommender systems, counterfactual inference and computational advertising. https://www.linkedin.com/in/clementcalauzenes/

Title - Revenue Maximizing Auctions: the Buyer Prospective

Abstract - Online advertisement is currently the fastest growing form of advertising. Thus, there has been a large effort put onto studying revenue maximizing auctions and their practical implementations. A crucial assumption is for the buyers to be “blind” to the repetition of auctions and to behave as a one-shot bidder. However, in online advertising, most bidders also observe a large share of auctions, which grants them the ability to reason in expectation and not in one-shot.

Thus, it is crucial to take the point of view of buyers reasoning in expectation to understand whether they have the incentive to move away from truthful bidding and study potential equilibrium strategies as well as the impacts on the bidders’ payoff and the seller revenue.

Nimrod Priell, Facebook

Nimrod

Biography - Nimrod is a research scientist manager at Facebook. His team of researchers develops and applies state-of-the-art methodology in statistics, computation social science and machine learning to understand and optimize Facebook’s products and accelerate the company’s success. Nimrod holds an M.S in Math from NYU and a B.S in Math from the Hebrew University of Jerusalem.

Title - Innovation at the frontiers of Data Science at Facebook.

Abstract - The term ‘data science’ today means different things to different people, with a lot of attention being directed recently to AI and deep learning. Nimrod will highlight how much more varied are the opportunities and applications of big data and scalable computation to transform practices beyond classification and unsupervised learning, in helping us impact disciplines as far ranging as experimentation, observational inference, surveying, operations research and optimization and network science. These fields and others are also experiencing a renaissance of methods, new problems and novel solutions and mean that a modern data scientist can be valuable by knowing more than just ML. Nimrod will discuss examples from the Core Data Science team at Facebook (research.fb.com/category/data-science/) for how these techniques are used in practice.

Elizeu Santos-Neto, Google

Elizeu 

Biography - Elizeu works on the application of statistical methods to characterize the search ads ecosystem and to design metrics that track the quality and value of ads from users' and advertisers’ standpoint. He also actively collaborates with academia on topics that include information retrieval, collaborative systems, and machine learning. Elizeu earned a PhD in Computer Engineering from the University of British Columbia (Vancouver, Canada) with a focus on the characterization and design of online peer production systems such as peer-to-peer networks, collaborative tagging communities, and online social networks. 

Title - Search Ads Overview.

Abstract - How to show relevant ads to users while taking into account value to advertisers/publishers and Google? In this talk, I will present a high level overview of Search Ads, the challenges and questions that drive our work to keep users and advertisers happy. 

Dean Straw, Proximity London 

Dean Straw

Biography - Dean is responsible for a small team of analysts that work across several clients including P&G, VW, Ikea, Specsavers, and Virgin Atlantic. Dean has been at Proximity for twelve years and prior to this he worked for the Home Office and Legal & General. His role at Proximity is to maximise the potential of his clients’ data by discovering new and innovative analytical solutions to solve business problems. 

Title - Using data analysis to inform marketing challenges – an agency view. 

Abstract - Proximity is one of the UK’s most successful customer engagement agencies. Our unique approach to problem solving, Creative Intelligence, combines the power of data and creativity to deliver highly effective communications programmes for our clients.

The presentation will describe some of the challenges our clients typically face and demonstrate how we support them, using data analytics to

  • Investigate and reveal the influences on customer behaviour
  • Drive the relationship between the brand and its customers to the next level
  • Improve the efficiency of marketing programmes

Our heads of the agency analytics teams, Dean Straw and Dawn Mills, will present case studies from our portfolio of clients.

Ashish Umre, XL Catlin

Ashish Umre2

Biography - Ashish is responsible for developing XL Catlin’s strategic vision and participation in AI & emerging technologies. Partnering with startups to experiment and realise innovative business solutions, new business models & commercial opportunities.

Ashish previously led the Advanced Analytics & Data Science Practice at Tesco. Supporting various initiatives around the use of Predictive Analytics, AI/Machine Learning, Behavioural Analytics for Marketing, Loyalty and Personalisation, Supply Chain, Distribution, Forecasting and Routing Optimisation. Agile Product Management, driving maturity in product analytics & customer experience (in-store & online).

Ashish has a long standing academic research background through his Masters and Doctoral research work in artificial intelligence, machine learning, neuroscience, statistics and sociobiology.  Ashish has been a mentor and advisor to many startups and continues to serve on advisory boards of some of these companies.

Title - Mummy, what’s a steering wheel? and other stories from insurance

Abstract - A perspective on the challenges faced by the commercial and specialty insurance sector, and how can insurance leverage internal and external data along with emerging technologies to understand complex risk in an uncertain world.

Jim Webber, Neo4j

JimWebber-300x300

Biography - Dr. Jim Webber is Chief Scientist at the leading graph database Neo4j, where he where he works on R&D for highly scalable graph databases and writes open source software. Jim has written two books on integration and distributed systems: “Developing Enterprise Web Services” on XML Web Services and “REST in Practice” on using the Web for building large-scale systems. His most recent book is “Graph Databases” which focuses on the Neo4j database. His blog is located at http://jimwebber.org and he tweets often @jimwebber.

Title - Eventual Consistency Will Ruin Your Graphs, Eventually.

Abstract - Graph data has made a huge impact in computing systems, and network science is the new wave of data science. But many databases we might use to store and query graphs were not designed with this use-case in mind and struggle with correctness. Given the importance of connected data to the modern world, we should understand the dependability qualities of so-called non-native graph databases so that we can be aware of when our tools produce garbage, often without warning or detection. 

This talk will present early results from an investigation into eventually consistent graph databases. It will informally explain the domain, typical database designs, and present a probabilistic model for reasoning about eventual corruption in graph data under no-fault operation. What’s more alarming is that the time to garbage is low, such that it impacts any reasonable production database. 

Our early results suggest that eventual consistency is not strong enough to support dependable graph databases and that users should be cautious about the quality of their graph data.

Edin Zajmovic, Thomson-Reuters

Edin

Biography - Edin Zajmovic is Director of Thomson-Reuters new program aiming to help Financial & Risk customers with their big-data challenges. He heads up product-strategy for a set of new and innovative solutions which are collective called: big, open, linked data solutions. Prior to this role Edin was the EMEA Head of Investment Management product-lines for Thomson-Reuters EMEA region. This role involved managing Business and product strategy for Asset management product lines serving Investment management professionals in the EMEA region. Prior to joining Thomson-Reuters in 2014 he held various roles primarily in commercial management, as the head of Global and UK strategic accounts at financial software provider - FactSet Research Systems. Before this he ran account management for FactSet’s large US Northeast Investment management firms and was based in Boston, USA. He started his career as a consultant with FactSet Research systems in Boston, USA. He has a Masters Degree in Finance/Economics and a BA in Finance and Entrepreneurship.

Title - Optimization of NLP & Knowledge Graphs for Capital markets analysis.

Abstract - Against Thomson Reuter’s strong background in data analytics and solutions, Edin will discuss key themes in big data in finance sector, covering topics such as the big data market, evolution of TR text data, NLP & linked data solutions, and use case studies and examples of Big, Open, Linked Data (BOLD) solution applications.